githubEdit

Amazon S3

Checkpoint/restore to/from Amazon S3 is as seamless as it is to/from local storage.

circle-info

Both general-purposearrow-up-right and directoryarrow-up-right S3 buckets are supported. Use directory buckets for better performance for the cost of lower availability. See Amazon S3 Storage Classesarrow-up-right for more details.

Prerequisites

  1. Create an account with Cedana, to get access to the GPU plugin. See authentication.

  2. Set the Cedana URL & authentication token in the configuration.

  3. Install the storage/s3 plugin with sudo cedana plugin install storage/s3.

  4. Set AWS credentials in the configuration.

  5. Ensure the daemon is running, see installation.

  6. Do a health check to ensure the plugin is ready, see health checks.

Checkpoint

To checkpoint to an S3 bucket, simply set the --dir to a path that starts with s3://<bucket>, for example:

cedana dump ... --dir s3://my-bucket/path/to/dir

For example, as explained in managed checkpoint/restore, to checkpoint a job to S3:

cedana dump job my-job-1 --dir s3://checkpoints-bucket

If you do cedana job list, you will see the latest checkpoint:

ID            TIME                 SIZE     PATH
my-job-1      2025-02-19 12:30:36  -        s3://checkpoints-bucket/dump-job.tar

Restore

Similarly, to restore from an S3 bucket, simply set the --path to your checkpoint path in S3, for example:

For example, as explained in managed checkpoint/restore, to restore a job from S3:

This will automatically restore from the latest checkpoint for my-job-1, which is stored in S3.

Compression

All compression algorithms supported for basic checkpoint/restore are supported. See compression for more information.

circle-info

For better performance when remote checkpointing/restoring large processes/containers, especially when using checkpoint/restore with GPUs, always use compression. The lz4 compression algorithm is a good compromise between speed and compression ratio.

Streaming

High-performance low-overhead streaming of checkpoints is also supported by the storage/s3 plugin. Follow instructions on checkpoint/restore streamer to use streaming with this plugin.

Enable by default

To enable streaming by default, set the Checkpoint.Dir field in the configuration to a path that starts with s3://.

See also

Last updated