Amazon S3

Checkpoint/restore to/from Amazon S3 is as seamless as it is to/from local storage.

Both general-purpose and directory S3 buckets are supported. Use directory buckets for better performance for the cost of lower availability. See Amazon S3 Storage Classes for more details.

Prerequisites

  1. Create an account with Cedana, to get access to the GPU plugin. See authentication.

  2. Set the Cedana URL & authentication token in the configuration.

  3. Install the storage/s3 plugin with sudo cedana plugin install storage/s3.

  4. Set AWS credentials in the configuration.

  5. Ensure the daemon is running, see installation.

  6. Do a health check to ensure the plugin is ready, see health checks.

Checkpoint

To checkpoint to an S3 bucket, simply set the --dir to a path that starts with s3://<bucket>, for example:

cedana dump ... --dir s3://my-bucket/path/to/dir

For example, as explained in managed checkpoint/restore, to checkpoint a job to S3:

cedana dump job my-job-1 --dir s3://checkpoints-bucket

If you do cedana job list, you will see the latest checkpoint:

ID            TIME                 SIZE     PATH
my-job-1      2025-02-19 12:30:36  -        s3://checkpoints-bucket/dump-job.tar

Restore

Similarly, to restore from an S3 bucket, simply set the --path to your checkpoint path in S3, for example:

cedana restore ... --path s3://my-bucket/path/to/dump.tar

For example, as explained in managed checkpoint/restore, to restore a job from S3:

cedana restore job --attach my-job-1

This will automatically restore from the latest checkpoint for my-job-1, which is stored in S3.

Compression

All compression algorithms supported for basic checkpoint/restore are supported. See compression for more information.

For better performance when remote checkpointing/restoring large processes/containers, especially when using checkpoint/restore with GPUs, always use compression. The lz4 compression algorithm is a good compromise between speed and compression ratio.

Streaming

High-performance low-overhead streaming of checkpoints is also supported by the storage/s3 plugin. Follow instructions on checkpoint/restore streamer to use streaming with this plugin.

Enable by default

To enable streaming by default, set the Checkpoint.Dir field in the configuration to a path that starts with s3://.

See also

Last updated

Was this helpful?