Cedana Storage
Cedana Storage is a global storage for checkpoints that is backed by multiple cloud providers, providing low latency and high availability. This is the fastest way to get started with remote checkpoint/restore, as you only need to be authenticated with Cedana.
If you're using Cedana on an Amazon EKS cluster, you'll likely get higher performance using Amazon S3. Similarly, if you're using Cedana on a GKE cluster, you'll likely get higher performance using Google Cloud Storage.
Prerequisites
Create an account with Cedana, to get access to the GPU plugin. See authentication.
Set the Cedana URL & authentication token in the configuration.
Install the storage/cedana plugin with
sudo cedana plugin install storage/cedana.Ensure the daemon is running, see installation.
Do a health check to ensure the plugin is ready, see health checks.
Checkpoint
To checkpoint to Cedana Storage, simply set the --dir to a path that starts with cedana://<path>, for example:
cedana dump ... --dir cedana://path/to/dirFor example, as explained in managed checkpoint/restore, to checkpoint a job to Cedana Storage:
cedana dump job my-job-1 --dir cedana://my-checkpointsIf you do cedana job list, you will see the latest checkpoint:
ID TIME SIZE PATH
my-job-1 2025-02-19 12:30:36 - cedana://my-checkpoints/dump-job.tarRestore
Similarly, to restore from Cedana Storage, simply set the --path to your checkpoint path in Cedana Storage, for example:
cedana restore ... --path cedana://path/to/dump.tarFor example, as explained in managed checkpoint/restore, to restore a job from Cedana Storage:
cedana restore job --attach my-job-1This will automatically restore from the latest checkpoint for my-job-1, which is stored in Cedana Storage.
Compression
All compression algorithms supported for basic checkpoint/restore are supported. See compression for more information.
Streaming
High-performance low-overhead streaming of checkpoints is also supported by the storage/cedana plugin. Follow instructions on checkpoint/restore streamer to use streaming with this plugin.
Enable by default
To enable streaming by default, set the Checkpoint.Dir field in the configuration to a path that starts with cedana://.
See also
Last updated
Was this helpful?