Checkpoint/restore basics

The Cedana daemon is designed to checkpoint/restore processes as well as containers.

Checkpoint

To checkpoint:

cedana dump <type> ...

Where <type> can be process, containerd, runc, etc. See feature matrix for all plugins that support checkpointing.

For example, to checkpoint a process:

cedana dump process <PID> --dir /tmp

A --dir flag can be used to specify the parent directory where the checkpoint will be stored. If not provided, the checkpoint will be stored in the default checkpoint directory as specified in the configuration, or in /tmp if not set. You may also specify a --name flag to give a custom name to the checkpoint file.

See CLI reference for all available options for process checkpoint.

Restore

To restore:

cedana restore <type> ...

Where <type> can be process, containerd, runc, etc. See feature matrix for all plugins that support restoring.

For example, to restore a process:

cedana restore process --path <path-to-dump>

Notice that for restore the flag is called --path instead of --dir (as in dump), this is because it can be a path to a compressed file, or to a directory if not compressed.

See CLI reference for all available options for process restore.

Managed checkpoint/restore

As explained in managed process/container, a job can be of any type, and thus can be checkpointed and restored using the cedana dump job and cedana restore job subcommands.

The cedana dump/restore job subcommands have the same options as their non-managed counterparts, but with pretty good defaults. For e.g., the --path flag is not required for cedana restore job, as the checkpoint path is stored in the job metadata.

If you do cedana job list after checkpointing a job, you will see the latest checkpoint time and size:

JOB             TYPE       PID  STATUS  GPU  CHECKPOINT     SIZE     LOG
famous_hopper7  process  32675  halted  no   3 seconds ago  610 KiB

To view all checkpoints for a job, use cedana job checkpoints <job_id>:

ID                                    TIME                 SIZE     PATH
141d52b4-0d1f-4911-a0da-abfab3358d16  2025-02-19 12:32:01  586 KiB  /tmp/dump-process-famous_hopper7-1739986321.tar
386dcce4-a29d-4acb-ab03-12d41b7c42ce  2025-02-19 12:30:36  610 KiB  /tmp/dump-process-famous_hopper7-1739986236.tar

Compression

The cedana dump subcommand supports a --compression flag to specify the compression algorithm to use. For example:

cedana dump process <PID> --dir /tmp --name xyz --compression gzip

This will create a compressed checkpoint file with the path /tmp/xyz.tar.gz. The --name flag is optional, and if not provided, the daemon will choose a unique name based on some metadata.

When restoring, the daemon will automatically detect the compression algorithm used and decompress the file. Simply provide the path to the compressed file:

cedana restore process --path /tmp/xyz.tar.gz

Supported values for --compression are none, tar, gzip, lz4, zlib.

You may also specify the default compression algorithm in the configuration.

Advanced

Last updated

Was this helpful?