Checkpoint/restore with GPUs
Prerequisites
Create an account with Cedana, to get access to the GPU plugin. See authentication.
Set the Cedana URL & authentication token in the configuration.
Install the GPU plugin with
sudo cedana plugin install gpu
.Ensure the daemon is running, see installation.
Do a health check to ensure the plugin is ready, see health checks.
Usage
NOTE: GPU checkpoint/restore is only possible for managed processes/containers, i.e., those that are spawned using cedana run
(see managed process/container).
You may clone the cedana-samples repository for some example GPU workloads.
Run a process with GPU support:
Checkpoint:
Restore:
For all available CLI options, see CLI reference. Directly interacting with daemon is also possible through gRPC, see API reference.
Last updated
Was this helpful?