Checkpoint/restore with GPUs
Last updated
Was this helpful?
Last updated
Was this helpful?
Checkpoint/restore with GPUs is currently only supported for NVIDIA GPUs.
Create an account with Cedana, to get access to the GPU plugin. See .
Set the Cedana URL & authentication token in the .
Install a GPU plugin.
Option 1: GPU Plugin
The GPU plugin is Cedana's proprietary plugin for high performance GPU checkpoint/restore. If unavailable to you, check option 2.
Minimum NVIDIA driver version: 452 (API 11.8)
Maximum NVIDIA driver version: 550 (API 12.4). Newer drivers are unstable and may not work.
Minimum CRIU version: 3.0
Option 2: CRIU CUDA Plugin
Minimum NVIDIA driver version: 570 (API 12.8)
Minimum CRIU version: 4.0
Check out for a performance comparison between the two plugins.
Ensure the daemon is running, see .
Do a health check to ensure the plugin is ready, see .
Run a process with GPU support:
Checkpoint:
Restore:
NOTE: Cedana GPU checkpoint/restore is only possible for managed processes/containers, i.e., those that are spawned using cedana run --gpu-enabled
(see ).
You may clone the for some example GPU workloads.
You can checkpoint/restore normally as you do for CPU workloads. See .
For all available CLI options, see . Directly interacting with daemon is also possible through gRPC, see .