Cedana
Cedana Daemon
Cedana
Cedana Daemon
  • Cedana Daemon
  • Get started
    • Installation
    • Authentication
    • Configuration
    • Health checks
    • Plugins
    • Feature matrix
  • Guides
    • Managed process/container
    • Checkpoint/restore basics
    • Checkpoint/restore with GPUs
    • Checkpoint/restore runc
    • Checkpoint/restore containerd
    • Checkpoint/restore streamer
    • Checkpoint/restore kata
      • how-to-create-custom-busybox-image
      • how-to-install-criu-in-guest
      • how-to-install-on-aws
      • how-to-make-kernel-criu-compatible
      • how-to-make-rootfs-criu-compatible
      • Checkpoint/Restore kata containers
  • Developer guides
    • Architecture
    • Profiling
    • Testing
    • Writing plugins
  • References
    • CLI
      • cedana
      • cedana attach
      • cedana checkpoint
      • cedana checkpoints
      • cedana completion
      • cedana completion bash
      • cedana completion fish
      • cedana completion powershell
      • cedana completion zsh
      • cedana daemon
      • cedana daemon check
      • cedana daemon start
      • cedana delete
      • cedana dump
      • cedana dump containerd
      • cedana dump job
      • cedana dump process
      • cedana dump runc
      • cedana exec
      • cedana features
      • cedana inspect
      • cedana job
      • cedana job attach
      • cedana job checkpoint
      • cedana job checkpoint inspect
      • cedana job checkpoint list
      • cedana job checkpoints
      • cedana job delete
      • cedana job inspect
      • cedana job kill
      • cedana job list
      • cedana jobs
      • cedana k8s-helper
      • cedana k8s-helper destroy
      • cedana kill
      • cedana manage
      • cedana manage containerd
      • cedana manage process
      • cedana manage runc
      • cedana plugin
      • cedana plugin features
      • cedana plugin install
      • cedana plugin list
      • cedana plugin remove
      • cedana plugins
      • cedana ps
      • cedana query
      • cedana query k8s
      • cedana query runc
      • cedana restore
      • cedana restore job
      • cedana restore process
      • cedana restore runc
      • cedana run
      • cedana run containerd
      • cedana run process
      • cedana run runc
    • API
    • GitHub
Powered by GitBook
On this page
  • Prerequisites
  • Usage (GPU plugin)
  • Usage (CRIU CUDA plugin)

Was this helpful?

Edit on GitHub
  1. Guides

Checkpoint/restore with GPUs

PreviousCheckpoint/restore basicsNextCheckpoint/restore runc

Last updated 7 days ago

Was this helpful?

Checkpoint/restore with GPUs is currently only supported for NVIDIA GPUs.

Prerequisites

  1. Create an account with Cedana, to get access to the GPU plugin. See .

  2. Set the Cedana URL & authentication token in the .

  3. Install a GPU plugin.

  • Option 1: GPU Plugin

    sudo cedana plugin install gpu

    The GPU plugin is Cedana's proprietary plugin for high performance GPU checkpoint/restore. If unavailable to you, check option 2.

    • Minimum NVIDIA driver version: 452 (API 11.8)

    • Maximum NVIDIA driver version: 550 (API 12.4). Newer drivers are unstable and may not work.

    • Minimum CRIU version: 3.0

  • Option 2: CRIU CUDA Plugin

    sudo cedana plugin install criu/cuda
    • Minimum NVIDIA driver version: 570 (API 12.8)

    • Minimum CRIU version: 4.0

    Check out for a performance comparison between the two plugins.

  1. Ensure the daemon is running, see .

  2. Do a health check to ensure the plugin is ready, see .

Usage (GPU plugin)

  1. Run a process with GPU support:

cedana run process --attach --gpu-enabled --jid <job_id> -- cedana-samples/gpu_smr/vector_add
  1. Checkpoint:

cedana dump job <job_id>
  1. Restore:

cedana restore job --attach <job_id>

Usage (CRIU CUDA plugin)

NOTE: Cedana GPU checkpoint/restore is only possible for managed processes/containers, i.e., those that are spawned using cedana run --gpu-enabled or managed using cedana manage --gpu-enabled (see ).

You may clone the for some example GPU workloads.

You can checkpoint/restore normally as you do for CPU workloads. See .

For all available CLI options, see . Directly interacting with daemon is also possible through gRPC, see .

managed process/container
cedana-samples repository
checkpoint/restore basics
CLI reference
API reference
authentication
configuration
installation
health checks
Cedana vs. CRIU CUDA for GPU Checkpoint/Restore