Experimental GPU SMR on Kubernetes

SMR of GPU workloads in Kubernetes is still experimental! Some of the jank involved in setting up nodes is planned to be automated/smoothed out soon.

Unsupported Configurations

  • Ubuntu 24.04: While technically supported, we cannot guarantee functionality due to its use of a newer libelf version, which requires linking against an 2.38+ libc. Most containers may not operate correctly on this system as a result as we use the system libelf inside the containers, hence enforcing a minimum libc version requirement on container images.

  • CUDA Versions >12.4: We officially support CUDA up to version 12.4 from 12.0. Newer versions may work, but we have not conducted thorough testing. Due to the nature of our APIs, it can be challenging to determine if issues arise from version mismatches or other factors.

  • glibc Versions <2.31: We do not support glibc versions lower than or equal to 2.31. While we plan to transition to static binaries for some components, a minimum of glibc 2.31 will still be required for the short term. Additionally, our Kubernetes systems currently use CRIU 4.0, which also mandates at least glibc 2.31.

Setup Cedana Shim for GPU Support

Step 0: Get access to a K8s Node

We support Ubuntu 22.04 as default currently, however centOS based systems are partially supported and being actively tested.

If you have access to EKS, start by creating a cluster with a GPU node.

# here we create one with only one node as it makes testing easier
# do disable nvidia plugin install and install drivers manually, often automatic installs can cause issues or break
eksctl create cluster --node-type=g4dn.xlarge --install-nvidia-plugin=false --node-ami-family=Ubuntu2204 -N 1 --ssh-access --ssh-public-key=<your-pub-keys>

# ssh into the node and install drivers as shown above, and ensure cuda is available

If you don't have a Kubernetes cluster, but have a gpu vps or gpu installed on a ubuntu linux box:

  1. Update your drivers to match the pre-requisites.

  2. Ensure you have libcuda.so on your system, ldconfig -v | grep libcuda

  3. Install kubernetes on it.

    1. We recommend using k3sup a. k3sup install --local will setup a k3s cluster for you. Note: /var/lib/rancher/k3s/data/current/bin/ contains the containerd-shim we would need to replace. b. Ensure you copy the export KUBECONFIG commands from the output and paste it in your .bashrc

sudo apt update
sudo apt install nvidia-driver-550 # you can list supported drivers with `sudo ubuntu-drivers list --gpgpu`

# ensure you have cuda
sudo /sbin/ldconfig -v | grep libcuda

# if you do not follow https://docs.nvidia.com/cuda/cuda-installation-guide-linux/ to install cuda-toolkit
# note: you might have to search the internet to get a cuda toolkit archive for a specific version you support
# You can get 12.4 from here: https://developer.nvidia.com/cuda-12-4-0-download-archive

# install k8s
k3sup install --local
# setup exports and source .bashrc
kubectl get node -o wide # should show the host machine as a node

# for production/HA you can connect multiple nodes together, but for testing 1 will do

Step 1: Download the Cedana Containerd Shim

First, download the Cedana fork for the containerd shim.

curl -X GET -L https://${ORGANIZATION}.cedana.ai/k8s/cedana-shim/latest \
     -H "Authorization: Bearer $CEDANA_AUTH_TOKEN" > containerd-shim-runc-v2

Step 2: Stop the Kubernetes Containerd Service

If you are using Shim v2, stop the containerd service before replacing the shim.

Note: it's not required in a test cluster, but in a production cluster ensure containerd is down so that no requests get miss assigned and dropped, just in case.

sudo systemctl stop containerd

Step 3: Install the New Shim Binary

Move the downloaded shim to the appropriate directory.

install containerd-shim-runc-v2 /usr/bin/containerd-shim-runc-v2
# note: the path to containerd-shim-runc-v2 depends on your installation of kubernetes
# consider running `which containerd-shim-runc-v2` and installing in the provided path

Step 4: Restart the Containerd Service

Start the containerd service again.

sudo systemctl start containerd

Step 5: Using Shim v1 (if applicable)

If you are using Shim v1, replace the binary with Shim v2, and then update the containerd config:

# Update containerd config to use v2
# Change `containerd.io.runc.v1` to `containerd.io.runc.v2`

# Restart containerd
sudo systemctl restart containerd

Prerequisites

Before installing Cedana using the Helm chart, ensure that the following are installed:

  • NVIDIA base drivers and CUDA drivers (version 12.1 to 12.4)

  • nvidia-smi is available

Follow instructions in Cedana Cluster Installation, and ensure you have cedana setup before proceeding further.

Verify that the Cedana helper pod logs indicate a valid CUDA version and display the message "GPU Enabled."

Running a Container with CUDA

Once everything is set up, you can run a container with CUDA support. Make sure to set the CEDANA_GPU environment variable in the container spec:

apiVersion: v1
kind: Pod
metadata:
  name: cuda-pod
  namespace: default
spec:
  containers:
    - name: cuda-container-test
      # note: it will also work fine if you have older cuda versions
      image: swarnimcedana/cuda-vectoradd:cuda-12.4
      tty: true
      env:
        - name: CEDANA_GPU
          value: "1"  # Value is irrelevant, but it must be set to enable GPU support

Notes:

  • Using tty: true is recommended, as the container might take a little time to start, and logs may not appear immediately if the buffer isn't flushed correctly.

  • If your program expects newline buffering, ensure that it is compatible with the delayed log output during initial startup.

Performing a Save

Now to perform a save there are two ways, either you use our CLI and run cedana dump runc --id <container-id> --path <checkpoint-path-on-local-filesystem>. Or you can setup ingres/port-forward our manager to get access to a basic set of APIs for performing checkpoint restore.

# namespace we installed helm chart to
$ export CEDANA_NAMESPACE="cedana-controller-system"

# you should detach the below command 
$ kubectl port-forward svc/cedana-cedana-helm-manager-service -n $CEDANA_NAMESPACE 1324:1324 &

# note the root depends on your k8s setup
# the default for k8s on containerd is `/run/containerd/runc/k8s.io`
# if you have started cuda-pod it should now show up with it's container id
$ export ROOT="/run/containerd/runc/k8s.io"

# now you can get the container id by listing containers in the given names using our API
$ curl -X GET -H "Content-Type: application/json" -d '{ "root" : "'$ROOT'" }' http://localhost:1324/list/default 

# setup variables from above information
$ export CHECKPOINT_CONTAINER=cuda-conatiner-test
$ export CHECKPOINT_SANDBOX=cuda-pod
$ export NAMESPACE=default
# path to store checkpoint on node's local filesystem
$ export CHECKPOINT_PATH=/tmp/ckpt-test

# finally you can run a checkpoint
$ curl -X POST -H "Content-Type: application/json" -d '{
  "checkpoint_data": {
    "container_name": "'$CHECKPOINT_CONTAINER'",
    "sandbox_name": "'$CHECKPOINT_SANDBOX'",
    "namespace": "'$NAMESPACE'",
    "checkpoint_path": "'$CHECKPOINT_PATH'",
    "root": "'$ROOT'"
  }
}' http://localhost:1324/checkpoint

Performing a Resume

For our resumes we require first creating a new container with the same image but with root PID in sleep so that it can be replaced by us. We plan to improve this workflow soon, but until then it's still considered a requirement.

apiVersion: v1
kind: Pod
metadata:
  name: cuda-pod-restore
  namespace: default
spec:
  containers:
    - name: cuda-container-test
      # note: it will also work fine if you have older cuda versions
      image: swarnimcedana/cuda-vectoradd:cuda-12.4
      args: ["sh", "-c", "sleep infinity"]
      tty: true
      env:
        - name: CEDANA_GPU
          value: "1"  # Value is irrelevant, but it must be set to enable GPU support

After the restore pod is setup and running you can attempt your restore which should resume from a previously taken checkpoint:

# ensure pod is created and setup
$ kubectl create -f cuda-pod-restore.yaml

$ export ROOT="/run/containerd/runc/k8s.io"

# setup variables from above information
$ export RESTORE_CONTAINER=cuda-conatiner-test
$ export RESTORE_SANDBOX=cuda-pod-restore
$ export NAMESPACE=default

# path to store checkpoint on node's local filesystem
$ export CHECKPOINT_PATH=/tmp/ckpt-test

# now you can try running the restore
curl -X POST -H "Content-Type: application/json" -d '{
  "checkpoint_data": {
    "container_name": "'$RESTORE_CONTAINER'",
    "sandbox_name": "'$RESTORE_SANDBOX'",
    "namespace": "'$NAMESPACE'",
    "checkpoint_path": "'$CHECKPOINT_PATH'",
    "root": "'$ROOT'"
  }
}' http://localhost:1324/restore

With these steps completed, you should be able to leverage Cedana GPU support within your Kubernetes containers.

  • We recommend using this new service as the default only for experimental purposes.

    • For other workloads, use the runtime: label in the pod spec configuration.

Last updated