Experimental GPU SMR on Kubernetes

SMR of GPU workloads in Kubernetes is still experimental! Some of the jank involved in setting up nodes is planned to be automated/smoothed out soon.

Unsupported Configurations

  • Ubuntu 24.04: While technically supported, we cannot guarantee functionality due to its use of a newer libelf version, which requires linking against an 2.38+ libc. Most containers may not operate correctly on this system as a result as we use the system libelf inside the containers, hence enforcing a minimum libc version requirement on container images.

  • CUDA Versions: We officially support CUDA up to version 12.4. Newer versions may work, but we have not conducted thorough testing. Due to the nature of our APIs, it can be challenging to determine if issues arise from version mismatches or other factors.

  • glibc Versions: We do not support glibc versions lower than 2.35. While we plan to transition to static binaries for some components, a minimum of glibc 2.31 will still be required. Additionally, our Kubernetes systems currently use CRIU 4.0, which also mandates at least glibc 2.35.

Prerequisites

  • GPU Support: Ensure that the CUDA toolkit and NVIDIA drivers are properly installed.

    • Verify that the NVIDIA libraries are in your PATH and ldconfig.

    # Check for nvidia-smi and nvcc in PATH
    nvidia-smi --version
    nvcc --version
    
    # Typically, CUDA binaries are installed at:
    export PATH=$PATH:/usr/local/cuda/bin
    
    # Test ldconfig
    /sbin/ldconfig -v | grep cuda # This should display libcuda.so
    
    # Alternatively, add to LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

    • Note: Most setups should work if installed via system packages. Consider rebooting or sourcing $HOME/.profile.

  • Container Runtime: Ensure your setup uses runc. We currently do not support crun or other runtimes.

  • Installing Cedana: You can now install Cedana via the Helm chart (see instructions) on the node you just set up. The Cedana DaemonSet will handle the installation of necessary packages and enable GPU support if nvidia-smi is found in the PATH.

    • The DaemonSet will also download and install any additional packages and libraries required.

  • Download the Shim File: Retrieve the shim file using the command below:

    curl -H "Authorization: Bearer CEDANA_AUTH_TOKEN" -o \
        containerd-shim-runc-v2-cedana "${CEDANA_URL}/k8s/cedana-shim"
    • Make sure to set the correct permissions:

    chmod 755 containerd-shim-runc-v2-cedana
    install containerd-shim-runc-v2-cedana /usr/local/bin # requires sudo
    • You have two options for integrating the shim:

      1. Add a Separate Runtime: Configure it in containerd’s config.toml (usually found at /etc/containerd/config.toml). Ensure you modify the correct file through journalctl logs for containerd config call arguments.

      2. Replace the Existing Binary: This is the simplest approach for quick setup, though it may disrupt some containers. We recommend this for initial experiments.

Making Containerd Config Changes

  • If adding a new runtime:

    • Name the new runtime as per the binary you saved in /usr/local/bin.

    • Set this new runtime as the default and specify the runtime type as io.containerd.runc.v2-cedana.

    • Restart the containerd service.

    version = 3
    
    [plugins."io.containerd.cri.v1.runtime".containerd]
      default_runtime_name = "cedana" # note this should be runc by default
      # do not change for non experimental tests
    
      [plugins."io.containerd.cri.v1.runtime".containerd.runtimes]
        # Cedana: Custom runtime using the Cedana shim
        [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.cedana]
          runtime_type = "io.containerd.runc.v2-cedana"
          [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.cedana.options]
            SystemdCgroup = true
    
        # Other runtimes can be defined here if needed
        # Example:
        # [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.crun]
        #   runtime_type = "io.containerd.runc.v2"
        #   [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.crun.options]
        #     BinaryName = "/usr/local/bin/crun"
    
        # gVisor: https://gvisor.dev/
        # [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.gvisor]
        #   runtime_type = "io.containerd.runsc.v1"
    
        # Kata Containers: https://katacontainers.io/
        # [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata]
        #   runtime_type = "io.containerd.kata.v2"
    
    [debug]
      level = "debug"

Usage Flow

  • You can now start new workloads, which will have the LD_PRELOAD and required mounts configured automatically.

  • Due to forced LD_PRELOAD and mounts, there may be compatibility issues with some containers.

  • We recommend using this new service as the default only for experimental purposes.

    • For other workloads, use the runtime: label in the pod spec configuration.

Last updated