SLURM Setup

Setup

This guide will walk you through the installation and setup of the Cedana checkpoint/restore plugin for SLURM. These steps must be performed on all SLURM controller and compute nodes in your cluster.

Installing our plugin

Automated Installation

First, you need to download and run the installation script. This script will install the Cedana agent and all necessary dependencies for the plugin to function correctly.

For now, you can either install the daemon from source, or use the released binaries.

Prerequisites

Since Cedana depends on CRIU, you will need to ensure it's dependencies are installed.

Using apt (Debian/Ubuntu)

apt-get install -y libnet-devel protobuf-c-devel libnl3-devel libbsd-devel libcap-devel libseccomp-devel gpgme-devel nftables-devel

Using dnf/yum (Fedora/CentOS)

yum install -y libnet-dev libprotobuf-c-dev libnl-3-dev libbsd-dev libcap-dev libseccomp-dev libgpgme11-dev libnftables1

Build from source

Build

make cedana

Install

make install

Build and install (with all plugins)

make all

Try make help to see all available targets.

Download from releases

Download the latest release from the releases.

curl -L -o cedana.tar.gz https://github.com/cedana/cedana/releases/download/v0.9.245/cedana-amd64.tar.gz
tar -xzvf cedana.tar.gz
chmod +x cedana
mv cedana /usr/local/bin/cedana
rm cedana.tar.gz

Install CRIU

A modified version of CRIU is shipped as a plugin for Cedana, so you don't need to install it separately. You can simply do:

sudo cedana plugin install criu

This version of CRIU is not a requirement for Cedana, but it is recommended for certain features, such as checkpoint/restore streamer.

To install CRIU independently, see the CRIU installation guide.

Start the daemon

You can directly start the daemon with:

sudo cedana daemon start

If you're a systemd user, you may also install it as a service (if built from source):

make install-systemd

Try make help to see all available targets.

Health check the daemon

The daemon can be health checked to ensure it fully supports the system and is ready to accept requests. See health checks for more information.


2. Enable and Start the Service

After the installation is complete, you need to enable the Cedana service to ensure it starts automatically on system boot. Then, start the service to activate it immediately.

Run the following commands:

# Enable the service to start on boot
sudo systemctl enable cedana

# Start the service now
sudo systemctl start cedana

Once these steps are completed on all nodes, the Cedana plugin will be installed and running in your cluster.

Last updated

Was this helpful?