Installation
Setup
This guide will walk you through the installation and setup of the Cedana checkpoint/restore plugin for SLURM.
These steps must be performed on all SLURM controller and compute nodes in your cluster! We're rolling out support soon however to automate installation on your cluster like we do in Kubernetes.
Installing Cedana
Prerequisites
We depend on a few packages to be installed on the node, which can be set up in the following ways:
Using dnf/yum (Fedora/CentOS)
yum install -y libnet-devel protobuf-c-devel libnl3-devel libbsd-devel libcap-devel libseccomp-devel gpgme-devel nftables-develUsing apt (Debian/Ubuntu)
apt-get install -y libnet-dev libprotobuf-c-dev libnl-3-dev libbsd-dev libcap-dev libseccomp-dev libgpgme11-dev libnftables1Getting Cedana
You can either download our latest published release, or build from source.
curl -L -o cedana.tar.gz https://github.com/cedana/cedana/releases/download/v0.9.245/cedana-amd64.tar.gz
tar -xzvf cedana.tar.gz
chmod +x cedana
mv cedana /usr/local/bin/cedana
rm cedana.tar.gzSee our daemon repo for info on building from source.
Installing Plugins
To install a plugin from the online registry, you need to be authenticated. See Authentication for more information.
The plugins required are dependent on your cluster. We ship plugins for different containerization frameworks (Singularity soon!). For example, if you'd like Cedana to manage GPU workloads:
sudo cedana plugin install criu gpuWe ship our own modified version of the CRIU binary, which is necessary to do any sort of checkpoint/restore in userspace.
The daemon requires root privileges for checkpoint/restore operations. Check the CLI reference for all options.
You can directly start the daemon with:
sudo cedana daemon startIf you're a systemd user, you may also install it as a service (if built from source):
make install-systemdTry make help to see all available targets.
Health check the daemon
The daemon can be health checked to ensure it fully supports the system and is ready to accept requests. See health checks for more information.
Enable and Start the Service
After the installation is complete, you need to enable the Cedana service to ensure it starts automatically on system boot. Then, start the service to activate it immediately.
Run the following commands:
# Enable the service to start on boot
sudo systemctl enable cedana
# Start the service now
sudo systemctl start cedanaOnce these steps are completed on all nodes, the Cedana plugin will be installed and running in your cluster.
Last updated
Was this helpful?