Installation
Begin checkpoint/migrate/restoring stateful workloads in SLURM in under 5 minutes!
You can install Cedana on a SLURM node in 3 ways:
[Option 1] Using web installer (sane defaults)
[Option 2] Using Cedana
[Option 3] Building from source
These steps must be performed on all SLURM controller and compute nodes in your cluster!
To use Cedana in Kubernetes, you need to be registered with us! Reach out to [email protected] to get set up with an organization.
You can also deploy fully self-hosted, with zero limitations on where you can store your checkpoints! Check out configuration. If you have any questions, please reach out to us at [email protected].
Using web installer
The web installer will automatically install the latest stable version of Cedana and all plugins required for SLURM support with sane defaults.
Install
Check Authentication for more details on how to get an authentication token.
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token
curl -fsSL ${CEDANA_URL}/install/slurm -H "Authorization: Bearer ${CEDANA_AUTH_TOKEN}" | sudo -E bashUse
?version=x.y.zquery parameter to install a specific version.Use
?build=alpha&version=feat/my-branchto install an alpha build from a branch.
Configure
For configuration, follow instructions on Cedana Daemon configuration. Use the environment variable corresponding to a configuration item to override it when calling the installer script.
Using Cedana
You can also install SLURM support directly from Cedana.
Install
First, install Cedana by following instructions on Cedana Daemon installation.
Then, install the slurm plugin:
Installing the plugins directly
For deployments that require installing the plugin files manually, you can download the files directly.
First, download the Cedana binary.
To download the SLURM plugin binaries, use the Cedana binary. For configuration, follow instructions on Cedana Daemon configuration. Use the environment variable corresponding to a configuration item to override it when calling the installer script.
For nightly SLURM plugin binaries, use the alpha builds.
The installer will download the cedana-slurm binary to /usr/local/bin and the SLURM plugin files to /usr/local/lib. To install the files, transfer the files to the required directories.
Update the /etc/slurm/plugstack.conf to include the spank_cedana.so
Update the /etc/slurm/slurm.conf to include the plugins.
Reload the slurmctld and slurmd with
On the database node (slurmdbd), start cedana-slurm.
Configure
For configuration, follow instructions on Cedana Daemon configuration.
On all nodes, initialize the /etc/cedana/config.json with
On the slurmdbd node, if you are using systemd, create the service file
Privileged Mode (root)
In privileged mode, the checkpointing and restoring are done as root. Privileged mode requires no additional configuration.
Unprivileged Mode (user)
In unprivileged mode, the checkpointing and restoring are done as the job's user, i.e., the UID of the SLURM job performs the checkpoint and restore. This configuration is useful when the root is demoted for security purposes. For example, a NFS with root_squash requires unprivileged mode.
To enable unprivileged mode, initialize the /etc/cedana/config.json with
In addition, the cedana-slurm, cedana, and criu binaries must have the required capabilities for users to perform checkpoint and restore.
Building from source
Check make help for available build targets.
Build all binaries:
By default, the binaries will be built using the cedana/cedana-slurm:build docker image.
These binaries are useless on their own. You need to install Cedana to use them.
First, install Cedana by following instructions on Cedana Daemon installation. Then, install the slurm plugin and set up SLURM support:
You need to be in the build directory for the cedana slurm setup command to work, as it needs to find the binaries you just built.
You're all set up! Let's checkpoint some workloads. Continue to Checkpoint/restore to get started.
Uninstall
To remove Cedana SLURM completely, run:
Last updated
Was this helpful?