githubEdit

Installation

Begin checkpoint/migrate/restoring stateful workloads in SLURM in under 5 minutes!

You can install Cedana on a SLURM node in 3 ways:

  1. [Option 2] Using Cedana

circle-exclamation
circle-check
circle-info

You can also deploy fully self-hosted, with zero limitations on where you can store your checkpoints! Check out configurationarrow-up-right. If you have any questions, please reach out to us at [email protected]envelope.

Using web installer

The web installer will automatically install the latest stable version of Cedana and all plugins required for SLURM support with sane defaults.

Install

circle-check
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token

curl -fsSL ${CEDANA_URL}/install/slurm -H "Authorization: Bearer ${CEDANA_AUTH_TOKEN}" | sudo -E bash
  • Use ?version=x.y.z query parameter to install a specific version.

  • Use ?build=alpha&version=feat/my-branch to install an alpha build from a branch.

Configure

For configuration, follow instructions on Cedana Daemon configurationarrow-up-right. Use the environment variable corresponding to a configuration item to override it when calling the installer script.

Using Cedana

You can also install SLURM support directly from Cedana.

Install

First, install Cedana by following instructions on Cedana Daemon installationarrow-up-right.

Then, install the slurm plugin:

Installing the plugins directly

For deployments that require installing the plugin files manually, you can download the files directly.

First, download the Cedana binary.

To download the SLURM plugin binaries, use the Cedana binary. For configuration, follow instructions on Cedana Daemon configurationarrow-up-right. Use the environment variable corresponding to a configuration item to override it when calling the installer script.

For nightly SLURM plugin binaries, use the alpha builds.

The installer will download the cedana-slurm binary to /usr/local/bin and the SLURM plugin files to /usr/local/lib. To install the files, transfer the files to the required directories.

Update the /etc/slurm/plugstack.conf to include the spank_cedana.so

Update the /etc/slurm/slurm.conf to include the plugins.

Reload the slurmctld and slurmd with

On the database node (slurmdbd), start cedana-slurm.

Configure

For configuration, follow instructions on Cedana Daemon configurationarrow-up-right.

On all nodes, initialize the /etc/cedana/config.json with

On the slurmdbd node, if you are using systemd, create the service file

Privileged Mode (root)

In privileged mode, the checkpointing and restoring are done as root. Privileged mode requires no additional configuration.

Unprivileged Mode (user)

In unprivileged mode, the checkpointing and restoring are done as the job's user, i.e., the UID of the SLURM job performs the checkpoint and restore. This configuration is useful when the root is demoted for security purposes. For example, a NFS with root_squash requires unprivileged mode.

To enable unprivileged mode, initialize the /etc/cedana/config.json with

In addition, the cedana-slurm, cedana, and criu binaries must have the required capabilities for users to perform checkpoint and restore.

Building from source

Check make help for available build targets.

Build all binaries:

By default, the binaries will be built using the cedana/cedana-slurm:build docker image.

These binaries are useless on their own. You need to install Cedana to use them.

First, install Cedana by following instructions on Cedana Daemon installationarrow-up-right. Then, install the slurm plugin and set up SLURM support:

circle-info

You need to be in the build directory for the cedana slurm setup command to work, as it needs to find the binaries you just built.

You're all set up! Let's checkpoint some workloads. Continue to Checkpoint/restore to get started.

Uninstall

To remove Cedana SLURM completely, run:

Last updated

Was this helpful?