# Installation

You can install Cedana on a SLURM node in 3 ways:

1. \[Option 1] [Using web installer (sane defaults)](#using-web-installer)
2. \[Option 2] [Using Cedana](#using-cedana)
3. \[Option 3] [Building from source](#building-from-source)

{% hint style="warning" %}
These steps must be performed on **all SLURM controller and compute nodes** in your cluster!
{% endhint %}

{% hint style="success" %}
To use Cedana in Kubernetes, you need to be registered with us! Reach out to [founders@cedana.ai](mailto:foundes@cedana.ai) to get set up with an organization.
{% endhint %}

{% hint style="info" %}
You can also deploy fully self-hosted, with zero limitations on where you can store your checkpoints! Check out [configuration](https://docs.cedana.ai/daemon/get-started/configuration). If you have any questions, please reach out to us at <founders@cedana.ai>.
{% endhint %}

## Using web installer

The web installer will automatically install the latest stable version of Cedana and all plugins required for SLURM support with sane defaults.

### Install

{% hint style="success" %}
Check [Authentication](/get-started/authentication.md) for more details on how to get an authentication token.
{% endhint %}

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token

curl -fsSL ${CEDANA_URL}/install/slurm -H "Authorization: Bearer ${CEDANA_AUTH_TOKEN}" | sudo -E bash
```

* Use `?version=x.y.z` query parameter to install a specific version.
* Use `?build=alpha&version=feat/my-branch` to install an alpha build from a branch.

### Configure

For configuration, follow instructions on [Cedana Daemon configuration](https://docs.cedana.ai/daemon/get-started/configuration). Use the environment variable corresponding to a configuration item to override it when calling the installer script.

```sh
export CEDANA_CHECKPOINT_DIR=s3://my-checkpoints-bucket
export CEDANA_CHECKPOINT_STREAMS=4
export CEDANA_CHECKPOINT_COMPRESSION=gzip

curl -fsSL ${CEDANA_URL}/install/slurm -H "Authorization: Bearer ${CEDANA_AUTH_TOKEN}" | sudo -E bash
```

## Using Cedana

You can also install SLURM support directly from Cedana.

### Install

First, install Cedana by following instructions on [Cedana Daemon installation](https://docs.cedana.ai/daemon/get-started/installation).

Then, install the `slurm` plugin:

```sh
sudo cedana plugin install slurm
sudo cedana slurm setup
```

### Installing the plugins directly

For deployments that require installing the plugin files manually, you can download the files directly.

First, download the Cedana binary.

```sh
curl -fsSL https://github.com/cedana/cedana/releases/latest/download/cedana-amd64.tar.gz | tar xzf - && chmod 755 cedana
```

To download the SLURM plugin binaries, use the Cedana binary. For configuration, follow instructions on [Cedana Daemon configuration](https://docs.cedana.ai/daemon/get-started/configuration). Use the environment variable corresponding to a configuration item to override it when calling the installer script.

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token

sudo ./cedana plugin remove slurm slurm/wlm
sudo ./cedana plugin install slurm slurm/wlm
```

For nightly SLURM plugin binaries, use the `alpha` builds.

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token
export CEDANA_PLUGINS_BUILDS=alpha

sudo ./cedana plugin remove slurm slurm/wlm
sudo ./cedana plugin install slurm@main slurm/wlm@main
```

The installer will download the `cedana-slurm` binary to `/usr/local/bin` and the SLURM plugin files to `/usr/local/lib`. To install the files, transfer the files to the required directories.

```sh
# install to the worker nodes (slurmd), controller nodes (slurmctld), and the database node (slurmdbd)
sudo install /usr/local/bin/cedana-slurm <binary-directory>/cedana-slurm
sudo install /usr/local/lib/cli_filter_cedana.so <slurm-plugin-directory>/cli_filter_cedana.so
sudo install /usr/local/lib/job_submit_cedana.so <slurm-plugin-directory>/job_submit_cedana.so
sudo install /usr/local/lib/task_cedana.so <slurm-plugin-directory>/task_cedana.so
sudo install /usr/local/lib/spank_cedana.so <slurm-plugin-directory>/spank_cedana.so
```

Update the `/etc/slurm/plugstack.conf` to include the `spank_cedana.so`

```diff
+required <slurm-plugin-directory>/spank_cedana.so
```

Update the `/etc/slurm/slurm.conf` to include the plugins.

```diff
-TaskPlugin=task/affinity,task/cgroup
+TaskPlugin=task/affinity,task/cgroup,task/cedana
+CliFilterPlugins=cli_filter/cedana
+JobSubmitPlugins=job_submit/cedana
```

Reload the `slurmctld` and `slurmd` with

```sh
sudo systemctl restart slurmctld
sudo systemctl restart slurmd
```

On the database node (slurmdbd), start `cedana-slurm`.

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token

sudo cedana-slurm daemon
```

### Configure

For configuration, follow instructions on [Cedana Daemon configuration](https://docs.cedana.ai/daemon/get-started/configuration).

On all nodes, initialize the `/etc/cedana/config.json` with

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token

cedana version --init-config
```

On the `slurmdbd` node, if you are using systemd, create the service file

```sh
export LOG_PATH=/var/log/cedana-slurm.log
export SERVICE_FILE=/etc/systemd/system/cedana-slurm.service
export APP_PATH=/usr/local/bin/cedana-slurm

cat <<EOF | tee "$SERVICE_FILE" >/dev/null
[Unit]
Description=Cedana Daemon
[Service]
ExecStart=$APP_PATH daemon start
User=root
Group=root
Restart=no

[Install]
WantedBy=multi-user.target

[Service]
StandardError=append:$LOG_PATH
StandardOutput=append:$LOG_PATH
EOF
```

#### Privileged Mode (root)

In privileged mode, the checkpointing and restoring are done as root. Privileged mode requires no additional configuration.

#### Unprivileged Mode (user)

In unprivileged mode, the checkpointing and restoring are done as the job's user, i.e., the UID of the SLURM job performs the checkpoint and restore. This configuration is useful when the root is demoted for security purposes. For example, a NFS with `root_squash` requires unprivileged mode.

To enable unprivileged mode, initialize the `/etc/cedana/config.json` with

```sh
export CEDANA_URL=https://myorg.cedana.ai/v1
export CEDANA_AUTH_TOKEN=your_auth_token
export CEDANA_SLURM_UNPRIVILEGED=1

cedana version --init-config
```

In addition, the `cedana-slurm`, `cedana`, and `criu` binaries must have the required capabilities for users to perform checkpoint and restore.

```sh
setcap CAP_SYS_PTRACE,CAP_DAC_READ_SEARCH,CAP_CHECKPOINT_RESTORE+eip /usr/local/bin/criu
setcap CAP_SYS_PTRACE,CAP_DAC_READ_SEARCH,CAP_CHECKPOINT_RESTORE+eip /usr/local/bin/cedana
setcap CAP_SYS_PTRACE,CAP_DAC_READ_SEARCH,CAP_CHECKPOINT_RESTORE+eip /usr/local/bin/cedana-slurm
```

## Building from source

Check `make help` for available build targets.

Build all binaries:

```sh
make all
```

By default, the binaries will be built using the `cedana/cedana-slurm:build` docker image.

These binaries are useless on their own. You need to install Cedana to use them.

First, install Cedana by following instructions on [Cedana Daemon installation](https://docs.cedana.ai/daemon/get-started/installation). Then, install the `slurm` plugin and set up SLURM support:

```sh
sudo cedana plugin install slurm
cd build && sudo cedana slurm setup
```

{% hint style="info" %}
You need to be in the `build` directory for the `cedana slurm setup` command to work, as it needs to find the binaries you just built.
{% endhint %}

You're all set up! Let's checkpoint some workloads. Continue to [Checkpoint/restore](/cedana-slurm/cr.md) to get started.

## Uninstall

To remove Cedana SLURM completely, run:

```sh
sudo cedana slurm destroy
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cedana.ai/cedana-slurm/installation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
