Skip to main content

Deploying Cedana on your cluster

Using helm chart

We provide a helm chart, to easily install our services on your cluster.

warning

Currently the host images should be debian:bookworm or ubuntu:22.04 and above.

Installation

Make sure you have the helm tool installed.

# install cedanacontroller-system
helm install cedana oci://registry-1.docker.io/cedana/cedana-helm \
--create-namespace -n cedanacontroller-system

Using Cedana API

We provide endpoints for attaching and detaching our cluster services to kubernetes cluster.

warning

When running Cedana against an EKS cluster, cgroupsv2 must be used by your container runtime and be installed on your AMI. The minimum Linux kernel version is 6.1.x. On AWS, Amazon Linux 2023 will meet this requirement.

Authentication

Create an account at https://auth.cedana.com/ui/registration. Using your email and password, you can grab a session token using the following steps:

export LOGIN_URL=$(curl -s -X GET -H "Accept: application/json" 'https://auth.cedana.com/self-service/login/api' | jq -r '.ui.action')

Use this action flow URL to authenticate and grab a token:

export AUTH_TOKEN=$(curl -s -X POST -H "Accept: application/json" -H "Content-Type: application/json" -d '{"identifier": "your-email", "password": "your-password", "method": "password"}' "$LOGIN_URL" | jq -r '.session_token')

This token is valid for 720 hours, and can be used to authenticate all requests to our services.

Bootstrapping

For a quick setup, set the following environment variables:

export CLUSTER_NAME=<cluster-name> \
export SA_NAME=<service-account-name> \
export SA_TOKEN_NAME=<service-account-token-name>

To get started, create a Service Account for Cedana to deploy the Cedana Binary onto your instances.

kubectl -n kube-system create serviceaccount $SA_NAME

Now, create a cluster role binding to make the service account cluster-admin by applying the following:

cluster-role-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: service-account-admin
subjects:
- kind: ServiceAccount
name: $SA_NAME
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
envsubst < cluster-role-binding.yaml | kubectl apply -f -

To obtain an authentication token, start by applying the following secret:

sa-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: $SA_TOKEN_NAME
namespace: kube-system
annotations:
kubernetes.io/service-account.name: $SA_NAME
type: kubernetes.io/service-account-token
envsubst < sa-secret.yaml | kubectl apply -f -

We can now obtain the data needed to hit Cedana's attach to Kubernetes endpoint. Get the service account token:

export SA_TOKEN=$(kubectl get secret $SA_TOKEN_NAME -n kube-system -o jsonpath='{.data.token}' | base64 --decode)

Get the certificate of authority:

export SA_CERT=$(kubectl get secret $SA_TOKEN_NAME -n kube-system -o jsonpath='{.data.ca\.crt}')

Finally, get your cluster endpoint URL:

export CLUSTER_URL=$(aws eks describe-cluster --name $CLUSTER_NAME | jq -r ".cluster.endpoint")

We can now hit the attach to Kubernetes endpoint and deploy Cedana to your Kubernetes cluster:

curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" -d '{
"server": "'$CLUSTER_URL'",
"token": "'$SA_TOKEN'",
"cert": "'$SA_CERT'",
"versions": { "controller_version": "latest", "binary_version": "latest"}
}' https://sandbox.cedana.ai/kubernetes/attach

Once Cedana is attached, a CustomResourceDefinition called Cedana and a Kubernetes Operator called Cedana_Controller are deployed to your Kubernetes cluster. You can now create an instance of the Cedana resource and conduct checkpoint/restore of containers in your cluster. The Cedana_Controller pod also contains a REST service with the following endpoints:

Using the service

Now that the controller is setup you can use the service to perform checkpoint and restore.

Known Limitations

  • Currently we don't support io_uring API checkpoint restore, consider checkpointing before creating and setting up urings.
  • Currently we don't automatically detect and change behaviour of our checkpointing services, for example, for CRIO and Rootfs use the separately provided api endpoints.
  • Restore requires the pod we restore to not be active. This generally means you should put the restore pod to sleep using custom command and arg, while true; do sleep infinity; done;.

Checkpoint/Restore - REST Service

The Cedana REST Service provides a REST API for checkpointing and restoring containers in your Kubernetes cluster. The API runs concurrently with the Cedana Controller. Below are curl commands illustrating the schema of the API. All curls are using the in-cluster IP of the cedanacontroller pod. In order to do out-of-cluster checkpoint and restore, you can expose the pod and create an external IP address with Kubernetes services:

export CEDANA_CONTROLLER=$(kubectl get pods -n cedanacontroller-system | grep cedanacontroller-controller-manager | awk '{print $1}')
kubectl port-forward $CEDANA_CONTROLLER -n cedanacontroller-system 1324:1324

where CEDANA_CONTROLLER is the name of the cedana controller pod.

List Containers in Namespace

Choose root depending on container runtime:

  • K8s
export ROOT=/run/containerd/runc/k8s.io
  • K3s
export ROOT=/host/run/containerd/runc/k8s.io
  • default
export ROOT=/run/runc

Set controller cluster URL (ex. localhost) and namespace (ex. default):

export CONTROLLER_URL=<controller_cluster_url> \
export NAMESPACE=<namespace>

List containers in a specific namespace by querying Kubernetes pods with specific labels:

curl -X GET -H 'Content-Type: application/json' -d '{
"root": "'$ROOT'"
}' http://$CONTROLLER_URL:1324/list/$NAMESPACE | jq

Response

  • Returns JSON array containing a list of containers in the specified namespace.

Save container name and sandbox name for checkpointing and restoring:

export CHECKPOINT_CONTAINER=<checkpoint-container-name> \
export CHECKPOINT_SANDBOX=<checkpoint-sandbox-name> \
export RESTORE_CONTAINER=<restore-container-name> \
export RESTORE_SANDBOX=<restore-sandbox-name>

Checkpoint

Initiate a checkpoint for a container:

export CHECKPOINT_PATH=/tmp/ckpt-$(date +%s%N)
curl -X POST -H "Content-Type: application/json" -d '{
"checkpoint_data": {
"container_name": "'$CHECKPOINT_CONTAINER'",
"sandbox_name": "'$CHECKPOINT_SANDBOX'",
"namespace": "'$NAMESPACE'",
"checkpoint_path": "'$CHECKPOINT_PATH'",
"root": "'$ROOT'"
},
"leave_running": true
}' http://$CONTROLLER_URL:1324/checkpoint

Arguments:

  • container_name: Name of the container to checkpoint.
  • sandbox_name: Name of the sandbox to checkpoint.
  • namespace: Namespace in which the container resides.
  • checkpoint_path: Optional directory to dump checkpoint into (empty for remote).
  • root: Container runtime being used.
  • leave_running: Optional flag to determine whether container should be terminated on checkpoint (default false).

Response:

  • checkpoint_id: A uuid that is associated with the checkpoint, used for remote restore.

Restore

Restore a container from a checkpoint:

curl -X POST -H "Content-Type: application/json" -d '{
"checkpoint_data": {
"container_name": "'$RESTORE_CONTAINER'",
"sandbox_name": "'$RESTORE_SANDBOX'",
"namespace": "'$NAMESPACE'",
"checkpoint_path": "'$CHECKPOINT_PATH'",
"root": "'$ROOT'"
}
}' http://$CONTROLLER_URL:1324/restore

Argument:

  • container_name: Name of the container to restore into.
  • sandbox_name: Name of the sandbox to restore into.
  • namespace: Namespace in which the container resides.
  • checkpoint_path: Optional directory to restore checkpoint from (empty for remote).
  • checkpoint_id: Optional identifier to restore checkpoint from (empty for local).
  • root: Container runtime being used.

Response:

  • Status Code: 200 OK

CRI-O Rootfs Checkpoint

The rootfs checkpoint endpoint is specific to snapshotting the read and write layers of a container in Kubernetes. We do an image diff given an original image ref to compare to and then a new image ref to commit the changes to. This checkpoint specifically supports the CRI-O runtime manager, soon this will be deprecated and a combined endpoint that is agnostic of your container runtime manager will be available. The image produced gets pushed to the registry prefixed in the image ref, as long as the credentials live on the node. The image can then be used in a new pod spec.

Here is an example checkpoint hitting our REST service on the cedanacontroller-manager :1324

curl -X POST -H "Content-Type: application/json" -d '{
"container_name": "'$CONTAINER_NAME'",
"sandbox_name": "'$SANDBOX_NAME'",
"namespace": "'$NAMESPACE'",
"image_ref": "'$IMAGE_REF'",
"new_image_ref": "'$NEW_IMAGE_REF'"
}' http://$CONTROLLER_URL:1324/checkpoint/rootfs/crio

Arguments:

  • container_name: Name of the container to checkpoint.
  • sandbox_name: Name of the sandbox to checkpoint.
  • namespace: Namespace in which the container resides.
  • address: The containerd sock address located on your worker nodes.
  • image_ref: The reference ID of your image (ex. cedana/checkpoint:latest).

Response:

  • Status Code: 200 OK
  • Returns the same image reference used in the CRI-O content store.

Rootfs Checkpoint

The rootfs checkpoint endpoint is specific to snapshotting the read and write layers of a container in Kubernetes. We are able to do an image diff by way of the containerd image service. If you are using either containerd or CRI-O as your higher level runtime, both should be able to support our feature as both use the containerd content store.

Here is an example checkpoint hitting our REST service on the cedanacontroller-manager :1324

curl -X POST -H "Content-Type: application/json" -d '{
"container_name": "'$CONTAINER_NAME'",
"sandbox_name": "'$SANDBOX_NAME'",
"namespace": "'$NAMESPACE'",
"address": "'$ADDRESS'",
"image_ref": "'$IMAGE_REF'"
}' http://$CONTROLLER_URL:1324/checkpoint/rootfs

Arguments:

  • container_name: Name of the container to checkpoint.
  • sandbox_name: Name of the sandbox to checkpoint.
  • namespace: Namespace in which the container resides.
  • address: The containerd sock address located on your worker nodes.
  • image_ref: The reference ID of your image (ex. cedana/checkpoint:latest).

Response:

  • Status Code: 200 OK
  • Returns the same image reference used in containerd content store.

Rootfs Restore

Restore a container from a checkpointed rootfs image:

curl -X POST -H "Content-Type: application/json" -d '{
"sandbox_name": "'$SANDBOX_NAME'",
"container_name": "'$CONTAINER_NAME'",
"namespace": "'$NAMESPACE'",
"address": "'$ADDRESS'",
"image_ref": "'$IMAGE_REF'"
}' http://$CONTROLLER_URL:1324/restore/rootfs

Arguments:

  • container_name: Name of the container to restore into.
  • sandbox_name: Name of the sandbox to restore into.
  • namespace: Namespace in which the container resides.
  • address: The containerd sock address located on your worker nodes.
  • image_ref: The reference ID of your image (ex. cedana/checkpoint:latest).

Response:

  • Status Code: 200 OK

Detach

To remove Cedana from your cluster, hit the DELETE endpoint:

curl -X DELETE -H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" -d '{
"server": "'$CLUSTER_URL'",
"token": "'$SA_TOKEN'",
"cert": "'$SA_CERT'"
}' https://sandbox.cedana.ai/kubernetes/destroy