API

We provide endpoints with the Kubernetes controller, so once it is setup you can use the service to perform checkpoint and restore.

Checkpoint/Restore - REST Service

The Cedana REST Service provides a REST API for checkpointing and restoring containers in your Kubernetes cluster. The API runs concurrently with the Cedana Controller. Below are curl commands illustrating the schema of the API. All curls are using the in-cluster IP of the cedanacontroller pod. In order to do out-of-cluster checkpoint and restore, you can expose the pod and create an external IP address with Kubernetes services:

export CEDANA_CONTROLLER=$(kubectl get pods -n $CEDANA_NAMESPACE | grep manager | awk '{print $1}')

# it's cedana if you install with: helm install "cedana" $CHART_PATH
export HELM_INSTALL_NAME="cedana"

kubectl port-forward "$HELM_INSTALL_NAME-cedana-helm-manager" -n $CEDANA_NAMESPACE 1324:1324

Known Limitations and Pitfalls

Currently we don't support io_uring API checkpoint restore, consider checkpointing before creating and setting up urings.
We don't automatically detect and change behaviour of our checkpointing services, for example, for CRIO and Rootfs use the separately provided api endpoints.
Restore requires the pod we restore to not be active. This generally means you should put the restore pod to sleep using custom command and arg, while true; do sleep infinity; done;.
We can't work on AMIs that are read-only. For example, the GKE optimized images won't work.

PreviousLLaMA Inference GPU Save, Migrate & Resume (SMR)NextAPI reference

Last updated 3 months ago

Was this helpful?

API