Managing Storage
While we can move the state of your process (including both CPU and GPU state), there should be some careful consideration with open files.
Given this, we have three ways with which we currently deal with files that are being written to (as a restored or migrated process expects the file to have the exact same size so it can pick up where it left off). There are 3 scenarios we recommend for 3 different filesystem-writing regimes:
files <~ 100MB: Tmpfs snapshotting
100MB <~ files ~> 1GB: Rootfs snapshotting
files >> 1GB: Volume snapshotting
Each scenario is described below.
Tmpfs Snapshotting
The first and recommended method is as simple as telling Cedana you'd like to mount a folder as a cedana-managed folder. In this case, we just take the files with us on checkpoint, so there's no extra consideration required.
For this feature, simply add CEDANA_PERSISTENT_VOLUMES as an env to your container, along with comma-separated values of folders you'd like "mounted" inside the container. 
Here's an example:
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: default
spec:
  restartPolicy: Never
  runtimeClassName: cedana
  containers:
    - name: runner
      image: some-image
      imagePullPolicy: Always
      command: ["sh", "-c", "while true; do date >> /tmp/out/current_date.txt; sleep 1; done"]
      env:
        - name: CEDANA_PERSISTENT_MOUNTS
          value: /tmp/out
Note that this requires runtimeClassName to be set to cedana. 
This is very useful for:
Small files
JIT-compiled files (like Triton does for example)
Intermediate/scratch files created during execution
Rootfs Snapshotting 
Another method is taking the entire root filesystem of the container with us on checkpoint. However as the diff can be uncertain between checkpoint intervals, we push it to your image registry, which will require special consideration.
Secrets setup
Add your secret to our cloud using your account:
curl -X POST <CEDANA-URL>/v2/secrets \
    -H 'Authorization: Bearer <CEDANA-API-KEY>' \
    -H 'Content-Type: application/json' \
    -d '{ "image_source" : "docker.io/cedana/cedana-checkpoints",  "image_secret": "user:<access-token>" }'
You can also add it via the UI.
As it's still a fairly new feature, we may wipe it whenever we perform an update to our services, to avoid/minimize exposure of any of our customer secrets.
This maybe annoying so let us know at support and we would directly get these permanently updated for your deployment.
Performing a rootfs snapshot
You can choose to perform a rootfs-only snapshot (where the process state is not saved) or a runc+rootfs snapshot, which includes container runtime state. 
Start using checkpoints through rootfs/filesystem checkpoints and restores, however you like - either through the UI or via our API!
E2E UI Example
Volume Snapshotting
The final method requires coordination with your CSI driver. We take advantage of the snapshotting primitives already present (https://kubernetes.io/docs/concepts/storage/volume-snapshots/), and take a reference to these with us; so when we restore, we restore from a Kubernetes Volume Snapshot.
Last updated
Was this helpful?