# Manual Checkpoint/Restore

For CPU workloads, no additional configuration is required. With Cedana running on your cluster, you can start by deploying this sample stateful reinforcement learning job (running [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3)).

## Deploy

```yaml
# test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: cedana-sample-ppo-sb3
  labels:
    app: cedana-sample-ppo-sb3
spec:
  restartPolicy: Never
  containers:
    - name: cedana-sample-container
      image: "cedana/cedana-samples:latest"
      command: ["python3", "/app/cpu_smr/rl/ppo_sb3.py"]
      resources:
        requests:
          cpu: "1"
        limits:
          cpu: "1"
```

{% hint style="info" %}
Note that for any sort of automation (resume on failure), it might make more sense to use [Kubernetes Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/). See [checkpoint/restoring Jobs](/cedana-kubernetes/jobs.md) for more information.
{% endhint %}

Deploy this pod to your cluster using:

```bash
kubectl apply -f test-pod.yaml
```

## Checkpoint

You can either create a heartbeat policy to automatically checkpoint at regular intervals, or you can manually checkpoint this pod on the [Pods Page](https://ui.cedana.com/monitoring/pods).

## Restore

You can manually restore the workload on the [Checkpoints Page](https://ui.cedana.com/checkpoints).

{% hint style="info" %}
Automated restores are currently best performed *within* the context of the Kubernetes lifecycle, where we integrate cleanly. For example, if you're using a kind (e.g. [Kubernetes Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)) that automatically reschedules the pod on node failure or eviction, it will restore from the latest checkpoint instead of starting from scratch. Check [checkpoint/restoring Jobs](/cedana-kubernetes/jobs.md) for more information.
{% endhint %}

## Example

Below you can find an example of this workflow:

{% embed url="<https://youtu.be/J_yEUqt66Rw>" %}

If you've made it this far, congratulations! You've successfully used Cedana to move a stateful workload between nodes and have it pick up work where it left off.

Take a look at the left sidebar to see more examples, such as [GPU Save/Migrate/Resume on Kubernetes](https://github.com/cedana/cedana-gitbook/blob/main/examples/gpu-smr-on-kubernetes.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cedana.ai/cedana-kubernetes/cr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
