Level 3 - Customization

Customizing the system to fit your needs!

By now, you should have been able to:

  • Checkpoint/Restore from our library of GPU sample workloads

  • Hotswap running workloads safely and effectively

  • Configured some automation to take heartbeat checkpoints in the background and perform automatic restores to recover from failure or to forcibly evict some pods on a node

To round it out, let's step through some of the customizations you can add to your Cedana deployment to suit your unique needs. You can see them all in Additional Configuration.

Storage

By default, our checkpoints are sent to our google cloud bucket; which is not optimal from a latency perspective. Additionally, if you'd like to deploy Cedana in a hermetic, self-hosted environment, this is the first place to decouple!

Fortunately, our system can stream checkpoints to wherever we can write to. You can configure your own S3 bucket for instance while doing a helm install by specifying:

  • checkpointDir: s3://<bucket>/<path>

  • awsAccessKeyId

  • awsSecretAccessKey

  • awsRegion

And reinstalling via helm!

Self-Hosting

As mentioned above, self-hosting is possible with Cedana, however we provide it on a case-by-case basis given differing access requirements from our customers! We are however SOC2 Type II compliant - reach out to [email protected] for the full report.

Last updated

Was this helpful?