Skip to main content

Examples

Below you'll find examples of Cedana in use. The intention is to represent the breadth of applications and systems that can be enabled by using it.

Deploying a Jupyter Notebook

You want to run a jupyter notebook in a cloud environment, taking advantage of GPU resources.

A sample job.yml file for this would look like:

    instance_specs:
vram_gb: 8
gpu: "NVIDIA"

work_dir: "work_dir"

setup:
run:
- "sudo apt-get update && sudo apt-get install -y python3-pip python3-dev"
- "pip install jupyter"

task:
run:
- "jupyter notebook --port 8080"

Calling cedana-cli run job.yml spins up the optimal instance for you (see Optimizer FAQ for more details).

Say you plan on walking away from the jupyter notebook for the night, and don't want to spend $20x8h (~cost of an A100 on AWS) for wasted compute time. To checkpoint your process to exactly as it is, you can run:

    cedana-cli commune checkpoint -j JOBID 

This sits on our managed blob storage, which you can use to restore onto a fresh instance and continue working using:

    cedana-cli restore job -j JOBID 

which pulls the latest checkpoint and restores it onto a new instance.

Running llama.cpp inference

Running a llama.cpp inference server using Cedana is even easier. To quickly get started:

    git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Download weights and store them in llama.cpp/models/Check out this link for guides on how to get the LLaMA weights.

Assuming you've prepared the data as instructed by the llama.ccp README, you can either push the entire folder during the instance setup process (not recommended, would take a long time to move 20+GB over the wire) or trim the models folder to just hold the quantized model.

Your models folder should look like this:

    |-- models
| |-- 7B
| | |-- ggml-model-q4_0.bin
| | `-- params.json
| |-- ggml-vocab.bin
| |-- tokenizer.model
| `-- tokenizer_checklist.chk

With this, spinning up inference is super simple on Cedana:

    instance_specs:
max_price_usd_hour: 1.0
memory_gb: 16

work_dir: "llama.cpp" # assuming models dir is populated w/ only a quantized ggml model

setup:
run:
- "cd llama.cpp && make -j" # might have to make again if a different arch
task:
run:
- "cd llama.cpp && ./server -m models/7B/ggml-model-q4_0.bin -c 2048" # if we've sent a quantized model already over ssh, can just start the server

Once spun up (using cedana-cli run llama_7b.yml) you have an inference server running in the cloud - on a spot instance that's managed for you.