Quickstart
This guide provides a brief overview of the basic commands for checkpointing and restoring SLURM jobs using Cedana.
#!/bin/bash
#SBATCH --job-name=hello_world # Job name
#SBATCH --output=hello_world.out # Standard output log
#SBATCH --error=hello_world.err # Standard error log
#SBATCH --time=00:10:00 # Time limit (hh:mm:ss)
#SBATCH --nodes=1 # Run on 1 node
#SBATCH --ntasks=1 # Run 1 task
echo "Starting counter job on $(hostname)..."
# Loop from 0 to 600
for i in {0..600}
do
echo "Counter: $i"
sleep 1
done
echo "Job finished successfully."Checkpointing a job

Restoring a job

Last updated
Was this helpful?