Managing Kubernetes Jobs
The right question is: what schedulers don't we support?
apiVersion: batch/v1
kind: Job
metadata:
name: cuda-vector-add-job
namespace: default
spec:
backoffLimit: 1
completions: 1
parallelism: 1
completionMode: NonIndexed
manualSelector: false
podReplacementPolicy: TerminatingOrFailed
template:
metadata:
labels:
job-name: cuda-vector-add-job
spec:
restartPolicy: Never
runtimeClassName: cedana
priorityClassName: indiv-priority
volumes:
- name: repo-data
emptyDir: {}
initContainers:
- name: clone-repo
image: alpine/git:latest
command:
- sh
- -c
- |
echo "Init ran at $(date)" >> /workspace/init-log.txt
git clone https://github.com/mirror/busybox.git /workspace || echo "Clone failed"
volumeMounts:
- name: repo-data
mountPath: /workspace
containers:
- name: cuda-vector-add
image: cedana/cedana-samples:cuda
command:
- /bin/sh
- -c
- |
gpu_smr/vector_add
env:
- name: CEDANA_CHECKPOINT
value: job-preemption-test-3
resources:
limits:
nvidia.com/gpu: 1 # request one GPU
requests:
nvidia.com/gpu: 1
volumeMounts:
- name: repo-data
mountPath: /workspace


Last updated
Was this helpful?