Additional Configuration

Configuration

This document outlines the configurable parameters for the Cedana Helm chart, as defined in the values.yaml file. For up to date configuration, see https://github.com/cedana/cedana-helm-charts/blob/main/cedana-helm/values.yaml.

Global Settings

These settings control the overall behavior of the deployment.

Parameter
Description
Default

nameOverride

Overrides the name of the chart.

"cedana"

fullnameOverride

Overrides the full name of the release.

"cedana"

installKueue

If set to true, Kueue will be installed. Note: The Kueue CRDs must be applied before enabling this option. You can apply them with kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml.

false

kubernetesClusterDomain

The Kubernetes cluster domain.

cluster.local

Cedana Configuration (cedanaConfig)

This section contains the core configuration for the Cedana platform.

Parameter
Description
Default

cedanaAuthToken

Your authentication token for the Cedana platform.

""

cedanaUrl

The URL for the Cedana API.

""

cedanaClusterName

A unique name for your cluster within the Cedana platform.

""

cedanaSqsQueueUrl

The SQS queue URL for communication with Cedana.

""

checkpointStreams

The number of parallel streams for checkpoint/restore operations. 0 disables streaming.

4

checkpointCompression

The compression algorithm for checkpoints. Options: "none", "tar", "lz4", "gzip", "zlib".

"lz4"

gpuPoolSize

The number of GPU controllers to keep warm to improve startup/restore time for GPU workloads.

0

gpuFreezeType

The default freeze type for GPU workloads. Options: "IPC", "NCCL".

"IPC"

gpuShmSize

The shared memory size for GPU workloads. It is set to 8 GiB by default.

"8589934592"

gpuLdLibPath

Additional LD_LIBRARY_PATH to locate CUDA libraries.

"/run/nvidia/driver/usr/lib/x86_64-linux-gnu"

pluginsBuilds

Specifies the build type for plugins. Use "release" for stable versions or "alpha" for development branches.

"release"

pluginsNativeVersion

The version of the native plugin to use.

"latest"

pluginsCriuVersion

The version of the CRIU plugin to use.

"v4.1-cedana.01"

pluginsRuntimeShimVersion

The version of the runtime shim plugin to use.

"v0.6.1"

pluginsGpuVersion

The version of the GPU plugin to use.

"v0.5.5"

pluginsStreamerVersion

The version of the streamer plugin to use.

"v0.0.8"

profilingEnabled

If true, enables profiling.

true

metricsOtel

If true, enables OpenTelemetry metrics.

true

logLevel

The logging level.

"info"

preExistingSecret

If you want to use a custom pre-existing secret, uncomment this and provide the name.

cedana-secret-user (commented out)

Daemon Helper (daemonHelper)

Configuration for the daemon-helper DaemonSet.

Parameter
Description
Default

upgradeAndRestart

If true, the daemon helper will be upgraded and restarted.

false

service.annotations

Annotations to add to the daemon helper service.

{}

image.repository

The repository for the cedana-helper image.

cedana/cedana-helper

image.tag

The tag for the cedana-helper image.

v0.9.251

image.imagePullPolicy

The image pull policy.

IfNotPresent

updateStrategy.maxSurge

The maximum number of pods that can be created over the desired number of pods.

0

updateStrategy.maxUnavailable

The maximum number of pods that can be unavailable during the update process.

1

tolerations

Tolerations for the daemon helper pods.

[]

affinity

Affinity settings for the daemon helper pods.

{}

Service Account (serviceAccount)

Configuration for the Kubernetes Service Account.

Parameter
Description
Default

create

If true, a Service Account will be created.

true

automount

If true, a ServiceAccount's API credentials will be automatically mounted.

true

annotations

Annotations to add to the Service Account.

{}

name

The name of the Service Account. If not set and create is true, a name is generated.

"cedana-controller-manager"

Controller Manager (controllerManager)

Configuration for the cedana-controller-manager.

Parameter
Description
Default

autoscaling.enabled

If true, enables autoscaling for the controller manager.

false

autoscaling.replicaCount

The number of replicas for the controller manager.

1

autoscaling.deploymentRevisionHistoryLimit

The number of old ReplicaSets to retain.

10

service.annotations

Annotations for the controller manager service.

{}

service.ports

The ports for the controller manager service.

TCP/1324

manager.podAnnotations

Annotations for the controller manager pods.

{}

manager.args

Arguments for the controller manager container.

[--health-probe-bind-address=:8081, --metrics-bind-address=127.0.0.1:8080, --leader-elect]

manager.containerSecurityContext

The security context for the manager container, which is configured to be non-privileged.

allowPrivilegeEscalation: false, capabilities: { drop: [ALL] }

manager.image.repository

The repository for the cedana-controller image.

cedana/cedana-controller

manager.image.tag

The tag for the cedana-controller image.

v0.4.5

manager.image.imagePullPolicy

The image pull policy.

IfNotPresent

manager.resources

Resource limits and requests for the manager container. Default is minimal. Uncomment to set custom values.

{}

rbac.resources

Resource limits and requests for the RBAC proxy container. Default is minimal. Uncomment to set custom values.

{}

tolerations

Tolerations for the controller manager pods.

[]

affinity

Affinity settings for the controller manager pods.

{}

Shared Memory Configuration (shmConfig)

Optional configuration to increase /dev/shm size on nodes, which is useful for workloads requiring large shared memory.

Parameter
Description
Default

enabled

Set to true to enable the increase of /dev/shm size.

false

size

The desired size for /dev/shm (e.g., "10G", "20G").

"10G"

minBytes

The minimum size in bytes that will trigger a remount of /dev/shm.

"10737418240"

Metrics Service (metricsService)

Configuration for the metrics service.

Parameter
Description
Default

ports

The ports for the metrics service.

HTTPS/8443

type

The type of the metrics service.

ClusterIP

Last updated

Was this helpful?