Additional Configuration
Configuration
This document outlines the configurable parameters for the Cedana Helm chart, as defined in the values.yaml
file. For up to date configuration, see https://github.com/cedana/cedana-helm-charts/blob/main/cedana-helm/values.yaml.
Global Settings
These settings control the overall behavior of the deployment.
nameOverride
Overrides the name of the chart.
"cedana"
fullnameOverride
Overrides the full name of the release.
"cedana"
installKueue
If set to true
, Kueue will be installed. Note: The Kueue CRDs must be applied before enabling this option. You can apply them with kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml
.
false
kubernetesClusterDomain
The Kubernetes cluster domain.
cluster.local
Cedana Configuration (cedanaConfig
)
cedanaConfig
)This section contains the core configuration for the Cedana platform.
cedanaAuthToken
Your authentication token for the Cedana platform.
""
cedanaUrl
The URL for the Cedana API.
""
cedanaClusterName
A unique name for your cluster within the Cedana platform.
""
cedanaSqsQueueUrl
The SQS queue URL for communication with Cedana.
""
checkpointStreams
The number of parallel streams for checkpoint/restore operations. 0
disables streaming.
4
checkpointCompression
The compression algorithm for checkpoints. Options: "none"
, "tar"
, "lz4"
, "gzip"
, "zlib"
.
"lz4"
gpuPoolSize
The number of GPU controllers to keep warm to improve startup/restore time for GPU workloads.
0
gpuFreezeType
The default freeze type for GPU workloads. Options: "IPC"
, "NCCL"
.
"IPC"
gpuShmSize
The shared memory size for GPU workloads. It is set to 8 GiB by default.
"8589934592"
gpuLdLibPath
Additional LD_LIBRARY_PATH
to locate CUDA libraries.
"/run/nvidia/driver/usr/lib/x86_64-linux-gnu"
pluginsBuilds
Specifies the build type for plugins. Use "release"
for stable versions or "alpha"
for development branches.
"release"
pluginsNativeVersion
The version of the native plugin to use.
"latest"
pluginsCriuVersion
The version of the CRIU plugin to use.
"v4.1-cedana.01"
pluginsRuntimeShimVersion
The version of the runtime shim plugin to use.
"v0.6.1"
pluginsGpuVersion
The version of the GPU plugin to use.
"v0.5.5"
pluginsStreamerVersion
The version of the streamer plugin to use.
"v0.0.8"
profilingEnabled
If true
, enables profiling.
true
metricsOtel
If true
, enables OpenTelemetry metrics.
true
logLevel
The logging level.
"info"
preExistingSecret
If you want to use a custom pre-existing secret, uncomment this and provide the name.
cedana-secret-user
(commented out)
Daemon Helper (daemonHelper
)
daemonHelper
)Configuration for the daemon-helper
DaemonSet.
upgradeAndRestart
If true
, the daemon helper will be upgraded and restarted.
false
service.annotations
Annotations to add to the daemon helper service.
{}
image.repository
The repository for the cedana-helper
image.
cedana/cedana-helper
image.tag
The tag for the cedana-helper
image.
v0.9.251
image.imagePullPolicy
The image pull policy.
IfNotPresent
updateStrategy.maxSurge
The maximum number of pods that can be created over the desired number of pods.
0
updateStrategy.maxUnavailable
The maximum number of pods that can be unavailable during the update process.
1
tolerations
Tolerations for the daemon helper pods.
[]
affinity
Affinity settings for the daemon helper pods.
{}
Service Account (serviceAccount
)
serviceAccount
)Configuration for the Kubernetes Service Account.
create
If true
, a Service Account will be created.
true
automount
If true
, a ServiceAccount's API credentials will be automatically mounted.
true
annotations
Annotations to add to the Service Account.
{}
name
The name of the Service Account. If not set and create
is true, a name is generated.
"cedana-controller-manager"
Controller Manager (controllerManager
)
controllerManager
)Configuration for the cedana-controller-manager
.
autoscaling.enabled
If true
, enables autoscaling for the controller manager.
false
autoscaling.replicaCount
The number of replicas for the controller manager.
1
autoscaling.deploymentRevisionHistoryLimit
The number of old ReplicaSets to retain.
10
service.annotations
Annotations for the controller manager service.
{}
service.ports
The ports for the controller manager service.
TCP/1324
manager.podAnnotations
Annotations for the controller manager pods.
{}
manager.args
Arguments for the controller manager container.
[--health-probe-bind-address=:8081, --metrics-bind-address=127.0.0.1:8080, --leader-elect]
manager.containerSecurityContext
The security context for the manager container, which is configured to be non-privileged.
allowPrivilegeEscalation: false
, capabilities: { drop: [ALL] }
manager.image.repository
The repository for the cedana-controller
image.
cedana/cedana-controller
manager.image.tag
The tag for the cedana-controller
image.
v0.4.5
manager.image.imagePullPolicy
The image pull policy.
IfNotPresent
manager.resources
Resource limits and requests for the manager container. Default is minimal. Uncomment to set custom values.
{}
rbac.resources
Resource limits and requests for the RBAC proxy container. Default is minimal. Uncomment to set custom values.
{}
tolerations
Tolerations for the controller manager pods.
[]
affinity
Affinity settings for the controller manager pods.
{}
Shared Memory Configuration (shmConfig
)
shmConfig
)Optional configuration to increase /dev/shm
size on nodes, which is useful for workloads requiring large shared memory.
enabled
Set to true
to enable the increase of /dev/shm
size.
false
size
The desired size for /dev/shm
(e.g., "10G", "20G").
"10G"
minBytes
The minimum size in bytes that will trigger a remount of /dev/shm
.
"10737418240"
Metrics Service (metricsService
)
metricsService
)Configuration for the metrics service.
ports
The ports for the metrics service.
HTTPS/8443
type
The type of the metrics service.
ClusterIP
Last updated
Was this helpful?