The module lifecycle stage: General Availability
The module has requirements for installation
Below are NodeGroup snippets for typical GPU modes. GPU-node management requires the NVIDIA driver and NVIDIA Container Toolkit installed on the nodes (manually or via NodeGroupConfiguration). When spec.gpu is present, this module configures containerd with default_runtime = "nvidia" (via NodeGroupConfiguration) and deploys NFD/GFD/Device Plugin/MIG/DCGM on those nodes.
GPU nodes are usually tainted (for example node-role=gpu:NoSchedule) so regular workloads do not land there.
Workloads needing a GPU add matching tolerations and nodeSelector.
Exclusive (one Pod — one GPU)
Each Pod gets an entire physical GPU; the cluster exposes nvidia.com/gpu.
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
name: gpu-exclusive
spec:
nodeType: Static # or CloudStatic/CloudEphemeral as needed
gpu:
sharing: Exclusive
nodeTemplate:
labels:
node-role/gpu: ""
taints:
- key: node-role
value: gpu
effect: NoScheduleTimeSlicing (4 partitions)
Up to four Pods time-share one GPU. Pods still request nvidia.com/gpu.
spec:
gpu:
sharing: TimeSlicing
timeSlicing:
partitionCount: 4MIG (profile all-1g.5gb)
Hardware-partition a MIG-capable GPU; resources like nvidia.com/mig-1g.5gb appear.
spec:
gpu:
sharing: MIG
mig:
partedConfig: all-1g.5gbFor per-index custom MIG partitioning use partedConfig: custom with customConfigs (see FAQ).
Enable DRA mode
Switch the entire GPU management stack to Dynamic Resource Allocation (requires Kubernetes ≥ 1.34):
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: gpu
spec:
enabled: true
version: 1
settings:
dra:
enabled: trueAfter applying, the module deploys gpu-controller, gpu-node-agent, and nvidia-adapter into d8-nvidia-gpu. The Device Plugin stack in d8-nvidia-gpu is removed automatically.
Smoke-test Job (CUDA vectoradd)
apiVersion: batch/v1
kind: Job
metadata:
name: cuda-vectoradd
spec:
template:
spec:
restartPolicy: OnFailure
nodeSelector:
node-role/gpu: ""
tolerations:
- key: node-role
value: gpu
effect: NoSchedule
containers:
- name: cuda-vectoradd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
resources:
limits:
nvidia.com/gpu: 1If the Pod finishes with Succeeded, the GPU is present and configured correctly.