The module lifecycle stage: General Availability
The module has requirements for installation
Below are NodeGroup snippets for typical GPU modes. GPU-node management requires the NVIDIA driver and NVIDIA Container Toolkit installed on the nodes (manually or via NodeGroupConfiguration). When spec.gpu is present, this module configures containerd with default_runtime = "nvidia" (via NodeGroupConfiguration) and deploys NFD/GFD/Device Plugin/MIG/DCGM on those nodes.
GPU nodes are usually tainted (e.g.
node-role=gpu:NoSchedule) so regular workloads don’t land there. Workloads needing a GPU add matchingtolerationsandnodeSelector.
Exclusive (one Pod — one GPU)
Each Pod gets an entire physical GPU; the cluster exposes nvidia.com/gpu.
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
name: gpu-exclusive
spec:
nodeType: Static # or CloudStatic/CloudEphemeral as needed
gpu:
sharing: Exclusive
nodeTemplate:
labels:
node-role/gpu: ""
taints:
- key: node-role
value: gpu
effect: NoScheduleTimeSlicing (4 partitions)
Up to four Pods time-share one GPU. Pods still request nvidia.com/gpu.
spec:
gpu:
sharing: TimeSlicing
timeSlicing:
partitionCount: 4MIG (profile all-1g.5gb)
Hardware-partition a MIG-capable GPU; resources like nvidia.com/mig-1g.5gb appear.
spec:
gpu:
sharing: MIG
mig:
partedConfig: all-1g.5gbFor per-index custom MIG partitioning use partedConfig: custom with customConfigs (see FAQ).
Smoke-test Job (CUDA vectoradd)
apiVersion: batch/v1
kind: Job
metadata:
name: cuda-vectoradd
spec:
template:
spec:
restartPolicy: OnFailure
nodeSelector:
node-role/gpu: ""
tolerations:
- key: node-role
value: gpu
effect: NoSchedule
containers:
- name: cuda-vectoradd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
resources:
limits:
nvidia.com/gpu: 1If the Pod finishes with Succeeded, the GPU is present and configured correctly.