Examples | gpu | Deckhouse

The module lifecycle stage: General Availability
The module has requirements for installation

Below are NodeGroup snippets for typical GPU modes. GPU-node management requires the NVIDIA driver and NVIDIA Container Toolkit installed on the nodes (manually or via NodeGroupConfiguration). When spec.gpu is present, this module configures containerd with default_runtime = "nvidia" (via NodeGroupConfiguration) and deploys NFD/GFD/Device Plugin/MIG/DCGM on those nodes.

GPU nodes are usually tainted (for example node-role=gpu:NoSchedule) so regular workloads do not land there. Workloads needing a GPU add matching tolerations and nodeSelector.

Exclusive (one Pod — one GPU)

Each Pod gets an entire physical GPU; the cluster exposes nvidia.com/gpu.

apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
  name: gpu-exclusive
spec:
  nodeType: Static # or CloudStatic/CloudEphemeral as needed
  gpu:
    sharing: Exclusive
  nodeTemplate:
    labels:
      node-role/gpu: ""
    taints:
    - key: node-role
      value: gpu
      effect: NoSchedule

TimeSlicing (4 partitions)

Up to four Pods time-share one GPU. Pods still request nvidia.com/gpu.

spec:
  gpu:
    sharing: TimeSlicing
    timeSlicing:
      partitionCount: 4

MIG (profile `all-1g.5gb`)

Hardware-partition a MIG-capable GPU; resources like nvidia.com/mig-1g.5gb appear.

spec:
  gpu:
    sharing: MIG
    mig:
      partedConfig: all-1g.5gb

For per-index custom MIG partitioning use partedConfig: custom with customConfigs (see FAQ).

Enable DRA mode

Switch the entire GPU management stack to Dynamic Resource Allocation (requires Kubernetes ≥ 1.34):

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: gpu
spec:
  enabled: true
  version: 1
  settings:
    dra:
      enabled: true

After applying, the module deploys gpu-controller, gpu-node-agent, and nvidia-adapter into d8-nvidia-gpu. The Device Plugin stack in d8-nvidia-gpu is removed automatically.

Smoke-test Job (CUDA vectoradd)

apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-vectoradd
spec:
  template:
    spec:
      restartPolicy: OnFailure
      nodeSelector:
        node-role/gpu: ""
      tolerations:
      - key: node-role
        value: gpu
        effect: NoSchedule
      containers:
      - name: cuda-vectoradd
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
        resources:
          limits:
            nvidia.com/gpu: 1

If the Pod finishes with Succeeded, the GPU is present and configured correctly.

GPU module: examples

Exclusive (one Pod — one GPU)

TimeSlicing (4 partitions)

MIG (profile `all-1g.5gb`)

Enable DRA mode

Smoke-test Job (CUDA vectoradd)

An error has occurred

Tell us what you didn’t like.

GPU module: examples

Exclusive (one Pod — one GPU)

TimeSlicing (4 partitions)

MIG (profile all-1g.5gb)

Enable DRA mode

Smoke-test Job (CUDA vectoradd)

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error

MIG (profile `all-1g.5gb`)