Available with limitations in: CSE Lite (1.73), CSE Pro (1.73)

Available without limitations in: EE

The module lifecycle stage: General Availability
The module has requirements for installation

GPU module

The module brings the NVIDIA stack to Deckhouse Kubernetes Platform for GPU workloads: NFD/GFD, device plugin (Exclusive/TimeSlicing/MIG), MIG manager, and DCGM/Exporter with Grafana dashboards.

Prerequisites

NVIDIA driver and NVIDIA Container Toolkit are installed on target nodes (containerd/runtime is configured by this module via NodeGroupConfiguration).
spec.gpu is set in the target NodeGroup (sharing: Exclusive, TimeSlicing, or MIG).

What the module deploys

Node Feature Discovery (NFD) and GPU Feature Discovery (GFD) for GPU labeling.
NVIDIA device plugin supporting Exclusive/TimeSlicing/MIG modes.
MIG manager and configs for MIG-capable GPUs.
DCGM Exporter and ready Grafana dashboards for GPU health.

How to enable

Set GPU parameters in the required NodeGroup — the module will label nodes and deploy components in d8-nvidia-gpu:

apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
  name: gpu
spec:
  cri: Containerd
  gpu:
    sharing: MIG # or Exclusive / TimeSlicing
    mig:
      partedConfig: all-1g.5gb

How it works (step-by-step)

Below is the full sequence and the key labels/taints involved.

Key labels (who sets them and why)

node.deckhouse.io/gpu="" — set by the module hook on GPU NodeGroup nodes; together with node.deckhouse.io/gpu-setup-complete="" it participates in scheduling “global” stack DaemonSets (NFD worker, DCGM/Exporter, etc.).
node.deckhouse.io/device-gpu.config=<Exclusive|TimeSlicing|MIG> — set by the hook; used by config-manager in GFD and the device plugin to select the right config.
node.deckhouse.io/gpu-setup-complete="" — set by gpu-sysctl.sh (NodeGroupConfiguration) after local checks/config; until it appears, the stack is not scheduled onto the node (NFD worker, GFD, device plugin, DCGM/Exporter).
feature.node.kubernetes.io/pci-*.present=true — published by NFD; used (together with the NodeGroup selector) to schedule GFD onto nodes with NVIDIA PCI devices.
nvidia.com/* (for example nvidia.com/gpu.count, nvidia.com/mig.capable=true|false) — published by NFD from feature files generated by GFD (NFD master is allowed to write into nvidia.com). The nvidia.com/gpu.count>0 label participates in scheduling the device plugin.
nvidia.com/mig.config=<profile|all-disabled> — set by the hook: desired MIG profile (e.g., all-1g.5gb) or all-disabled for MIG rollback/disable.
nvidia.com/mig.config.state=<pending|rebooting|success|failed> — set by nvidia-mig-manager during reconfiguration.
taint mig-reconfigure=true:NoSchedule — set/removed by nvidia-mig-manager while the operation is running.
annotations update.node.deckhouse.io/disruption-approved, update.node.deckhouse.io/draining, update.node.deckhouse.io/drained — used for controlled node drain during MIG.

Common flow (all modes)

You set/change spec.gpu in the target NodeGroup.
Helm deploys (or removes) components in d8-nvidia-gpu:
- GFD and device plugin as per-NodeGroup DaemonSets;
- NFD master/gc on master nodes; NFD worker/DCGM/Exporter on GPU nodes.
The module hook updates node labels:
- adds/updates node.deckhouse.io/gpu="" and node.deckhouse.io/device-gpu.config=...;
- for MIG it adds/updates nvidia.com/mig.config=...;
- on GPU disable it removes node.deckhouse.io/gpu and node.deckhouse.io/device-gpu.config; if the node had nvidia.com/mig.config it sets it to all-disabled to trigger MIG rollback.
NodeGroupConfiguration scripts run on the node (by weight): gpu-check.sh → gpu-runtime.sh → gpu-sysctl.sh.
- on success the node gets node.deckhouse.io/gpu-setup-complete="";
- on cleanup/errors the label is cleared, the runtime drop-in is removed, and sysctl is restored.
After that, the stack “converges” on the node:
- NFD publishes feature.node.kubernetes.io/pci-* and serves custom feature files;
- GFD writes GPU feature files into /etc/kubernetes/node-feature-discovery/features.d, and NFD publishes them as nvidia.com/*;
- the device plugin exposes resources (depending on the mode), DCGM/Exporter start exposing metrics.

Exclusive mode

Enable

Set spec.gpu.sharing: Exclusive.
The module sets node.deckhouse.io/device-gpu.config=Exclusive; after node.deckhouse.io/gpu-setup-complete is present, it brings up GFD and the device plugin.
The node exposes nvidia.com/gpu; each Pod gets a full GPU.

Disable

Remove spec.gpu from the NodeGroup (or move nodes to a non-GPU NodeGroup).
The module removes node.deckhouse.io/gpu and node.deckhouse.io/device-gpu.config from the node; NodeGroupConfiguration clears node.deckhouse.io/gpu-setup-complete and rolls back runtime/sysctl changes.
GPU stack DaemonSets stop being scheduled onto the node; NFD GC eventually cleans nvidia.com/* labels.

TimeSlicing mode

Enable

Set spec.gpu.sharing: TimeSlicing and optionally spec.gpu.timeSlicing.partitionCount (default is 4).
The module sets node.deckhouse.io/device-gpu.config=TimeSlicing; the device plugin applies the time-slicing config.
The node still exposes nvidia.com/gpu, but with more “virtual” slots (by partitionCount).

Disable

Switch to Exclusive or remove spec.gpu from the NodeGroup.
On mode switch, node.deckhouse.io/device-gpu.config changes and GFD/device plugin reload configs; on full GPU disable follow Exclusive → Disable.

MIG mode

Enable

Set:
- spec.gpu.sharing: MIG
- spec.gpu.mig.partedConfig: <profile name> (e.g., all-1g.5gb)
The module sets node.deckhouse.io/device-gpu.config=MIG and nvidia.com/mig.config=<profile>.
Once GFD/NFD publishes nvidia.com/mig.capable=true (GPU supports MIG), nvidia-mig-manager is scheduled to the node (it runs as a DaemonSet and reacts to changes of the nvidia.com/mig.config label).
When it needs to reconfigure MIG, it:
- sets nvidia.com/mig.config.state=pending;
- “pauses” GPU clients by setting deployment labels to paused-for-mig-change:
  - nvidia.com/gpu.deploy.device-plugin
  - nvidia.com/gpu.deploy.gpu-feature-discovery
  - nvidia.com/gpu.deploy.dcgm-exporter
  - nvidia.com/gpu.deploy.dcgm
  - nvidia.com/gpu.deploy.nvsm
- sets taint mig-reconfigure=true:NoSchedule;
- waits for update.node.deckhouse.io/disruption-approved (or an already started drain via update.node.deckhouse.io/draining/update.node.deckhouse.io/drained), then sets update.node.deckhouse.io/draining=bashible and waits for update.node.deckhouse.io/drained;
- deletes (and waits for shutdown of) GPU client pods on the node: device plugin, GFD, DCGM Exporter, DCGM, plus validators (cuda/plugin);
- applies the selected MIG profile (may set nvidia.com/mig.config.state=rebooting and reboot if needed);
- finishes with nvidia.com/mig.config.state=success (or failed), removes the mig-reconfigure taint, runs kubectl uncordon, removes update.node.deckhouse.io/drained/update.node.deckhouse.io/disruption-approved, restores nvidia.com/gpu.deploy.* to true, and returns the node back to service.
After applying the profile, the cluster exposes resources like nvidia.com/mig-<profile> (e.g., nvidia.com/mig-1g.5gb).
If the nvidia-mig-manager Pod is restarted/removed, the preStop hook waits for an active operation to finish (/processing file), then runs kubectl uncordon, removes the mig-reconfigure=true:NoSchedule taint, and removes update.node.deckhouse.io/drained/update.node.deckhouse.io/disruption-approved (best-effort, may not finish on forced termination).

Disable (MIG → non-MIG, or full GPU disable)

When switching away from MIG (Exclusive/TimeSlicing) or removing spec.gpu, the module sets nvidia.com/mig.config=all-disabled to roll back MIG on the node.
nvidia-mig-manager applies all-disabled in the same way (taint + drain + operation). If nvidia.com/mig.capable is already missing (e.g., after GPU/NFD labels were removed), the manager is still scheduled while nvidia.com/mig.config=all-disabled exists to finish the rollback and remove the taint.
After successful MIG disable (nvidia.com/mig.config.state=success), the script removes nvidia.com/mig.config and nvidia.com/mig.config.state if the node is no longer in MIG mode or GPU is disabled, so the manager does not “hang” on the node waiting for a label change.
The manager does not run on GPUs without MIG support (nvidia.com/mig.capable=false); use Exclusive or TimeSlicing for such nodes.

Monitoring

DCGM Exporter publishes metrics; Grafana dashboards show GPU load and MIG state.

GPU module

GPU module

Prerequisites

What the module deploys

How to enable

How it works (step-by-step)

Key labels (who sets them and why)

Common flow (all modes)

Exclusive mode

TimeSlicing mode

MIG mode

Monitoring

An error has occurred

Tell us what you didn’t like.

GPU module

GPU module

Prerequisites

What the module deploys

How to enable

How it works (step-by-step)

Key labels (who sets them and why)

Common flow (all modes)

Exclusive mode

TimeSlicing mode

MIG mode

Monitoring

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error