Available with limitations in CSE Lite (1.73), CSE Pro (1.73)

Available without limitations in:  EE

The module lifecycle stageGeneral Availability
The module has requirements for installation

Migration from Device Plugin mode to DRA

DRA mode is experimental. Do not enable it in production clusters.

When dra.enabled is switched from false to true, the module performs an automatic migration:

  1. The check_migration hook detects the existing d8-nvidia-gpu namespace and patches it with the label gpu.deckhouse.io/managed-by=gpu.
  2. On the next reconcile, migrationReady becomes true and Helm renders the DRA templates.
  3. The NVIDIA Device Plugin stack (device plugin, GFD, MIG manager, DCGM) is removed by Helm; the DRA stack is deployed into the same d8-nvidia-gpu namespace.

No manual intervention is required — only setting dra.enabled: true in ModuleConfig.

How to explicitly enable the module…

You may explicitly enable or disable the module in one of the following ways:

  • Via Deckhouse web UI. In the “System” → “System Management” → “Deckhouse” → “Modules” section, open the gpu module and enable (or disable) the “Module enabled” toggle. Save changes.

    Example:

    Module enable/disable interface
  • Via Deckhouse CLI (d8).

    Use the d8 system module enable command for enabling, or d8 system module disable command for disabling the module (you need Deckhouse CLI (d8), configured to work with the cluster).

    Example of enabling the module:

    d8 system module enable gpu
  • Using ModuleConfig gpu.

    Set spec.enabled to true or false in ModuleConfig gpu (create it if necessary);

    Example of a manifest to enable module gpu:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: gpu
    spec:
      enabled: true

How to configure the module…

You can configure the module in one of the following ways:

  • Via Deckhouse web UI.

    In the “System” → “System Management” → “Deckhouse” → “Modules” section, open the gpu module and enable the “Advanced Settings” switch. Fill in the required fields in the “Configuration” tab or specify the module settings in YAML format on the “YAML” tab, excluding the settings section. Save the changes.

    Example:

    Module Setup Interface

    You can also edit the ModuleConfig object gpu on the “YAML” tab in the module settings window (“System” → “System Management” → “Deckhouse” → “Modules”, open the module gpu) by specifying the schema version in the spec.version parameter and the necessary module parameters in the spec.settings section.

  • Via Deckhouse CLI (d8) (requires Deckhouse CLI (d8) configured to work with the cluster).

    Edit the existing ModuleConfig gpu (for more details on configuring Deckhouse, see the documentation) by executing the following command:

    d8 k edit mc gpu

    Make the necessary changes in the spec.settings section. If necessary, specify the schema version in the spec.version parameter. Save the changes.

    You can also create a file with manifest for ModuleConfig gpu using the example below. Fill in the spec.settings section with the required module parameters. If necessary, specify the schema version in the spec.version parameter.

    Apply the manifest using the following command (indicate the manifest file name):

    d8 k apply -f <FILENAME>

    Example of a manifest for ModuleConfig gpu:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: gpu
    spec:
      version: 1
      enabled: true
      settings: # Module parameters from the "Parameters" section below.

How to change the module release channel…

To change the module release channel, follow the instruction.

Requirements

To the Kubernetes version: 1.34 and above.

To the Deckhouse version: 1.75 and above.

Parameters

Schema version: 1

  • settings
    object
    • settings.dra
      object

      Default: {}

      • settings.dra.allowCrossNamespaceSharing
        boolean
        Allow sharing MPS and time-slicing across namespaces.

        Default: false

      • settings.dra.enabled
        boolean

        Enable DRA (Dynamic Resource Allocation) mode for GPU management.

        Switches the GPU management stack from Device Plugin mode (NFD/GFD, nvidia-device-plugin) to DRA.

        Requires Kubernetes >= 1.34.

        When enabled, disables the Device Plugin/NFD/GFD stack.

        DRA mode is experimental. Do not enable it in production clusters.

        Default: false

    • settings.logLevel
      string
      Operator logging level.

      Default: Info

      Allowed values: Trace, Debug, Info, Error

      Example:


      logLevel: Info