The module lifecycle stageGeneral Availability
The module has requirements for installation

The Deckhouse Kubernetes Platform installs CRDs but does not remove them when a module is disabled. If you no longer need the created CRDs, delete them.

PhysicalGPU

Scope: Cluster
Version: v1alpha1

PhysicalGPU is the Schema for the physicalgpus API.
  • apiVersion
    string
    APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
  • kind
    string
    Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
  • metadata
    object
  • spec
    object
    PhysicalGPUSpec defines the desired state of PhysicalGPU. v0: desired state is intentionally empty.
  • status
    object
    PhysicalGPUStatus defines the observed state of PhysicalGPU.
    • status.capabilities
      object
      Capabilities is a snapshot of device capabilities.
      • status.capabilities.architecture
        string
        Architecture is the GPU architecture name (for example “Hopper”, “RDNA3”, “Xe-HPC”).
      • status.capabilities.computeAPI
        string
        ComputeAPI is the primary compute API (“CUDA”, “ROCm”, “oneAPI”).
      • status.capabilities.computeCapability
        string
        ComputeCapability is the compute capability string (for example “8.0” for NVIDIA, empty for others).
      • status.capabilities.memoryMiB
        integer
        MemoryMiB is the total framebuffer memory in MiB.
      • status.capabilities.partitionSupport
        object
        PartitionSupport describes partition capabilities (MIG/SPX/SR-IOV).
        • status.capabilities.partitionSupport.profiles
          array of objects
          Profiles lists supported partition profiles.
          • status.capabilities.partitionSupport.profiles.maxInstances
            integer
            MaxInstances is the maximum number of instances for this profile.
          • status.capabilities.partitionSupport.profiles.memoryMiB
            integer
            MemoryMiB is the framebuffer size for the profile.
          • status.capabilities.partitionSupport.profiles.name
            string
            Name is the profile name (for example “1g.6gb”, “spx-half”).
          • status.capabilities.partitionSupport.profiles.profileID
            integer
            ProfileID is the numeric profile ID.
          • status.capabilities.partitionSupport.profiles.resources
            object
            Resources is a vendor-agnostic map of resource counters (for example {“multiprocessors”: 14, “copy-engines”: 1}).
          • status.capabilities.partitionSupport.profiles.sliceCount
            integer
            SliceCount is the number of slices consumed by the profile.
        • status.capabilities.partitionSupport.supported
          boolean

          Required value

          Supported indicates whether partitioning is supported on this GPU.
        • status.capabilities.partitionSupport.technology
          string
          Technology is the partition technology name (“MIG”, “SPX”, “SR-IOV”).
        • status.capabilities.partitionSupport.totalSlices
          integer
          TotalSlices is the total number of partition slices on the GPU.
      • status.capabilities.powerLimitMaxW
        integer
        PowerLimitMaxW is the maximum supported power limit in watts.
      • status.capabilities.powerLimitMinW
        integer
        PowerLimitMinW is the minimum supported power limit in watts.
      • status.capabilities.productName
        string
        ProductName is the GPU product name (for example “NVIDIA A30”).
      • status.capabilities.vendor
        string
        Vendor is the GPU vendor identifier (for example “nvidia”, “amd”, “intel”).
    • status.conditions
      array of objects
      Conditions represent the health and readiness of the GPU.
      • status.conditions.lastTransitionTime
        string
        lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
      • status.conditions.message
        string
        message is a human readable message indicating details about the transition. This may be an empty string.

        Maximum length: 32768

      • status.conditions.observedGeneration
        integer
        observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.

        Allowed values: 0 <= X

      • status.conditions.reason
        string
        reason contains a programmatic identifier indicating the reason for the condition’s last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.

        Length: 1..1024

        Pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$

      • status.conditions.status
        string
        status of the condition, one of True, False, Unknown.

        Allowed values: True, False, Unknown

      • status.conditions.type
        string
        type of condition in CamelCase or in foo.example.com/CamelCase.

        Maximum length: 316

        Pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$

    • status.currentState
      object
      CurrentState reflects current driver binding and runtime details.
      • status.currentState.computeAPIVersion
        string
        ComputeAPIVersion is the compute API version string at runtime.
      • status.currentState.driverVersion
        string
        DriverVersion is the driver version string.
      • status.currentState.gpuUUID
        string
        GPUUUID is the GPU UUID reported by the vendor driver.
      • status.currentState.partition
        object
        Partition describes the current partition mode state.
        • status.currentState.partition.mode
          string
          Mode is the current partition mode reported by the adapter.
      • status.currentState.powerLimitCurrentW
        integer
        PowerLimitCurrentW is the current power limit in watts.
      • status.currentState.powerLimitEnforcedW
        integer
        PowerLimitEnforcedW is the enforced power limit in watts.
    • status.nodeInfo
      object
      NodeInfo contains node identification and bare-metal detection.
      • status.nodeInfo.bareMetal
        boolean
        BareMetal indicates whether the node is detected as bare metal.
      • status.nodeInfo.nodeName
        string
        NodeName is the Kubernetes node name where the GPU is detected.
    • status.pciInfo
      object
      PCIInfo holds PCI identification details for the device.
      • status.pciInfo.address
        string
        Address is the PCI address, for example “0000:02:00.0”.
      • status.pciInfo.class
        object
        Class describes the PCI class.
        • status.pciInfo.class.code
          string
          Code is the PCI class code (base+subclass), for example “0302”.
      • status.pciInfo.device
        object
        Device describes the PCI device.
        • status.pciInfo.device.id
          string
          ID is the PCI device ID, for example “20b7”.
      • status.pciInfo.vendor
        object
        Vendor describes the PCI vendor.
        • status.pciInfo.vendor.id
          string
          ID is the PCI vendor ID, for example “10de”.
    • status.phase
      string
      Phase is the lifecycle phase of the GPU, derived from conditions by gpu-controller.

      Allowed values: Pending, Initializing, Ready, Reconfiguring, Failed, Lost