Custom resource NodeGroup

NodeGroup

Scope: Cluster

Describes the runtime parameters of the node group.

Example:

# NodeGroup for cloud nodes in AWS.
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
  name: test
spec:
  nodeType: CloudEphemeral
  cloudInstances:
    zones:
      - eu-west-1a
      - eu-west-1b
    minPerZone: 1
    maxPerZone: 2
    classReference:
      kind: AWSInstanceClass
      name: test
  nodeTemplate:
    labels:
      tier: test
---
# NodeGroup for static nodes on bare metal servers (or VMs).
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
  name: worker
spec:
  nodeType: Static

metadata
object
- metadata.name
  string
  Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$
  Maximum length: 42
spec
object
Required value
- spec.chaos
  object
  Chaos monkey settings.
  
  Example:
  chaos: mode: DrainAndDelete period: 24h
  - spec.chaos.mode
    string
    The chaos monkey mode:
    
    DrainAndDelete — drains and deletes a node when triggered;
    
    Disabled — leaves this NodeGroup intact.
    Default: "Disabled"
    Allowed values: Disabled, DrainAndDelete
  - spec.chaos.period
    string
    The time interval to use for the chaos monkey.
    
    It is specified as a string containing the time unit in hours and minutes: 30m, 1h, 2h30m, 24h.
    
    Default: "6h"
    Pattern: ^([0-9]+h([0-9]+m)?|[0-9]+m)$
- spec.cloudInstances
  object
  Parameter for provisioning the cloud-based VMs.
  
  Caution! Can only be used together with nodeType: CloudEphemeral.
  - spec.cloudInstances.classReference
    object
    Required value
    The reference to the InstanceClass object. It is unique for each cloud-provider-* module.
    - spec.cloudInstances.classReference.kind
      string
      The object type (e.g., OpenStackInstanceClass). The object type is specified in the documentation of the corresponding cloud-provider- module.
      
      Allowed values: OpenStackInstanceClass, GCPInstanceClass, VsphereInstanceClass, AWSInstanceClass, YandexInstanceClass, AzureInstanceClass, VCDInstanceClass, ZvirtInstanceClass, DynamixInstanceClass, HuaweiCloudInstanceClass
    - spec.cloudInstances.classReference.name
      string
      The name of the required InstanceClass object (e.g., finland-medium).
  - spec.cloudInstances.maxPerZone
    integer
    Required value
    The maximum number of instances for the group in each zone.
    
    This value is used as the upper bound in cluster-autoscaler.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.maxSurgePerZone
    integer
    The maximum number of instances to rollout simultaneously in the group in each zone.
    
    Default: 1
    Allowed values: 0 <= X
  - spec.cloudInstances.maxUnavailablePerZone
    integer
    The maximum number of unavailable instances (during rollout) in the group in each zone.
    
    Default: 0
    Allowed values: 0 <= X
  - spec.cloudInstances.minPerZone
    integer
    Required value
    The minimum number of instances for the group in each zone.
    
    This value is used in the MachineDeployment object and as a lower bound in cluster-autoscaler.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.priority
    integer
    Priority of the node group.
    
    When scaling a cluster, the autoscaler will first select node groups with a higher priority set. If several node groups have the same priority, the autoscaler randomly selects a group of them.
    
    Using priorities can be convenient to prefer ordering cheaper nodes (for example, spot instances) over more expensive ones.
  - spec.cloudInstances.quickShutdown
    Deprecated
    boolean
    Lowers CloudEphemeral machine drain timeout to 5 minutes.
  - spec.cloudInstances.standby
    integer or string
    The summary number of overprovisioned nodes for this NodeGroup in all zones.
    
    An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
    
    The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
    
    Pattern: ^[0-9]+%?$
  - spec.cloudInstances.standbyHolder
    object
    Amount of reserved resources.
    
    Used to determine whether to order overprovisioned nodes.
    - spec.cloudInstances.standbyHolder.notHeldResources
      Deprecated
      object
      Deprecated: the parameter is no longer used. Use the overprovisioningRate parameter.
      
      Describes the resources that will not be held (consumed) by the standby holder.
      - spec.cloudInstances.standbyHolder.notHeldResources.cpu
        integer or string
        Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
        
        Pattern: ^[0-9]+m?$
      - spec.cloudInstances.standbyHolder.notHeldResources.memory
        integer or string
        Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
        
        Pattern: ^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
    - spec.cloudInstances.standbyHolder.overprovisioningRate
      integer
      Percentage of reserved resources calculated from the capacity of a node of a NodeGroup.
      
      Default: 50
      Allowed values: 1 <= X <= 80
  - spec.cloudInstances.zones
    array of strings
    List of availability zones to create instances in.
    
    The default value depends on the cloud provider selected and usually corresponds to all zones of the region being used.
    
    Example:
    zones: - Helsinki - Espoo - Tampere
- spec.cri
  object
  Container runtime parameters.
  - spec.cri.containerd
    object
    Containerd runtime parameters.
    
    If used, cri.type must be set to Containerd.
    - spec.cri.containerd.maxConcurrentDownloads
      integer
      Set the max concurrent downloads for each pull.
      
      Default: 3
  - spec.cri.docker
    Deprecated
    object
    Docker settings for nodes.
    - spec.cri.docker.manage
      boolean
      Enable Docker maintenance from bashible.
      
      Default: true
    - spec.cri.docker.maxConcurrentDownloads
      integer
      Set the max concurrent downloads for each pull.
      
      Default: 3
  - spec.cri.notManaged
    object
    Settings for not managed CRI for nodes.
    - spec.cri.notManaged.criSocketPath
      string
      Path to CRI socket.
  - spec.cri.type
    string
    Container runtime type.
    
    Value defaultCRI from the initial cluster configration (cluster-configuration.yaml parameter from the d8-cluster-configuration secret in the kube-system namespace) is used if not specified.
    
    Note! The Docker is deprecated.
    
    Allowed values: Docker, Containerd, NotManaged
- spec.disruptions
  object
  Disruptions settings for nodes.
  
  Example:
  disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
  - spec.disruptions.approvalMode
    string
    The approval mode for disruptive updates:
    
    Manual — disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! The master node group update mode must be Manual to avoid issues with draining.
    
    Automatic — automatically approve disruption-involving updates.
    
    RollingUpdate — in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
    
    If the RollingUpdate mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In the RollingUpdate mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, the RollingUpdate mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.
    Default: "Automatic"
    Allowed values: Manual, Automatic, RollingUpdate
  - spec.disruptions.automatic
    object
    Additional parameters for the Automatic mode.
    - spec.disruptions.automatic.drainBeforeApproval
      boolean
      Drain Pods from the nodes before approving disruption.
      
      Caution! This setting ignores (nodes will be approved without draining Pods):
      
      for the nodeGroup master with a single node;
      
      for a single ready node in a nodeGroup picked out for Deckhouse placement.
      
      Default: true
    - spec.disruptions.automatic.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.automatic.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.automatic.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.automatic.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
  - spec.disruptions.rollingUpdate
    object
    Additional parameters for the RollingUpdate mode.
    - spec.disruptions.rollingUpdate.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.rollingUpdate.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.rollingUpdate.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.rollingUpdate.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
- spec.fencing
  object
  Enable fencing controller for this group.
  - spec.fencing.mode
    string
    Required value
    Fencing supports a single mode: Watchdog. In this mode, the watchdog kernel module (watchdog timer) is used.
    
    The Watchdog implementation includes the following components:
    
    Fencing-agent — a DaemonSet that is deployed on a specific group of nodes (NodeGroups).
    
    Once started, the agent activates Watchdog and sets a special label node-manager.deckhouse.io/fencing-enabled on the node where it is functioning. The agent regularly checks if the Kubernetes API is available. If the API is available, the agent sends a signal to Watchdog, which resets the watchdog timer. The agent also monitors special service labels on the node and, depending on their presence, enables or disables the Watchdog.
    
    A softdog kernel module with parameters soft_margin=60 and soft_panic=1 is used as Watchdog. This means that the timeout period of the watchdog timer is 60 seconds. After this time expires, kernel-panic occurs and the node remains in this state until the user reboots it.
    
    Fencing-controller — a controller that monitors all nodes with the node-manager.deckhouse.io/fencing-enabled label. If any node becomes unavailable for more than 60 seconds, the controller removes all pods from that node and then removes the node itself.
    Allowed values: Watchdog
- spec.kubelet
  object
  Kubelet settings for nodes.
  - spec.kubelet.containerLogMaxFiles
    integer
    How many rotated log files to store before deleting them.
    
    Default: 4
    Allowed values: 1 <= X <= 20
  - spec.kubelet.containerLogMaxSize
    string
    Maximum log file size before it is rotated.
    
    Default: "50Mi"
    Pattern: \d+[Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|k|m]
  - spec.kubelet.maxPods
    integer
    Set the max count of pods per node.
    
    Default: 110
  - spec.kubelet.resourceReservation
    object
    Management of resource reservation for system daemons on a node.
    
    More info in the Kubernetes documentation.
    - spec.kubelet.resourceReservation.mode
      string
      Defines the system resource reservation mode:
      
      Off — disables resource reservation.
      
      Auto — reserves resources based on the Node capacity.
      
      Static — reserves resources based on the values provided by the user via the static parameter.
      
      Note that currently we do not use a dedicated group cgroup for resource reservation (-system-reserved-cgroup is not used).
      
      Default: "Auto"
    - spec.kubelet.resourceReservation.static
      object
      Resource reservation parameters for the ‘Static’ mode.
      - spec.kubelet.resourceReservation.static.cpu
        integer or string
        Pattern: \d+[m]
      - spec.kubelet.resourceReservation.static.ephemeralStorage
        integer or string
        Pattern: \d+[Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|k|m]
      - spec.kubelet.resourceReservation.static.memory
        integer or string
        Pattern: \d+[Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|k|m]
  - spec.kubelet.rootDir
    string
    Directory path for managing kubelet files (volume mounts,etc).
    
    Default: "/var/lib/kubelet"
  - spec.kubelet.topologyManager
    object
    Controls a topology manager.
    
    The topology manager allows the Kubernetes scheduler to take resource topology data into account when scheduling pods.
    
    In a cluster with NUMA nodes, the topology manager can improve application performance by aligning workloads with NUMA nodes that are more resource-efficient.
    
    For more information, see the Kubernetes documentation.
    - spec.kubelet.topologyManager.enabled
      boolean
      Enables Topology Management on a node.
      
      Default: false
    - spec.kubelet.topologyManager.policy
      string
      Resource/topology alignment policy to use when scheduling a pod.
      
      There are four supported policies:
      
      None — no resource alignment (default).
      
      BestEffort — a preferred NUMA node is defined for each container. If there are no preferred nodes for the pod available, it will be scheduled to a node without taking topology manager data into account, while the already running pod will keep running.
      
      Restricted — a preferred NUMA node is defined for each pod container. If there are no preferred nodes available, a new pod will not be scheduled to a node while an already running pod will be stopped.
      
      SingleNumaNode — a preferred NUMA node is defined for each pod container. If there is no NUMA node available that suits all pod containers, a new pod will not be scheduled while an already running pod will be stopped.
      
      More information can be found in the Kubernetes documentation.
      
      Default: "None"
      Allowed values: None, BestEffort, Restricted, SingleNumaNode
    - spec.kubelet.topologyManager.scope
      string
      Defines granularity with which resources are aligned:
      
      Container. The preferred resources are computed for a container. The computation does not take into account that multiple containers may belong to the same pod.
      
      Pod. The preferred resources are computed for the pod. All containers in the pod are treated as a whole in the computation.
      
      Default: "Container"
      Allowed values: Container, Pod
- spec.nodeDrainTimeoutSecond
  integer
  Maximum duration (in seconds) for which the controller attempts to reschedule pods to a new machine during node draining.
- spec.nodeTemplate
  object
  Specification of some of the fields that will be maintained in all nodes of the group.
  
  Example:
  nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
  - spec.nodeTemplate.annotations
    object
    Similar to the standard metadata.annotations field.
    
    Example:
    annotations: ai.fleet.com/discombobulate: 'true'
  - spec.nodeTemplate.labels
    object
    Similar to the standard metadata.labels field.
    
    Example:
    labels: environment: production app: warp-drive-ai
  - spec.nodeTemplate.taints
    array of objects
    Similar to the .spec.taints field of the Node object.
    
    Available fields: effect, key, and values.
    
    Example:
    taints: - effect: NoExecute key: ship-class value: frigate
    - spec.nodeTemplate.taints.effect
      string
      Allowed values: NoSchedule, PreferNoSchedule, NoExecute
    - spec.nodeTemplate.taints.key
      string
    - spec.nodeTemplate.taints.value
      string
- spec.nodeType
  string
  Required value
  The type of nodes this group provides:
  - CloudEphemeral — nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;
  - CloudPermanent — nodes from ProviderClusterConfiguration will be created via dhctl;
  - CloudStatic — a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud providers. This node has the CSI running, and it is managed by the cloud-controller-manager: the Node object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes;
  - Static — a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even if one of the cloud providers is enabled.
  Allowed values: CloudEphemeral, CloudPermanent, CloudStatic, Static
- spec.operatingSystem
  object
  Operating System settings for nodes.
  - spec.operatingSystem.manageKernel
    Deprecated
    boolean
    This parameter has no effect. Earlier, it enabled kernel maintenance on behalf of bashible.
    
    Default: true
- spec.staticInstances
  object
  Parameter for provisioning static machines to the cluster.
  - spec.staticInstances.count
    integer
    The number of existing instances to bootstrap and add to the cluster.
    
    Default: 0
    Allowed values: 0 <= X
  - spec.staticInstances.labelSelector
    object
    A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects.
    - spec.staticInstances.labelSelector.matchExpressions
      array of objects
      A list of label selector requirements. The requirements are ANDed.
      
      A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
      - spec.staticInstances.labelSelector.matchExpressions.key
        string
        key is the label key that the selector applies to.
      - spec.staticInstances.labelSelector.matchExpressions.operator
        string
        operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.
      - spec.staticInstances.labelSelector.matchExpressions.values
        array of strings
        values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
        
        Element of the array
        string
        Pattern: [a-z0-9]([-a-z0-9]*[a-z0-9])?
        Length: 1..63
    - spec.staticInstances.labelSelector.matchLabels
      object
      matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.
- spec.update
  object
  - spec.update.maxConcurrent
    integer or string
    Maximum number of concurrently updating nodes.
    
    Can be set as absolute count or as a percent of total nodes.
    
    Default: 1
    Pattern: ^[1-9][0-9]*%?$

Describes the runtime parameters of the node group.

metadata
object
- metadata.name
  string
  Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$
  Maximum length: 42
spec
object
Required value
- spec.chaos
  object
  Chaos monkey settings.
  
  Example:
  chaos: mode: DrainAndDelete period: 24h
  - spec.chaos.mode
    string
    The chaos monkey mode:
    
    DrainAndDelete — drains and deletes a node when triggered;
    
    Disabled — leaves this NodeGroup intact.
    Default: "Disabled"
    Allowed values: Disabled, DrainAndDelete
  - spec.chaos.period
    string
    The time interval to use for the chaos monkey (can be specified in the Go format).
    
    Default: "6h"
    Pattern: ^[0-9]+[mh]{1}$
- spec.cloudInstances
  object
  Parameter for provisioning the cloud-based VMs.
  
  Caution! Can only be used together with nodeType: CloudEphemeral.
  - spec.cloudInstances.classReference
    object
    Required value
    The reference to the InstanceClass object. It is unique for each cloud-provider-* module.
    - spec.cloudInstances.classReference.kind
      string
      The object type (e.g., OpenStackInstanceClass). The object type is specified in the documentation of the corresponding cloud-provider- module.
      
      Allowed values: OpenStackInstanceClass, GCPInstanceClass, VsphereInstanceClass, AWSInstanceClass, YandexInstanceClass, AzureInstanceClass, VCDInstanceClass, ZvirtInstanceClass, DynamixInstanceClass, HuaweiCloudInstanceClass
    - spec.cloudInstances.classReference.name
      string
      The name of the required InstanceClass object (e.g., finland-medium).
  - spec.cloudInstances.maxPerZone
    integer
    Required value
    The maximum number of instances for the group in each zone.
    
    This value is used as the upper bound in cluster-autoscaler.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.maxSurgePerZone
    integer
    The maximum number of instances to rollout simultaneously in the group in each zone.
    
    Default: 1
    Allowed values: 0 <= X
  - spec.cloudInstances.maxUnavailablePerZone
    integer
    The maximum number of unavailable instances (during rollout) in the group in each zone.
    
    Default: 0
    Allowed values: 0 <= X
  - spec.cloudInstances.minPerZone
    integer
    Required value
    The minimum number of instances for the group in each zone.
    
    This value is used in the MachineDeployment object and as a lower bound in cluster-autoscaler.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.standby
    integer or string
    The summary number of overprovisioned nodes for this NodeGroup in all zones.
    
    An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
    
    The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
    
    Pattern: ^[0-9]+%?$
  - spec.cloudInstances.standbyHolder
    object
    Amount of reserved resources.
    
    Used to determine whether to order overprovisioned nodes.
    - spec.cloudInstances.standbyHolder.notHeldResources
      object
      Describes the resources that will not be held (consumed) by the standby holder.
      - spec.cloudInstances.standbyHolder.notHeldResources.cpu
        integer or string
        Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
        
        Pattern: ^[0-9]+m?$
      - spec.cloudInstances.standbyHolder.notHeldResources.memory
        integer or string
        Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
        
        Pattern: ^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
  - spec.cloudInstances.zones
    array of strings
    List of availability zones to create instances in.
    
    The default value depends on the cloud provider selected and usually corresponds to all zones of the region being used.
    
    Example:
    zones: - Helsinki - Espoo - Tampere
- spec.cri
  object
  Container runtime parameters.
  - spec.cri.containerd
    object
    Containerd runtime parameters.
    
    If used, cri.type must be set to Containerd.
    - spec.cri.containerd.maxConcurrentDownloads
      integer
      Set the max concurrent downloads for each pull.
      
      Default: 3
  - spec.cri.docker
    object
    Docker settings for nodes.
    
    Note! the Docker is deprecated.
    - spec.cri.docker.manage
      boolean
      Enable Docker maintenance from bashible.
      
      Default: true
    - spec.cri.docker.maxConcurrentDownloads
      integer
      Set the max concurrent downloads for each pull.
      
      Default: 3
  - spec.cri.notManaged
    object
    Settings for not managed CRI for nodes.
    - spec.cri.notManaged.criSocketPath
      string
      Path to CRI socket.
  - spec.cri.type
    string
    Container runtime type.
    
    Value defaultCRI from the initial cluster configration (cluster-configuration.yaml parameter from the d8-cluster-configuration secret in the kube-system namespace) is used if not specified.
    
    Note! the Docker is deprecated.
    
    Allowed values: Docker, Containerd, NotManaged
- spec.disruptions
  object
  Disruptions settings for nodes.
  
  Example:
  disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
  - spec.disruptions.approvalMode
    string
    The approval mode for disruptive updates:
    
    Manual — disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! The master node group update mode must be Manual to avoid issues with draining.
    
    Automatic — automatically approve disruption-involving updates.
    
    RollingUpdate — in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
    
    If the RollingUpdate mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In the RollingUpdate mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, the RollingUpdate mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.
    Default: "Automatic"
    Allowed values: Manual, Automatic, RollingUpdate
  - spec.disruptions.automatic
    object
    Additional parameters for the Automatic mode.
    - spec.disruptions.automatic.drainBeforeApproval
      boolean
      Drain Pods from the nodes before approving disruption.
      
      Caution! This setting ignores (nodes will be approved without draining Pods):
      
      for the nodeGroup master with a single node;
      
      for a single ready node in a nodeGroup picked out for Deckhouse placement.
      
      Default: true
    - spec.disruptions.automatic.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.automatic.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.automatic.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.automatic.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
  - spec.disruptions.rollingUpdate
    object
    Additional parameters for the RollingUpdate mode.
    - spec.disruptions.rollingUpdate.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.rollingUpdate.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.rollingUpdate.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.rollingUpdate.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
- spec.kubelet
  object
  Kubelet settings for nodes.
  - spec.kubelet.containerLogMaxFiles
    integer
    How many rotated log files to store before deleting them.
    
    WARNING! This parameter does nothing if CRI type is Docker.
    
    Default: 4
    Allowed values: 1 <= X <= 20
  - spec.kubelet.containerLogMaxSize
    string
    Maximum log file size before it is rotated.
    
    WARNING! This parameter does nothing if CRI type is Docker.
    
    Default: "50Mi"
    Pattern: \d+[Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|k|m]
  - spec.kubelet.maxPods
    integer
    Set the max count of pods per node.
    
    Default: 110
  - spec.kubelet.rootDir
    string
    Directory path for managing kubelet files (volume mounts,etc).
    
    Default: "/var/lib/kubelet"
- spec.nodeTemplate
  object
  Specification of some of the fields that will be maintained in all nodes of the group.
  
  Example:
  nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
  - spec.nodeTemplate.annotations
    object
    Similar to the standard metadata.annotations field.
    
    Example:
    annotations: ai.fleet.com/discombobulate: 'true'
  - spec.nodeTemplate.labels
    object
    Similar to the standard metadata.labels field.
    
    Example:
    labels: environment: production app: warp-drive-ai
  - spec.nodeTemplate.taints
    array of objects
    Similar to the .spec.taints field of the Node object.
    
    Available fields: effect, key, and values.
    
    Example:
    taints: - effect: NoExecute key: ship-class value: frigate
    - spec.nodeTemplate.taints.effect
      string
      Allowed values: NoSchedule, PreferNoSchedule, NoExecute
    - spec.nodeTemplate.taints.key
      string
    - spec.nodeTemplate.taints.value
      string
- spec.nodeType
  string
  Required value
  The type of nodes this group provides.
  - Cloud — nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;
  - Static — a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even of one of the cloud providers is enabled;
  - Hybrid — a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud provider. This node has the CSI running, and it is managed by the cloud-controller-manager: the Node object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes.
  Allowed values: Cloud, Static, Hybrid
- spec.operatingSystem
  object
  Operating System settings for nodes.
  - spec.operatingSystem.manageKernel
    boolean
    Enable kernel maintenance from bashible.
    
    Default: true

Defines the runtime parameters of a node group.

metadata
object
- metadata.name
  string
  Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$
  Maximum length: 42
spec
object
Required value
- spec.chaos
  object
  Chaos monkey settings.
  
  Example:
  chaos: mode: DrainAndDelete period: 24h
  - spec.chaos.mode
    string
    The chaos monkey mode:
    
    DrainAndDelete — drains and deletes a node when triggered;
    
    Disabled — leaves this NodeGroup intact.
    Default: "Disabled"
    Allowed values: Disabled, DrainAndDelete
  - spec.chaos.period
    string
    The time interval to use for the chaos monkey (can be specified in the Go format).
    
    Default: "6h"
    Pattern: ^[0-9]+[mh]{1}$
- spec.cloudInstances
  object
  Parameter for provisioning the cloud-based VMs.
  
  Caution! Can only be used together with nodeType: CloudEphemeral.
  - spec.cloudInstances.classReference
    object
    Required value
    The reference to the InstanceClass object. It is unique for each cloud-provider-* module.
    - spec.cloudInstances.classReference.kind
      string
      The object type (e.g., OpenStackInstanceClass). The object type is specified in the documentation of the corresponding cloud-provider- module.
      
      Allowed values: OpenStackInstanceClass, GCPInstanceClass, VsphereInstanceClass, AWSInstanceClass, YandexInstanceClass, AzureInstanceClass, VCDInstanceClass, ZvirtInstanceClass, DynamixInstanceClass, HuaweiCloudInstanceClass
    - spec.cloudInstances.classReference.name
      string
      The name of the required InstanceClass object (e.g., finland-medium).
  - spec.cloudInstances.maxPerZone
    integer
    Required value
    The maximum number of instances for the group in each zone.
    
    This value is used as the upper bound in cluster-autoscaler.
    
    With a value of 0, you need to set capacity for some InstanceClass. Get more details in the description of the necessary InstanceClass.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.maxSurgePerZone
    integer
    The maximum number of instances to rollout simultaneously in the group in each zone.
    
    Default: 1
    Allowed values: 0 <= X
  - spec.cloudInstances.maxUnavailablePerZone
    integer
    The maximum number of unavailable instances (during rollout) in the group in each zone.
    
    Default: 0
    Allowed values: 0 <= X
  - spec.cloudInstances.minPerZone
    integer
    Required value
    The minimum number of instances for the group in each zone.
    
    This value is used in the MachineDeployment object and as a lower bound in cluster-autoscaler.
    
    Allowed values: 0 <= X
  - spec.cloudInstances.standby
    integer or string
    The summary number of overprovisioned nodes for this NodeGroup all zones.
    
    An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
    
    The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
    
    Pattern: ^[0-9]+%?$
  - spec.cloudInstances.standbyHolder
    object
    Amount of reserved resources.
    
    Used to determine whether to order overprovisioned nodes.
    - spec.cloudInstances.standbyHolder.notHeldResources
      object
      Describes the resources that will not be held (consumed) by the standby holder.
      - spec.cloudInstances.standbyHolder.notHeldResources.cpu
        integer or string
        Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
        
        Pattern: ^[0-9]+m?$
      - spec.cloudInstances.standbyHolder.notHeldResources.memory
        integer or string
        Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
        
        The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
        
        Pattern: ^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
  - spec.cloudInstances.zones
    array of strings
    List of availability zones to create instances in.
    
    The default value depends on the cloud provider selected and usually corresponds to all zones of the region being used.
    
    Example:
    zones: - Helsinki - Espoo - Tampere
- spec.cri
  object
  Container runtime parameters.
  - spec.cri.containerd
    object
    Containerd runtime parameters.
    
    If used, cri.type must be set to Containerd.
    - spec.cri.containerd.maxConcurrentDownloads
      integer
      Set the max concurrent downloads for each pull.
      
      Default: 3
  - spec.cri.type
    string
    Container runtime type.
    
    Value defaultCRI from the initial cluster configration (cluster-configuration.yaml parameter from the d8-cluster-configuration secret in the kube-system namespace) is used if not specified.
    
    Note! the Docker is deprecated.
    
    Allowed values: Docker, Containerd, NotManaged
- spec.disruptions
  object
  Disruptions settings for nodes.
  
  Example:
  disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
  - spec.disruptions.approvalMode
    string
    The approval mode for disruptive updates:
    
    Manual — disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! The master node group update mode must be Manual to avoid issues with draining.
    
    Automatic — automatically approve disruption-involving updates.
    
    RollingUpdate — in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
    
    If the RollingUpdate mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In the RollingUpdate mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, the RollingUpdate mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.
    Default: "Automatic"
    Allowed values: Manual, Automatic, RollingUpdate
  - spec.disruptions.automatic
    object
    Additional parameters for the Automatic mode.
    - spec.disruptions.automatic.drainBeforeApproval
      boolean
      Drain Pods from the nodes before approving disruption.
      
      Caution! This setting ignores (nodes will be approved without draining Pods):
      
      for the nodeGroup master with a single node;
      
      for a single ready node in a nodeGroup picked out for Deckhouse placement.
      
      Default: true
    - spec.disruptions.automatic.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.automatic.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.automatic.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.automatic.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
  - spec.disruptions.rollingUpdate
    object
    Additional parameters for the RollingUpdate mode.
    - spec.disruptions.rollingUpdate.windows
      array of objects
      Time windows for node disruptive updates.
      - spec.disruptions.rollingUpdate.windows.days
        array of strings
        Days of the week when node could be updated.
        
        Examples:
        
        days: Mon
        
        days: Wed
        
        Element of the array
        string
        Day of the week.
        
        Allowed values: Mon, Tue, Wed, Thu, Fri, Sat, Sun
      - spec.disruptions.rollingUpdate.windows.from
        string
        Required value
        Start time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        from: '13:00'
      - spec.disruptions.rollingUpdate.windows.to
        string
        Required value
        End time of disruptive update window (UTC timezone).
        
        Pattern: ^(?:\d|[01]\d|2[0-3]):[0-5]\d$
        Example:
        to: '18:30'
- spec.docker
  object
  Docker settings for nodes.
  
  If used, cri.type must be set to Docker.
  
  Note! the Docker is deprecated.
  - spec.docker.manage
    boolean
    Enable Docker maintenance from bashible.
    
    Default: true
  - spec.docker.maxConcurrentDownloads
    integer
    Set the max concurrent downloads for each pull.
    
    Default: 3
- spec.kubelet
  object
  Kubelet settings for nodes.
  - spec.kubelet.containerLogMaxFiles
    integer
    How many rotated log files to store before deleting them.
    
    WARNING! This parameter does nothing if CRI type is Docker.
    
    Default: 4
    Allowed values: 1 <= X <= 20
  - spec.kubelet.containerLogMaxSize
    string
    Maximum log file size before it is rotated.
    
    WARNING! This parameter does nothing if CRI type is Docker.
    
    Default: "50Mi"
    Pattern: \d+[Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|k|m]
  - spec.kubelet.maxPods
    integer
    Set the max count of pods per node.
    
    Default: 110
  - spec.kubelet.rootDir
    string
    Directory path for managing kubelet files (volume mounts,etc).
    
    Default: "/var/lib/kubelet"
- spec.kubernetesVersion
  string
  The desired minor version of Kubernetes.
  
  By default, it corresponds to the version selected for the cluster globally (see installation documentation) or to the current version of the control plane (if the global version is not defined).
  
  Allowed values: 1.28, 1.29, 1.30, 1.31, 1.32
  Example:
  kubernetesVersion: '1.29'
- spec.nodeTemplate
  object
  Specification of some of the fields that will be maintained in all nodes of the group.
  
  Example:
  nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
  - spec.nodeTemplate.annotations
    object
    Similar to the standard metadata.annotations field.
    
    Example:
    annotations: ai.fleet.com/discombobulate: 'true'
  - spec.nodeTemplate.labels
    object
    Similar to the standard metadata.labels field.
    
    Example:
    labels: environment: production app: warp-drive-ai
  - spec.nodeTemplate.taints
    array of objects
    Similar to the .spec.taints field of the Node object.
    
    Available fields: effect, key, and values.
    
    Example:
    taints: - effect: NoExecute key: ship-class value: frigate
    - spec.nodeTemplate.taints.effect
      string
      Allowed values: NoSchedule, PreferNoSchedule, NoExecute
    - spec.nodeTemplate.taints.key
      string
    - spec.nodeTemplate.taints.value
      string
- spec.nodeType
  string
  Required value
  The type of nodes this group provides.
  - Cloud — nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;
  - Static — a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even of one of the cloud providers is enabled;
  - Hybrid — a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud provider. This node has the CSI running, and it is managed by the cloud-controller-manager: the Node object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes.
  Allowed values: Cloud, Static, Hybrid
- spec.operatingSystem
  Deprecated
  object
  Operating System settings for nodes.
  - spec.operatingSystem.manageKernel
    Deprecated
    boolean
    Enable kernel maintenance from bashible.
    
    Default: true
- spec.static
  object
  Static node parameters
  - spec.static.internalNetworkCIDRs
    array of strings
    Subnet CIDR

Custom resource NodeGroup

NodeGroup

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error