Instance
Scope: Cluster
Version: v1alpha1
Describes an implementation-independent ephemeral machine resource.
InstanceTypesCatalog
Scope: Cluster
Version: v1alpha1
Contains information about discovered data from cloud.
NodeGroup
Scope: Cluster
Describes the runtime parameters of the node group.
Example:
# NodeGroup for cloud nodes in AWS.
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
name: test
spec:
nodeType: CloudEphemeral
cloudInstances:
zones:
- eu-west-1a
- eu-west-1b
minPerZone: 1
maxPerZone: 2
classReference:
kind: AWSInstanceClass
name: test
nodeTemplate:
labels:
tier: test
---
# NodeGroup for static nodes on bare metal servers (or VMs).
apiVersion: deckhouse.io/v1
kind: NodeGroup
metadata:
name: worker
spec:
nodeType: Static
- object
Parameter for provisioning the cloud-based VMs.
Caution! Can only be used together with
nodeType: CloudEphemeral
.- object
Required value
The reference to the
InstanceClass
object. It is unique for eachcloud-provider-*
module.- string
The object type (e.g.,
OpenStackInstanceClass
). The object type is specified in the documentation of the correspondingcloud-provider-
module.Allowed values:
OpenStackInstanceClass
,GCPInstanceClass
,VsphereInstanceClass
,AWSInstanceClass
,YandexInstanceClass
,AzureInstanceClass
,VCDInstanceClass
,ZvirtInstanceClass
,DynamixInstanceClass
- integer
The maximum number of unavailable instances (during rollout) in the group in each zone.
Default:
0
Allowed values:
0 <= X
- integer
Priority of the node group.
When scaling a cluster, the autoscaler will first select node groups with a higher priority set. If several node groups have the same priority, the autoscaler randomly selects a group of them.
Using priorities can be convenient to prefer ordering cheaper nodes (for example, spot instances) over more expensive ones.
- integer or string
The summary number of overprovisioned nodes for this
NodeGroup
in all zones.An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
Pattern:
^[0-9]+%?$
- object
Amount of reserved resources.
Used to determine whether to order overprovisioned nodes.
- object
Deprecated: the parameter is no longer used. Use the
overprovisioningRate
parameter.Describes the resources that will not be held (consumed) by the standby holder.
- integer or string
Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
Pattern:
^[0-9]+m?$
- integer or string
Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
Pattern:
^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
- object
Containerd runtime parameters.
If used,
cri.type
must be set toContainerd
. - string
Container runtime type.
Value
defaultCRI
from the initial cluster configration (cluster-configuration.yaml
parameter from thed8-cluster-configuration
secret in thekube-system
namespace) is used if not specified.Note! The
Docker
is deprecated.Allowed values:
Docker
,Containerd
,NotManaged
- object
Disruptions settings for nodes.
Example:
disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
- string
The approval mode for disruptive updates:
Manual
— disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! Themaster
node group update mode must beManual
to avoid issues with draining.Automatic
— automatically approve disruption-involving updates.RollingUpdate
— in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
If the
RollingUpdate
mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In theRollingUpdate
mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, theRollingUpdate
mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.Default:
"Automatic"
Allowed values:
Manual
,Automatic
,RollingUpdate
- boolean
Drain Pods from the nodes before approving disruption.
Caution! This setting ignores (nodes will be approved without draining Pods):
- for the nodeGroup
master
with a single node; - for a single
ready
node in a nodeGroup picked out for Deckhouse placement.
Default:
true
- for the nodeGroup
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- string
Required value
Fencing supports a single mode:
Watchdog
. In this mode, the watchdog kernel module (watchdog timer) is used.The Watchdog implementation includes the following components:
-
Fencing-agent — a DaemonSet that is deployed on a specific group of nodes (NodeGroups).
Once started, the agent activates Watchdog and sets a special label
node-manager.deckhouse.io/fencing-enabled
on the node where it is functioning. The agent regularly checks if the Kubernetes API is available. If the API is available, the agent sends a signal to Watchdog, which resets the watchdog timer. The agent also monitors special service labels on the node and, depending on their presence, enables or disables the Watchdog.A softdog kernel module with parameters
soft_margin=60
andsoft_panic=1
is used as Watchdog. This means that the timeout period of the watchdog timer is 60 seconds. After this time expires, kernel-panic occurs and the node remains in this state until the user reboots it. -
Fencing-controller — a controller that monitors all nodes with the
node-manager.deckhouse.io/fencing-enabled
label. If any node becomes unavailable for more than 60 seconds, the controller removes all pods from that node and then removes the node itself.
Allowed values:
Watchdog
-
- object
Management of resource reservation for system daemons on a node.
More info in the Kubernetes documentation.
- string
Defines the system resource reservation mode:
Off
— disables resource reservation.Auto
— reserves resources based on the Node capacity.Static
— reserves resources based on the values provided by the user via thestatic
parameter.
Note that currently we do not use a dedicated group
cgroup
for resource reservation (-system-reserved-cgroup
is not used).Default:
"Auto"
- object
Controls a topology manager.
The topology manager allows the Kubernetes scheduler to take resource topology data into account when scheduling pods.
In a cluster with NUMA nodes, the topology manager can improve application performance by aligning workloads with NUMA nodes that are more resource-efficient.
For more information, see the Kubernetes documentation.
- string
Resource/topology alignment policy to use when scheduling a pod.
There are four supported policies:
None
— no resource alignment (default).BestEffort
— a preferred NUMA node is defined for each container. If there are no preferred nodes for the pod available, it will be scheduled to a node without taking topology manager data into account, while the already running pod will keep running.Restricted
— a preferred NUMA node is defined for each pod container. If there are no preferred nodes available, a new pod will not be scheduled to a node while an already running pod will be stopped.SingleNumaNode
— a preferred NUMA node is defined for each pod container. If there is no NUMA node available that suits all pod containers, a new pod will not be scheduled while an already running pod will be stopped.
More information can be found in the Kubernetes documentation.
Default:
"None"
Allowed values:
None
,BestEffort
,Restricted
,SingleNumaNode
- string
Defines granularity with which resources are aligned:
Container
. The preferred resources are computed for a container. The computation does not take into account that multiple containers may belong to the same pod.Pod
. The preferred resources are computed for the pod. All containers in the pod are treated as a whole in the computation.
Default:
"Container"
Allowed values:
Container
,Pod
- object
Specification of some of the fields that will be maintained in all nodes of the group.
Example:
nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
- object
Similar to the standard
metadata.annotations
field.Example:
annotations: ai.fleet.com/discombobulate: 'true'
- object
Similar to the standard
metadata.labels
field.Example:
labels: environment: production app: warp-drive-ai
- array of objects
Similar to the
.spec.taints
field of the Node object.Caution! Only
effect
,key
,value
fields are available.Example:
taints: - effect: NoExecute key: ship-class value: frigate
- string
Required value
The type of nodes this group provides:
CloudEphemeral
— nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;CloudPermanent
— nodes from ProviderClusterConfiguration will be created via dhctl;CloudStatic
— a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud providers. This node has the CSI running, and it is managed by the cloud-controller-manager: theNode
object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes;Static
— a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even if one of the cloud providers is enabled.
Allowed values:
CloudEphemeral
,CloudPermanent
,CloudStatic
,Static
- object
A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects.
- array of objects
matchExpressions is a list of label selector requirements. The requirements are ANDed.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
- array of strings
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
- object
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.
Describes the runtime parameters of the node group.
- string
The time interval to use for the chaos monkey (can be specified in the Go format).
Default:
"6h"
Pattern:
^[0-9]+[mh]{1}$
- object
Parameter for provisioning the cloud-based VMs.
Caution! Can only be used together with
nodeType: CloudEphemeral
.- object
Required value
The reference to the
InstanceClass
object. It is unique for eachcloud-provider-*
module.- string
The object type (e.g.,
OpenStackInstanceClass
). The object type is specified in the documentation of the correspondingcloud-provider-
module.Allowed values:
OpenStackInstanceClass
,GCPInstanceClass
,VsphereInstanceClass
,AWSInstanceClass
,YandexInstanceClass
,AzureInstanceClass
,VCDInstanceClass
,ZvirtInstanceClass
,DynamixInstanceClass
- integer
The maximum number of unavailable instances (during rollout) in the group in each zone.
Default:
0
Allowed values:
0 <= X
- integer or string
The summary number of overprovisioned nodes for this
NodeGroup
in all zones.An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
Pattern:
^[0-9]+%?$
- object
Amount of reserved resources.
Used to determine whether to order overprovisioned nodes.
- object
Describes the resources that will not be held (consumed) by the standby holder.
- integer or string
Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
Pattern:
^[0-9]+m?$
- integer or string
Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
Pattern:
^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
- object
Containerd runtime parameters.
If used,
cri.type
must be set toContainerd
. - string
Container runtime type.
Value
defaultCRI
from the initial cluster configration (cluster-configuration.yaml
parameter from thed8-cluster-configuration
secret in thekube-system
namespace) is used if not specified.Note! the
Docker
is deprecated.Allowed values:
Docker
,Containerd
,NotManaged
- object
Disruptions settings for nodes.
Example:
disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
- string
The approval mode for disruptive updates:
Manual
— disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! Themaster
node group update mode must beManual
to avoid issues with draining.Automatic
— automatically approve disruption-involving updates.RollingUpdate
— in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
If the
RollingUpdate
mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In theRollingUpdate
mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, theRollingUpdate
mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.Default:
"Automatic"
Allowed values:
Manual
,Automatic
,RollingUpdate
- boolean
Drain Pods from the nodes before approving disruption.
Caution! This setting ignores (nodes will be approved without draining Pods):
- for the nodeGroup
master
with a single node; - for a single
ready
node in a nodeGroup picked out for Deckhouse placement.
Default:
true
- for the nodeGroup
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- object
Specification of some of the fields that will be maintained in all nodes of the group.
Example:
nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
- object
Similar to the standard
metadata.annotations
field.Example:
annotations: ai.fleet.com/discombobulate: 'true'
- object
Similar to the standard
metadata.labels
field.Example:
labels: environment: production app: warp-drive-ai
- array of objects
Similar to the
.spec.taints
field of the Node object.Caution! Only
effect
,key
,value
fields are available.Example:
taints: - effect: NoExecute key: ship-class value: frigate
- string
Required value
The type of nodes this group provides.
Cloud
— nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;Static
— a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even of one of the cloud providers is enabled;Hybrid
— a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud provider. This node has the CSI running, and it is managed by the cloud-controller-manager: the Node object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes.
Allowed values:
Cloud
,Static
,Hybrid
Defines the runtime parameters of a node group.
- string
The time interval to use for the chaos monkey (can be specified in the Go format).
Default:
"6h"
Pattern:
^[0-9]+[mh]{1}$
- object
Parameter for provisioning the cloud-based VMs.
Caution! Can only be used together with
nodeType: CloudEphemeral
.- object
Required value
The reference to the
InstanceClass
object. It is unique for eachcloud-provider-*
module.- string
The object type (e.g.,
OpenStackInstanceClass
). The object type is specified in the documentation of the correspondingcloud-provider-
module.Allowed values:
OpenStackInstanceClass
,GCPInstanceClass
,VsphereInstanceClass
,AWSInstanceClass
,YandexInstanceClass
,AzureInstanceClass
,VCDInstanceClass
,ZvirtInstanceClass
,DynamixInstanceClass
- integer
Required value
The maximum number of instances for the group in each zone.
This value is used as the upper bound in cluster-autoscaler.
With a value of 0, you need to set capacity for some
InstanceClass
. Get more details in the description of the necessaryInstanceClass
.Allowed values:
0 <= X
- integer
The maximum number of unavailable instances (during rollout) in the group in each zone.
Default:
0
Allowed values:
0 <= X
- integer or string
The summary number of overprovisioned nodes for this
NodeGroup
all zones.An overprovisioned node is a cluster node on which resources are reserved that are available at any time for scaling. The presence of such a node allows the cluster autoscaler not to wait for node initialization (which may take several minutes), but to immediately place a load on it.
The value can be an absolute number (for example, 2) or a percentage of desired nodes (for example, 10%). If a percentage is specified, the absolute number is calculated based on the percentage of the maximum number of nodes (the maxPerZone parameter) rounded down, but not less than one.
Pattern:
^[0-9]+%?$
- object
Amount of reserved resources.
Used to determine whether to order overprovisioned nodes.
- object
Describes the resources that will not be held (consumed) by the standby holder.
- integer or string
Describes the amount of CPU that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of cpus (for example, 2) as well as a milli representation (for example, 1500m).
Pattern:
^[0-9]+m?$
- integer or string
Describes the amount of memory that will not be held by standby holder on Nodes from this NodeGroup.
The value can be an absolute number of bytes (for example, 128974848) as well as a fixed-point number using one of memory suffixes: G, Gi, M, Mi.
Pattern:
^[0-9]+(\.[0-9]+)?(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)?$
- object
Containerd runtime parameters.
If used,
cri.type
must be set toContainerd
. - string
Container runtime type.
Value
defaultCRI
from the initial cluster configration (cluster-configuration.yaml
parameter from thed8-cluster-configuration
secret in thekube-system
namespace) is used if not specified.Note! the
Docker
is deprecated.Allowed values:
Docker
,Containerd
,NotManaged
- object
Disruptions settings for nodes.
Example:
disruptions: approvalMode: Automatic automatic: drainBeforeApproval: false windows: - from: '06:00' to: '08:00' days: - Tue - Sun
- string
The approval mode for disruptive updates:
Manual
— disable automatic disruption approval; the alert will be displayed if disruption is needed. Caution! Themaster
node group update mode must beManual
to avoid issues with draining.Automatic
— automatically approve disruption-involving updates.RollingUpdate
— in this mode, a new node with new settings will be created; then, the old node will be deleted. Available only for cloud nodes.
If the
RollingUpdate
mode is not used, when updating, the node is first drained and then updated (rebooted) and put back into operation (uncordoned). Note that in this case, the cluster must have sufficient resources to accommodate the load while the node being updated is unavailable. In theRollingUpdate
mode, the node is replaced by the updated node, i.e., an extra node appears in the cluster for the duration of the update. In cloud infrastructures, theRollingUpdate
mode is convenient, for example, if there are no resources in the cluster to temporarily host the load from the node being updated.Default:
"Automatic"
Allowed values:
Manual
,Automatic
,RollingUpdate
- boolean
Drain Pods from the nodes before approving disruption.
Caution! This setting ignores (nodes will be approved without draining Pods):
- for the nodeGroup
master
with a single node; - for a single
ready
node in a nodeGroup picked out for Deckhouse placement.
Default:
true
- for the nodeGroup
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- array of strings
Days of the week when node could be updated.
Examples:
days: Mon
days: Wed
- object
Docker settings for nodes.
If used,
cri.type
must be set toDocker
.Note! the
Docker
is deprecated. - string
The desired minor version of Kubernetes.
By default, it corresponds to the version selected for the cluster globally (see installation documentation) or to the current version of the control plane (if the global version is not defined).
Allowed values:
1.27
,1.28
,1.29
,1.30
,1.31
Example:
kubernetesVersion: '1.29'
- object
Specification of some of the fields that will be maintained in all nodes of the group.
Example:
nodeTemplate: labels: environment: production app: warp-drive-ai annotations: ai.fleet.com/discombobulate: 'true' taints: - effect: NoExecute key: ship-class value: frigate
- object
Similar to the standard
metadata.annotations
field.Example:
annotations: ai.fleet.com/discombobulate: 'true'
- object
Similar to the standard
metadata.labels
field.Example:
labels: environment: production app: warp-drive-ai
- array of objects
Similar to the
.spec.taints
field of the Node object.Caution! Only
effect
,key
,value
fields are available.Example:
taints: - effect: NoExecute key: ship-class value: frigate
- string
Required value
The type of nodes this group provides.
Cloud
— nodes for this group will be automatically created (and deleted) in the cloud of the specified cloud provider;Static
— a static node hosted on a bare metal or virtual machine. The cloud-controller-manager does not manage the node even of one of the cloud providers is enabled;Hybrid
— a static node (created manually or using any external tools) hosted in the cloud integrated with one of the cloud provider. This node has the CSI running, and it is managed by the cloud-controller-manager: the Node object automatically gets the information about the zone and region based on the cloud data; if a node gets deleted from the cloud, its corresponding Node object will be deleted in Kubernetes.
Allowed values:
Cloud
,Static
,Hybrid
NodeGroupConfiguration
Scope: Cluster
Version: v1alpha1
Executes bash scripts on nodes.
Read more in the module documentation.
- array of strings
Required value
Bundles for step execution. You can set
'*'
for selecting all bundles.See the list of possible bundles in the allowedBundles module parameter.
Examples:
bundles: - ubuntu-lts - centos-7
bundles: - ubuntu-lts
bundles: - "*"
- string
Required value
A bash script that does the same things you would do in a configuration step.
You can use Go Template to generate a script.
The list of parameters available for use in templates can be retrieved from the
bashible-apiserver-context
Secret as follows:kubectl -n d8-cloud-instance-manager get secrets bashible-apiserver-context -o jsonpath='{.data.input\.yaml}'| base64 -d
For example:
{{- range .nodeUsers }} echo 'Tuning environment for user {{ .name }}' # Some code for tuning user environment {{- end }}
You can also use the pre-defined bashbooster commands in the script. For example:
bb-event-on 'bb-package-installed' 'post-install' post-install() { bb-log-info "Setting reboot flag due to kernel was updated" bb-flag-set reboot }
NodeUser
Scope: Cluster
Defines the linux users to create on all nodes.
The user’s home directory is created in the /home/deckhouse/
directory.
Deprecated resource. Support for the resource might be removed in a later release.
Defines the linux users to create on all nodes.
The user’s home directory is created in the /home/deckhouse/
directory.
SSHCredentials
Scope: Cluster
Version: v1alpha1
Contains credentials required by Cluster API Provider Static (CAPS) to connect over SSH. CAPS connects to the server (virtual machine) defined in the StaticInstance custom resource to manage its state.
A reference to this resource is specified in the credentialsRef parameter of the StaticInstance
resource.
- string
APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info…
- string
Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info…
StaticInstance
Scope: Cluster
Version: v1alpha1
StaticInstance describes a machine for the Cluster API Provider Static.
- string
APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info…
- string
Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info…