CloudPermanent node management

This page describes the architecture of the node-manager module for CloudPermanent nodes.

Module architecture

The following simplifications are made in the diagram:

The diagram shows containers in different pods interacting directly with each other. In reality, they communicate via the corresponding Kubernetes Services (internal load balancers). Service names are omitted if they are obvious from the diagram context. Otherwise, the Service name is shown above the arrow.
Pods may run multiple replicas. However, each pod is shown as a single replica in the diagram.

The Level 2 C4 architecture of the node-manager module and its interactions with other Deckhouse Kubernetes Platform (DKP) components are shown in the following diagram:

Node-manager architecture for CloudPermanent nodes

Module components

Bashible is a key component of the Cluster & Infrastructure subsystem that enables the operation of the node-manager module. However, it is not part of the module itself, as it runs at the OS level as a system service. For Bashible details, refer to the corresponding documentation section.

The module managing CloudPermanent nodes consists of the following components:

Bashible-api-server: A Kubernetes Extension API Server deployed on master nodes. It generates bashible scripts from templates stored in custom resources. When kube-apiserver receives a request for resources containing bashible bundles, it forwards the request to bashible-api-server and returns the generated result. For more details about bashible and bashible-api-server, refer to the corresponding documentation section.
Early-oom (DaemonSet): A pod deployed on every node. It reads resource load metrics from /proc and terminates pods under high load before kubelet does. Enabled by default, but can be disabled in the module configuration if it causes issues for normal node operation.

Includes the following containers:
- psi-monitor: Monitors the PSI (Pressure Stall Information) metric, which reflects how long processes wait for resources such as CPU, memory, or I/O.
- kube-rbac-proxy: Sidecar container providing an RBAC-based authorization proxy for secure access to the early-oom metrics.
Fencing-agent (DaemonSet): Deployed to a specific node group when the spec.fencing parameter of the NodeGroup custom resource is enabled.

After startup, the agent activates the Watchdog timer and sets the label node-manager.deckhouse.io/fencing-enabled on the node. The agent periodically checks Kubernetes API availability. If the API is reachable, it sends a signal to the Watchdog, resetting the timer. It also monitors maintenance labels on the node and enables or disables Watchdog accordingly.

The Linux kernel softdog module is used as the Watchdog with parameters soft_margin=60 and soft_panic=1. This means the timeout is 60 seconds. If the timeout expires, a kernel panic occurs, and the node remains in that state until manually rebooted.

Consists of a single container:
- fencing-agent: Performs the checks described above and writes to /dev/watchdog to signal the Watchdog.
Fencing-controller: A controller that watches all nodes labeled with node-manager.deckhouse.io/fencing-enabled.

If a node is unavailable for more than 60 seconds, the controller deletes all pods from that node and then removes the node itself.

Module interactions

The module interacts with the following components:

Kube-apiserver:
- Manages Node resources.
- Authorizes metric requests.
Node filesystem:
- /proc: Reads PSI metrics for OOM handling.
- /dev/watchdog: Sends signals to reset the Watchdog timer.

The following external components interact with the module:

Kube-apiserver:
- Forwards requests for bashible resources to bashible-api-server.
Prometheus-main:
- Collects metrics from node-manager module components.

Architecture features specific to CloudPermanent nodes

Nodes are persistent and are created, managed, and deleted by the user. Node management is performed not directly in the infrastructure but via the dhctl utility executed as part of the DKP installer.
Terraform-manager is a module used for automated management of cloud infrastructure resources. It checks the Terraform state and applies non-destructive changes to infrastructure resources. The module architecture is described on the corresponding documentation page.
Csi-driver is used to provision disks in the cloud infrastructure.
Cloud-controller-manager is used to provision load balancers and other infrastructure resources according to its specification.
Infrastructure-provider is not required. All node management operations are performed by the user via the dhctl utility and the terraform-manager module.
Automatic node scaling is not supported.

Module architecture

Module components

Module interactions

Architecture features specific to CloudPermanent nodes

Additional resources

An error has occurred

Tell us what you didn’t like.

CloudPermanent node management

Module architecture

Module components

Module interactions

Architecture features specific to CloudPermanent nodes

Additional resources

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error