Available in editions:  CE, BE, SE, SE+, EE

The module lifecycle stageGeneral Availability

The control-plane-manager (CPM) module is responsible for managing the cluster’s control plane components. It runs on all master nodes of the cluster (nodes that have the node-role.kubernetes.io/control-plane: "" label).

The control-plane-manager:

  • Manages certificates required for the operation of the control plane (renews certificates and re-issues them in response to configuration changes, among other things). This feature allows the CPM to automatically maintain a secure control plane configuration and quickly add additional SANs for organizing secure access to the Kubernetes API.
  • Configures components. The CPM module automatically creates the required configs and manifests of the control plane components.
  • Upgrades/downgrades components. Makes sure that the versions of the components in the cluster are the same.
  • Manages the configuration of the etcd cluster and its members. The CPM module scales master nodes and migrates the cluster from single-master to multi-master (and vice versa).
  • Configures kubeconfig. The CPM module maintains up-to-date kubeconfig files on control-plane nodes. It generates, renews, and updates kubeconfigs for control-plane components and the admin kubeconfig (admin.conf). By default, it creates a symlink for the root user (/root/.kube/config -> admin.conf). When the user-authz module is enabled, the symlink can be turned off via the rootKubeconfigSymlink parameter in the control-plane-manager module (see FAQ). The CPM also hardens file permissions on admin.conf and super-admin.conf.
  • Extends scheduler functionality by integrating external plugins via webhooks. Manages by KubeSchedulerWebhookConfiguration resource. Allows more complex logic to be used in workload scheduling tasks within the cluster. For example:
    • placing data storage application pods closer to the data itself,
    • state-based node prioritization (network load, storage subsystem status, etc.),
    • dividing nodes into zones, etc.
  • Monitors the control plane. The CPM module organizes secure metrics collection and provides a basic set of monitoring rules for the control plane components. For details, refer to Control plane monitoring.

Managing certificates

The CPM module manages certificates of the control-plane components, such as:

  • Server certificates for kube-apiserver & etcd. These are stored in the secret d8-pki of the namespace kube-system:
    • the root CA kubernetes certificate (ca.crt & ca.key),
    • the root CA etcd certificate (etcd/ca.crt & etcd/ca.key),
    • the RSA certificate and the key for signing Service Accounts (sa.pub & sa.key),
    • the root CA certificate for the extension API servers (front-proxy-ca.key & front-proxy-ca.crt).
  • Client certificates for connecting control-plane components to each other. The CPM module issues, renews, and re-issues if something has changed (e.g., the SAN list). These certificates are stored on the nodes only:
    • The server-side API server certificate (apiserver.crt & apiserver.key).
    • The client-side certificate for connecting kube-apiserver to kubelet (apiserver-kubelet-client.crt & apiserver-kubelet-client.key).
    • The client-side certificate for connecting kube-apiserver to etcd (apiserver-etcd-client.crt & apiserver-etcd-client.key).
    • The client-side certificate for connecting kube-apiserver to the extension API servers (front-proxy-client.crt & front-proxy-client.key).
    • The server-side etcd certificate (etcd/server.crt & etcd/server.key).
    • The client-side certificate for connecting etcd to other cluster members (etcd/peer.crt & etcd/peer.key).
    • The client-side certificate for connecting kubelet to etcd for performing health-checks (etcd/healthcheck-client.crt & etcd/healthcheck-client.key).

Also, the CPM module lets you add the additional SANs to certificates (this way, you can quickly and effortlessly add more “entry points” to the Kubernetes API).

The CPM module automatically updates the kubeconfig configuration when certificates are changed.

Scaling

The CPM module supports control plane running in a single-master or multi-master mode.

In the single-master mode:

  • kube-apiserver only uses the etcd instance that is hosted on the same node;
  • A proxy server is configured on the node that responds to localhost, kube-apiserver responds to the IP address of the master node.

In the multi-master mode, control plane components are automatically deployed in a fault-tolerant manner:

  • kube-apiserver is configured to work with all etcd instances;
  • The additional proxy server that processes localhost requests is set up on each master node. By default, the proxy server sends requests to the local kube-apiserver instance. If it is unavailable, the proxy tries to connect to other kube-apiserver instances.

Scaling master nodes

The control-plane nodes are scaled automatically using the node-role.kubernetes.io/control-plane="" label:

  • Attaching the node-role.kubernetes.io/control-plane="" label to a node results in deploying control plane components on this node, connecting the new etcd node to the etcd cluster, and regenerating all the necessary certificates and config files.
  • Removing the node-role.kubernetes.io/control-plane="" label results in deleting all control plane components on a node, gracefully removing it from the etcd cluster, and regenerating all the necessary config files and certificates.

Manual etcd actions are required when decreasing the number of nodes from two to one. In all other cases, all the necessary actions are performed automatically. Please note that when scaling from any number of master nodes to one, sooner or later at the last step, the situation of scaling nodes from two to one will arise.

Dynamic terminated pod garbage collection threshold

Automatically configures the optimal --terminated-pod-gc-threshold based on cluster size:

  • Small clusters (less than 100 nodes): 1000 terminated Pods.
  • Medium clusters (from 100 to 300 nodes): 3000 terminated Pods.
  • Large clusters (from 300 nodes): 6000 terminated Pods.

Note. This feature only takes effect in environments where the --terminated-pod-gc-threshold parameter is configurable. On managed Kubernetes services (such as EKS, GKE, AKS), this setting is controlled by managed provider.

Configuring Control Plane Resource Requests

The module allows you to configure the total CPU and memory resource requests for control plane components on each master node: kube-apiserver, etcd, kube-controller-manager, and kube-scheduler.

Use the resourcesRequests parameter in the control-plane-manager ModuleConfig:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: control-plane-manager
spec:
  version: 3
  enabled: true
  settings:
    resourcesRequests:
      cpu: 1000m
      memory: 500Mi

The specified values are used as a common requests budget for control plane components on each master node. Deckhouse Kubernetes Platform (DKP) distributes this budget between control plane static pods when rendering their manifests.

These settings do not apply if the cluster control plane is managed by a cloud provider, for example in GKE, AKS, or EKS.

Version control

Patch versions of control plane components (i.e. within the minor version, for example, from 1.31.13 to 1.31.14) are upgraded automatically together with the DKP version updates. You can’t manage patch version upgrades.

Upgrading minor versions of control plane components (e.g. from 1.32.* to 1.33.*) can be managed using the kubernetesVersion parameter. It specifies the automatic update mode (if set to Automatic) or the desired minor version of the control plane. The default control plane version (to use with kubernetesVersion: Automatic) as well as a list of supported Kubernetes versions can be found in the documentation.

The control plane upgrade is performed in a safe way for both single-master and multi-master clusters. The API server may be temporarily unavailable during the upgrade. At the same time, it does not affect the operation of applications in the cluster and can be performed without scheduling a maintenance window.

If the target version (set with the kubernetesVersion parameter) does not match the current control plane version in the cluster, a smart strategy for changing component versions is applied:

  • General remarks
    • Updating in different NodeGroups is performed in parallel. Within each NodeGroup, nodes are updated sequentially, one at a time.
  • When upgrading:
    • Upgrades are carried out sequentially, one minor version at a time: 1.32 -> 1.33, 1.33 -> 1.34, 1.35 -> 1.36.
    • At each step, the control plane version is upgraded first, followed by kubelet upgrades on the cluster nodes.
  • When downgrading:
    • Successful downgrading is only guaranteed for a single version down from the maximum minor version of the control plane ever used in the cluster.
    • kubelets on the cluster nodes are downgraded first, followed by the control plane components.

Exposing the Kubernetes API

The kube-apiserver component is only available within the internal cluster network by default. This module solves the problem of providing simple and secure external access to the Kubernetes API from outside the cluster.

Via Ingress

By configuring the apiserver.publishAPI.ingress parameters, you can publish the API server on a dedicated domain (for more details, see the section on service domains in the documentation).

During configuration, you can specify:

  • The list of network addresses and subnets from which connections are allowed;
  • The Ingress controller on which the publication occurs;
  • Whether to use a manually provided TLS certificate, one obtained via cert-manager, or an automatic self-signed certificate.

By default, a special CA certificate will be generated and a kubeconfig generator will be automatically configured.

Via a Service of type LoadBalancer

By configuring the apiserver.publishAPI.loadBalancer parameters, you can create a service of type LoadBalancer named kube-system/d8-control-plane-apiserver.

During configuration, you can specify:

  • The list of network addresses and subnets from which connections are allowed;
  • The external port of the service;
  • Annotations on the service for load balancer provider settings.

Exposing the Kubernetes API over Ingress

The kube-apiserver component (without advanced configuration) is only accessible in the internal cluster network. This module enables easy and secure access to Kubernetes API from outside the cluster. The API server is exposed on a dedicated domain (for more details, see the section on service domains in the documentation).

When configuring, you can:

  • list network addresses from which connection is allowed;
  • list groups that are allowed to access the API server;
  • specify Ingress-controller to authenticate on.

By default, a special CA certificate will be generated and the kubeconfig generator will be automatically configured.

Auditing

Kubernetes Auditing can help you if you need to keep track of operations in your Namespaces or troubleshoot the cluster. You can configure it by setting the appropriate Audit Policy. As the result you will have the log file /var/log/kube-audit/audit.log containing audit events according to the configured Policy.

By default, in a cluster with DKP, a basic policy is created for logging events:

  • related to the creation, deletion, and changing of resources;
  • committed from the names of ServiceAccounts from the “system” Namespace kube-system, d8-*;
  • committed with resources in the “system” Namespace kube-system, d8-*.

A basic policy can be disabled by setting the basicAuditPolicyEnabled flag to false.

When OIDC authentication is configured, additional user information is included in audit logs under the user.extra field:

  • user-authn.deckhouse.io/name — user’s display name
  • user-authn.deckhouse.io/preferred_username — user’s preferred username
  • user-authn.deckhouse.io/dex-provider — Dex provider identifier (requires federated:id scope)

You can find how to set up policies in a special FAQ section.

Control plane monitoring

The module organizes secure metrics collection and provides a basic set of monitoring rules for the following cluster components:

  • kube-apiserver;
  • kube-controller-manager;
  • kube-scheduler;
  • kube-etcd.

Feature Gates

You can configure feature gates using the enabledFeatureGates parameter of the control-plane-manager ModuleConfig.

Changing the list of feature gates causes a restart of the corresponding component (for example, kube-apiserver, kube-scheduler, kube-controller-manager, kubelet).

The following example enables the ComponentFlagz and ComponentStatusz feature gates:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: control-plane-manager
spec:
  version: 3
  enabled: true
  settings:
    enabledFeatureGates:
      - ComponentFlagz
      - ComponentStatusz

If a feature gate is not supported or is deprecated, the monitoring system generates the D8ProblematicFeatureGateInUse alert indicating that the feature gate will not be applied.

The Kubernetes version update (controlled by the kubernetesVersion parameter) will not occur if the list of enabled feature gates for the new version of Kubernetes includes deprecated feature gates.

More information about feature gates is available in the Kubernetes documentation.

List of available feature gates...

Kubernetes 1.31

  • AuthorizeNodeWithSelectors
  • AuthorizeWithSelectors
  • ConcurrentWatchObjectDecode
  • CrossNamespaceVolumeDataSource
  • DisableAllocatorDualWrite
  • HPAScaleToZero
  • ImageVolume
  • InPlacePodVerticalScaling
  • JobManagedBy
  • KubeletPodResourcesGet
  • MaxUnavailableStatefulSet
  • MemoryQoS
  • MutatingAdmissionPolicy
  • OrderedNamespaceDeletion
  • ProcMountType
  • RecoverVolumeExpansionFailure
  • RelaxedEnvironmentVariableValidation
  • ResourceHealthStatus
  • SELinuxMount
  • SchedulerQueueingHints
  • StrictCostEnforcementForVAP
  • StrictCostEnforcementForWebhooks

Kubernetes 1.32

  • AllowUnsafeMalformedObjectDeletion
  • ComponentFlagz
  • ComponentStatusz
  • ConcurrentWatchObjectDecode
  • CrossNamespaceVolumeDataSource
  • DisableAllocatorDualWrite
  • HPAScaleToZero
  • ImageVolume
  • InPlacePodVerticalScaling
  • InPlacePodVerticalScalingAllocatedStatus
  • KubeletPodResourcesGet
  • MaxUnavailableStatefulSet
  • MemoryQoS
  • MutatingAdmissionPolicy
  • OrderedNamespaceDeletion
  • PodLifecycleSleepActionAllowZero
  • PodLogsQuerySplitStreams
  • ProcMountType
  • RelaxedDNSSearchValidation
  • ResourceHealthStatus
  • SELinuxChangePolicy
  • SELinuxMount
  • SchedulerAsyncPreemption

Kubernetes 1.33

  • AllowParsingUserUIDFromCertAuth
  • AllowUnsafeMalformedObjectDeletion
  • ComponentFlagz
  • ComponentStatusz
  • ConcurrentWatchObjectDecode
  • ContainerStopSignals
  • CrossNamespaceVolumeDataSource
  • DeploymentReplicaSetTerminatingReplicas
  • DisableAllocatorDualWrite
  • HPAConfigurableTolerance
  • HPAScaleToZero
  • ImageVolume
  • KubeletPSI
  • KubeletPodResourcesGet
  • ListFromCacheSnapshot
  • MaxUnavailableStatefulSet
  • MemoryQoS
  • MutatingAdmissionPolicy
  • PodLogsQuerySplitStreams
  • PodObservedGenerationTracking
  • PreferSameTrafficDistribution
  • ResourceHealthStatus
  • SELinuxMount
  • StrictIPCIDRValidation

Kubernetes 1.34

  • AllowUnsafeMalformedObjectDeletion
  • ComponentFlagz
  • ComponentStatusz
  • ConcurrentWatchObjectDecode
  • ContainerRestartRules
  • ContainerStopSignals
  • CrossNamespaceVolumeDataSource
  • DeploymentReplicaSetTerminatingReplicas
  • EnvFiles
  • HPAConfigurableTolerance
  • HPAScaleToZero
  • HostnameOverride
  • ImageVolume
  • MaxUnavailableStatefulSet
  • MemoryQoS
  • MutatingAdmissionPolicy
  • PodLogsQuerySplitStreams
  • ResourceHealthStatus
  • SELinuxMount
  • StrictIPCIDRValidation

Kubernetes 1.35

  • AllowUnsafeMalformedObjectDeletion
  • CRDObservedGenerationTracking
  • ComponentFlagz
  • ComponentStatusz
  • ConcurrentWatchObjectDecode
  • ConstrainedImpersonation
  • ContainerStopSignals
  • CrossNamespaceVolumeDataSource
  • HPAScaleToZero
  • MemoryQoS
  • MutablePVNodeAffinity
  • MutablePodResourcesForSuspendedJobs
  • MutableSchedulingDirectivesForSuspendedJobs
  • MutatingAdmissionPolicy
  • NodeDeclaredFeatures
  • PodLogsQuerySplitStreams
  • ResourceHealthStatus
  • RestartAllContainersOnContainerExits
  • SELinuxMount
  • StrictIPCIDRValidation
  • TaintTolerationComparisonOperators

Kubernetes 1.36

  • AllowUnsafeMalformedObjectDeletion
  • CRDObservedGenerationTracking
  • CRIListStreaming
  • ConcurrentWatchObjectDecode
  • ContainerStopSignals
  • ControllerManagerReleaseLeaderElectionLockOnExit
  • CrossNamespaceVolumeDataSource
  • HPAScaleToZero
  • MemoryQoS
  • MutablePVNodeAffinity
  • NativeHistograms
  • PersistentVolumeClaimUnusedSinceTime
  • PodLogsQuerySplitStreams
  • SELinuxMount
  • ShardedListAndWatch
  • TaintTolerationComparisonOperators

Protecting sensitive fields in custom resources

The CRDSensitiveData feature gate provides field-level protection for sensitive data in resources marked with the x-kubernetes-sensitive-data: true annotation. This feature is implemented as a patch to kube-apiserver (apiextensions-apiserver) and is supported starting from Kubernetes version 1.31.

The x-kubernetes-sensitive-data marker is validated by kube-apiserver when applying a resource:

  • Marker requires the CRDSensitiveData feature gate to be enabled. It is enabled by default and shouldn’t be specified manually.
  • Marker can’t be set on the root of the schema (the openAPIV3Schema node). To protect all fields of a resource, add the marker to the spec property (or a subtree below it), not to the schema root — the root also includes system fields (apiVersion, kind, metadata), which cannot be encrypted.
  • Field type must be one of the OpenAPI v3 types: string, integer, number, boolean, object, or array. Applying the marker to object or array makes the entire subtree sensitive.
  • Fields defined with x-kubernetes-int-or-string: true are supported.
  • Marker is not allowed inside anyOf, oneOf, allOf, or not branches (this is enforced by the structural schema validator).

If at least one field in the resource schema is marked with x-kubernetes-sensitive-data: true, the following protection mechanisms are applied to all custom resources of this type:

  • Encryption in etcd: Entire resource is encrypted using the same mechanism as Kubernetes Secrets. Requires enabling the apiserver.encryptionEnabled parameter.
  • RBAC-based field filtering: For get, list, and watch requests, sensitive fields are removed from API responses if the caller does not have the corresponding permissions on the <resource>/sensitive subresource.
  • Audit log masking: Values of sensitive fields are replaced with "******" in audit logs, regardless of RBAC permissions and audit level.

To add encryption in etcd to sensitive data protection, set the apiserver.encryptionEnabled parameter to true. The CRDSensitiveData feature gate is enabled by default and it shouldn’t be specified manually:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: control-plane-manager
spec:
  version: 3
  enabled: true
  settings:
    apiserver:
      encryptionEnabled: true

Enabling encryptionEnabled is irreversible and triggers a kube-apiserver restart.

For details, see the following sections:

  • FAQ: Instructions for enabling sensitive data protection.
  • Examples: Configuration examples and results.

Third-party components

List of third-party software used in the control-plane-manager module:

  • Kubernetes

    License: Apache License 2.0

    An open source system for managing containerized applications across multiple hosts.

  • Etcd 3.6.10

    License: Apache License 2.0

    A distributed reliable key-value store for the most critical data of a distributed system.