How do I add a master node?

Static or hybrid cluster

Adding a master node to a static or hybrid cluster has no difference from adding a regular node to a cluster. To do this, use the corresponding instruction. All the necessary actions to configure a cluster control plane components on the new master nodes are performed automatically. Wait until the master nodes appear in Ready status.

Cloud cluster

Make sure you have all the necessary quota limits, before adding nodes.

To add one or more master nodes to a cloud cluster, follow these steps:

  • Determine the Deckhouse version and edition used in the cluster by running the following command on the master node or a host with configured kubectl access to the cluster:
    kubectl -n d8-system get deployment deckhouse \
    -o jsonpath='version-{.metadata.annotations.core\.deckhouse\.io\/version}, edition-{.metadata.annotations.core\.deckhouse\.io\/edition}' \
    | tr '[:upper:]' '[:lower:]'
  • Run the corresponding version and edition of the Deckhouse installer:
    docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \<DECKHOUSE_EDITION>/install:<DECKHOUSE_VERSION> bash

    For example, if the Deckhouse version in the cluster is v1.28.0 and the Deckhouse edition is ee, the command to run the installer will be:

    docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" bash

    Change the container registry address if necessary (e.g, if you use an internal container registry).

  • Run the following command inside the installer container (use the --ssh-bastion-* parameters if using a bastion host):
    dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
    --ssh-host <SSH_HOST>
  • Specify the required number of master node replicas in the masterNodeGroup.replicas field and save changes.
  • Start scaling process by running the following command (specify the appropriate cluster access parameters, as in the previous step):
    dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <SSH_HOST>
  • Answer Yes to the question Do you want to CHANGE objects state in the cloud?.

All the other actions are performed automatically. Wait until the master nodes appears in Ready status.

How do I delete the master node?

  1. Does the deletion lead to the etcd cluster losing its quorum?
    • If the deletion does not lead to the etcd cluster losing its quorum:
      • If a virtual machine with a master node can be deleted (there are no other necessary services on it), then you can delete the virtual machine in the usual way.
      • If you can’t delete the master right away (for example, it is used for backups or it is involved in the deployment process), then you have to stop the Container Runtime on the node: In the case of Docker:
         systemctl stop docker
         systemctl disable docker

        In the case of Containerd:

         systemctl stop containerd
         systemctl disable containerd
         kill $(ps ax | grep containerd-shim | grep -v grep |awk '{print $1}')
    • If the deletion may result in etcd losing its quorum (the 2 -> 1 mirgation), stop kubelet on the node (without stopping the etcd container):

       systemctl stop kubelet
       systemctl stop bashible.timer
       systemctl stop bashible
       systemctl disable kubelet
       systemctl disable bashible.timer
       systemctl disable bashible
  2. Delete the Node object from Kubernetes;
  3. Wait until the etcd member is automatically deleted;

How do I dismiss the master role while keeping the node?

  1. Remove the master and "" labels, then wait for the etcd member to be automatically deleted.
  2. Exec to the node and run the following commands:
      rm -f /etc/kubernetes/manifests/{etcd,kube-apiserver,kube-scheduler,kube-controller-manager}.yaml
      rm -f /etc/kubernetes/{scheduler,controller-manager}.conf
      rm -f /etc/kubernetes/authorization-webhook-config.yaml
      rm -f /etc/kubernetes/admin.conf /root/.kube/config
      rm -rf /etc/kubernetes/deckhouse
      rm -rf /etc/kubernetes/pki/{ca.key,apiserver*,etcd/,front-proxy*,sa.*}
      rm -rf /var/lib/etcd

How do I view the list of etcd members?

  1. Exec to the etcd Pod:
      kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) sh
  2. Execute the command:
      ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key --endpoints member list

What if something went wrong?

The control-plane-manager saves backups to /etc/kubernetes/deckhouse/backup. They can be useful in diagnosing the issue.

What if the etcd cluster fails?

  1. Stop (delete the /etc/kubernetes/manifests/etcd.yaml file) etcd on all nodes except one. This last node will serve as a starting point for the new multi-master cluster;
  2. On the last node, edit etcd manifest /etc/kubernetes/manifests/etcd.yaml and add the parameter --force-new-cluster to spec.containers.command;
  3. After the new cluster is ready, remove the --force-new-cluster parameter.

Caution! This operation is unsafe and breaks the guarantees given by the consensus protocol. Note that it brings the cluster to the state that was saved on the node. Any pending entries will be lost.

How do I enable event auditing?

Kubernetes Auditing can help you if you need to keep track of operations or troubleshoot the cluster. You can configure it by setting the appropriate Audit Policy.

Currently, the following fixed parameters of log rotation are in use:


There must be some log scraper on master nodes (filebeat, promtail) that will monitor the log directory:


Depending on the Policy settings and the number of requests to the apiserver, the amount of logs collected may be high. Thus, in some cases, logs can only be kept for less than 30 minutes. The maximum disk space for logs is limited to 1000 MB. Logs older than 7 days will also be deleted.

Cautionary note

⚠️ Note that the current implementation of this feature isn’t safe and may lead to a temporary failure of the control-plane.

The apiserver will not be able to start if there are unsupported options or a typo in the secret.

If apiserver is unable to start, you have to manually disable the --audit-log-* parameters in the /etc/kubernetes/manifests/kube-apiserver.yaml manifest and restart apiserver using the following command:

docker stop $(docker ps | grep kube-apiserver- | awk '{print $1}')

After the restart, you will be able to fix the Secret or delete it.

Enabling and configuring

The following parameter in the d8-system/deckhouse ConfigMap enables the audit:

  controlPlaneManager: |
      auditPolicyEnabled: true

The parameters are configured via the kube-system/audit-policy Secret. You need to put in it a base64-encoded yaml file:

apiVersion: v1
kind: Secret
  name: audit-policy
  namespace: kube-system
  audit-policy.yaml: <base64>

An example

The minimum viable example of the audit-policy.yaml file looks as follows:

kind: Policy
- level: Metadata
  - RequestReceived

You can find the detailed information about configuring the audit-policy.yaml file at the following links:

Useful commands

Create a Secret from the file:

kubectl -n kube-system create secret generic audit-policy --from-file=./audit-policy.yaml

Delete a Secret from the cluster:

kubectl -n kube-system delete secret audit-policy

How do I speed up the restart of Pods if the connection to the node has been lost?

By default, a node is marked as unavailable if it does not report its state for 40 seconds. After another 5 minutes, its Pods will be rescheduled to other nodes. Thus, the overall application unavailability lasts approximately 6 minutes.

In specific cases, if an application cannot run in multiple instances, there is a way to lower its unavailability time:

  1. Reduce the period required for the node to become Unreachable if the connection to it is lost by setting the nodeMonitorGracePeriodSeconds parameter.
  2. Set a lower timeout for evicting Pods on a failed node using the failedNodePodEvictionTimeoutSeconds parameter.

An example:

  controlPlaneManager: |
    nodeMonitorGracePeriodSeconds: 10
    failedNodePodEvictionTimeoutSeconds: 50

In this case, if the connection to the node is lost, the applications will be restarted in about 1 minute.

Cautionary note

Both these parameters directly impact the CPU and memory resources consumed by the control plane. By lowering timeouts, we force system components to send statuses more frequently and check the resource state more often.

When deciding on the appropriate threshold values, consider resources consumed by the control nodes (graphs can help you with this). Note that the lower parameters are, the more resources you may need to allocate to these nodes.