How do I add a master node to a static or hybrid cluster?
It is important to have an odd number of masters to ensure a quorum.
Adding a master node to a static or hybrid cluster has no difference from adding a regular node to a cluster. To do this, use the corresponding instruction. All the necessary actions to configure a cluster control plane components on the new master nodes are performed automatically. Wait until the master nodes appear in Ready
status.
How do I add a master nodes to a cloud cluster (single-master to a multi-master)?
Before adding nodes, ensure you have the required quotas in the cloud provider.
It is important to have an odd number of masters to ensure a quorum.
-
Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):
DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') \ DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}') \ docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \ registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
-
In the installer container, run the following command and specify the required number of replicas using the
masterNodeGroup.replicas
parameter:dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \ --ssh-host <MASTER-NODE-0-HOST>
-
In the installer container, run the following command to start scaling:
dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
-
Wait until the required number of master nodes are
Ready
and allcontrol-plane-manager
instances are up and running:kubectl -n kube-system wait pod --timeout=10m --for=condition=ContainersReady -l app=d8-control-plane-manager
How do I reduce the number of master nodes in a cloud cluster (multi-master to single-master)?
-
Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):
DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') \ DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}') \ docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \ registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
-
Run the following command in the installer container and set
masterNodeGroup.replicas
to1
:dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> \ --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
- Remove the following labels from the master nodes to be deleted:
node-role.kubernetes.io/control-plane
node-role.kubernetes.io/master
node.deckhouse.io/group
Use the following command to remove labels:
kubectl label node <MASTER-NODE-N-NAME> node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-
-
Make sure that the master nodes to be deleted are no longer listed as etcd cluster members:
kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) -- sh -c \ "ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \ --endpoints https://127.0.0.1:2379/ member list -w table"
-
Drain the nodes being deleted:
kubectl drain <MASTER-NODE-N-NAME> --ignore-daemonsets --delete-emptydir-data
-
Shut down the virtual machines corresponding to the nodes to be deleted, remove the instances of those nodes from the cloud and the disks connected to them (
kubernetes-data-master-<N>
). -
In the cluster, delete the Pods running on the nodes being deleted:
kubectl delete pods --all-namespaces --field-selector spec.nodeName=<MASTER-NODE-N-NAME> --force
-
In the cluster, delete the Nore objects associated with the nodes being deleted:
kubectl delete node <MASTER-NODE-N-NAME>
-
In the installer container, run the following command to start scaling:
dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
How do I dismiss the master role while keeping the node?
- Remove the
node.deckhouse.io/group: master
andnode-role.kubernetes.io/control-plane: ""
labels. -
Make sure that the master node to be deleted is no longer listed as a member of the etcd cluster:
kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) -- sh -c \ "ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \ --endpoints https://127.0.0.1:2379/ member list -w table"
-
Exec to the node and run the following commands:
rm -f /etc/kubernetes/manifests/{etcd,kube-apiserver,kube-scheduler,kube-controller-manager}.yaml rm -f /etc/kubernetes/{scheduler,controller-manager}.conf rm -f /etc/kubernetes/authorization-webhook-config.yaml rm -f /etc/kubernetes/admin.conf /root/.kube/config rm -rf /etc/kubernetes/deckhouse rm -rf /etc/kubernetes/pki/{ca.key,apiserver*,etcd/,front-proxy*,sa.*} rm -rf /var/lib/etcd/member/
How do I switch to a different OS image in a multi-master cluster?
-
Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):
DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') \ DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}') \ docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \ registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
-
In the installer container, run the following command and specify the required OS image using the
masterNodeGroup.instanceClass
parameter (specify the addresses of all master nodes using the-ssh-host
parameter):dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \ --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
Repeat the steps below for each master node one by one, starting with the node with the highest number (suffix 2) and ending with the node with the lowest number (suffix 0).
-
Select the master node to update (enter its name):
NODE="<MASTER-NODE-N-NAME>"
-
Run the following command to remove the
node-role.kubernetes.io/control-plane
,node-role.kubernetes.io/master
, andnode.deckhouse.io/group
labels from the node:kubectl label node ${NODE} \ node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-
-
Make sure that the node is no longer listed as an etcd cluster member:
kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) -- sh -c \ "ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \ --endpoints https://127.0.0.1:2379/ member list -w table"
-
Drain the node:
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data
-
Shut down the virtual machine associated with the node, remove the node instance from the cloud and the disks connected to it (
kubernetes-data
). -
In the cluster, delete the Pods remaining on the node being deleted:
kubectl delete pods --all-namespaces --field-selector spec.nodeName=${NODE} --force
-
In the cluster, delete the Node object for the node being deleted:
kubectl delete node ${NODE}
-
In the installer container, run the following command to create the updated node:
dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \ --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
-
On the newly created node, check the systemd-unit log for the
bashible.service
. Wait until the node configuration is complete (you will see a messagenothing to do
in the log):journalctl -fu bashible.service
-
Make sure the node is listed as an etcd cluster member:
kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) -- sh -c \ "ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \ --endpoints https://127.0.0.1:2379/ member list -w table"
-
Make sure
control-plane-manager
is running on the node:kubectl -n kube-system wait pod --timeout=10m --for=condition=ContainersReady \ -l app=d8-control-plane-manager --field-selector spec.nodeName=${NODE}
-
Proceed to update the next node (repeat the steps above).
How do I switch to a different OS image in a single-master cluster?
-
Convert your single-master cluster to a multi-master one, as described in the guide on adding master nodes to a cluster.
In addition to increasing the number of replicas, you can also specify the image with the required OS version using the
masterNode.instanceClass
parameter. - Update the master nodes following the instructions.
- Convert your multi-master cluster to a single-master one according to the guide on excluding master nodes from the cluster.
How do I view the list of etcd members?
Option 1
-
Exec to the etcd Pod:
kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) sh
-
Execute the command:
ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt \ --key /etc/kubernetes/pki/etcd/ca.key --endpoints https://127.0.0.1:2379/ member list
Option 2
Use the etcdctl endpoint status
command. The fith parameter in the output table will be true
for the leader.
Example:
$ ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt \
--key /etc/kubernetes/pki/etcd/ca.key --endpoints https://127.0.0.1:2379/ endpoint status
https://10.2.1.101:2379, ade526d28b1f92f7, 3.5.3, 177 MB, false, false, 42007, 406566258, 406566258,
https://10.2.1.102:2379, d282ac2ce600c1ce, 3.5.3, 182 MB, true, false, 42007, 406566258, 406566258,
What if something went wrong?
The control-plane-manager saves backups to /etc/kubernetes/deckhouse/backup
. They can be useful in diagnosing the issue.
What if the etcd cluster fails?
- Stop (delete the
/etc/kubernetes/manifests/etcd.yaml
file) etcd on all nodes except one. This last node will serve as a starting point for the new multi-master cluster. - On the last node, edit etcd manifest
/etc/kubernetes/manifests/etcd.yaml
and add the parameter--force-new-cluster
tospec.containers.command
. - After the new cluster is ready, remove the
--force-new-cluster
parameter.
Caution! This operation is unsafe and breaks the guarantees given by the consensus protocol. Note that it brings the cluster to the state that was saved on the node. Any pending entries will be lost.
How do I configure additional audit policies?
-
Enable the auditPolicyEnabled flag in the module configuration:
apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true
-
Create the
kube-system/audit-policy
Secret containing abase64
-encodedyaml
file:apiVersion: v1 kind: Secret metadata: name: audit-policy namespace: kube-system data: audit-policy.yaml: <base64>
The minimum viable example of the
audit-policy.yaml
file looks as follows:apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: Metadata omitStages: - RequestReceived
You can find the detailed information about configuring the
audit-policy.yaml
file at the following links:
How to omit Deckhouse built-in policy rules?
Set the apiserver.basicAuditPolicyEnabled module parameter to false
.
An example:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: control-plane-manager
spec:
version: 1
settings:
apiserver:
auditPolicyEnabled: true
basicAuditPolicyEnabled: false
How stream audit log to stdout instead of files?
Set the apiserver.auditLog.output parameter to Stdout
.
An example:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: control-plane-manager
spec:
version: 1
settings:
apiserver:
auditPolicyEnabled: true
auditLog:
output: Stdout
How to deal with the audit log?
There must be some log scraper
on master nodes (log-shipper, promtail, filebeat) that will monitor the log file:
/var/log/kube-audit/audit.log
The following fixed parameters of log rotation are in use:
- The maximum disk space is limited to
1000 Mb
. - Logs older than
7 days
will be deleted.
Depending on the Policy
settings and the number of requests to the apiserver, the amount of logs collected may be high. Thus, in some cases, logs can only be kept for less than 30 minutes.
Cautionary note
Note that the current implementation of this feature isn’t safe and may lead to a temporary failure of the control plane.
The apiserver will not be able to start if there are unsupported options or a typo in the Secret.
If apiserver is unable to start, you have to manually disable the --audit-log-*
parameters in the /etc/kubernetes/manifests/kube-apiserver.yaml
manifest and restart apiserver using the following command:
docker stop $(docker ps | grep kube-apiserver- | awk '{print $1}')
# Or (depending on your CRI).
crictl stopp $(crictl pods --name=kube-apiserver -q)
After the restart, you will be able to fix the Secret or delete it:
kubectl -n kube-system delete secret audit-policy
How do I speed up the restart of Pods if the connection to the node has been lost?
By default, a node is marked as unavailable if it does not report its state for 40 seconds. After another 5 minutes, its Pods will be rescheduled to other nodes. Thus, the overall application unavailability lasts approximately 6 minutes.
In specific cases, if an application cannot run in multiple instances, there is a way to lower its unavailability time:
- Reduce the period required for the node to become
Unreachable
if the connection to it is lost by setting thenodeMonitorGracePeriodSeconds
parameter. - Set a lower timeout for evicting Pods on a failed node using the
failedNodePodEvictionTimeoutSeconds
parameter.
An example
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: control-plane-manager
spec:
version: 1
settings:
nodeMonitorGracePeriodSeconds: 10
failedNodePodEvictionTimeoutSeconds: 50
In this case, if the connection to the node is lost, the applications will be restarted in about 1 minute.
Cautionary note
Both these parameters directly impact the CPU and memory resources consumed by the control plane. By lowering timeouts, we force system components to send statuses more frequently and check the resource state more often.
When deciding on the appropriate threshold values, consider resources consumed by the control nodes (graphs can help you with this). Note that the lower parameters are, the more resources you may need to allocate to these nodes.
How do make etcd backup?
Login into any control-plane node with root
user and use next script:
#!/usr/bin/env bash
for pod in $(kubectl get pod -n kube-system -l component=etcd,tier=control-plane -o name); do
if kubectl -n kube-system exec "$pod" -- sh -c "ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key --endpoints https://127.0.0.1:2379/ snapshot save /tmp/${pod##*/}.snapshot" && \
kubectl -n kube-system exec "$pod" -- gzip -c /tmp/${pod##*/}.snapshot | zcat > "${pod##*/}.snapshot" && \
kubectl -n kube-system exec "$pod" -- sh -c "cd /tmp && sha256sum ${pod##*/}.snapshot" | sha256sum -c && \
kubectl -n kube-system exec "$pod" -- rm "/tmp/${pod##*/}.snapshot"; then
mv "${pod##*/}.snapshot" etcd-backup.snapshot
break
fi
done
In the current directory etcd snapshot file etcd-backup.snapshot
will be created from one of an etcd cluster members.
From this file, you can restore the previous etcd cluster state in the future.
Also, we recommend making a backup of the /etc/kubernetes
directory, which contains:
- manifests and configurations of control-plane components;
- Kubernetes cluster PKI. This directory will help to quickly restore a cluster in case of complete loss of control-plane nodes without creating a new cluster and without rejoin the remaining nodes into the new cluster.
We recommend encrypting etcd snapshot backups as well as backup of the directory /etc/kubernetes/
and saving them outside the Deckhouse cluster.
You can use one of third-party files backup tools, for example: Restic, Borg, Duplicity, etc.
You can see here for learn about etcd disaster recovery procedures from snapshots.