Recovery from failures
During its operation DVP automatically creates backups of configuration and data that may be useful in case of problems. These backups are saved in the /etc/kubernetes/deckhouse/backup directory. If any issues or unexpected situations occur during operation, you can use these backups to restore the system to a previously healthy state.
Restoring etcd cluster functionality
If the etcd cluster is not functioning and cannot be restored from a backup, you can attempt to recover it from scratch by following the steps below.
- On all nodes that are part of your etcd cluster, except one, delete the
etcd.yamlmanifest located in/etc/kubernetes/manifests/. This will leave only one active node, from which the multi-master cluster state will be restored. - On the remaining node, open the
etcd.yamlmanifest and add the--force-new-clusterflag underspec.containers.command. - After the cluster is successfully restored, remove the
--force-new-clusterflag.
This operation is destructive: it completely wipes the existing data and initializes a new cluster based on the state preserved on the remaining node. All pending records will be lost.
Restoring a master node when kubelet fails to load control plane components
Such a situation may occur if images of the control plane components on the master were deleted in a cluster that has a single master node (e.g., the directory /var/lib/containerd was deleted). In this case, kubelet cannot pull images of the control plane components when restarted since the master node lacks authorization parameters required for accessing registry.deckhouse.io.
Below is an instruction on how you can restore the master node.
containerd
-
Execute the following command to restore the master node in any cluster running under DVP:
d8 k -n d8-system get secrets deckhouse-registry -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths."registry.deckhouse.io".auth' -
Copy the command’s output and use it for setting the
AUTHvariable on the corrupted master. -
Next, pull images of control plane components to the corrupted master:
for image in $(grep "image:" /etc/kubernetes/manifests/* | awk '{print $3}'); do crictl pull --auth $AUTH $image done -
Restart kubelet after pulling the images.
etcd restore
Viewing etcd cluster members
Option 1
Use the etcdctl member list command.
Example:
for pod in $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name); do
d8 k -n kube-system exec "$pod" -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
--endpoints https://127.0.0.1:2379/ member list -w table
if [ $? -eq 0 ]; then
break
fi
done
Warning. The last parameter in the output table shows etcd member is in learner state, is not in leader state.
Option 2
To obtain information about etcd cluster nodes in tabular form, use the etcdctl endpoint status command. For the leader, the IS LEADER column will show true.
Example:
for pod in $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name); do
d8 k -n kube-system exec "$pod" -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
--endpoints https://127.0.0.1:2379/ endpoint status --cluster -w table
if [ $? -eq 0 ]; then
break
fi
done
Restoring the etcd cluster in case of complete unavailability
- Stop all etcd nodes except one by deleting the
etcd.yamlmanifest on the others. - On the remaining node, add the
--force-new-clusteroption to the etcd startup command. - After the cluster is restored, remove this option.
Be careful: these actions completely erase the previous data and form a new etcd cluster.
Recovering etcd after panic: unexpected removal of unknown remote peer error
In some cases, manual restoration via etcdutl snapshot restore can help:
- Save a local snapshot from
/var/lib/etcd/member/snap/db. - Use
etcdutlwith the--force-new-clusteroption to restore. - Completely wipe the
/var/lib/etcddirectory and place the restored snapshot there. - Remove any “stuck” etcd/kube-apiserver containers and restart the node.
Actions to take when etcd database exceeds quota-backend-bytes limit
When the database volume of etcd reaches the limit set by the quota-backend-bytes parameter, it switches to “read-only” mode. This means that the etcd database stops accepting new entries but remains available for reading data. You can tell that you are facing a similar situation by executing the command:
d8 k -n kube-system exec -ti $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name | sed -n 1p) -- \
etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
--endpoints https://127.0.0.1:2379/ endpoint status -w table --cluster
This command uses substitution: $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name | sed -n 1p).
It automatically inserts the name of the first Pod matching the specified labels.
If you see a message like alarm:NOSPACE in the ERRORS field, you need to take the following steps:
- Make change to
/etc/kubernetes/manifests/etcd.yaml— find the line with--quota-backend-bytesand increase the value by multiplying the specified number by two. If there is no such line — add, for example:- --quota-backend-bytes=8589934592— this sets the limit to 8 GB. -
Disarm the active alarm that occurred due to reaching the limit. To do this, execute the command:
d8 k -n kube-system exec -ti $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name | sed -n 1p) -- \ etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \ --endpoints https://127.0.0.1:2379/ alarm disarm - Change the
maxDbSizeparameter in thecontrol-plane-managersettings to match the value specified in the manifest.
High availability
If any component of the control plane becomes unavailable, the cluster temporarily maintains its current state but cannot process new events. For example:
- If
kube-controller-managerfails, Deployment scaling will stop working. - If
kube-apiserveris unavailable, no requests can be made to the Kubernetes API, although existing applications will continue to function.
However, prolonged unavailability of control plane components disrupts the processing of new objects, handling of node failures, and other operations. Over time, this can lead to cluster degradation and impact user applications.
To mitigate these risks, the control plane should be scaled to a high-availability configuration — a minimum of three nodes. This is especially critical for etcd, which requires a quorum to elect a leader. The quorum works on a majority basis (N/2 + 1) of the total number of nodes.
Example:
| Cluster size | Quorum (majority) | Max fault tolerance |
|---|---|---|
| 1 | 1 | 0 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
| 9 | 5 | 4 |
An even number of nodes does not improve fault tolerance but increases replication overhead.
In most cases, three etcd nodes are sufficient. Use five if high availability is critical. More than seven is rarely necessary and not recommended due to high resource consumption.
After new control plane nodes are added:
- The label
node-role.kubernetes.io/control-plane=""is applied. - A DaemonSet launches control plane pods on the new nodes.
- DVP creates or updates files in
/etc/kubernetes: manifests, configuration files, certificates, etc. - All DVP modules that support high availability will enable it automatically, unless the global setting
highAvailabilityis manually overridden.
Control plane node removal is performed in reverse:
- Labels
node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master, andnode.deckhouse.io/groupare removed. - DVP removes its pods from these nodes.
- etcd members on the nodes are automatically deleted.
- If the number of nodes drops from two to one, etcd may enter
readonlymode. In this case, you must start etcd with the--force-new-clusterflag, which should be removed after a successful startup.