etcd backup

Automatic backup

Deckhouse creates a CronJob kube-system/d8-etcd-backup-*, which is triggered at 00:00 UTC+0. The etcd data backup is saved to the archive /var/lib/etcd/etcd-backup.tar.gz on all master nodes.

Manual backup using Deckhouse CLI

In Deckhouse v1.65 and higher clusters, etcd data backup can be created with a single d8 backup etcd command:

d8 backup etcd --kubeconfig $KUBECONFIG ./etcd.db

Manual backup with etcdctl

Not recommended for use in Deckhouse 1.65 and higher.

On Deckhouse v1.64 and earlier, run the following script on any master node as root:

#!/usr/bin/env bash
set -e

pod=etcd-`hostname`
d8 k -n kube-system exec "$pod" -- /usr/bin/etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key --endpoints https://127.0.0.1:2379/ snapshot save /var/lib/etcd/${pod##*/}.snapshot && \
mv /var/lib/etcd/"${pod##*/}.snapshot" etcd-backup.snapshot && \
cp -r /etc/kubernetes/ ./ && \
tar -cvzf kube-backup.tar.gz ./etcd-backup.snapshot ./kubernetes/
rm -r ./kubernetes ./etcd-backup.snapshot

The kube-backup.tar.gz file will be created in the current directory with a snapshot of the etcd database of one of the etcd cluster nodes. The resulting snapshot can be used to restore the state of the etcd cluster.

It is also recommended to backup the directory /etc/kubernetes, which contains:

We recommend storing backup copies of the etcd cluster snapshots, as well as a backup of the directory /etc/kubernetes/ in encrypted form outside the Deckhouse cluster. For this, you can use third-party file backup tools, such as Restic, Borg, Duplicity, etc.

Full cluster state recovery from etcd backup

The following are the steps for restoring a cluster to a previous state from a backup in case of complete data loss.

Recovering a cluster with one master node

To correctly recover a cluster with one master node, follow these steps:

  1. Download the etcdctl utility to the server (it is desirable that its version is the same as the etcd version in the cluster).

     wget "https://github.com/etcd-io/etcd/releases/download/v3.5.4/etcd-v3.5.4-linux-amd64.tar.gz"
     tar -xzvf etcd-v3.5.4-linux-amd64.tar.gz && mv etcd-v3.5.4-linux-amd64/etcdctl /usr/local/bin/etcdctl
    

    You can check the etcd version in your cluster by running the following command:

     d8 k -n kube-system exec -ti etcd-$(hostname) -- etcdctl version
    
  2. Stop etcd.

    Etcd runs as a static pod, so it’s enough to move the manifest file:

     mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml
    
  3. Backup the current etcd data.

     cp -r /var/lib/etcd/member/ /var/lib/deckhouse-etcd-backup
    
  4. Clean up the etcd directory.

     rm -rf /var/lib/etcd/member/
    
  5. Place the etcd backup in ~/etcd-backup.snapshot.

  6. Restore the etcd database.

      ETCDCTL_API=3 etcdctl snapshot restore ~/etcd-backup.snapshot --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt \
      --key /etc/kubernetes/pki/etcd/ca.key --endpoints https://127.0.0.1:2379/ --data-dir=/var/lib/etcd
    
  7. Start etcd.

     mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
    

Recovering a multi-master cluster

To properly recover a multi-master cluster, follow these steps:

  1. Explicitly enable High Availability (HA) mode using the global parameter highAvailability. This is necessary, for example, to avoid losing one Prometheus replica and its PVC, since HA is disabled by default in single-master cluster mode.

  2. Switch the cluster to single-master mode, according to the instruction for cloud clusters or manually remove static master nodes from the cluster.

  3. On the remaining single master node, follow the steps to restore etcd from backup as described in the guide for a single-master cluster.

  4. When etcd is running will be restored, remove information about the master nodes already deleted in step 1 from the cluster using the following command (specify the node name):

     d8 k delete node <MASTER_NODE_I>
    
  5. Restart all cluster nodes.

  6. Wait for the tasks from the Deckhouse queue to complete:

     d8 k -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    
  7. Switch the cluster back to multi-master mode in accordance with instruction for cloud clusters or instruction for static or hybrid clusters.

Restoring a Kubernetes object from an etcd backup

A short scenario for restoring individual objects from an etcd backup:

  1. Get a backup of your data.

  2. Start a temporary etcd instance.

  3. Fill it with data from the backup.

  4. Get descriptions of the required objects using the etcdhelper utility.

Steps for restoring objects from an etcd backup

In the example:

  • etcd-snapshot.bin is a file with a backup of etcd data (snapshot);
  • infra-production is the namespace in which you want to restore the objects.
  1. Start a pod with a temporary etcd instance.

    It is desirable that the version of the etcd instance you are starting matches the version of etcd from which the backup was created. For simplicity, the instance is launched not locally, but in the cluster, since the cluster already has an etcd image.

    • Prepare the etcd.pod.yaml file with the pod manifest:

      cat <<EOF >etcd.pod.yaml
      apiVersion: v1
      kind: Pod
      metadata:
      name: etcdrestore
      namespace: default
      spec:
      containers:
      - command:
      - /bin/sh
      - -c
      - "sleep 96h"
      image: IMAGE
      imagePullPolicy: IfNotPresent
      name: etcd
      volumeMounts:
      - name: etcddir
      mountPath: /default.etcd
      volumes:
      - name: etcddir
      emptyDir: {}
      EOF
      
    • Set the current name of the etcd image:

      IMG=`kubectl -n kube-system get pod -l component=etcd -o jsonpath="{.items[0].spec. containers[*].image}"`
      sed -i -e "s#IMAGE#$IMG#" etcd.pod.yaml
      
    • Create a pod:

      kubectl create -f etcd.pod.yaml
      
    • Copy etcdhelper and the etcd snapshot to the pod container.

      etcdhelper can be built from source or copied from a pre-built image (e.g. the etcdhelper image on Docker Hub).

      Example:

      kubectl cp etcd-snapshot.bin default/etcdrestore:/tmp/etcd-snapshot.bin
      kubectl cp etcdhelper default/etcdrestore:/usr/bin/etcdhelper
      
    • In the container, set permissions to run etcdhelper, restore the data from the backup, and start etcd.

      Example:

      ~ # kubectl -n default exec -it etcdrestore -- sh
      / # chmod +x /usr/bin/etcdhelper
      / # etcdctl snapshot restore /tmp/etcd-snapshot.bin
      / # etcd &
      
    • Get the descriptions of the desired cluster objects by filtering them with grep.

      Example:

      ~ # kubectl -n default exec -it etcdrestore -- sh
      / # mkdir /tmp/restored_yaml
      / # cd /tmp/restored_yaml
      /tmp/restored_yaml # for o in `etcdhelper -endpoint 127.0.0.1:2379 ls /registry/ | grep infra-production` ; do etcdhelper -endpoint 127.0.0.1:2379 get $o > `echo $o | sed -e "s#/registry/##g;s#/#_#g"`.yaml ; done
      

      The sed replacement in the example allows object descriptions to be saved to files named like the etcd registry structure. For example: /registry/deployments/infra-production/supercronic.yamldeployments_infra-production_supercronic.yaml.

  2. Copy the received object descriptions from the pod to the master node using the command:

     d8 k cp default/etcdrestore:/tmp/restored_yaml restored_yaml
    
  3. Remove information about the creation time, UID, status and other operational data from the received object descriptions, then restore the objects using the command:

     d8 k create -f restored_yaml/deployments_infra-production_supercronic.yaml
    
  4. A pod with a temporary etcd instance can be deleted using the command:

     d8 k delete -f etcd.pod.yaml
    

How to get a list of etcd cluster nodes (option 1)

Use the etcdctl member list command.

Example:

d8 k -n kube-system exec -ti $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name | head -n1) -- \
etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
--endpoints https://127.0.0.1:2379/ member list -w table

Warning. The last parameter in the output table shows that the etcd cluster node is in the learner state, not the leader state.

How to get a list of cluster nodes etcd (option 2)

Use the etcdctl endpoint status command. For this command, after the --endpoints flag, you need to substitute the address of each control-plane node.

The true value in the fifth column of the output indicates the leader.

An example of a script that automatically transfers all addresses of control-plane nodes:

MASTER_NODE_IPS=($(d8 k get nodes -l \
node-role.kubernetes.io/control-plane="" \
-o 'custom-columns=IP:.status.addresses[?(@.type=="InternalIP")].address' \
--no-headers))
unset ENDPOINTS_STRING
for master_node_ip in ${MASTER_NODE_IPS[@]}
do ENDPOINTS_STRING+="--endpoints https://${master_node_ip}:2379 "
done
d8 k -n kube-system exec -ti $(d8 k -n kube-system get pod \
-l component=etcd,tier=control-plane -o name | head -n1)\
-- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/ca.crt \
--key /etc/kubernetes/pki/etcd/ca.key \
$(echo -n $ENDPOINTS_STRING) endpoint status -w table

How to add master nodes to a cloud cluster

The following describes the conversion of a single-master cluster into a multi-master.

Before adding nodes, ensure you have the required quotas in the cloud provider.

It is important to have an odd number of masters to ensure a quorum.

  1. Make a backup of etcd and the /etc/kubernetes directory.
  2. Transfer the archive to a server outside the cluster (e.g., on a local machine).
  3. Ensure there are no alerts in the cluster that can prevent the update of the master nodes.

    To get a list of alerts in a cluster, run the command:

     kubectl get clusteralerts
    

    To view a specific alert, run the command:

     kubectl get clusteralerts <ALERT_NAME> -o yaml
    
  4. Make sure that Deckhouse queue is empty.

    To view the status of all Deckhouse job queues, run the command:

     kubectl -n d8-system exec -it svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue list
    

    Output example (queues are empty):

     Summary:
     - 'main' queue: empty.
     - 88 other queues (0 active, 88 empty): 0 tasks.
     - no tasks to handle.
    

    To view the status of the main Deckhouse task queue, run the command:

     kubectl -n d8-system exec -it svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    

    Output example (queue main is empty):

     Queue 'main': length 0, status: 'waiting for task 0s'
    
  5. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):

    DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') \
    DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}' | tr '[:upper:]' '[:lower:]' ) \
    docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \
      registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
    
  6. In the installer container, run the following command to check the state before working:

    dhctl terraform check --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
    

    The command output should indicate that Terraform found no inconsistencies and no changes are required.

  7. In the installer container, run the following command and specify the required number of replicas using the masterNodeGroup.replicas parameter:

    dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
      --ssh-host <MASTER-NODE-0-HOST>
    

    For Yandex Cloud, when using external addresses on master nodes, the number of array elements in the masterNodeGroup.instanceClass.externalIPAddresses parameter must equal the number of master nodes. If Auto is used (public IP addresses are provisioned automatically), the number of array elements must still equal the number of master nodes.

    To illustrate, with three master nodes (masterNodeGroup.replicas: 3) and automatic address reservation, the masterNodeGroup.instanceClass.externalIPAddresses parameter would look as follows:

    externalIPAddresses:
    - "Auto"
    - "Auto"
    - "Auto"
    
  8. In the installer container, run the following command to start scaling:

    dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
    
  9. Wait until the required number of master nodes are Ready and all control-plane-manager instances are up and running:

    kubectl -n kube-system wait pod --timeout=10m --for=condition=ContainersReady -l app=d8-control-plane-manager
    

How to reduce the number of master nodes in a cloud cluster

The following describes the conversion of a multi-master cluster into a single-master.

The steps described below must be performed from the first in order of the master node of the cluster (master-0). This is because the cluster is always scaled in order: for example, it is impossible to delete nodes master-0 and master-1, leaving master-2.

  1. Make a backup of etcd and the /etc/kubernetes directory.
  2. Transfer the archive to a server outside the cluster (e.g., on a local machine).
  3. Ensure there are no alerts in the cluster that can prevent the update of the master nodes.

    To get a list of alerts in a cluster, run the command:

     kubectl get clusteralerts
    

    To view a specific alert, run the command:

     kubectl get clusteralerts <ALERT_NAME> -o yaml
    
  4. Make sure that Deckhouse queue is empty.

    To view the status of all Deckhouse job queues, run the command:

     kubectl -n d8-system exec -it svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue list
    

    Output example (queues are empty):

     Summary:
     - 'main' queue: empty.
     - 88 other queues (0 active, 88 empty): 0 tasks.
     - no tasks to handle.
    

    To view the status of the main Deckhouse task queue, run the command:

     kubectl -n d8-system exec -it svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    

    Output example (queue main is empty):

     Queue 'main': length 0, status: 'waiting for task 0s'
    
  5. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):

    DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') \
    DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}' | tr '[:upper:]' '[:lower:]' ) \
    docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \
      registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
    
  6. In the installer container, run the following command to check the state before working:

    dhctl terraform check --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
    

    The command output should indicate that Terraform found no inconsistencies and no changes are required.

  7. Run the following command in the installer container and set masterNodeGroup.replicas to 1:

    dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> \
      --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
    

    For Yandex Cloud, when using external addresses on master nodes, the number of array elements in the masterNodeGroup.instanceClass.externalIPAddresses parameter must equal the number of master nodes. If Auto is used (public IP addresses are provisioned automatically), the number of array elements must still equal the number of master nodes.

    To illustrate, with three master nodes (masterNodeGroup.replicas: 1) and automatic address reservation, the masterNodeGroup.instanceClass.externalIPAddresses parameter would look as follows:

    externalIPAddresses:
    - "Auto"
    
  8. Remove the following labels from the master nodes to be deleted:
    • node-role.kubernetes.io/control-plane
    • node-role.kubernetes.io/master
    • node.deckhouse.io/group

      Use the following command to remove labels:

       kubectl label node <MASTER-NODE-N-NAME> node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-
      
  9. Make sure that the master nodes to be deleted are no longer listed as etcd cluster members:

    kubectl -n kube-system exec -ti \
    $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json | jq -r '.items[] | select( .status.conditions[] | select(.type == "ContainersReady" and .status == "True")) | .metadata.name' | head -n1) -- \
    etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
    --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
    --endpoints https://127.0.0.1:2379/ member list -w table
    
  10. drain the nodes being deleted:

    kubectl drain <MASTER-NODE-N-NAME> --ignore-daemonsets --delete-emptydir-data
    
  11. Shut down the virtual machines corresponding to the nodes to be deleted, remove the instances of those nodes from the cloud and the disks connected to them (kubernetes-data-master-<N>).

  12. In the cluster, delete the Pods running on the nodes being deleted:

    kubectl delete pods --all-namespaces --field-selector spec.nodeName=<MASTER-NODE-N-NAME> --force
    
  13. In the cluster, delete the Node objects associated with the nodes being deleted:

    kubectl delete node <MASTER-NODE-N-NAME>
    
  14. In the installer container, run the following command to start scaling:

    dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> --ssh-host <MASTER-NODE-0-HOST>
    

Rebuilding etcd cluster

A rebuild may be required if the etcd cluster has collapsed, or when migrating from a multi-master cluster to a single-master cluster.

  1. Select the node from which to start restoring the etcd cluster. In case of migrating to a single-master cluster, this is the node where etcd should remain.
  2. Stop etcd on all other nodes. To do this, delete the file /etc/kubernetes/manifests/etcd.yaml.
  3. On the remaining node, in the manifest /etc/kubernetes/manifests/etcd.yaml, add the --force-new-cluster argument to the spec.containers.command field.

  4. After the cluster has been successfully started, remove the --force-new-cluster parameter.

The operation is destructive, it completely destroys the consensus and starts the etcd cluster from the state that was saved on the selected node. Any pending entries will be lost.

Eliminating infinite restart

This option may be needed if starting with the --force-new-cluster argument does not restore etcd. This can happen during an unsuccessful converge of master nodes, when the new master node was created with the old etcd disk, changed its address from the local network, and there are no other master nodes. It is worth using this method if the etcd container is in an infinite restart, and its log contains the error: panic: unexpected removal of unknown remote peer.

  1. Install the etcdutl utility.
  2. From the current local snapshot of the etcd database (/var/lib/etcd/member/snap/db), create a new snapshot:

     ./etcdutl snapshot restore /var/lib/etcd/member/snap/db --name <HOSTNAME> \
     --initial-cluster=HOSTNAME=https://<ADDRESS>:2380 --initial-advertise-peer-urls=https://ADDRESS:2380 \
     --skip-hash-check=true --data-dir /var/lib/etcdtest
    

    where:

    • <HOSTNAME> is the name of the master node;
    • <ADDRESS> is the address of the master node.
  3. Run commands to use the new snapshot:

     cp -r /var/lib/etcd /tmp/etcd-backup
     rm -rf /var/lib/etcd
     mv /var/lib/etcdtest /var/lib/etcd
    
  4. Find the etcd and kube-apiserver containers:

     crictl ps -a --name "^etcd|^kube-apiserver"
    
  5. Remove the found etcd and kube-apiserver containers:

     crictl rm <CONTAINER-ID>
    
  6. Restart the master node.