Backup and restore

Manual cluster restore

Restoring a cluster with a single control plane node

To properly restore the cluster, follow these steps on the master node:

Prepare the etcdutl utility. Locate and copy the executable on the node:

cp $(find /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/ \
-name etcdutl -print | tail -n 1) /usr/local/bin/etcdutl

Check the version of etcdutl:

etcdutl version

Make sure the output of etcdutl version is displayed without errors.

If etcdutl is not found, download the binary from the official etcd repository, choosing a version that matches your cluster’s etcd version:

wget "https://github.com/etcd-io/etcd/releases/download/v3.6.1/etcd-v3.6.1-linux-amd64.tar.gz"
tar -xzvf etcd-v3.6.1-linux-amd64.tar.gz && mv etcd-v3.6.1-linux-amd64/etcdutl /usr/local/bin/etcdutl

Check the etcd version in the cluster (if the Kubernetes API is accessible):
```
d8 k -n kube-system exec -ti etcd-$(hostname) -- etcdutl version
```
If the command executes successfully, it will display the current etcd version.
Stop etcd. Move the etcd manifest to prevent kubelet from launching the etcd pod:
```
mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml
```
Make sure the etcd pod is stopped:
```
crictl ps | grep etcd
```
If the command returns nothing, the etcd pod has been successfully stopped.
Backup current etcd data. Create a backup copy of the member directory:
```
cp -r /var/lib/etcd/member/ /var/lib/deckhouse-etcd-backup
```
This backup will allow you to roll back in case of issues.
Clean the etcd directory. Remove old data to prepare for restore:
```
rm -rf /var/lib/etcd
```
Verify that /var/lib/etcd is now empty or does not exist:
```
ls -la /var/lib/etcd
```
Place the etcd snapshot file. Copy or move the etcd-backup.snapshot file to the current user’s (root) home directory:
```
cp /path/to/backup/etcd-backup.snapshot ~/etcd-backup.snapshot
```
Ensure the file is readable:
```
ls -la ~/etcd-backup.snapshot
```
Restore the etcd database from the snapshot using etcdutl:
```
ETCDCTL_API=3 etcdutl snapshot restore ~/etcd-backup.snapshot  --data-dir=/var/lib/etcd
```
After the command completes, check that files have appeared in /var/lib/etcd/, reflecting the restored state.
Start etcd. Move the manifest back so that kubelet relaunches the etcd pod:
```
mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
```

Wait for the pod to be created and reach Running state. Make sure it is up and running:

crictl ps --label io.kubernetes.pod.name=etcd-$HOSTNAME

Pod startup may take some time. Once etcd is running, the cluster will be restored from the snapshot.

Output example:

CONTAINER        IMAGE            CREATED              STATE     NAME      ATTEMPT     POD ID          POD
4b11d6ea0338f    16d0a07aa1e26    About a minute ago   Running   etcd      0           ee3c8c7d7bba6   etcd-gs-test

Restart the master node.

Restoring a multi-master cluster

To properly restore a multi-master cluster, follow these steps:

Enable High Availability (HA) mode. This is necessary to preserve at least one Prometheus replica and its PVC, since HA is disabled by default in single-master clusters.
Switch the cluster to single master mode:
- In a static cluster, manually remove the additional master nodes.
Restore etcd from the backup on the only remaining master node. Follow the instructions for restoring a cluster with a single control-plane node.
Once etcd is restored, remove the records of the previously deleted master nodes from the cluster using the following command (replace with the actual node name):
```
d8 k delete node <MASTER_NODE_NAME>
```
Reboot all cluster nodes. Ensure that after the reboot all nodes are available and functioning correctly.
Wait for Deckhouse to process all tasks in the queue:
```
d8 system queue main
```
Switch the cluster back to multi-master mode.

Once you go through these steps, the cluster will be successfully restored in the multi-master configuration.

Restoring individual objects

Restoring Kubernetes objects from an etcd backup

To restore individual cluster objects (e.g., specific Deployments, Secrets, or ConfigMaps) from an etcd snapshot, follow these steps:

Launch a temporary etcd instance. Create a separate etcd instance that runs independently from the main cluster.
Load data from backup copy into temporary etcd instance. Use the existing etcd snapshot file to populate the temporary instance with the necessary data.
Unload the manifests of the necessary objects in YAML format.
Restore cluster objects from uploaded YAML files.

Example of steps to restore objects from an etcd backup

In the example below, etcd-backup.snapshot is a etcd shapshot, infra-production is the namespace in which objects need to be restored.

To decode objects from etcd you would need auger. It can be built from source on any machine that has Docker installed (it cannot be done on cluster nodes).
```
git clone -b v1.0.1 --depth 1 https://github.com/etcd-io/auger
cd auger
make release
build/auger -h
```
Resulting executable build/auger, and also the snapshot from the backup copy of etcd must be uploaded on master-node, on which following actions would be performed.

Following actions are performed on a master node, to which etcd snapshot file and auger tool were copied:

Set the correct access permissions for the backup file:
```
chmod 644 etcd-backup.snapshot
```

Set full path for snapshot file and for the tool into environmental variables:

SNAPSHOT=/root/etcd-restore/etcd-backup.snapshot
AUGER_BIN=/root/auger 
chmod +x $AUGER_BIN

Run a Pod with temporary instance of etcd.

Create Pod manifest. It should schedule on current master node by $HOSTNAME variable, and mounts snapshot file by $SNAPSHOT variable, which it then restores in temporary etcd instance:

cat <<EOF >etcd.pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: etcdrestore
  namespace: default
spec:
  nodeName: $HOSTNAME
  tolerations:
  - operator: Exists
  initContainers:
  - command:
    - etcdutl
    - snapshot
    - restore
    - "/tmp/etcd-snapshot"
    - --data-dir=/default.etcd
    image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath="{.items[*].spec.containers[*].image}" | cut -f 1 -d ' ')
    imagePullPolicy: IfNotPresent
    name: etcd-snapshot-restore
    # Uncomment the fragment below to set limits for the container if the node does not have enough resources to run it.
    # resources:
    #   requests:
    #     ephemeral-storage: "200Mi"
    #   limits:
    #     ephemeral-storage: "500Mi"
    volumeMounts:
    - name: etcddir
      mountPath: /default.etcd
    - name: etcd-snapshot
      mountPath: /tmp/etcd-snapshot
      readOnly: true
  containers:
  - command:
    - etcd
    image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath="{.items[*].spec.containers[*].image}" | cut -f 1 -d ' ')
    imagePullPolicy: IfNotPresent
    name: etcd-temp
    volumeMounts:
    - name: etcddir
      mountPath: /default.etcd
  volumes:
  - name: etcddir
    emptyDir: {}
    # Use the snippet below instead of emptyDir: {} to set limits for the container if the node's resources are insufficient to run it.
    # emptyDir:
    #  sizeLimit: 500Mi
  - name: etcd-snapshot
    hostPath:
      path: $SNAPSHOT
      type: File
EOF

Create Pod from the resulting manifest:
```
d8 k create -f etcd.pod.yaml
```

Set environment variables. In this example:
- infra-production — namespace which we will search resources in.
- /root/etcd-restore/output — path for outputting recovered resource manifests.
- /root/auger — path to auger executable.
```
FILTER=infra-production
BACKUP_OUTPUT_DIR=/root/etcd-restore/output
mkdir -p $BACKUP_OUTPUT_DIR && cd $BACKUP_OUTPUT_DIR
```

Commands below will filter needed resources by $FILTER and output them into $BACKUP_OUTPUT_DIR directory:

files=($(kubectl -n default exec etcdrestore -c etcd-temp -- etcdctl  --endpoints=localhost:2379 get / --prefix --keys-only | grep "$FILTER"))
for file in "${files[@]}"
do
  OBJECT=$(kubectl -n default exec etcdrestore -c etcd-temp -- etcdctl  --endpoints=localhost:2379 get "$file" --print-value-only | $AUGER_BIN decode)
  FILENAME=$(echo $file | sed -e "s#/registry/##g;s#/#_#g")
  echo "$OBJECT" > "$BACKUP_OUTPUT_DIR/$FILENAME.yaml"
  echo $BACKUP_OUTPUT_DIR/$FILENAME.yaml
done

Restore objects from exported YAML files.
Delete the Pod with a temporary instance of etcd:
```
d8 k -n default delete pod etcdrestore
```

Restoring cluster objects from exported YAML files

To restore objects from exported YAML files, follow these steps:

Prepare the YAML files for restoration. Before applying them to the cluster, remove technical fields that may be outdated or interfere with the recovery process:
- creationTimestamp
- UID
- status
You can edit these fields manually or use YAML/JSON processing tools such as yq or jq.
Create the objects in the cluster. To restore individual resources, run:
```
d8 k create -f <PATH_TO_FILE>.json
```
You can specify either a single file or a directory path.
To restore multiple objects at once, use the find command:
```
find $BACKUP_OUTPUT_DIR -type f -name "*.yaml" -exec d8 k create -f {} \;
```
This will locate all .yaml files within the specified $BACKUP_OUTPUT_DIR and apply them sequentially using d8 k create.

After completing these steps, the selected objects will be recreated in the cluster based on the definitions in the YAML files.

Restoring objects after changing the master node IP address

This section describes a scenario where only the IP address of the master node has changed, and all other objects in the etcd backup (such as CA certificates) remain valid. It assumes the restoration is performed in a single-master-node cluster.

To restore etcd objects after changing the master node’s IP address, follow these steps:

Restore etcd from the backup. Use the standard etcd restore procedure with a snapshot. Make sure not to change any parameters during restoration other than the etcd data itself.
Update the IP address in static configuration files:
- Check the Kubernetes component manifest files located in /etc/kubernetes/manifests/.
- Review kubelet’s system configuration files (typically found in /etc/systemd/system/kubelet.service.d/ or similar directories).
- Update the IP address in any other configurations that reference the old address, if necessary.
Regenerate certificates that were issued for the old IP. Delete or move old certificates related to the API server and etcd (if applicable). Then generate new certificates, specifying the new master node IP address as a SAN (Subject Alternative Name).
Restart all services that use the updated configurations and certificates. Force kubelet to restart control-plane manifests (API server, etcd, etc.). Either restart the system services manually (e.g., systemctl restart kubelet) or ensure they restart automatically.
Wait for kubelet to regenerate its own certificate.

These actions can be performed either automatically using a script, or manually by running the required commands step-by-step.

Automated object extraction when changing IP address

To simplify cluster recovery after the master node’s IP address changes, use the script provided below. Before running the script:

Specify the correct paths and IP addresses:
- ETCD_SNAPSHOT_PATH: The path to the etcd snapshot backup.
- OLD_IP: The old master node IP address used when the backup was created.
- NEW_IP: The new IP address of the master node.
Make sure the Kubernetes version (KUBERNETES_VERSION) matches the one used in the cluster. This is necessary for downloading the correct version of kubeadm.
Download etcdutl if it is not installed.
After running the script, wait for the kubelet to regenerate its certificate with the new IP address. You can verify this in the /var/lib/kubelet/pki/ directory, where a new certificate should appear.

Object extraction script

          ETCD_SNAPSHOT_PATH="./etcd-backup.snapshot" # Path to the etcd snapshot.
OLD_IP=10.242.32.34                         # Old master node IP address.
NEW_IP=10.242.32.21                         # New master node IP address.
KUBERNETES_VERSION=1.28.0                   # Kubernetes version.

mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml 
mkdir ./etcd_old
mv /var/lib/etcd ~/etcd_old
ETCDUTL_PATH=$(find /var/lib/containerd/ -name etcdutl)

ETCDCTL_API=3 $ETCDUTL_PATH snapshot restore etcd-backup.snapshot --data-dir=/var/lib/etcd 

mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml

find /etc/kubernetes/ -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'
find /etc/systemd/system/kubelet.service.d -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'
find  /var/lib/bashible/ -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'

mkdir -p ./old_certs/etcd
mv /etc/kubernetes/pki/apiserver.* ./old_certs/
mv /etc/kubernetes/pki/etcd/server.* ./old_certs/etcd/
mv /etc/kubernetes/pki/etcd/peer.* ./old_certs/etcd/

curl -LO https://dl.k8s.io/v$KUBERNETES_VERSION/bin/linux/amd64/kubeadm
chmod +x kubeadm
./kubeadm init phase certs all --config /etc/kubernetes/deckhouse/kubeadm/config.yaml

crictl ps --name 'kube-apiserver' -o json | jq -r '.containers[0].id' | xargs crictl stop
crictl ps --name 'kubernetes-api-proxy' -o json | jq -r '.containers[0].id' | xargs crictl stop
crictl ps --name 'etcd' -o json | jq -r '.containers[].id' | xargs crictl stop

systemctl daemon-reload
systemctl restart kubelet.service

        

Manual object restore after changing the IP address

If you prefer to manually make changes during cluster recovery after the master node’s IP address has changed, follow these steps:

Restore etcd from the backup:
- Move the etcd manifest so that kubelet stops the corresponding pod:
```
mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml
```
- Create a directory to temporarily store the previous etcd data:
```
mkdir ./etcd_old
mv /var/lib/etcd ./etcd_old
```
- Find or download the etcdutl utility if it’s not available, and perform the snapshot restore:
```
ETCD_SNAPSHOT_PATH="./etcd-backup.snapshot" # Path to the etcd snapshot.
ETCDUTL_PATH=$(find /var/lib/containerd/ -name etcdutl)

ETCDCTL_API=3 $ETCDUTL_PATH snapshot restore \
  etcd-backup.snapshot \
  --data-dir=/var/lib/etcd
```
- Restore the etcd manifest so kubelet starts the etcd pod again:
```
mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
```
- Verify etcd is running by checking the pod list using crictl ps | grep etcd or reviewing the kubelet logs.

Update the IP address in static configuration files. If the old IP address is used in manifests or kubelet services, replace it with the new one:

 OLD_IP=10.242.32.34                         # Old master node IP address.
 NEW_IP=10.242.32.21                         # New master node IP address.

 find /etc/kubernetes/ -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'
 find /etc/systemd/system/kubelet.service.d -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'
 find  /var/lib/bashible/ -type f -exec sed -i "s/$OLD_IP/$NEW_IP/g" {} ';'

Regenerate certificates issued for the old IP address:

Prepare a directory to store the old certificates:

mkdir -p ./old_certs/etcd
mv /etc/kubernetes/pki/apiserver.* ./old_certs/
mv /etc/kubernetes/pki/etcd/server.* ./old_certs/etcd/
mv /etc/kubernetes/pki/etcd/peer.* ./old_certs/etcd/

Install or download kubeadm to match the current Kubernetes version:

KUBERNETES_VERSION=1.28.0 # Kubernetes version.
curl -LO https://dl.k8s.io/v$KUBERNETES_VERSION/bin/linux/amd64/kubeadm
chmod +x kubeadm

Generate new certificates with the updated IP:
```
./kubeadm init phase certs all --config /etc/kubernetes/deckhouse/kubeadm/config.yaml
```
The new IP address will be included in the generated certificates.

Restart all services that use the updated configuration and certificates. To immediately stop active containers, run:

 crictl ps --name 'kube-apiserver' -o json | jq -r '.containers[0].id' | xargs crictl stop
 crictl ps --name 'kubernetes-api-proxy' -o json | jq -r '.containers[0].id' | xargs crictl stop
 crictl ps --name 'etcd' -o json | jq -r '.containers[].id' | xargs crictl stop

 systemctl daemon-reload
 systemctl restart kubelet.service

Kubelet will restart the necessary pods, and Kubernetes components will load the new certificates.

Wait for kubelet to regenerate its own certificate. Kubelet will automatically generate a new certificate with the updated IP address:
- Check the /var/lib/kubelet/pki/ directory.
- Ensure the new certificate is present and valid.

Once these steps are completed, the cluster will be successfully restored and fully functional with the new master node IP address.

Creating backups with Deckhouse CLI

Deckhouse CLI (d8) provides the backup command for creating backups of various cluster components:

etcd: Snapshot of the Deckhouse key-value data store.
cluster-config: Archive containing key configuration objects of the cluster.
loki: Export of logs from the built-in Loki API.

Backing up etcd

An etcd snapshot allows you to preserve the current state of the cluster at the key-value storage level. This is a full dump that can be used for recovery.

To create a snapshot, run the following command:

d8 backup etcd <path-to-snapshot> [flags]

Flags:

-p, --etcd-pod string: Name of the etcd pod to snapshot.
-h, --help: Show help for the etcd command.
--verbose: Enable verbose output for detailed logging.

Example:

d8 backup etcd etcd-backup.snapshot

Example output:

2025/04/22 08:38:58 Trying to snapshot etcd-sandbox-master-0
2025/04/22 08:39:01 Snapshot successfully taken from etcd-sandbox-master-0

Automatic etcd backup

Deckhouse automatically performs a daily etcd backup using a CronJob that runs inside the d8-etcd-backup pod in the kube-system namespace. The job creates a snapshot of the database, compresses it, and saves the archive locally on the node at /var/lib/etcd/:

etcdctl snapshot save etcd-backup.snapshot
tar -czvf etcd-backup.tar.gz etcd-backup.snapshot
mv etcd-backup.tar.gz /var/lib/etcd/etcd-backup.tar.gz

To configure automatic etcd backups, use the control-plane-manager module. The required parameters are specified in its configuration:

Parameter	Description
`etcd.backup.enabled`	Enables daily etcd backup.
`etcd.backup.cronSchedule`	Cron-formatted schedule for running the backup. Local time of `kube-controller-manager` is used.
`etcd.backup.hostPath`	Path on master nodes where etcd backup archives will be stored.

Example configuration fragment:

apiVersion: deckhouse.io/v1
kind: ClusterConfiguration
spec:
  etcd:
    backup:
      enabled: true
      cronSchedule: "0 1 * * *"
      hostPath: "/var/lib/etcd"

Cluster configuration backup

The d8 backup cluster-config command creates an archive containing a set of key resources related to the cluster configuration. This is not a full backup of all objects, but a specific whitelist.

To create the backup, run the following command:

d8 backup cluster-config <path-to-backup-file>

Example:

d8 backup cluster-config /backup/cluster-config-2025-04-21.tar

The archive includes only those objects that meet the following criteria:

CustomResource objects whose CRDs are annotated with:
```
backup.deckhouse.io/cluster-config=true
```
StorageClasses with the label:
```
heritage=deckhouse
```
Secrets and ConfigMaps from namespaces starting with d8- or kube-, if they are explicitly listed in the whitelist file.
Cluster-level Roles and Bindings (ClusterRole and ClusterRoleBinding), if they are not labeled with:
```
heritage=deckhouse
```

The backup includes only CR objects, but not the CRD definitions themselves. To fully restore the cluster, the corresponding CRDs must already be present (e.g., installed by Deckhouse modules).

Example whitelist content:

Namespace	Object	Name
`d8-system`	Secret	`d8-cluster-terraform-state`
		$regexp:^d8-node-terraform-state-(.*)$
		`deckhouse-registry`
	ConfigMap	`d8-deckhouse-version-info`
`kube-system`	ConfigMap	`d8-cluster-is-bootstraped`
		`d8-cluster-uuid`
		`extension-apiserver-authentication`
	Secret	`d8-cloud-provider-discovery-data`
		`d8-cluster-configuration`
		`d8-cni-configuration`
		`d8-control-plane-manager-config`
		`d8-node-manager-cloud-provider`
		`d8-pki`
		`d8-provider-cluster-configuration`
		`d8-static-cluster-configuration`
		`d8-secret-encryption-key`
`d8-cert-manager`	Secret	`cert-manager-letsencrypt-private-key`
		`selfsigned-ca-key-pair`

Exporting logs from Loki

The d8 backup loki command is intended for exporting logs from the built-in Loki. This is not a full backup, but rather a diagnostic export: the resulting data cannot be restored back into Loki.

To perform the export, d8 accesses the Loki API using the loki ServiceAccount in the d8-monitoring namespace, authenticated via a token stored in a Kubernetes secret.

The loki ServiceAccount is automatically created starting from Deckhouse v1.69.0. However, to use the d8 backup loki command, you must manually create the token secret and assign the necessary Role and RoleBinding if they are not already present.

Apply the manifests below before running d8 backup loki to ensure the command can properly authenticate and access the Loki API.

Example manifests:

---
apiVersion: v1
kind: Secret
metadata:
  name: loki-api-token
  namespace: d8-monitoring
  annotations:
    kubernetes.io/service-account.name: loki
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: access-to-loki-from-d8
  namespace: d8-monitoring
rules:
  - apiGroups: ["apps"]
    resources:
      - "statefulsets/http"
    resourceNames: ["loki"]
    verbs: ["create", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: access-to-loki-from-d8
  namespace: d8-monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: access-to-loki-from-d8
subjects:
  - kind: ServiceAccount
    name: loki
    namespace: d8-monitoring

To create a log backup, run the following command:

d8 backup loki [flags]

Example:

d8 backup loki --days 1 > ./loki.log

Flags:

--start, --end: Time range boundaries in the format “YYYY-MM-DD HH:MM:SS”.
--days: The time window size for log export (default is 5 days).
--limit: The maximum number of log lines per request (default is 5000).

You can list all available flags using the following command:

d8 backup loki --help

Manual cluster restore

Restoring a cluster with a single control plane node

Restoring a multi-master cluster

Restoring individual objects

Restoring Kubernetes objects from an etcd backup

Example of steps to restore objects from an etcd backup

Restoring cluster objects from exported YAML files

Restoring objects after changing the master node IP address

Automated object extraction when changing IP address

Manual object restore after changing the IP address

Creating backups with Deckhouse CLI

Backing up etcd

Automatic etcd backup

Cluster configuration backup

Exporting logs from Loki

An error has occurred

Tell us what you didn’t like.

Backup and restore

Manual cluster restore

Restoring a cluster with a single control plane node

Restoring a multi-master cluster

Restoring individual objects

Restoring Kubernetes objects from an etcd backup

Example of steps to restore objects from an etcd backup

Restoring cluster objects from exported YAML files

Restoring objects after changing the master node IP address

Automated object extraction when changing IP address

Manual object restore after changing the IP address

Creating backups with Deckhouse CLI

Backing up etcd

Automatic etcd backup

Cluster configuration backup

Exporting logs from Loki

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error