The module lifecycle stage: General Availability
Volume cleanup on deletion
Volume cleanup is available only in commercial editions of Deckhouse.
When files are deleted, the operating system does not physically delete the content but only marks the corresponding blocks as “free”. If a new volume receives physical blocks previously used by another volume, data from the previous user may remain in them.
Data leakage scenario example
- User #1 placed files in a volume requested from StorageClass 1 on node 1 (in “Block” or “Filesystem” mode).
- User #1 deleted the files and the volume.
- The physical blocks that the volume occupied become “free” but are not overwritten.
- User #2 requested a new volume from StorageClass 1 on node 1 in “Block” mode.
- There is a risk that some or all blocks previously occupied by user #1 will be allocated to user #2 again.
- In this case, user #2 may recover user #1’s data.
Thick volumes
To prevent data leakage through thick volumes, the volumeCleanup parameter is provided, which allows you to select a method for cleaning the volume before deleting the PersistentVolume (PV).
Possible values:
-
Parameter not set: No additional actions are performed when deleting the volume. Data may remain accessible to the next user.
-
RandomFillSinglePass: The volume is overwritten with random data once before deletion. Not recommended for solid-state drives, as it reduces the drive’s lifespan. -
RandomFillThreePass: The volume is overwritten with random data three times before deletion. Not recommended for solid-state drives, as it reduces the drive’s lifespan. -
Discard: All blocks of the volume are marked as free using thediscardsystem call before deletion. Use this option only for solid-state drives.
Most modern solid-state drives guarantee that a block marked with discard will not return previous data when read. This makes the Discard option the most effective way to prevent leaks when using solid-state drives.
However, clearing a cell is a relatively long operation, so it is performed by the device in the background. In addition, many drives cannot clear individual cells, only groups — pages. Because of this, not all drives guarantee immediate unavailability of freed data. In addition, not all drives that guarantee this keep their promise.
Do not use a device if it does not guarantee Deterministic TRIM (DRAT), Deterministic Read Zero after TRIM (RZAT) and is not verified.
Thin volumes
When a thin volume block is freed via discard from the guest operating system, this command is forwarded to the device. When using a hard disk or lack of discard support from the solid-state drive, data may remain on the thin pool until such a block is reused.
Users are provided access only to thin volumes, not to the thin pool itself. They can only get a volume from the pool. For thin volumes, the thin pool block is zeroed when reused, which prevents leaks between clients. This is guaranteed by the thin_pool_zero=1 setting in LVM.
Migrating data between PVCs
Use the following script to transfer data from one PVC to another:
-
Copy the script to a file
migrate.shon any master node. -
Use the script with parameters:
migrate.sh NAMESPACE SOURCE_PVC_NAME DESTINATION_PVC_NAME
#!/bin/bash
ns=$1
src=$2
dst=$3
if [[ -z $3 ]]; then
echo "You must give as args: namespace source_pvc_name destination_pvc_name"
exit 1
fi
echo "Creating job yaml"
cat > migrate-job.yaml << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: migrate-pv-$src
namespace: $ns
spec:
template:
spec:
containers:
- name: migrate
image: debian
command: [ "/bin/bash", "-c" ]
args:
-
apt-get update && apt-get install -y rsync &&
ls -lah /src_vol /dst_vol &&
df -h &&
rsync -avPS --delete /src_vol/ /dst_vol/ &&
ls -lah /dst_vol/ &&
du -shxc /src_vol/ /dst_vol/
volumeMounts:
- mountPath: /src_vol
name: src
readOnly: true
- mountPath: /dst_vol
name: dst
restartPolicy: Never
volumes:
- name: src
persistentVolumeClaim:
claimName: $src
- name: dst
persistentVolumeClaim:
claimName: $dst
backoffLimit: 1
EOF
kubectl create -f migrate-job.yaml
kubectl -n $ns get jobs -o wide
kubectl_completed_check=0
echo "Waiting for data migration to be completed"
while [[ $kubectl_completed_check -eq 0 ]]; do
kubectl -n $ns get pods | grep migrate-pv-$src
sleep 5
kubectl_completed_check=`kubectl -n $ns get pods | grep migrate-pv-$src | grep "Completed" | wc -l`
done
echo "Data migration completed"
Creating volume snapshots
The ability to work with volume snapshots is available only in commercial editions of Deckhouse Kubernetes Platform and only when using LVM-thin volumes.
To work with volume snapshots, the snapshot-controller module must be connected.
For detailed information about snapshots, see the Kubernetes documentation.
To create a volume snapshot, follow these steps:
-
Enable the
snapshot-controllermodule:d8 s module enable snapshot-controller -
To create a volume snapshot, run the following command with the necessary parameters:
d8 k apply -f -<<EOF apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: my-snapshot namespace: <namespace-name> # Namespace name where the PVC is located spec: volumeSnapshotClassName: sds-local-volume-snapshot-class source: persistentVolumeClaimName: <pvc-name> # Name of the PVC for which the snapshot is created EOFWarning: the
sds-local-volume-snapshot-classclass is created automatically. ThedeletionPolicyparameter is set toDelete, soVolumeSnapshotContentis deleted when the associatedVolumeSnapshotis deleted. -
Check the snapshot status:
d8 k get volumesnapshotThe command outputs a list of all snapshots and their current status.
Setting StorageClass as default
Add the storageclass.kubernetes.io/is-default-class: "true" annotation to the corresponding StorageClass resource:
d8 k annotate storageclasses.storage.k8s.io <storageClassName> storageclass.kubernetes.io/is-default-class=true
Selecting nodes for module operation
The module uses nodes that have the labels specified in the nodeSelector field in the module settings. To do this:
-
To display the module settings, run the command:
d8 k edit mc sds-local-volumeExample output:
apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: sds-local-volume spec: enabled: true settings: dataNodes: nodeSelector: my-custom-label-key: my-custom-label-value status: message: "" version: "1" -
Run the command to view labels in the
nodeSelectorfield:d8 k get mc sds-local-volume -o=jsonpath={.spec.settings.dataNodes.nodeSelector}Example output:
nodeSelector: my-custom-label-key: my-custom-label-value -
The module selects nodes that have all the specified labels. Modify the
nodeSelectorfield to change the list of nodes.Warning: you can specify any number of labels in the
nodeSelectorfield. All specified labels must be present on the node. The module starts thecsi-nodepod only on nodes that have all the specified labels. -
After adding labels, verify that the
csi-nodepods are running on the nodes:d8 k -n d8-sds-local-volume get pod -owide
Removing a node from module management
To remove a node from module management, remove the labels specified in the nodeSelector field in the module settings. To do this:
-
Run the command to view labels in
nodeSelector:d8 k get mc sds-local-volume -o=jsonpath={.spec.settings.dataNodes.nodeSelector}Example output:
nodeSelector: my-custom-label-key: my-custom-label-value -
Remove the specified labels from the node:
d8 k label node %node-name% %label-from-selector%-Warning: to remove a label, add a minus sign after the label key instead of the value.
-
Verify that the
csi-nodepod is removed from the node:d8 k -n d8-sds-local-volume get po -owide
If the csi-node pod remains on the node after removing labels:
-
Verify that the labels are actually removed from the node:
d8 k get node <node-name> --show-labels -
Ensure that the node has no LVMVolumeGroup resources that are used in LocalStorageClass resources. For more details, see Checking dependent LVMVolumeGroup resources on a node.
Note that LVMVolumeGroup and LocalStorageClass resources that prevent the node from being removed from module management will display the storage.deckhouse.io/sds-local-volume-candidate-for-eviction label.
The node itself will have the storage.deckhouse.io/sds-local-volume-need-manual-eviction label.
Creating thin storage
-
Get a list of available BlockDevice resources in the cluster:
d8 k get bdExample output:
NAME NODE CONSUMABLE SIZE PATH dev-ef4fb06b63d2c05fb6ee83008b55e486aa1161aa worker-0 false 100Gi /dev/nvme1n1 dev-7e4df1ddf2a1b05a79f9481cdf56d29891a9f9d0 worker-1 false 100Gi /dev/nvme1n1 dev-53d904f18b912187ac82de29af06a34d9ae23199 worker-2 false 100Gi /dev/nvme1n1 -
Create an LVMVolumeGroup resource for the
worker-0node:d8 k apply -f - <<EOF apiVersion: storage.deckhouse.io/v1alpha1 kind: LVMVolumeGroup metadata: name: "vg-2-on-worker-0" spec: type: Local local: nodeName: "worker-0" blockDeviceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - dev-ef4fb06b63d2c05fb6ee83008b55e486aa1161aa actualVGNameOnTheNode: "vg-2" thinPools: - name: thindata size: 100Gi EOF -
Wait for the LVMVolumeGroup resource to transition to the
Readystate:d8 k get lvg vg-2-on-worker-0 -wAfter transitioning to the
Readystate, an LVM volume group namedvg-2and a thin pool namedthindatawill be created on theworker-0node from the block device/dev/nvme1n1. -
Create an LVMVolumeGroup resource for the
worker-1node:d8 k apply -f - <<EOF apiVersion: storage.deckhouse.io/v1alpha1 kind: LVMVolumeGroup metadata: name: "vg-2-on-worker-1" spec: type: Local local: nodeName: "worker-1" blockDeviceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - dev-7e4df1ddf2a1b05a79f9481cdf56d29891a9f9d0 actualVGNameOnTheNode: "vg-2" thinPools: - name: thindata size: 100Gi EOF -
Wait for the LVMVolumeGroup resource to transition to the
Readystate:d8 k get lvg vg-2-on-worker-1 -wAfter transitioning to the
Readystate, an LVM volume group namedvg-2and a thin pool namedthindatawill be created on theworker-1node from the block device/dev/nvme1n1. -
Create an LVMVolumeGroup resource for the
worker-2node:d8 k apply -f - <<EOF apiVersion: storage.deckhouse.io/v1alpha1 kind: LVMVolumeGroup metadata: name: "vg-2-on-worker-2" spec: type: Local local: nodeName: "worker-2" blockDeviceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - dev-53d904f18b912187ac82de29af06a34d9ae23199 actualVGNameOnTheNode: "vg-2" thinPools: - name: thindata size: 100Gi EOF -
Wait for the LVMVolumeGroup resource to transition to the
Readystate:d8 k get lvg vg-2-on-worker-2 -wAfter transitioning to the
Readystate, an LVM volume group namedvg-2and a thin pool namedthindatawill be created on theworker-2node from the block device/dev/nvme1n1. -
Create a LocalStorageClass resource:
d8 k apply -f -<<EOF apiVersion: storage.deckhouse.io/v1alpha1 kind: LocalStorageClass metadata: name: local-storage-class spec: lvm: lvmVolumeGroups: - name: vg-2-on-worker-0 thin: poolName: thindata - name: vg-2-on-worker-1 thin: poolName: thindata - name: vg-2-on-worker-2 thin: poolName: thindata type: Thin reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer EOF -
Wait for the LocalStorageClass resource to transition to the
Createdstate:d8 k get lsc local-storage-class -w -
Verify that the corresponding StorageClass is created:
d8 k get sc local-storage-class
You can now create PVCs, specifying the StorageClass named local-storage-class.
Checking dependent LVMVolumeGroup resources on a node
Follow these steps:
-
Display LocalStorageClass resources:
d8 k get lsc -
Check the list of used LVMVolumeGroup resources in each LocalStorageClass.
Display the contents of all LocalStorageClass resources:
d8 k get lsc -oyamlOr display the contents of a specific resource:
d8 k get lsc <lsc-name> -oyamlExample of a LocalStorageClass resource:
apiVersion: v1 items: - apiVersion: storage.deckhouse.io/v1alpha1 kind: LocalStorageClass metadata: finalizers: - storage.deckhouse.io/local-storage-class-controller name: test-sc spec: fsType: ext4 lvm: lvmVolumeGroups: - name: test-vg type: Thick reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer status: phase: Created kind: ListThe
spec.lvm.lvmVolumeGroupsfield lists the used LVMVolumeGroup resources. -
Display the list of LVMVolumeGroup resources:
d8 k get lvgExample output:
NAME THINPOOLS CONFIGURATION APPLIED PHASE NODE SIZE ALLOCATED SIZE VG AGE vg0-on-astra-1-8 0/0 True Ready astra-1-8 5116Mi 0 vg0 180d vg0-on-master-0 0/0 True Ready p-master-0 5116Mi 0 vg0 182d vg0-on-redos-murom 0/0 True Ready redos-murom 5116Mi 0 vg0 32d vg0-on-worker-1 0/0 True Ready p-worker-1 5116Mi 0 vg0 225d vg0-on-worker-2 0/0 True Ready p-worker-2 5116Mi 0 vg0 225d vg1-on-redos-murom 1/1 True Ready redos-murom 3068Mi 3008Mi vg1 32d vg1-on-worker-1 1/1 True Ready p-worker-1 3068Mi 3068Mi vg1 190d vg1-on-worker-2 1/1 True Ready p-worker-2 3068Mi 3068Mi vg1 190d -
Verify that the node has no LVMVolumeGroup resources that are used in LocalStorageClass resources.
Before removing a node from module management, delete dependent resources manually to avoid losing control over created volumes.