The module lifecycle stage: Experimental
The module has requirements for installation
Enabling Required Modules
The sds-elastic module is in Experimental stage. Experimental modules are not enabled by default. Set allowExperimentalModules: true in the deckhouse ModuleConfig before enabling the module.
Enable sds-elastic together with its companion modules:
sds-node-configurator— owns theBlockDeviceandLVMVolumeGroupCRDs thatElasticClusterselects from.csi-ceph— owns theCephClusterConnectionandCephStorageClassCRDs the controller writes into.snapshot-controller— required for VolumeSnapshot support (optional).
d8 k apply -f - <<EOF
apiVersion: v1
kind: List
items:
- apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: sds-node-configurator
spec:
enabled: true
version: 1
- apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: snapshot-controller
spec:
enabled: true
version: 1
- apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: csi-ceph
spec:
enabled: true
version: 1
- apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: sds-elastic
spec:
enabled: true
version: 1
EOFWait until every module reaches the Ready state:
d8 k get module sds-node-configurator snapshot-controller csi-ceph sds-elastic -wSelecting Data Nodes
settings.dataNodes.nodeSelector declares which Kubernetes Nodes are eligible to host sds-elastic data. The controller places the label storage.deckhouse.io/sds-elastic-node="" on every matching Node and removes it from Nodes that no longer match.
Downstream consumers — the sds-node-configurator agent (it picks up BlockDevice discovery on data nodes) and your ElasticCluster.spec.storage.nodeSelector — use this label as a nodeAffinity term.
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: sds-elastic
spec:
enabled: true
version: 1
settings:
dataNodes:
nodeSelector:
node-role.deckhouse.io/storage: ""If the field is omitted, the empty selector matches every Node — every Node in the cluster gets storage.deckhouse.io/sds-elastic-node="".
Narrowing dataNodes.nodeSelector does not redistribute data. If a Node that already hosts OSDs falls outside the new selector, its storage.deckhouse.io/sds-elastic-node label is removed and data on that Node becomes unreachable until the Node is brought back under the selector.
Preparing Storage Nodes
ElasticCluster consumes BlockDevice CRs (managed by sds-node-configurator) selected by labels and provisions one OSD per matched device.
-
Pick the nodes that will host Ceph daemons and label them. The example uses
node-role.deckhouse.io/storage:d8 k label node <node-name> node-role.deckhouse.io/storage= -
Make sure each storage node has at least one unused raw block device (no partitions, filesystem, or LVM signatures).
sds-node-configuratordiscovers them and creates a correspondingBlockDeviceCR. Verify:d8 k get blockdevices.storage.deckhouse.io -o wide -
Add a label that the
ElasticClusterwill use to select OSD-eligible devices. The example usesapp=elastic-osd:d8 k label blockdevice <bd-name> app=elastic-osd
Deploying an ElasticCluster
The example below bootstraps a Ceph cluster on every node carrying the node-role.deckhouse.io/storage label, consuming every BlockDevice labelled app=elastic-osd.
d8 k apply -f - <<EOF
apiVersion: storage.deckhouse.io/v1alpha1
kind: ElasticCluster
metadata:
name: ceph-prod
spec:
storage:
nodeSelector:
matchExpressions:
- { key: node-role.deckhouse.io/storage, operator: Exists }
blockDeviceSelector:
matchLabels:
app: elastic-osd
network:
public: 10.12.0.0/16
cluster: 10.12.0.0/16
EOFWait until the ElasticCluster reports Ready:
d8 k get elasticcluster ceph-prod -wThe Phase column is expected to switch from Pending to InProgress and finally to Ready. The full per-stage progression is exposed through conditions: StorageReady → CephClusterReady → CredentialsReady → CsiCephReady → aggregate Ready.
Verify the underlying objects:
d8 k get lvmvolumegroup -l sds-elastic.deckhouse.io/cluster=ceph-prod
d8 k get lvmlogicalvolume -l sds-elastic.deckhouse.io/cluster=ceph-prod
d8 k get pv -l sds-elastic.deckhouse.io/cluster=ceph-prod
d8 k -n d8-sds-elastic get pod -owideThe controller also creates an internal ElasticClusterCredential that mirrors rook-ceph-mon Secret fields:
d8 k get elasticclustercredential ceph-prod -o yamlBlockDevice Adoption and Ownership
Once an ElasticCluster selects a BlockDevice for the first time, the controller patches it with the sds-elastic.deckhouse.io/cluster=<cluster-name> label. The label is the durable record of which cluster owns the device and drives several behaviors:
-
Single owner per BlockDevice. If a
BlockDevicematches theblockDeviceSelectorof twoElasticClusterresources, the second one cannot adopt it. The controller refuses to overwrite the existing label and surfacesStorageReady=FalsewithReason=OwnershipConflictand a message listing each contested BD and its current owner. No LVMVolumeGroup, LVMLogicalVolume, or local PersistentVolume is created until every conflict is resolved — even free BDs in the selector remain unadopted while a conflict is pending.To resolve a conflict, decide which cluster should own the BD and clear the label on the other side:
d8 k label blockdevice <bd-name> sds-elastic.deckhouse.io/cluster-Or remove the conflicting
ElasticClusterentirely. The next reconcile picks the BD up. -
Sticky adoption — adopted BlockDevices stay with the cluster. Once a BD has been labelled by the controller, it remains part of the cluster’s working set even if it later drifts out of
blockDeviceSelectorornodeSelector(for example, the operator narrows the selector, the device’s labels change, or its node is relabelled). This is intentional: the OSD on top of it is already provisioned, the local PV is bound to a specific node, and dropping it from the working set would shrinkCephCluster.spec.storageClassDeviceSets[0].countand risk data unavailability. The cluster’s OSD count is therefore monotonic for the lifetime of anElasticCluster— it can grow when new BDs match the selector but never shrinks on its own.As a side effect,
sds-node-configuratorflipsBlockDevice.status.consumabletofalseonce a VG appears on the device. Sticky adoption prevents this from kicking the BD out of the working set on the very next reconcile. -
Releasing a BlockDevice. There is no automatic disown path on this experimental stage (planned as part of B20 — OwnerReferences and finalizer-driven teardown). Deleting the
ElasticClusterdoes NOT cascade to the per-device objects: the controller only removes the RookCephClusterand the csi-cephCephClusterConnection, leaving theLVMVolumeGroup/LVMLogicalVolume/ localPersistentVolumeand the BD label for you to clean up by label (seeDeleting Resourcesbelow). To retire a single BD from a live cluster, manually delete the correspondingLVMLogicalVolumeandLVMVolumeGroup, and only then clear the label:d8 k delete lvmlogicalvolume <name> d8 k delete lvmvolumegroup <name> d8 k label blockdevice <bd-name> sds-elastic.deckhouse.io/cluster-Doing this while pools still hold useful data risks losing replicas.
-
Editing the selectors after creation.
ElasticCluster.spec.storage.nodeSelectorandspec.storage.blockDeviceSelectorare editable after creation —kubectl edit elasticcluster <name>and adjust the matchers. The validating webhook on UPDATE enforces two safety rails:- Orphan-guard. If an edit would leave an already-adopted BD outside the new selector pair (its labels no longer match
blockDeviceSelector, or itsstatus.nodeNameis no longer in the set produced bynodeSelector), the webhook rejects the request and lists the offending BDs. Adopted BDs cannot be released automatically — follow the manual procedure above first. - Pre-flight conflict detection. If a widening edit would pull in a BD already labelled by another
ElasticCluster, the webhook rejects the request and reports the contested BDs along with their current owners. Resolve the conflict (clear the label, or delete the other EC) before retrying.
spec.networkremains immutable on UPDATE: changing the public/cluster CIDRs on a live cluster invalidates mon endpoints and host-network bindings, and there is no safe automatic remediation. To change the network configuration, delete and re-create theElasticCluster. - Orphan-guard. If an edit would leave an already-adopted BD outside the new selector pair (its labels no longer match
Declaring StorageClasses
Pools and the matching csi-ceph StorageClasses are declared per ElasticStorageClass. One ESC produces one Ceph pool + one CephStorageClass named after the ESC.
RBD pool with default replication (3 replicas)
d8 k apply -f - <<EOF
apiVersion: storage.deckhouse.io/v1alpha1
kind: ElasticStorageClass
metadata:
name: ceph-prod-rbd
spec:
clusterRef: ceph-prod
type: RBD
replication: ConsistencyAndAvailability
EOFCephFS pool with default replication (3 replicas)
d8 k apply -f - <<EOF
apiVersion: storage.deckhouse.io/v1alpha1
kind: ElasticStorageClass
metadata:
name: ceph-prod-cephfs
spec:
clusterRef: ceph-prod
type: CephFS
replication: ConsistencyAndAvailability
EOFThe
ErasureCodedCompactreplication mode is temporarily disabled and cannot be selected.
Pool that survives two simultaneous host failures (HighRedundancy)
d8 k apply -f - <<EOF
apiVersion: storage.deckhouse.io/v1alpha1
kind: ElasticStorageClass
metadata:
name: ceph-prod-rbd-hr
spec:
clusterRef: ceph-prod
type: RBD
replication: HighRedundancy
EOFHighRedundancy produces a 4-replica pool (size=4, min_size=2, requireSafeReplicaSize=true):
- two simultaneous host failures keep I/O continuous (2 replicas equal
min_size); - a third simultaneous failure pauses I/O but does not lose data — Ceph backfills the surviving copy onto free cluster space and resumes;
- data loss only at the fourth simultaneous failure.
The mode requires at least 5 storage nodes (4 for the pool’s CRUSH placement at failureDomain=host and 5 to host a 5-mon quorum). The first time you create a HighRedundancy ESC against an ElasticCluster, the controller automatically promotes the underlying CephCluster to mon.count=5, mgr.count=3 (the standard topology is 3, 2). The promotion is sticky: deleting the last HighRedundancy ESC does NOT roll the counts back, because silently weakening a live cluster’s fault-tolerance guarantee is unsafe.
A validating webhook gates ESC creation on the same thresholds so the sticky promotion cannot fire on an undersized cluster. CREATE of an ESC with replication: HighRedundancy is rejected when:
- the parent
ElasticClusterreferenced byspec.clusterRefdoes not exist; - fewer than 5 nodes match
ElasticCluster.spec.storage.nodeSelector(the 5-mon quorum floor); - adopted
BlockDeviceresources of the parent EC live on fewer than 4 distinct nodes (the 4-replica CRUSH placement floor).
So the bootstrap order is fixed: apply the ElasticCluster first, wait until at least four storage nodes have adopted BDs (check via kubectl get bd -l sds-elastic.deckhouse.io/cluster=<ec> or EC.status.phase=Ready), and only then apply the HighRedundancy ESC. Trying to ship the EC and the HR ESC in the same kubectl apply is rejected by admission — the EC arrives first, but its adopted-BD set is still empty when the ESC admission runs.
The audit trail lives on ElasticCluster.status.cephTopology:
d8 k get elasticcluster ceph-prod -o jsonpath='{.status.cephTopology}'
# {"monCount":5,"mgrCount":3,"reason":"HighRedundancyESCPresent","lastPromotedAt":"2026-…"}Possible reason values: Standard, HighRedundancyESCPresent, StickyHighWaterMark. To force a recompute (for example, after deliberately scaling down to a smaller cluster), clear the field via the status subresource and trigger a reconcile:
d8 k patch elasticcluster ceph-prod \
--type=merge --subresource=status \
-p '{"status":{"cephTopology":null}}'Wait until each ESC reports Ready:
d8 k get elasticstorageclass -wThe conditions transition is PoolReady → CsiStorageClassReady → aggregate Ready.
Verify the resulting csi-ceph objects and Kubernetes StorageClasses:
d8 k get cephclusterconnection
d8 k get cephstorageclass
d8 k get scA CephClusterConnection named after the parent ElasticCluster (ceph-prod) and one CephStorageClass per ElasticStorageClass (ceph-prod-rbd, ceph-prod-cephfs) are expected. Each csi-ceph CephStorageClass produces a Kubernetes StorageClass with the same name, ready to be consumed by PersistentVolumeClaim resources.
The internal helm-managed StorageClass sds-elastic-osd (provisioner kubernetes.io/no-provisioner, volumeBindingMode: WaitForFirstConsumer) backs OSD-local PersistentVolumes and is intentionally not user-facing — ElasticStorageClass resources cannot reuse this name (the validating webhook rejects them).
Deleting Resources
Deleting an ElasticCluster
Deleting an ElasticCluster is reversible as long as no ElasticStorageClass still references it: the controller removes only the resources you cannot delete by hand — the Rook CephCluster and the csi-ceph CephClusterConnection, both protected by the vendor-cr-validation webhook. The OSD disks and the mon store are left intact, so the cluster can be re-created from the same devices.
Order of operations:
-
Delete every dependent
ElasticStorageClassfirst (see below). The controller refuses to start the cluster teardown while any ESC references it. -
Delete the
ElasticCluster:d8 k delete elasticcluster ceph-prodHeld by a finalizer, the controller deletes the
CephClusterandCephClusterConnection, then releases the CR. -
Clean up the remaining controller-labelled objects by hand — they are intentionally preserved (no automatic cascade):
# inspect what is still labelled with the cluster name d8 k get pv,lvmlogicalvolume,lvmvolumegroup -l sds-elastic.deckhouse.io/cluster=ceph-prod d8 k delete pv -l sds-elastic.deckhouse.io/cluster=ceph-prod d8 k delete lvmlogicalvolume -l sds-elastic.deckhouse.io/cluster=ceph-prod d8 k delete lvmvolumegroup -l sds-elastic.deckhouse.io/cluster=ceph-prod # finally clear the cluster label from the BlockDevices d8 k label blockdevice -l sds-elastic.deckhouse.io/cluster=ceph-prod sds-elastic.deckhouse.io/cluster-Keep the
ElasticClusterCredentialif you plan to re-create the cluster with the same identity.
While the teardown is in progress the ElasticCluster Ready condition explains what is blocking it:
| Reason | Meaning | Action |
|---|---|---|
StorageClassesExist |
One or more ElasticStorageClass still reference this cluster. |
Delete the listed ElasticStorageClass objects first. |
VolumesExist |
The storage backend still has bound PersistentVolumes. |
Delete the remaining PersistentVolumes; teardown then continues automatically. |
Terminating |
Backend resources are being removed. | Wait for completion. |
Deleting an ElasticStorageClass
Delete an ElasticStorageClass to remove the corresponding pool and CephStorageClass:
d8 k delete elasticstorageclass ceph-prod-rbdDeleting an ElasticStorageClass is destructive: it tears down the underlying storage pool / filesystem and the data stored in it. Make sure no application still needs the data first.
Held by a finalizer, the controller runs an ordered teardown:
- It refuses to delete anything while any
PersistentVolumeprovisioned from this StorageClass is stillBound. Delete the consumingPersistentVolumeClaims first — this guard cannot be overridden. - Once nothing is bound, it removes the
CephStorageClassand tears down the backing pool / filesystem.
For block (RBD) classes, a pool that still holds data is preserved by default. To permanently delete it (the data in the pool is lost), authorise the destructive purge with the force-deletion annotation:
d8 k annotate elasticstorageclass ceph-prod-rbd sds-elastic.deckhouse.io/force-deletion=trueFor shared-filesystem (CephFS) classes there is no force override: the filesystem is removed automatically once it is empty, which you achieve by deleting the remaining PersistentVolumes for the StorageClass.
While the teardown is in progress the ElasticStorageClass Ready condition explains what is blocking it:
| Reason | Meaning | Action |
|---|---|---|
BoundVolumesExist |
PersistentVolumes provisioned from this StorageClass are still bound. |
Delete the consuming PersistentVolumeClaims. The force annotation does not override this. |
DataPresentInPool |
The block pool still holds data (RBD only). | Set sds-elastic.deckhouse.io/force-deletion=true to permanently delete the pool and its data. |
FilesystemNotEmpty |
The filesystem still has volumes (CephFS only). | Delete the remaining PersistentVolumes for this StorageClass. |
Terminating |
Backend resources are being removed. | Wait for completion. |
PV / LVM / BlockDevice cleanup after deleting an ElasticCluster is manual (see above); end-to-end OwnerReferences-driven GC is tracked as backlog item B20.
Disabling the Module
Disabling the module stops the controller and the Rook operator. Data stored in Ceph clusters managed by this module may become unavailable or be lost. Always delete every ElasticCluster, ElasticStorageClass and ElasticClusterCredential object before disabling the module.
A validating webhook on the sds-elastic ModuleConfig rejects setting spec.enabled: false while any ElasticCluster still exists. This prevents accidentally tearing down the controller and the Rook operator while a live Ceph cluster (OSD data on host disks) is still under management. Follow the ordered teardown below; the disable is accepted only once the last ElasticCluster is gone.
-
Delete every
ElasticStorageClassand wait until the controller has removed the pools and csi-ceph StorageClasses:d8 k get elasticstorageclasses.storage.deckhouse.ioWait until the command returns
No resources found. -
Delete every
ElasticClusterand wait for cluster teardown:d8 k get elasticclusters.storage.deckhouse.ioWait until the command returns
No resources found. -
Optionally remove the
ElasticClusterCredential. It is a cluster-scoped identity backup and does not gate the disable (only a liveElasticClusterblocks it). Delete it unless you plan to re-create the cluster with the same identity:d8 k get elasticclustercredentials.storage.deckhouse.io d8 k delete elasticclustercredential <name> -
Disable the module. Disabling requires the
modules.deckhouse.io/allow-disabling: "true"annotation on the ModuleConfig:d8 k annotate moduleconfig sds-elastic modules.deckhouse.io/allow-disabling=true --overwrite d8 k patch moduleconfig sds-elastic --type=merge -p '{"spec":{"enabled":false}}'
Forcing the Module Off While ElasticClusters Remain
This bypasses the safety guard. Use it only for disaster recovery, when you deliberately want to keep the ElasticCluster CRs and their on-disk data but stop the module from managing them. The Ceph cluster will be left orphaned (no operator), and the controller finalizers on the leftover CRs are stripped by the module-delete hook so the API server can garbage-collect them. OSD data on host disks and dataDirHostPath are not erased, but they are no longer managed and may become unrecoverable through normal means.
If you must disable the module without deleting the ElasticClusters first, set the sds-elastic.deckhouse.io/force-disable: "true" annotation on the ModuleConfig. With this annotation present, the webhook allows spec.enabled: false regardless of how many ElasticClusters exist:
d8 k annotate moduleconfig sds-elastic sds-elastic.deckhouse.io/force-disable=true --overwrite
d8 k annotate moduleconfig sds-elastic modules.deckhouse.io/allow-disabling=true --overwrite
d8 k patch moduleconfig sds-elastic --type=merge -p '{"spec":{"enabled":false}}'Checking Cluster Health
The controller exposes coarse-grained progress on each CR through conditions. For an ElasticCluster:
d8 k describe elasticcluster <cluster-name>Useful conditions: StorageReady, CephClusterReady, CredentialsReady, CsiCephReady, UpgradeReady, UpgradeInProgress, and the aggregate Ready.
The UPGRADING printcolumn (and the underlying UpgradeInProgress condition) tracks the per-daemon convergence picture Rook publishes under CephCluster.status.ceph.versions.overall. While the map carries more than one key the cluster is mid-rollout and UPGRADING stays True for the whole window — including the mon → mgr → osd → mds rolling phases when CephCluster.status.phase=Progressing and the FSM gates downstream stages. UpgradeInProgress flips back to False only once versions.overall has a single key matching the desired version. Note that EC.status.cephVersion.running (the Ceph printcolumn) reports the lagging version present in versions.overall while daemons disagree, so it shows what callers will still hit on the slowest-rolling daemon (typically OSDs), not Rook’s already-bumped target marker.
For an ElasticStorageClass:
d8 k describe elasticstorageclass <esc-name>Useful conditions: PoolReady, CsiStorageClassReady, and the aggregate Ready.
For a deeper Ceph-level inspection, exec into a Rook toolbox pod:
d8 k -n d8-sds-elastic exec -it deploy/rook-ceph-tools -- ceph status
d8 k -n d8-sds-elastic exec -it deploy/rook-ceph-tools -- ceph osd tree