Administration

The module lifecycle stage: General Availability

Commander internals

Diagram

Deckhouse Commander has an external dependency — PostgreSQL database.

The API server is the central component. Data is stored in PostgreSQL. Options for installing Deckhouse Commander with a DBMS are listed in the Installation section.

The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.

Web Application uses the API for managing clusters and other Deckhouse Commander entities.

Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.

Cluster manager is single-threaded. Therefore, the efficiency of cluster processing depends on the number of clusters and the number of cluster manager replicas. When a cluster is created in API, the API server creates an installation task. Then the free instance of the cluster manager takes the task to work on. The same happens for cluster update, delete or reconciliation operations. Cluster Manager is responsible for launching installers — dhctl servers. Cluster Manager launches a replica of the dhctl server with only the necessary version for each Deckhouse Kubernetes Platform (DKP) cluster.

In each cluster, Deckhouse Commander automatically installs the commander-agent module. This module is responsible for synchronizing Kubernetes resources in the application cluster, as well as sending telemetry to the Deckhouse Commander API server. Telemetry includes basic metrics (total number of CPUs, total memory, number of nodes, and total storage), DKP version, Kubernetes version, and DKP component availability.

Network Availability Requirements For Components

Deckhouse Commander components require network connectivity with application clusters. This availability is not needed all the time. However, during network connectivity, a comparison of the target and actual cluster configuration occurs, and only with network access can the cluster be brought to the desired state. What network access is needed for full functionality:

22/TCP from Deckhouse Commander to the application cluster
- Cluster Manager (dhctl) connects via SSH to the master node for initial setup or for destroying the cluster.
443/TCP or 80/TCP from the application cluster to Deckhouse Commander
- 443 or 80 or other value — depends on the ingress controller parameters in the control cluster and on the https.mode parameter in the control cluster, specified in global settings or Deckhouse Commander module settings;
- the agent collects configuration to update it and also sends telemetry to the server;
- the agent creates a TCP tunnel through which the Cluster Manager controls the infrastructure part of the configuration.

By default, Deckhouse Commander configures the commander-agent to use HTTPS. To change this, you need to explicitly set the commander-agent module configuration in the cluster template:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: commander-agent
  labels:
    heritage: deckhouse-commander
spec:
  enabled: true
  version: 1
  settings:
    # http is set explicitly ↓
    commanderUrl: "http://{{ .dc_domain }}/agent_api/{{ .dc_clusterUUID }}"

Data encryption

Deckhouse Commander encrypts sensitive data stored in the database using keys that are automatically generated when the module is enabled and stored in the commander-envs secret.

Attention: It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!

❯ kubectl -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
  ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
  ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
  ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
  name: commander-envs
  namespace: d8-commander
type: Opaque

Collecting logs of the history of changes

In Deckhouse Commander version 1.9 and later, events related to the history of changes are printed to the standard output and are tagged with the ["audit"] label. These logs can be collected and sent using the log-shipper module.

Logs example:

{"level":"INFO","time":"2025-06-18 14:22:15 +0300","request_id":"ea09d409dc3c95dcf658fc2c2838084b","pid":19,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"8a0041ef-6c30-48bc-b3ca-e9db3e22be47","action":"create","user_type":"User","remote_address":"82.150.57.81","request_uuid":"ea09d409dc3c95dcf658fc2c2838084b","workspace_slug":"xcjtd","user_name":"admin@company.my","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:22:15 +0300","request_id":"ea09d409dc3c95dcf658fc2c2838084b","pid":19,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"056f7fe5-7d22-4a76-b5e2-f225c0a99613","action":"create","user_type":"User","remote_address":"82.150.57.81","request_uuid":"ea09d409dc3c95dcf658fc2c2838084b","workspace_slug":"xcjtd","user_name":"admin@company.my","audited_changes":{"name":"mycluster","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:23:57 +0300","request_id":"a1eaf50bbc87a8cca4cd17d8be8fffdb","pid":12,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"707c46b1-b2c8-4fab-9392-8216a2058219","action":"create","user_type":"AuthToken","remote_address":"238.106.231.86","request_uuid":"a1eaf50bbc87a8cca4cd17d8be8fffdb","workspace_slug":"bfqcc","user_name":"api-user","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:23:57 +0300","request_id":"a1eaf50bbc87a8cca4cd17d8be8fffdb","pid":12,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"42d432aa-8250-4ef0-b260-51639e1445d0","action":"create","user_type":"AuthToken","remote_address":"238.106.231.86","request_uuid":"a1eaf50bbc87a8cca4cd17d8be8fffdb","workspace_slug":"bfqcc","user_name":"api-user","audited_changes":{"name":"15731486914-1-con-1-30","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:28:56 +0300","request_id":"069566a46c004e53b686189587d484a9","pid":19,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"402a4d4d-5c14-4466-a1f3-3d990d7cf35a","action":"create","user_type":"User","remote_address":"30.231.184.26","request_uuid":"069566a46c004e53b686189587d484a9","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:28:56 +0300","request_id":"069566a46c004e53b686189587d484a9","pid":19,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"9ee687d4-18fe-423c-bbaa-e8e46ea47e67","action":"create","user_type":"User","remote_address":"30.231.184.26","request_uuid":"069566a46c004e53b686189587d484a9","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"name":"mycluster2","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:29:06 +0300","request_id":"d29b248fbce414db8b71f821a3b1886e","pid":12,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"e0f3c3de-2129-4b75-b927-72a8eb26902b","action":"update","user_type":"User","remote_address":"30.231.184.26","request_uuid":"d29b248fbce414db8b71f821a3b1886e","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"archived_at":[null,"2025-06-18T14:29:05.943+03:00"]}}

Configuration example:

apiVersion: deckhouse.io/v1alpha2
kind: ClusterLoggingConfig
metadata:
  name: commander-audit-logs
spec:
  destinationRefs:
  - loki-example
  kubernetesPods:
    labelSelector:
      matchLabels:
        app: backend
    namespaceSelector:
      labelSelector:
        matchLabels:
          kubernetes.io/metadata.name: d8-commander
  labelFilter:
  - field: message
    operator: Regex
    values:
    - .*\[\"audit\"\].*
  type: KubernetesPods
---
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLogDestination
metadata:
  name: loki-example
spec:
  type: Loki
  loki:
    endpoint: http://loki-example.loki.svc:3100

For more detailed configuration information, see the documentation of the log-shipper module.

Changing the storage class

Option 1 (preferred)

Perform a backup of the database instance

kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump

Change storageClass in the module settings, replacing <NEW_STORAGECLASS_NAME> with the name of the necessary storage class

The list of available storage classes can be found using the command kubectl get storageclasses

kubectl patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'

moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check the logs of the postgres operator

kubectl -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}

Increase the number of replicas of the PostgreSQL database (optional)

This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas

kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}'

postgresql.acid.zalan.do/commander-postgres patched

Check the logs of the operator and the postgres instance

kubectl -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}

kubectl -n d8-commander logs commander-postgres-1

2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)

Perform the master switch

kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover

Current cluster topology
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Leader  | running |  5 |           |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  5 |         0 |
+----------------------+--------------+---------+---------+----+-----------+
Candidate ['commander-postgres-1'] []: commander-postgres-1
Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y
2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1"
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Replica | stopped |    |   unknown |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  5 |           |
+----------------------+--------------+---------+---------+----+-----------+

Make sure that both DB instances are in the running state

kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Replica | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Check that the disk of the new DB replica was created with the necessary storageClass

kubectl -n d8-commander get pvc --selector application=spilo
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-hdd    36h
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    2m54s

Delete the disk and pod of the first postgres instance

kubectl -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false
kubectl -n d8-commander delete po commander-postgres-0

Check logs

kubectl -n d8-commander logs commander-postgres-0

2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0
2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster
2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)

Check that the disk was created with the correct storageClass

kubectl -n d8-commander get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    2m6s
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    7m11s

Perform the master switch one more time

kubectl -n d8-commander exec -it commander-postgres-0  -- patronictl failover

Current cluster topology
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Replica | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+
Candidate ['commander-postgres-0'] []: commander-postgres-0
Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y
2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0"
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |           |
| commander-postgres-1 | 10.111.2.239 | Replica | stopped |    |   unknown |
+----------------------+--------------+---------+---------+----+-----------+

Make sure that both DB instances are in the running state

kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Reducing the number of replicas of the PostgreSQL database (optional)

This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas

kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}'

postgresql.acid.zalan.do/commander-postgres patched

Check the operator logs

kubectl -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}

Delete the disk and pod of the first instance (if HighAvailability mode is active and and PostgreSQL has 2 replicas)

This step must be skipped if HighAvailability mode is not active

kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false
kubectl -n d8-commander delete po commander-postgres-1

Check logs

kubectl -n d8-commander logs commander-postgres-1

2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1
2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster
2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)

Check that the disk was created with the necessary storageClass

kubectl -n d8-commander get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    7m6s
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    1m11s

Make sure that both DB instances are in the running state

kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Delete the unused disk of the temporary database replica (if HighAvailability is not active)

This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas
```
kubectl -n d8-commander delete pvc pgdata-commander-postgres-1

persistentvolumeclaim "pgdata-commander-postgres-1" deleted
```

Option 2

Perform a backup of the database instance

kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump

Turn off the commander module

kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}'

moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check that the d8-commander namespace has been deleted

kubectl get namespace d8-commander
Error from server (NotFound): namespaces "d8-commander" not found

Set the required storage class and enable the commander module

 kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'

 moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check that the DB instance has the Running status

kubectl -n d8-commander get po commander-postgres-0

NAME                   READY   STATUS    RESTARTS   AGE
commander-postgres-0   1/1     Running   0          2m4s

Restoring a previously saved backup of the database

kubectl -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump

Commander internals

Network Availability Requirements For Components

Data encryption

Collecting logs of the history of changes

Changing the storage class

Option 1 (preferred)

Option 2

An error has occurred

Tell us what you didn’t like.

Administration

Commander internals

Network Availability Requirements For Components

Data encryption

Collecting logs of the history of changes

Changing the storage class

Option 1 (preferred)

Option 2

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error