Commander internals
Deckhouse Commander has an external dependency — PostgreSQL database.
The API server is the central component. Data is stored in PostgreSQL. Options for installing Deckhouse Commander with a DBMS are listed in the Installation section.
The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.
Web Application uses the API for managing clusters and other Deckhouse Commander entities.
Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.
Cluster manager is single-threaded. Therefore, the efficiency of cluster processing depends on the number of clusters and the number of cluster manager replicas. When a cluster is created in API, the API server creates an installation task. Then the free instance of the cluster manager takes the task to work on. The same happens for cluster update, delete or reconciliation operations. Cluster Manager is responsible for launching installers — dhctl servers. Cluster Manager launches a replica of the dhctl server with only the necessary version for each Deckhouse Kubernetes Platform (DKP) cluster.
In each cluster, Deckhouse Commander automatically installs the commander-agent
module. This
module is responsible for synchronizing Kubernetes resources in the application cluster, as well as sending
telemetry to the Deckhouse Commander API server. Telemetry includes basic metrics (total
number of CPUs, total memory, number of nodes, and total storage), DKP version, Kubernetes version, and DKP component availability.
Network Availability Requirements For Components
Deckhouse Commander components require network connectivity with application clusters. This availability is not needed all the time. However, during network connectivity, a comparison of the target and actual cluster configuration occurs, and only with network access can the cluster be brought to the desired state. What network access is needed for full functionality:
- 22/TCP from Deckhouse Commander to the application cluster
- Cluster Manager (dhctl) connects via SSH to the master node for initial setup or for destroying the cluster.
- 443/TCP or 80/TCP from the application cluster to Deckhouse Commander
- 443 or 80 or other value — depends on the ingress controller parameters in the control
cluster and on the
https.mode
parameter in the control cluster, specified in global settings or Deckhouse Commander module settings; - the agent collects configuration to update it and also sends telemetry to the server;
- the agent creates a TCP tunnel through which the Cluster Manager controls the infrastructure part of the configuration.
- 443 or 80 or other value — depends on the ingress controller parameters in the control
cluster and on the
By default, Deckhouse Commander configures the commander-agent to use HTTPS. To change this, you need to explicitly set the commander-agent module configuration in the cluster template:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: commander-agent
labels:
heritage: deckhouse-commander
spec:
enabled: true
version: 1
settings:
# http is set explicitly ↓
commanderUrl: "http://{{ .dc_domain }}/agent_api/{{ .dc_clusterUUID }}"
Data encryption
Deckhouse Commander encrypts sensitive data stored in the database using keys that are automatically generated when the module is enabled and stored in the commander-envs
secret.
Attention: It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!
❯ kubectl -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
name: commander-envs
namespace: d8-commander
type: Opaque
Collecting logs of the history of changes
In Deckhouse Commander version 1.9 and later, events related to the history of changes are printed to the standard output and are tagged with the [audit]
label. These logs can be collected and sent using the log-shipper module.
Logs example:
2024-12-18 17:27:17.503 [2024-12-18 17:27:17.466636971] [1ee79b305b5fdbd577231ad6dd8f2951] [12] [audit] {"auditable_type":"ClusterSettings","auditable_id":"7f0e29e8-f427-4c5f-9ecf-de457b2d427f","action":"create","user_type":"User","remote_address":null,"request_uuid":"05a02066-a8e9-4e23-a79f-5e5a32533e61","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:27:17.503 [2024-12-18 17:27:17.471032472] [1ee79b305b5fdbd577231ad6dd8f2951] [12] [audit] {"auditable_type":"Cluster","auditable_id":"b8004e7e-cc57-499c-a1e8-38cc5ba78220","action":"create","user_type":"User","remote_address":null,"request_uuid":"92f4bef8-a9a4-466b-9644-6bdf1775f24c","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:28:29.121 [2024-12-18 17:28:29.115108882] [8ebaca9149c21f8949a877b27bb363f8] [12] [audit] {"auditable_type":"Record","auditable_id":"477b5718-b18b-4468-b95b-700b3068ad91","action":"create","user_type":"User","remote_address":null,"request_uuid":"7b590af2-acfb-4708-a66f-09e42c1f4b7c","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:28:29.148 [2024-12-18 17:28:29.146244011] [612d6c1911933035ff3a4800a9faa6cf] [12] [audit] {"auditable_type":"Record","auditable_id":"9a5ad514-c7cf-41e1-ad12-edadb6fd521c","action":"create","user_type":"User","remote_address":null,"request_uuid":"e462f9da-e879-4795-9506-2dbb747fed08","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:57:12.360 [2024-12-18 17:57:12.358538533] [6472d45c-bdd7-4c08-8159-4c90939ed47c] [12] [audit] {"auditable_type":"Cluster","auditable_id":"b8004e7e-cc57-499c-a1e8-38cc5ba78220","action":"update","user_type":null,"remote_address":null,"request_uuid":"cfe477a3-3a30-49fe-9a59-55f7030b890d","workspace_slug":"wfqmr","user_name":null}
Configuration example:
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
name: commander-audit-logs
spec:
destinationRefs:
- loki-example
kubernetesPods:
labelSelector:
matchLabels:
app: backend
namespaceSelector:
matchNames:
- d8-commander
labelFilter:
- field: message
operator: Regex
values:
- .*\audit\.*
type: KubernetesPods
---
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLogDestination
metadata:
name: loki-example
spec:
type: Loki
loki:
endpoint: http://loki-example.loki.svc:3100
For more detailed configuration information, see the documentation of the log-shipper module.
Changing the storage class
Option 1 (preferred)
-
Perform a backup of the database instance
kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
-
Change storageClass in the module settings, replacing
<NEW_STORAGECLASS_NAME>
with the name of the necessary storage classThe list of available storage classes can be found using the command
kubectl get storageclasses
kubectl patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check the logs of the postgres operator
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
-
Increase the number of replicas of the PostgreSQL database (optional)
This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas
kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}' postgresql.acid.zalan.do/commander-postgres patched
Check the logs of the operator and the postgres instance
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
kubectl -n d8-commander logs commander-postgres-1 2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
-
Perform the master switch
kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover Current cluster topology + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Leader | running | 5 | | | commander-postgres-1 | 10.111.2.239 | Replica | running | 5 | 0 | +----------------------+--------------+---------+---------+----+-----------+ Candidate ['commander-postgres-1'] []: commander-postgres-1 Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y 2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1" + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Replica | stopped | | unknown | | commander-postgres-1 | 10.111.2.239 | Leader | running | 5 | | +----------------------+--------------+---------+---------+----+-----------+
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Replica | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Leader | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
Check that the disk of the new DB replica was created with the necessary
storageClass
kubectl -n d8-commander get pvc --selector application=spilo NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-hdd 36h pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 2m54s
-
Delete the disk and pod of the first postgres instance
kubectl -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false kubectl -n d8-commander delete po commander-postgres-0
Check logs
kubectl -n d8-commander logs commander-postgres-0 2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0 2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster 2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)
Check that the disk was created with the correct
storageClass
kubectl -n d8-commander get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-ssd 2m6s pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 7m11s
-
Perform the master switch one more time
kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover Current cluster topology + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Replica | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Leader | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+ Candidate ['commander-postgres-0'] []: commander-postgres-0 Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y 2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0" + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | | | commander-postgres-1 | 10.111.2.239 | Replica | stopped | | unknown | +----------------------+--------------+---------+---------+----+-----------+
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Replica | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
-
Reducing the number of replicas of the PostgreSQL database (optional)
This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas
kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}' postgresql.acid.zalan.do/commander-postgres patched
Check the operator logs
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}
-
Delete the disk and pod of the first instance (if HighAvailability mode is active and and PostgreSQL has 2 replicas)
This step must be skipped if HighAvailability mode is not active
kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false kubectl -n d8-commander delete po commander-postgres-1
Check logs
kubectl -n d8-commander logs commander-postgres-1 2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1 2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster 2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
Check that the disk was created with the necessary
storageClass
kubectl -n d8-commander get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-ssd 7m6s pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 1m11s
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Replica | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
-
Delete the unused disk of the temporary database replica (if HighAvailability is not active)
This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas
kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 persistentvolumeclaim "pgdata-commander-postgres-1" deleted
Option 2
-
Perform a backup of the database instance
kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
-
Turn off the
commander
modulekubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check that the
d8-commander
namespace has been deletedkubectl get namespace d8-commander Error from server (NotFound): namespaces "d8-commander" not found
-
Set the required storage class and enable the
commander
modulekubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check that the DB instance has the
Running
statuskubectl -n d8-commander get po commander-postgres-0 NAME READY STATUS RESTARTS AGE commander-postgres-0 1/1 Running 0 2m4s
-
Restoring a previously saved backup of the database
kubectl -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump