Commander internals

Diagram

Deckhouse Commander has an external dependency — PostgreSQL database.

The API server is the central component. Data is stored in PostgreSQL. Options for installing Deckhouse Commander with a DBMS are listed in the Installation section.

The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.

Web Application uses the API for managing clusters and other Deckhouse Commander entities.

Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.

Cluster manager is single-threaded. Therefore, the efficiency of cluster processing depends on the number of clusters and the number of cluster manager replicas. When a cluster is created in API, the API server creates an installation task. Then the free instance of the cluster manager takes the task to work on. The same happens for cluster update, delete or reconciliation operations. Cluster Manager is responsible for launching installers — dhctl servers. Cluster Manager launches a replica of the dhctl server with only the necessary version for each Deckhouse Kubernetes Platform (DKP) cluster.

In each cluster, Deckhouse Commander automatically installs the commander-agent module. This module is responsible for synchronizing Kubernetes resources in the application cluster, as well as sending telemetry to the Deckhouse Commander API server. Telemetry includes basic metrics (total number of CPUs, total memory, number of nodes, and total storage), DKP version, Kubernetes version, and DKP component availability.

Network Availability Requirements For Components

Deckhouse Commander components require network connectivity with application clusters. This availability is not needed all the time. However, during network connectivity, a comparison of the target and actual cluster configuration occurs, and only with network access can the cluster be brought to the desired state. What network access is needed for full functionality:

  • 22/TCP from Deckhouse Commander to the application cluster
    • Cluster Manager (dhctl) connects via SSH to the master node for initial setup or for destroying the cluster.
  • 443/TCP or 80/TCP from the application cluster to Deckhouse Commander
    • 443 or 80 or other value — depends on the ingress controller parameters in the control cluster and on the https.mode parameter in the control cluster, specified in global settings or Deckhouse Commander module settings;
    • the agent collects configuration to update it and also sends telemetry to the server;
    • the agent creates a TCP tunnel through which the Cluster Manager controls the infrastructure part of the configuration.

By default, Deckhouse Commander configures the commander-agent to use HTTPS. To change this, you need to explicitly set the commander-agent module configuration in the cluster template:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: commander-agent
  labels:
    heritage: deckhouse-commander
spec:
  enabled: true
  version: 1
  settings:
    # http is set explicitly ↓
    commanderUrl: "http://{{ .dc_domain }}/agent_api/{{ .dc_clusterUUID }}"

Data encryption

Deckhouse Commander encrypts sensitive data stored in the database using keys that are automatically generated when the module is enabled and stored in the commander-envs secret.

Attention: It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!

❯ kubectl -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
  ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
  ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
  ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
  name: commander-envs
  namespace: d8-commander
type: Opaque

Collecting logs of the history of changes

In Deckhouse Commander version 1.9 and later, events related to the history of changes are printed to the standard output and are tagged with the [audit] label. These logs can be collected and sent using the log-shipper module.

Logs example:

2024-12-18 17:27:17.503 [2024-12-18 17:27:17.466636971] [1ee79b305b5fdbd577231ad6dd8f2951] [12] [audit] {"auditable_type":"ClusterSettings","auditable_id":"7f0e29e8-f427-4c5f-9ecf-de457b2d427f","action":"create","user_type":"User","remote_address":null,"request_uuid":"05a02066-a8e9-4e23-a79f-5e5a32533e61","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:27:17.503 [2024-12-18 17:27:17.471032472] [1ee79b305b5fdbd577231ad6dd8f2951] [12] [audit] {"auditable_type":"Cluster","auditable_id":"b8004e7e-cc57-499c-a1e8-38cc5ba78220","action":"create","user_type":"User","remote_address":null,"request_uuid":"92f4bef8-a9a4-466b-9644-6bdf1775f24c","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:28:29.121 [2024-12-18 17:28:29.115108882] [8ebaca9149c21f8949a877b27bb363f8] [12] [audit] {"auditable_type":"Record","auditable_id":"477b5718-b18b-4468-b95b-700b3068ad91","action":"create","user_type":"User","remote_address":null,"request_uuid":"7b590af2-acfb-4708-a66f-09e42c1f4b7c","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:28:29.148 [2024-12-18 17:28:29.146244011] [612d6c1911933035ff3a4800a9faa6cf] [12] [audit] {"auditable_type":"Record","auditable_id":"9a5ad514-c7cf-41e1-ad12-edadb6fd521c","action":"create","user_type":"User","remote_address":null,"request_uuid":"e462f9da-e879-4795-9506-2dbb747fed08","workspace_slug":"wfqmr","user_name":"admin@deckhouse.io"}
2024-12-18 17:57:12.360 [2024-12-18 17:57:12.358538533] [6472d45c-bdd7-4c08-8159-4c90939ed47c] [12] [audit] {"auditable_type":"Cluster","auditable_id":"b8004e7e-cc57-499c-a1e8-38cc5ba78220","action":"update","user_type":null,"remote_address":null,"request_uuid":"cfe477a3-3a30-49fe-9a59-55f7030b890d","workspace_slug":"wfqmr","user_name":null}

Configuration example:

apiVersion: deckhouse.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
  name: commander-audit-logs
spec:
  destinationRefs:
  - loki-example
  kubernetesPods:
    labelSelector:
      matchLabels:
        app: backend
    namespaceSelector:
      matchNames:
      - d8-commander
  labelFilter:
  - field: message
    operator: Regex
    values:
    - .*\audit\.*
  type: KubernetesPods
---
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLogDestination
metadata:
  name: loki-example
spec:
  type: Loki
  loki:
    endpoint: http://loki-example.loki.svc:3100

For more detailed configuration information, see the documentation of the log-shipper module.

Changing the storage class

Option 1 (preferred)

  1. Perform a backup of the database instance

    kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
    
  2. Change storageClass in the module settings, replacing <NEW_STORAGECLASS_NAME> with the name of the necessary storage class

    The list of available storage classes can be found using the command kubectl get storageclasses

    kubectl patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'
    
    moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check the logs of the postgres operator

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
    
  3. Increase the number of replicas of the PostgreSQL database (optional)

    This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}'
    
    postgresql.acid.zalan.do/commander-postgres patched
    

    Check the logs of the operator and the postgres instance

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
    
    kubectl -n d8-commander logs commander-postgres-1
    
    2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
    
  4. Perform the master switch

    kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover
    
    Current cluster topology
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Leader  | running |  5 |           |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  5 |         0 |
    +----------------------+--------------+---------+---------+----+-----------+
    Candidate ['commander-postgres-1'] []: commander-postgres-1
    Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y
    2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1"
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Replica | stopped |    |   unknown |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  5 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Replica | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Check that the disk of the new DB replica was created with the necessary storageClass

    kubectl -n d8-commander get pvc --selector application=spilo
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-hdd    36h
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    2m54s
    
  5. Delete the disk and pod of the first postgres instance

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false
    kubectl -n d8-commander delete po commander-postgres-0
    

    Check logs

    kubectl -n d8-commander logs commander-postgres-0
    
    2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0
    2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster
    2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)
    

    Check that the disk was created with the correct storageClass

    kubectl -n d8-commander get pvc
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    2m6s
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    7m11s
    
  6. Perform the master switch one more time

    kubectl -n d8-commander exec -it commander-postgres-0  -- patronictl failover
    
    Current cluster topology
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Replica | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    Candidate ['commander-postgres-0'] []: commander-postgres-0
    Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y
    2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0"
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |           |
    | commander-postgres-1 | 10.111.2.239 | Replica | stopped |    |   unknown |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    
  7. Reducing the number of replicas of the PostgreSQL database (optional)

    This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}'
    
    postgresql.acid.zalan.do/commander-postgres patched
    

    Check the operator logs

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}
    
  • Delete the disk and pod of the first instance (if HighAvailability mode is active and and PostgreSQL has 2 replicas)

    This step must be skipped if HighAvailability mode is not active

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false
    kubectl -n d8-commander delete po commander-postgres-1
    

    Check logs

    kubectl -n d8-commander logs commander-postgres-1
    
    2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1
    2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster
    2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
    

    Check that the disk was created with the necessary storageClass

    kubectl -n d8-commander get pvc
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    7m6s
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    1m11s
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    
  • Delete the unused disk of the temporary database replica (if HighAvailability is not active)

    This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-1
    
    persistentvolumeclaim "pgdata-commander-postgres-1" deleted
    

Option 2

  1. Perform a backup of the database instance

    kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
    
  2. Turn off the commander module

    kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}'
    
    moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check that the d8-commander namespace has been deleted

    kubectl get namespace d8-commander
    Error from server (NotFound): namespaces "d8-commander" not found
    
  3. Set the required storage class and enable the commander module

     kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'
    
     moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check that the DB instance has the Running status

    kubectl -n d8-commander get po commander-postgres-0
    
    NAME                   READY   STATUS    RESTARTS   AGE
    commander-postgres-0   1/1     Running   0          2m4s
    
  4. Restoring a previously saved backup of the database

    kubectl -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump