Commander internals

Diagram

Commander has external dependency — PostgreSQL database.

The API server is the central component. Data is stored in PostgreSQL. Options for installing Commander with a DBMS are listed in the section below.

The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.

The web application uses the API to manage clusters and other Commander entities.

Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.

Cluster manager is single-threaded. Therefore, the efficiency of cluster processing depends on the number of clusters and the number of cluster manager replicas. When a cluster is created in API, the API server creates an installation task. Then the free instance of the cluster manager takes the task to work on. The same happens for cluster update, delete or reconciliation operations.

Cluster manager uses a special component to manage clusters — dhctl server. In the target picture, the cluster manager launches a replica of dhctl server only the necessary version for each DKP cluster individually. However, dhctl server is currently under active development, so there is currently a limit on the version of DKP that Commander can install. See the “Current limitations” section below.

In each cluster, Commander automatically installs the commander-agent module. This module is responsible for synchronizing Kubernetes resources in the application cluster, as well as sending telemetry to the Commander API server. Telemetry now includes basic metrics (CPU, memory, number of nodes, and total storage space), DKP version, Kubernetes version, and DKP components availability. In addition to the agent, the console module is also installed in the cluster. This module is used to manage an individual cluster via the Commander in the Administration tab. Service components in the application cluster are placed on system nodes by default.

Commander also uses additional services that are not shown in the diagram — renderer and connector. The renderer is responsible for generating and validating cluster configurations, and the connector is responsible for the operation of the cluster administration interface.

Requirements for resources

To start using Commander, we recommend creating a fault-tolerant management cluster that will include the following node sets (NodeGroup):

Node Group Number of nodes CPU, cores Memory, GB Disk, GB
master 3 4 8 50
system 2 4 8 50
frontend 2 4 8 50
commander 3 8 12 50
  • PostgreSQL in HighAvailability mode in two replicas requires 1 core and 1 GB of memory on 2 separate nodes.
  • The API server in HighAvailability mode for two replicas needs 1 core and 1GB of memory on two separate nodes.
  • Service components used for rendering configurations and connecting to application clusters require 0.5 cores and 128 MB of memory per cluster.
  • Cluster manager and dhctl server together require resources based on the number of clusters they serve and the number of DKP versions they serve simultaneously.
  • Up to 2 cores per node can be occupied by DKP service components (for example: runtime-audit-engine, istio, cilium, log-shipper).
Number of clusters CPU, cores Memory, GB Number of 8/8 nodes Number of 8/12 nodes
10 9 16 3 (=24/24) 2 (=16/24)
25 10 19 3 (=24/24) 3(=24/36)
100 15 29 4 (=32/32) 4 (=32/48)

Installation

  • If you are using your own database server, set the connection parameters in ModuleConfig/commander.

  • The PostgreSQL database can also be deployed to the cluster with postgres-operator, which can be installed to the cluster by enabling the operator-postgres module. In this case you need to enable it first and ensure that the CRDs from this module appear in the cluster. Then you can enable the Commander module.

Below, we will describe these options in more detail.

If you are using your PostgreSQL DB installation

Important: PostgreSQL extensions plpgsql and pgcrypto are required for Commander to work properly.

To enable Commander, create a ModuleConfig:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: commander
spec:
  enabled: true
  version: 1
  settings:
    postgres:
      mode: External
      external:
        host: "..."     # Mandatory field
        port: "..."     # Mandatory field
        user: "..."     # Mandatory field
        password: "..." # Mandatory field
        db: "..."       # Mandatory field

If you’re using the operator-postgres module

Step 1: Enabling operator-postgres (optional)

You can use your own postgres-operator installation version v1.10.0 or later.

First, enable the postgres operator module and wait for it to become active:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: operator-postgres
spec:
  enabled: true

Then, wait until the Deckhouse task queue becomes empty to make sure the module is enabled:

kubectl -n d8-system exec -t deploy/deckhouse -c deckhouse -- deckhouse-controller queue main

After that ensure that necessary CRDs are installed:

kubectl get crd | grep postgresqls.acid.zalan.do

Step 2: Enable commander

Next, enable the commander module.

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: commander
spec:
  enabled: true
  version: 1
  settings:
    nodeSelector:
      node.deckhouse.io/group: commander
    postgres:
      mode: Internal

Data encryption

Starting from version 1.6, the Commander encrypts sensitive data stored in the database using keys that are automatically generated by the Commander and stored in the commander-envs secret.

Attention: It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!

❯ kubectl -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
  ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
  ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
  ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
  name: commander-envs
  namespace: d8-commander
type: Opaque

Changing the storage class

Option 1 (preferred)

  1. Perform a backup of the database instance

    kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
    
  2. Change storageClass in the module settings, replacing <NEW_STORAGECLASS_NAME> with the name of the necessary storage class

    The list of available storage classes can be found using the command kubectl get storageclasses

    kubectl patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'
    
    moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check the logs of the postgres operator

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
    
  3. Increase the number of replicas of the PostgreSQL database (optional)

    This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}'
    
    postgresql.acid.zalan.do/commander-postgres patched
    

    Check the logs of the operator and the postgres instance

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
    
    kubectl -n d8-commander logs commander-postgres-1
    
    2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
    
  4. Perform the master switch

    kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover
    
    Current cluster topology
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Leader  | running |  5 |           |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  5 |         0 |
    +----------------------+--------------+---------+---------+----+-----------+
    Candidate ['commander-postgres-1'] []: commander-postgres-1
    Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y
    2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1"
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Replica | stopped |    |   unknown |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  5 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.167 | Replica | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Check that the disk of the new DB replica was created with the necessary storageClass

    kubectl -n d8-commander get pvc --selector application=spilo
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-hdd    36h
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    2m54s
    
  5. Delete the disk and pod of the first postgres instance

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false
    kubectl -n d8-commander delete po commander-postgres-0
    

    Check logs

    kubectl -n d8-commander logs commander-postgres-0
    
    2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0
    2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster
    2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)
    

    Check that the disk was created with the correct storageClass

    kubectl -n d8-commander get pvc
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    2m6s
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    7m11s
    
  6. Perform the master switch one more time

    kubectl -n d8-commander exec -it commander-postgres-0  -- patronictl failover
    
    Current cluster topology
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Replica | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    Candidate ['commander-postgres-0'] []: commander-postgres-0
    Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y
    2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0"
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |           |
    | commander-postgres-1 | 10.111.2.239 | Replica | stopped |    |   unknown |
    +----------------------+--------------+---------+---------+----+-----------+
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    
  7. Reducing the number of replicas of the PostgreSQL database (optional)

    This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}'
    
    postgresql.acid.zalan.do/commander-postgres patched
    

    Check the operator logs

    kubectl -n d8-operator-postgres logs deployments/operator-postgres
    
    {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}
    
  • Delete the disk and pod of the first instance (if HighAvailability mode is active and and PostgreSQL has 2 replicas)

    This step must be skipped if HighAvailability mode is not active

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false
    kubectl -n d8-commander delete po commander-postgres-1
    

    Check logs

    kubectl -n d8-commander logs commander-postgres-1
    
    2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1
    2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster
    2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
    

    Check that the disk was created with the necessary storageClass

    kubectl -n d8-commander get pvc
    NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    7m6s
    pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    1m11s
    

    Make sure that both DB instances are in the running state

    kubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list
    + Cluster: commander-postgres --------+---------+---------+----+-----------+
    | Member               | Host         | Role    | State   | TL | Lag in MB |
    +----------------------+--------------+---------+---------+----+-----------+
    | commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
    | commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
    +----------------------+--------------+---------+---------+----+-----------+
    
  • Delete the unused disk of the temporary database replica (if HighAvailability is not active)

    This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas

    kubectl -n d8-commander delete pvc pgdata-commander-postgres-1
    
    persistentvolumeclaim "pgdata-commander-postgres-1" deleted
    

Option 2

  1. Perform a backup of the database instance

    kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
    
  2. Turn off the commander module

    kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}'
    
    moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check that the d8-commander namespace has been deleted

    kubectl get namespace d8-commander
    Error from server (NotFound): namespaces "d8-commander" not found
    
  3. Set the required storage class and enable the commander module

     kubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'
    
     moduleconfig.deckhouse.io/commander patched
    

    Wait until the deckhouse queue will be empty

    kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main
    
    Queue 'main': length 0, status: 'waiting for task 5s'
    

    Check that the DB instance has the Running status

    kubectl -n d8-commander get po commander-postgres-0
    
    NAME                   READY   STATUS    RESTARTS   AGE
    commander-postgres-0   1/1     Running   0          2m4s
    
  4. Restoring a previously saved backup of the database

    kubectl -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump