Commander internals
Commander has external dependency — PostgreSQL database.
The API server is the central component. Data is stored in PostgreSQL. Options for installing Commander with a DBMS are listed in the section below.
The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.
The web application uses the API to manage clusters and other Commander entities.
Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.
Cluster manager is single-threaded. Therefore, the efficiency of cluster processing depends on the number of clusters and the number of cluster manager replicas. When a cluster is created in API, the API server creates an installation task. Then the free instance of the cluster manager takes the task to work on. The same happens for cluster update, delete or reconciliation operations.
Cluster manager uses a special component to manage clusters — dhctl server. In the target picture, the cluster manager launches a replica of dhctl server only the necessary version for each DKP cluster individually. However, dhctl server is currently under active development, so there is currently a limit on the version of DKP that Commander can install. See the “Current limitations” section below.
In each cluster, Commander automatically installs the commander-agent
module. This
module is responsible for synchronizing Kubernetes resources in the application cluster, as well as
sending telemetry to the Commander API server. Telemetry now includes basic metrics (CPU, memory,
number of nodes, and total storage space), DKP version, Kubernetes version, and DKP components
availability. In addition to the agent, the console
module is also installed in the cluster.
This module is used to manage an individual cluster via the Commander in the Administration tab.
Service components in the application cluster are placed on system nodes by default.
Commander also uses additional services that are not shown in the diagram — renderer and connector. The renderer is responsible for generating and validating cluster configurations, and the connector is responsible for the operation of the cluster administration interface.
Requirements for resources
To start using Commander, we recommend creating a fault-tolerant management cluster that will include the following node sets (NodeGroup):
Node Group | Number of nodes | CPU, cores | Memory, GB | Disk, GB |
---|---|---|---|---|
master | 3 | 4 | 8 | 50 |
system | 2 | 4 | 8 | 50 |
frontend | 2 | 4 | 8 | 50 |
commander | 3 | 8 | 12 | 50 |
- PostgreSQL in HighAvailability mode in two replicas requires 1 core and 1 GB of memory on 2 separate nodes.
- The API server in HighAvailability mode for two replicas needs 1 core and 1GB of memory on two separate nodes.
- Service components used for rendering configurations and connecting to application clusters require 0.5 cores and 128 MB of memory per cluster.
- Cluster manager and dhctl server together require resources based on the number of clusters they serve and the number of DKP versions they serve simultaneously.
- Up to 2 cores per node can be occupied by DKP service components (for example: runtime-audit-engine, istio, cilium, log-shipper).
Number of clusters | CPU, cores | Memory, GB | Number of 8/8 nodes | Number of 8/12 nodes |
---|---|---|---|---|
10 | 9 | 16 | 3 (=24/24) | 2 (=16/24) |
25 | 10 | 19 | 3 (=24/24) | 3(=24/36) |
100 | 15 | 29 | 4 (=32/32) | 4 (=32/48) |
Installation
-
If you are using your own database server, set the connection parameters in
ModuleConfig/commander
. -
The PostgreSQL database can also be deployed to the cluster with postgres-operator, which can be installed to the cluster by enabling the
operator-postgres
module. In this case you need to enable it first and ensure that the CRDs from this module appear in the cluster. Then you can enable the Commander module.
Below, we will describe these options in more detail.
If you are using your PostgreSQL DB installation
Important: PostgreSQL extensions plpgsql and pgcrypto are required for Commander to work properly.
To enable Commander, create a ModuleConfig:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: commander
spec:
enabled: true
version: 1
settings:
postgres:
mode: External
external:
host: "..." # Mandatory field
port: "..." # Mandatory field
user: "..." # Mandatory field
password: "..." # Mandatory field
db: "..." # Mandatory field
If you’re using the operator-postgres module
Step 1: Enabling operator-postgres (optional)
You can use your own postgres-operator installation version
v1.10.0
or later.
First, enable the postgres operator module and wait for it to become active:
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: operator-postgres
spec:
enabled: true
Then, wait until the Deckhouse task queue becomes empty to make sure the module is enabled:
kubectl -n d8-system exec -t deploy/deckhouse -c deckhouse -- deckhouse-controller queue main
After that ensure that necessary CRDs are installed:
kubectl get crd | grep postgresqls.acid.zalan.do
Step 2: Enable commander
Next, enable the commander module.
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
name: commander
spec:
enabled: true
version: 1
settings:
nodeSelector:
node.deckhouse.io/group: commander
postgres:
mode: Internal
Data encryption
Starting from version 1.6
, the Commander encrypts sensitive data stored in the database using keys that are automatically generated by the Commander and stored in the commander-envs
secret.
Attention: It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!
❯ kubectl -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
name: commander-envs
namespace: d8-commander
type: Opaque
Changing the storage class
Option 1 (preferred)
-
Perform a backup of the database instance
kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
-
Change storageClass in the module settings, replacing
<NEW_STORAGECLASS_NAME>
with the name of the necessary storage classThe list of available storage classes can be found using the command
kubectl get storageclasses
kubectl patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check the logs of the postgres operator
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
-
Increase the number of replicas of the PostgreSQL database (optional)
This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas
kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}' postgresql.acid.zalan.do/commander-postgres patched
Check the logs of the operator and the postgres instance
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}
kubectl -n d8-commander logs commander-postgres-1 2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
-
Perform the master switch
kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover Current cluster topology + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Leader | running | 5 | | | commander-postgres-1 | 10.111.2.239 | Replica | running | 5 | 0 | +----------------------+--------------+---------+---------+----+-----------+ Candidate ['commander-postgres-1'] []: commander-postgres-1 Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y 2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1" + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Replica | stopped | | unknown | | commander-postgres-1 | 10.111.2.239 | Leader | running | 5 | | +----------------------+--------------+---------+---------+----+-----------+
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.167 | Replica | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Leader | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
Check that the disk of the new DB replica was created with the necessary
storageClass
kubectl -n d8-commander get pvc --selector application=spilo NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-hdd 36h pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 2m54s
-
Delete the disk and pod of the first postgres instance
kubectl -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false kubectl -n d8-commander delete po commander-postgres-0
Check logs
kubectl -n d8-commander logs commander-postgres-0 2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0 2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster 2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)
Check that the disk was created with the correct
storageClass
kubectl -n d8-commander get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-ssd 2m6s pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 7m11s
-
Perform the master switch one more time
kubectl -n d8-commander exec -it commander-postgres-0 -- patronictl failover Current cluster topology + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Replica | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Leader | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+ Candidate ['commander-postgres-0'] []: commander-postgres-0 Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y 2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0" + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | | | commander-postgres-1 | 10.111.2.239 | Replica | stopped | | unknown | +----------------------+--------------+---------+---------+----+-----------+
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Replica | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
-
Reducing the number of replicas of the PostgreSQL database (optional)
This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas
kubectl -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}' postgresql.acid.zalan.do/commander-postgres patched
Check the operator logs
kubectl -n d8-operator-postgres logs deployments/operator-postgres {"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}
-
Delete the disk and pod of the first instance (if HighAvailability mode is active and and PostgreSQL has 2 replicas)
This step must be skipped if HighAvailability mode is not active
kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false kubectl -n d8-commander delete po commander-postgres-1
Check logs
kubectl -n d8-commander logs commander-postgres-1 2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1 2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster 2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)
Check that the disk was created with the necessary
storageClass
kubectl -n d8-commander get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pgdata-commander-postgres-0 Bound pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11 2Gi RWO network-ssd 7m6s pgdata-commander-postgres-1 Bound pvc-7af2f442-3097-4fe3-a795-5ad18bb11351 2Gi RWO network-ssd 1m11s
Make sure that both DB instances are in the
running
statekubectl -n d8-commander exec -t commander-postgres-0 -- patronictl list + Cluster: commander-postgres --------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------------------+--------------+---------+---------+----+-----------+ | commander-postgres-0 | 10.111.3.189 | Leader | running | 6 | 0 | | commander-postgres-1 | 10.111.2.239 | Replica | running | 6 | | +----------------------+--------------+---------+---------+----+-----------+
-
Delete the unused disk of the temporary database replica (if HighAvailability is not active)
This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas
kubectl -n d8-commander delete pvc pgdata-commander-postgres-1 persistentvolumeclaim "pgdata-commander-postgres-1" deleted
Option 2
-
Perform a backup of the database instance
kubectl -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump
-
Turn off the
commander
modulekubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check that the
d8-commander
namespace has been deletedkubectl get namespace d8-commander Error from server (NotFound): namespaces "d8-commander" not found
-
Set the required storage class and enable the
commander
modulekubectl patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}' moduleconfig.deckhouse.io/commander patched
Wait until the deckhouse queue will be empty
kubectl -n d8-system exec -it $((kubectl -n d8-system get leases.coordination.k8s.io deckhouse-leader-election -o jsonpath={.spec.holderIdentity} || printf "deployments/deckhouse") | cut -d. -f1) -c deckhouse -- deckhouse-controller queue main Queue 'main': length 0, status: 'waiting for task 5s'
Check that the DB instance has the
Running
statuskubectl -n d8-commander get po commander-postgres-0 NAME READY STATUS RESTARTS AGE commander-postgres-0 1/1 Running 0 2m4s
-
Restoring a previously saved backup of the database
kubectl -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump