Administration

The module lifecycle stage: General Availability
The module has requirements for installation

Deckhouse Commander internals

Deckhouse Commander components

Diagram

Deckhouse Commander has an external dependency — PostgreSQL database.

The API server is the central component. Data is stored in PostgreSQL. Options for installing Deckhouse Commander with a DBMS are listed in the Installation section.

The API server provides both external APIs — web applications and for external integration — and internal APIs for working with clusters.

Web Application uses the API to manage clusters and other Commander entities. It also provides seamless access to the application cluster’s admin web UI (DKP UI, the console module): browsing Kubernetes resources and the web terminal. Requests to the application cluster’s Kubernetes API are routed through Commander under the same account the user signed in with.

Asynchronous operations — tasks — are used to manage clusters. The cluster manager is a service that monitors tasks and executes them. Tasks can be cluster installation, cluster deletion, or cluster state reconciliation with the specified configuration.

When a cluster is created, an installation task is created. Then a free cluster manager instance takes the task to work on. The same happens for cluster update, delete, or reconciliation operations.

After acquiring a task, the cluster manager determines which Deckhouse Kubernetes Platform (DKP) version the task requires, starts or reuses a running dhctl server of that version, and runs the required operation in it. One dhctl server performs only one operation at a time. The cluster manager automatically starts and stops dhctl servers and scales the number of their replicas based on the number of incoming tasks. This lets the system adapt to the current load.

Every application cluster runs the commander-agent module. Deckhouse Commander enables it automatically. After the initial cluster installation, the agent opens a reverse TLS tunnel to the management cluster and keeps it open. With the tunnel in place, Commander does not need inbound network reachability to the application cluster’s Kubernetes API. The tunnel carries traffic to the application cluster’s Kubernetes API: it is used during Check (check) and Change (converge) phases to reconcile the application cluster’s infrastructure configuration with the desired configuration (creating, deleting, and upgrading nodes, upgrading DKP components), and by the cluster admin web UI to retrieve objects from the application cluster’s Kubernetes API.

Cluster telemetry is sent through the same Commander HTTPS API endpoint that the agent uses to establish the tunnel. Telemetry includes basic metrics (total number of CPUs, total memory, number of nodes, and total storage), DKP and Kubernetes versions, and DKP component availability.

The resource conversion mechanism also uses this same Commander HTTPS API: the agent requests the desired configuration, applies it in the application cluster, and sends back the status and the applied configuration.

Interaction with the infrastructure provider API is separate and can originate from either the management cluster or the application cluster, without using the commander-agent reverse TLS tunnel.

Direct SSH to a master node (22/TCP) is used during the initial cluster installation and deletion, during attach and detach operations, and as a fallback when commander-agent cannot establish or hold the reverse tunnel. The agent itself runs on the application cluster’s master nodes by default.

When the billing feature flag is on, Commander also collects resource-consumption metrics from every application cluster to calculate cost. These metrics are delivered to a dedicated long-term Prometheus in the management cluster through Prometheus remote write. Remote write uses the same Commander HTTPS API endpoint where the agent opens the tunnel, but it is a separate push mechanism from the application cluster, not traffic over the reverse TLS tunnel. Commander renders billing dashboards and exports CSV cost reports from the stored data.

On the application-cluster side, commander-agent handles billing integration:

it configures the remote-write stream of resource-consumption metrics;
it reports node group, cloud instance-class, and StorageClass descriptions to Commander — these are used in the billing UI to bind compute classes and storage classes to real cluster objects;
it labels every node with billing.commander.deckhouse.io/name, so the node can be matched to its compute class and priced according to the active tariff.

Commander builds the binding contract (which nodes should carry which label) and delivers it to the agent. The agent directly patches Node objects, adding the label to every node that belongs to the bound node group.

Component placement and networking

Deckhouse Commander can be enabled in any DKP cluster. For the application clusters it manages, that cluster becomes the management cluster. Other Deckhouse system modules can run in the same cluster alongside Commander. In a shared setup, user workloads may also run in this cluster — on separate nodes.

By default, Commander runs on nodes with the commander role (label node-role.deckhouse.io/commander); if there are none, it falls back to system nodes (node-role.deckhouse.io/system). To pin Commander to specific nodes, set the nodeSelector parameter. If those nodes carry a taint, also set tolerations. Billing components inherit the same settings.

commander-agent runs in every application cluster and, by default, on master nodes. It keeps a persistent reverse TLS tunnel to the management cluster, so during normal operation Commander does not open inbound connections to the application cluster’s Kubernetes API. Direct SSH to a master node (22/TCP) is used during the initial cluster installation, deletion, attach and detach operations, and as a fallback when the reverse tunnel is unavailable.

Network connectivity requirements

The following network connectivity is required between the management and application clusters:

22/TCP from dhctl servers to all master nodes of application clusters — for the initial cluster installation, deletion, attach and detach operations, and as a fallback when the reverse tunnel is unavailable.
Access from dhctl servers to cloud provider APIs — for managing infrastructure resources of application clusters.
443/TCP from the application cluster to the management cluster — through ingress on the management cluster frontend nodes. The actual port depends on the ingress controller settings. Traffic goes to two domains:
- Commander domain (commander.<publicDomainTemplate>) — commander-agent uses it to establish the reverse TLS tunnel, send telemetry, and run the configuration conversion mechanism. Prometheus in the application cluster also sends billing remote write data to the /prometheus/api/v1/write path on this same domain. Through the reverse tunnel, Commander accesses the application cluster’s Kubernetes API for Check and Change phases and the cluster admin web UI.
- Dex domain (dex.<publicDomainTemplate>) — Dex in the application cluster connects to it for OIDC discovery and token validation. This is required for DexProvider/commander, which lets the application cluster trust accounts from the management cluster.
Access to the container registry from which the dhctl image is pulled — from the nodes running the cluster manager, because it accesses the registry directly, and from the nodes running dhctl servers, because those nodes pull the image of the required dhctl version.

The “agent → management cluster” channel only works over HTTPS. Running Deckhouse Commander without HTTPS is not supported.

Network quality requirements and timeouts

Deckhouse Commander is designed to work over the public internet and tolerates unstable network links. There are no strict latency requirements — all interactions use retry logic and persistent connections with automatic reconnection.

Key timeouts that affect operation:

Component	Parameter	Value
SSH (bootstrap, destroy, attach, detach)	Connect timeout	10 sec
SSH	KeepAlive interval	15 sec
Agent → Commander API (all requests, including resource conversion)	HTTP timeout	30 sec
Agent → Commander API	Retry on error	up to 3 attempts
Agent: resource sync interval	Converge interval	30 sec
Reverse tunnel (AMPG)	TCP KeepAlive	enabled (OS default)
Reverse tunnel (AMPG): backend connection wait	Acquire timeout	60 sec
Dex: ID token lifetime	`idTokenTTL`	10 min (configurable)
Dex: auth request lifetime	`authRequests`	10 min
Prometheus remote write	Send timeout	30 sec (Prometheus default)

When connectivity between the application and management clusters is lost:

The reverse TLS tunnel (yamux) automatically reconnects on disconnect.
The agent continues trying to reach the Commander API every 30 seconds.
Prometheus buffers remote write metrics and will deliver them once connectivity is restored.

Application cluster configuration and resource synchronization

Each application cluster is synchronized through two independent channels — infrastructure configuration and Kubernetes resources have their own executors and sources of truth. These are two separate processes, even though both are shown on the cluster page.

Infrastructure configuration: Change and Check operations

Infrastructure configuration covers the tabs in the Infrastructure group on the cluster page: Kubernetes, Placement, Post-bootstrap script, and SSH Parameters. Deckhouse Commander applies and verifies this configuration directly from the management cluster:

Change brings the cluster infrastructure — nodes, control plane, initial DKP configuration, provider resources — to the desired state. It runs on cluster creation and when edits to the infrastructure tabs are saved in the web UI. In the Auto change application mode (see below) it runs automatically; in Manual mode it goes through a change request that requires approval.
Check compares the desired and the actual infrastructure state without making changes. It runs automatically at the configured frequency (the Reconciliation Interval in the workspace or per-cluster parameters).

If an operation fails, retry it manually with the Retry button on the cluster page.

The change application mode and the reconciliation interval are configured on the workspace Parameters → Settings tab and may be overridden per cluster.

These settings do not control Kubernetes resource group synchronization. When the change application mode is switched from Auto to Manual, commander-agent continues to synchronize resource groups inside the application cluster according to their control modes.

Kubernetes resource groups: agent-driven synchronization

The “Kubernetes” group on the cluster page contains numbered tabs — Kubernetes resource groups. Each group is a set of YAML manifests that commander-agent applies inside the application cluster. These groups are unrelated to the Change and Check operations: a separate agent loop reconciles them, independently of any infrastructure operations that may be running at the same time.

Each group has one of the following synchronization modes, set in the cluster template and switchable on the cluster page:

Force Creation — the agent applies the group’s manifests on every reconcile cycle and restores any modified or deleted resources;
Create on Install — the group’s resources are created only during cluster installation. The agent does not control them during the cluster lifecycle; subsequent in-cluster edits are not reverted;
Ignored — group control is disabled in Commander: previously applied resources stay in the application cluster, but the agent stops synchronizing them. After that the resources become ordinary Kubernetes objects — they can be edited or deleted manually.

Issues with commander-agent (loss of connectivity, errors when applying manifests) surface on the cluster page: in the overall cluster status, the synchronization badge next to the cluster name, and the Kubernetes tab (per-group and per-resource details). See Cluster status in the user guide for the full list of cluster statuses.

Protection of controlled resources

To prevent out-of-band changes from drifting away from the desired configuration, resources under active synchronization are protected from external modifications. The protection covers resources from groups in “Force Creation” mode and Commander’s own service resources (DexProvider, billing, RBAC, projects, agent secrets). An attempt to edit or delete such a resource via kubectl is rejected by the application cluster’s API server. To exempt a group from this protection, switch it to “Ignored” — after that its resources become ordinary Kubernetes objects.

A subset of Commander’s service resources (RBAC, projects, Dex and billing configurations) is protected more strictly: when the corresponding group leaves Commander control or its source becomes temporarily unavailable, such resources are deleted rather than left in the cluster without Commander supervision.

Enabling and disabling Deckhouse Commander capabilities

Certain Deckhouse Commander capabilities are enabled and disabled via the commander ModuleConfig, field spec.settings.featureFlags. Apply changes in the management cluster where Deckhouse Commander runs.

Billing and cost management — `billingEnabled`

Parameter	Description
Default	Off (`false`) until `billingEnabled: true` is set explicitly
When enabled	Billing components are deployed; the Billing section appears in the UI
When disabled	Set `billingEnabled: false`

Enable:

d8 k patch mc commander --type merge --patch '{"spec":{"settings":{"featureFlags":{"billingEnabled":true}}}}'

After billing is enabled, the Billing section becomes available in the top navigation bar on the workspace list screen or at URL {COMMANDER_ADDRESS}/billing/.

In addition, the commander-agent component in every application cluster starts:

configuring the local Prometheus to send resource consumption metrics (CPU, memory, storage) to Prometheus in the management cluster — a PrometheusRemoteWrite resource is created in the application cluster for this purpose. See How Commander calculates cost for details;
sending Commander the descriptions of node groups, cloud-provider instance classes, and Kubernetes storage classes — these are used in the billing UI to bind compute classes and storage classes to real cluster objects;
adding a service label billing.commander.deckhouse.io/name to every node, with the name of the compute class the node belongs to. Commander uses this label to match the node with its compute class and apply the price defined in the current tariff to the node’s consumption.

Commander builds the binding contract (node group → label value) and delivers it to the agent. The agent directly patches Node objects, adding the label to every node of the bound node group.

Consumption metrics are stored in a dedicated billing Prometheus (billing-prometheus) in the management cluster. It stores only the data required for cost calculation: container CPU and memory consumption, pod and PVC resource requests, volume usage, pod phases, controller-to-pod relationships, namespace, pod, and node labels, and PVC information.

Prometheus in the application cluster sends these data to the management cluster through Prometheus remote write over the Commander HTTPS API. This is a separate Prometheus HTTP(S) request, commander-agent does not send metrics itself. The agent configures the PrometheusRemoteWrite resource and passes connection parameters to Prometheus.

Billing metadata is stored in the Commander database. This includes tariffs, compute classes, storage classes, resource bindings, and report schedules. Generated reports use separate billing-reports storage.

See the billing documentation for how this data is used to compute cost.

If the section does not appear, check access management in the billing documentation.

Disable:

d8 k patch mc commander --type merge --patch '{"spec":{"settings":{"featureFlags":{"billingEnabled":false}}}}'

After disabling, if needed manually delete remaining resources:

d8 k -n d8-commander delete pvc -l app=billing-reports
d8 k -n d8-commander delete pvc -l app=billing-prometheus

User documentation: Billing and cost management.

Projects — `projectsEnabled`

Parameter	Description
Default	Off (`false`) until `projectsEnabled: true` is set explicitly
When enabled	The Projects section is available in the UI
When disabled	Set `projectsEnabled: false`

Enable:

d8 k patch mc commander --type merge --patch '{"spec":{"settings":{"featureFlags":{"projectsEnabled":true}}}}'

Disable:

d8 k patch mc commander --type merge --patch '{"spec":{"settings":{"featureFlags":{"projectsEnabled":false}}}}'

User documentation: User guide — Projects.

Audit log retention — `auditsRetentionDays`

Parameter	Description
Default	Off — the parameter is omitted; audit records in the Deckhouse Commander database are not deleted automatically
When a value is set	A daily CronJob removes audit records older than the specified number of days
Valid value	A whole number of days written as digits only, for example `30`, `90`, or `365`

Enable automatic cleanup:

d8 k patch mc commander --type merge --patch '{"spec":{"settings":{"featureFlags":{"auditsRetentionDays":90}}}}'

Deckhouse Commander stores a change history for entities in PostgreSQL indefinitely. When auditsRetentionDays is set, the audits-cleaner CronJob in the d8-commander namespace runs once a day and removes audit records older than the specified number of days. For example, with auditsRetentionDays: 30, records from 31 or more days ago are removed; records from exactly 30 days ago are kept.

To turn off automatic cleanup, remove auditsRetentionDays from featureFlags in the commander ModuleConfig.

User documentation: User guide — Audit.

Authentication in application clusters via DexProvider

Every cluster attached to Deckhouse Commander automatically trusts users authenticated by the management cluster’s Dex. Once a user signs into Commander, they can open the application cluster’s admin web UI (DKP UI, the console module) without signing in again.

This does not require a separate account on the application cluster.

Commander reconciles the trust link automatically through a pair of resources:

In the application cluster — a DexProvider named commander, type OIDC. Its issuer points at the management cluster’s Dex URL. Its trust bundle carries the management cluster’s root CA, so TLS to the Dex endpoint validates without extra system-wide trust anchors. The provider requests the openid, profile, email, and groups scopes. This single DexProvider is enough for the application cluster to rely on management-cluster identities; clusters may still have additional DexProvider resources for local identities.
In the management cluster — a paired DexClient named commander-agent-<id> and a Secret with its OIDC client secret. The DexClient carries a redirect URI pointing to the application cluster’s Dex, so the management-cluster Dex correctly accepts login callbacks.

Both resources are created when a cluster is installed or attached, and removed on detach. No manual steps are required. If a resource is edited or removed by hand, the next reconciliation restores it.

When a user opens the application cluster’s admin web UI from Commander, the browser reaches the application cluster’s Dex. The application cluster’s Dex uses the commander provider and redirects the browser to the management cluster’s Dex.

The management cluster’s Dex authenticates the user and redirects the browser back to the application cluster’s Dex using a redirect URI that belongs to that application cluster. The application cluster’s Dex then completes sign-in for the cluster admin web UI. If the user already has an active session in the management cluster’s Dex after signing in to Commander, they usually do not need to enter their login and password again.

The allowedUserGroups parameter of the commander module only controls who can sign in to the Commander web UI. It does not grant any permissions inside Deckhouse Commander itself; roles are configured separately. See Access control for details.

Do not edit, disable, or delete the auto-managed DexProvider/commander in an application cluster or the paired DexClient and its Secret in the management cluster. Removing them breaks all Commander-mediated logins into the application cluster. The next reconciliation restores them, so manual tuning is lost in any case. If you need additional authentication settings for an application cluster, add extra DexProvider resources alongside the managed one — the Commander-managed provider is the minimum required to keep the cluster reachable from Commander and is not exclusive.

Data encryption

Deckhouse Commander encrypts sensitive data stored in the database using keys that are automatically generated when the module is enabled and stored in the commander-envs secret.

It is extremely important to save the keys to a secure location in order to be able to restore the database in case of any problems. If there are no keys, it will be impossible to restore the data!

$ d8 k -n d8-commander get secret commander-envs -oyaml
apiVersion: v1
data:
  ACTIVE_RECORD_ENCRYPTION_DETERMINISTIC_KEY: YVBBNVh5QUxoZjc1Tk5uTXphc3BXN2FrVGZacDBsUFk=
  ACTIVE_RECORD_ENCRYPTION_KEY_DERIVATION_SALT: eEVZMGR0NlRaY0FNZzUySzdPODR3WXpranZiQTYySHo=
  ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY: RUdZOFdodWxVT1hpeHlib2Q3Wld3TUlMNjhSOW81a0M=
kind: Secret
metadata:
...
  name: commander-envs
  namespace: d8-commander
type: Opaque

Collecting logs of the history of changes

In Deckhouse Commander version 1.9 and later, events related to the history of changes are printed to the standard output and are tagged with the ["audit"] label. These logs can be collected and sent using the log-shipper module.

Logs example:

{"level":"INFO","time":"2025-06-18 14:22:15 +0300","request_id":"ea09d409dc3c95dcf658fc2c2838084b","pid":19,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"8a0041ef-6c30-48bc-b3ca-e9db3e22be47","action":"create","user_type":"User","remote_address":"82.150.57.81","request_uuid":"ea09d409dc3c95dcf658fc2c2838084b","workspace_slug":"xcjtd","user_name":"admin@company.my","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:22:15 +0300","request_id":"ea09d409dc3c95dcf658fc2c2838084b","pid":19,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"056f7fe5-7d22-4a76-b5e2-f225c0a99613","action":"create","user_type":"User","remote_address":"82.150.57.81","request_uuid":"ea09d409dc3c95dcf658fc2c2838084b","workspace_slug":"xcjtd","user_name":"admin@company.my","audited_changes":{"name":"mycluster","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:23:57 +0300","request_id":"a1eaf50bbc87a8cca4cd17d8be8fffdb","pid":12,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"707c46b1-b2c8-4fab-9392-8216a2058219","action":"create","user_type":"AuthToken","remote_address":"238.106.231.86","request_uuid":"a1eaf50bbc87a8cca4cd17d8be8fffdb","workspace_slug":"bfqcc","user_name":"api-user","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:23:57 +0300","request_id":"a1eaf50bbc87a8cca4cd17d8be8fffdb","pid":12,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"42d432aa-8250-4ef0-b260-51639e1445d0","action":"create","user_type":"AuthToken","remote_address":"238.106.231.86","request_uuid":"a1eaf50bbc87a8cca4cd17d8be8fffdb","workspace_slug":"bfqcc","user_name":"api-user","audited_changes":{"name":"15731486914-1-con-1-30","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:28:56 +0300","request_id":"069566a46c004e53b686189587d484a9","pid":19,"tags":["audit"],"auditable_type":"ClusterSettings","auditable_id":"402a4d4d-5c14-4466-a1f3-3d990d7cf35a","action":"create","user_type":"User","remote_address":"30.231.184.26","request_uuid":"069566a46c004e53b686189587d484a9","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"cluster_manager":{"sync":{"mode":"auto"},"check_interval":1}}}
{"level":"INFO","time":"2025-06-18 14:28:56 +0300","request_id":"069566a46c004e53b686189587d484a9","pid":19,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"9ee687d4-18fe-423c-bbaa-e8e46ea47e67","action":"create","user_type":"User","remote_address":"30.231.184.26","request_uuid":"069566a46c004e53b686189587d484a9","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"name":"mycluster2","archived_at":null}}
{"level":"INFO","time":"2025-06-18 14:29:06 +0300","request_id":"d29b248fbce414db8b71f821a3b1886e","pid":12,"tags":["audit"],"auditable_type":"Cluster","auditable_id":"e0f3c3de-2129-4b75-b927-72a8eb26902b","action":"update","user_type":"User","remote_address":"30.231.184.26","request_uuid":"d29b248fbce414db8b71f821a3b1886e","workspace_slug":"xcjtd","user_name":"user@company.my","audited_changes":{"archived_at":[null,"2025-06-18T14:29:05.943+03:00"]}}

Configuration example:

apiVersion: deckhouse.io/v1alpha2
kind: ClusterLoggingConfig
metadata:
  name: commander-audit-logs
spec:
  destinationRefs:
  - loki-example
  kubernetesPods:
    labelSelector:
      matchLabels:
        app: backend
    namespaceSelector:
      labelSelector:
        matchLabels:
          kubernetes.io/metadata.name: d8-commander
  labelFilter:
  - field: message
    operator: Regex
    values:
    - .*\[\"audit\"\].*
  type: KubernetesPods
---
apiVersion: deckhouse.io/v1alpha1
kind: ClusterLogDestination
metadata:
  name: loki-example
spec:
  type: Loki
  loki:
    endpoint: http://loki-example.loki.svc:3100

For more detailed configuration information, see the documentation of the log-shipper module.

Changing the storage class

Option 1 (preferred)

Perform a backup of the database instance

d8 k -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump

Change storageClass in the module settings, replacing <NEW_STORAGECLASS_NAME> with the name of the necessary storage class

The list of available storage classes can be found using the command d8 k get storageclasses

d8 k patch moduleconfig commander --type=merge -p '{"spec":{"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'

moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

d8 system queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check the logs of the postgres operator

d8 k -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}

Increase the number of replicas of the PostgreSQL database (optional)

This step must be skipped if the HighAvailability mode is active and PostgreSQL has 2 replicas

d8 k -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":2}}'

postgresql.acid.zalan.do/commander-postgres patched

Check the logs of the operator and the postgres instance

d8 k -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:36:22Z","worker":0}

d8 k -n d8-commander logs commander-postgres-1

2024-05-19 20:38:15,648 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)

Perform the master switch

d8 k -n d8-commander exec -it commander-postgres-0 -- patronictl failover

Current cluster topology
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Leader  | running |  5 |           |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  5 |         0 |
+----------------------+--------------+---------+---------+----+-----------+
Candidate ['commander-postgres-1'] []: commander-postgres-1
Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-0? [y/N]: y
2024-05-19 20:40:52.63041 Successfully failed over to "commander-postgres-1"
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Replica | stopped |    |   unknown |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  5 |           |
+----------------------+--------------+---------+---------+----+-----------+

Make sure that both DB instances are in the running state

d8 k -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.167 | Replica | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Check that the disk of the new DB replica was created with the necessary storageClass

d8 k -n d8-commander get pvc --selector application=spilo
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-hdd    36h
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    2m54s

Delete the disk and pod of the first postgres instance

d8 k -n d8-commander delete pvc pgdata-commander-postgres-0 --wait=false
d8 k -n d8-commander delete po commander-postgres-0

Check logs

d8 k -n d8-commander logs commander-postgres-0

2024-05-19 20:43:33,293 INFO: Lock owner: commander-postgres-1; I am commander-postgres-0
2024-05-19 20:43:33,293 INFO: establishing a new patroni connection to the postgres cluster
2024-05-19 20:43:33,357 INFO: no action. I am (commander-postgres-0), a secondary, and following a leader (commander-postgres-1)

Check that the disk was created with the correct storageClass

d8 k -n d8-commander get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    2m6s
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    7m11s

Perform the master switch one more time

d8 k -n d8-commander exec -it commander-postgres-0  -- patronictl failover

Current cluster topology
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Replica | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Leader  | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+
Candidate ['commander-postgres-0'] []: commander-postgres-0
Are you sure you want to failover cluster commander-postgres, demoting current leader commander-postgres-1? [y/N]: y
2024-05-19 20:46:11.69855 Successfully failed over to "commander-postgres-0"
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |           |
| commander-postgres-1 | 10.111.2.239 | Replica | stopped |    |   unknown |
+----------------------+--------------+---------+---------+----+-----------+

Make sure that both DB instances are in the running state

d8 k -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Reducing the number of replicas of the PostgreSQL database (optional)

This step must be skipped if the HighAvailability is active and PostgreSQL has 2 replicas

d8 k -n d8-commander patch postgresqls.acid.zalan.do commander-postgres --type=merge -p '{"spec":{"numberOfInstances":1}}'

postgresql.acid.zalan.do/commander-postgres patched

Check the operator logs

d8 k -n d8-operator-postgres logs deployments/operator-postgres

{"cluster-name":"d8-commander/commander-postgres","level":"info","msg":"cluster has been updated","pkg":"controller","time":"2024-05-19T20:50:22Z","worker":0}

Delete the disk and pod of the first instance (if HighAvailability mode is active and PostgreSQL has 2 replicas)

This step must be skipped if HighAvailability mode is not active

d8 k -n d8-commander delete pvc pgdata-commander-postgres-1 --wait=false
d8 k -n d8-commander delete po commander-postgres-1

Check logs

d8 k -n d8-commander logs commander-postgres-1

2024-05-19 20:53:33,293 INFO: Lock owner: commander-postgres-0; I am commander-postgres-1
2024-05-19 20:53:33,293 INFO: establishing a new patroni connection to the postgres cluster
2024-05-19 20:53:33,357 INFO: no action. I am (commander-postgres-1), a secondary, and following a leader (commander-postgres-0)

Check that the disk was created with the necessary storageClass

d8 k -n d8-commander get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgdata-commander-postgres-0   Bound    pvc-fd80fde4-d0e2-4b5f-9e3a-eac998191f11   2Gi        RWO            network-ssd    7m6s
pgdata-commander-postgres-1   Bound    pvc-7af2f442-3097-4fe3-a795-5ad18bb11351   2Gi        RWO            network-ssd    1m11s

Make sure that both DB instances are in the running state

d8 k -n d8-commander exec -t commander-postgres-0 -- patronictl list
+ Cluster: commander-postgres --------+---------+---------+----+-----------+
| Member               | Host         | Role    | State   | TL | Lag in MB |
+----------------------+--------------+---------+---------+----+-----------+
| commander-postgres-0 | 10.111.3.189 | Leader  | running |  6 |         0 |
| commander-postgres-1 | 10.111.2.239 | Replica | running |  6 |           |
+----------------------+--------------+---------+---------+----+-----------+

Delete the unused disk of the temporary database replica (if HighAvailability is not active)

This step must be skipped if HighAvailability is active and PostgreSQL has 2 replicas
```
d8 k -n d8-commander delete pvc pgdata-commander-postgres-1

persistentvolumeclaim "pgdata-commander-postgres-1" deleted
```

Option 2

Perform a backup of the database instance

d8 k -n d8-commander exec -t commander-postgres-0 -- su - postgres -c "pg_dump -Fc -b -v -d commander" > commander.dump

Turn off the commander module

d8 k patch moduleconfig commander --type=merge -p '{"spec":{"enabled":false}}'

moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

d8 system queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check that the d8-commander namespace has been deleted

d8 k get namespace d8-commander
Error from server (NotFound): namespaces "d8-commander" not found

Set the required storage class and enable the commander module

 d8 k patch moduleconfig commander --type=merge -p '{"spec":{"enabled":true,"settings":{"postgres":{"internal":{"storageClass":"<NEW_STORAGECLASS_NAME>"}}}}}'

 moduleconfig.deckhouse.io/commander patched

Wait until the deckhouse queue will be empty

d8 system queue main

Queue 'main': length 0, status: 'waiting for task 5s'

Check that the DB instance has the Running status

d8 k -n d8-commander get po commander-postgres-0

NAME                   READY   STATUS    RESTARTS   AGE
commander-postgres-0   1/1     Running   0          2m4s

Restoring a previously saved backup of the database

d8 k -n d8-commander exec -it commander-postgres-0 -- su - postgres -c "pg_restore -v -c --if-exists -Fc -d commander" < commander.dump

Deckhouse Commander internals

Deckhouse Commander components

Component placement and networking

Network connectivity requirements

Network quality requirements and timeouts

Application cluster configuration and resource synchronization

Infrastructure configuration: Change and Check operations

Kubernetes resource groups: agent-driven synchronization

Protection of controlled resources

Enabling and disabling Deckhouse Commander capabilities

Billing and cost management — `billingEnabled`

Projects — `projectsEnabled`

Audit log retention — `auditsRetentionDays`

Authentication in application clusters via DexProvider

Data encryption

Collecting logs of the history of changes

Changing the storage class

Option 1 (preferred)

Option 2

An error has occurred

Tell us what you didn’t like.

Administration

Deckhouse Commander internals

Deckhouse Commander components

Component placement and networking

Network connectivity requirements

Network quality requirements and timeouts

Application cluster configuration and resource synchronization

Infrastructure configuration: Change and Check operations

Kubernetes resource groups: agent-driven synchronization

Protection of controlled resources

Enabling and disabling Deckhouse Commander capabilities

Billing and cost management — billingEnabled

Projects — projectsEnabled

Audit log retention — auditsRetentionDays

Authentication in application clusters via DexProvider

How Sign-In Works

Data encryption

Collecting logs of the history of changes

Changing the storage class

Option 1 (preferred)

Option 2

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error

Billing and cost management — `billingEnabled`

Projects — `projectsEnabled`

Audit log retention — `auditsRetentionDays`