The module is available only in Deckhouse Enterprise Edition.

The module is actively developed. It might significantly change in the future.

Address

If the public domain template in the %s.example.com cluster, the web application can be accessed at https://commander.example.com.

Cluster Management

We recommend installing Commander in a management cluster. This cluster should serve the purpose of centralized management and collecting information from the entire application infrastructure, including application clusters. We call clusters managed by Commander application clusters. Commander is the source of truth for cluster configuration. Next, we will look at how this is implemented in practice.

Cluster Status

Infrastructure

Cluster management is reduced to three types of operations: creation, deletion, and modification. At any given time, a cluster in Commander has one of the following «infrastructure statuses»:

  • New — a cluster configuration has been created in Commander, but the cluster itself is still waiting to be created.
  • Configuration Error — a configuration for the cluster has been created with errors in Commander, so the cluster itself will not be created.
  • In Creation — Commander is deploying the cluster.
  • Ready — the cluster is created, and the state of the infrastructure matches the configuration specified in Commander.
  • Changing — Commander brings the cluster state to the specified configuration.
  • Change Error, Creation Error, Deletion Error — internal or external errors that occurred during cluster management.
  • Archived — the cluster is no longer tracked by Commander; it has been previously deleted or left without Commander management.

The Commander performs operations asynchronously using tasks, based on which operations with the cluster are carried out.

Tasks and, consequently, operations can be installation, removal, change of the cluster, or verification of its configuration against the actual state. Operations are shown inside the cluster in the “cloud” tab (including static clusters). A log of the execution is available for each task. The result of the task execution determines the infrastructure status of the cluster.

Infrastructure operations are performed by the Cluster Manager component. The speed at which the Cluster Manager takes tasks for execution is determined by the number of clusters and the number of replicas of the Cluster Manager. If the total number of tasks significantly exceeds the number of Cluster Manager replicas, then operations on clusters will be delayed.

Kubernetes

In addition to its infrastructure status, a cluster also has a Kubernetes configuration status. It indicates whether the cluster complies with the configuration of manifests for Kubernetes. Resource manifests (simply “resources” hereafter) are part of the cluster configuration.

The state of Kubernetes configuration can have three statuses:

  • Configured: complete compliance
  • Not Configured: discrepancy between configuration and cluster state
  • No Data: configuration state data is outdated

The component installed within the application cluster, known as the Commander agent or commander-agent (hereafter simply “agent”), is responsible for ensuring that the cluster matches the given configuration for resources. The agent always tries to bring the cluster configuration into compliance with the specified one.

The agent connects to the Commander API and downloads resource manifests, then applies them. If a resource created by the agent is deleted in the application cluster, the agent will recreate it within a minute. If a resource is deleted from the cluster configuration, the agent removes the resource in the application cluster. If the agent cannot apply a resource for some reason, the Kubrentes status in Commander will be “not configured”.

In addition to synchronizing the resource configuration in Kubernetes, the agent provides Commander with telemetry data:

  • The current version of the Deckhouse Kubernetes Platform
  • Availability of an update to the latest version of the Deckhouse Kubernetes Platform
  • The Deckhouse Kubernetes platform update channel
  • Kubernetes version
  • Availability of system components
  • Alerts that require user attention (alerts, manual confirmation of node reboot, etc.)
  • Key cluster metrics: total CPU count, memory size, disk storage size, and total number of nodes.

Creation

Clusters are created based on cluster templates. To create a cluster, the user selects a template, fills in the input parameters of the template (these are provided by the template), and then clicks on the “install” button. This gives the cluster a configuration and binds it to the template, specifically to a specific version of the template. The template or the version can be changed.

As the user fills in the inputs, the cluster configuration is rendered as YAML. If errors are found in the configuration, the Commander interface will show them. If the user saves a new cluster with errors, its installation will not begin until the errors are corrected. In other words, the cluster will have the status “Configuration error,” and the installation task will not be created until the configuration is changed to be correct. Errors in cluster configuration can be caused by both template code and incorrectly filled input parameters.

Once the configuration becomes valid, an installation task for the cluster is created, after which the cluster manager creates the cluster. If the cluster is being created on pre-created machines, Commander configures the Deckhouse Kubernetes Platform components on them and then creates the specified Kubernetes resources. If the cloud platform or virtualization platform API is used, Commander creates the infrastructure before the steps mentioned above. The exact set of cloud resources depends on the cloud provider.

After successful cluster installation, Commander will periodically check its configuration. If the infrastructure configuration diverges from that declared in Commander, Commander will create a task to change the infrastructure to bring it to its declared state. The configuration discrepancy can occur on either the infrastructure side or the Commander side. In the first case, it means a change in the cloud API, for example, if something was manually changed in the cloud resource configuration. In the second case, it indicates a change in cluster configuration, which we will discuss in the next section.

Update

Changing the cluster configuration means that a new configuration has been saved to the cluster, different from the previous one. This may be due to changes in the input parameters of the current cluster template. It may also be due to moving the cluster to a new version of the template or even to a different template.

When the cluster configuration changes, Commander creates a task to change the cluster infrastructure. The agent brings the Kubernetes configuration to the desired state in parallel with the infrastructure change.

Cluster configuration changes can lead to destructive changes in the infrastructure. For example, this may be a change in virtual machines that require their deletion or recreation. Another example is a change in the composition of cloud availability zones. When Commander detects destructive changes, it does not enact those changes until the user confirms them.

Deletion

Deleting clusters in Commander can be achieved in two ways. Both methods are available in the cluster on equal terms.

The first method is clearing the infrastructure of the cluster. In this case, Commander creates the deletion task. Static resources are cleared of Deckhouse Kubernetes Platform components, and cloud resources are removed (e.g., virtual machines). After deletion, the cluster configuration does not disappear, and the cluster moves to the archive. Its configuration can be restored if needed, but the cluster will no longer be listed among active clusters. This distinguishes the archived cluster from the active one.

Another way to delete a cluster is manual deletion. Commander will move the cluster to the archive, but it will not clear the infrastructure. This method can be useful if Commander cannot handle the correct deletion of the cluster by the first method for some reason. In that case, the cluster will have a “deletion error” status. The user will have to manually clean up the resources occupied by Deckhouse Kubernetes Platform, and move the cluster to archive manually.

Cluster Configuration

Cluster configuration consists of several sections:

Section Type Purpose
Input Parameters Scheme Scheme of template input parameters
Kubernetes YAML Template Kubernetes configuration
ClusterConfiguration
Placement YAML Template Infrastructure configuration
<Provider>ClusterConfiguration or StaticClusterConfiguration
SSH Parameters YAML Template SSH connection to the master nodes
Resources YAML Template Cluster resources, including ModuleConfig except system ones
Primary Resources YAML Template Cluster resources, including ModuleConfig except system ones
Start-up Configuration YAML Template Installation configuration
InitConfiguration and system ModuleConfig

Cluster Parameters

This is a template user configuration. See Input Parameters.

Kubernetes

Settings for the Kubernetes version, pod and service subnets. See ClusterConfiguration.

Placement

Features of cluster placement in the infrastructure. Here, for a static cluster, the configuration may remain empty.

For cloud clusters, specify the features of access to the cloud API, nodes that will be created automatically and tracked (including master nodes), settings for availability zones, etc.

Параметры SSH

apiVersion: dhctl.deckhouse.io/v1
kind: SSHConfig

sshBastionHost: 10.1.2.3              # Bastion host is optional.
sshBastionPort: 2233
sshBastionUser: debian

sshUser: ubuntu
sshPort: 22
sshAgentPrivateKeys:                  # The list of private keys,
  - key: |                            # at least one key is required
      -----BEGIN RSA PRIVATE KEY-----
      .............................
      -----END RSA PRIVATE KEY-----
    passphrase: qwerty123             # Key password, optional

sshExtraArgs: -vvv                    # Extra arguments for SSH command

---

apiVersion: dhctl.deckhouse.io/v1     # Target hosts.
kind: SSHHost                         # Commonly there are 1 or 3 hosts
host: 172.16.0.1                      # to be used as control plane nodes
---
apiVersion: dhctl.deckhouse.io/v1
kind: SSHHost
host: 172.16.0.2
---
apiVersion: dhctl.deckhouse.io/v1
kind: SSHHost
host: 172.16.0.3

Resources

Arbitrary manifests of Kubernetes and Deckhouse resources, except for the settings of built-in modules of the Deckhouse Kubernetes Platform. The Commander will synchronize these resources.

Primary Resources

Arbitrary manifests of Kubernetes and Deckhouse resources, except for the settings of the built-in Deckhouse Kubernetes Platform modules. The Commander will not synchronize these resources.

Initial Configuration

This section specifies the registration and access to it (see InitConfiguration). Also in this section, the settings of Deckhouse Kubernetes Platform (DKP) built-in modules are specified, for example, the template of the domain for service web interfaces, TLS certificate settings, or the channel of updates.

Templates

Commander is designed to manage standard clusters. Since all cluster configuration sections are in YAML format, cluster templating is the process of marking up the desired YAML configuration with parameters and describing the schema for these parameters. For template markup of YAML, the go template syntax and sprig function set are used. To describe the schema of input parameters, a custom syntax similar to OpenAPI3 is used, but it is simpler.

The cluster configuration is created by substituting the input parameters into the section templates. The input parameters are validated by the schema defined for them. The schema of the input parameters in the Commander web application can be set using both text configuration and a visual form designer. Read about the input parameters in the section on working with templates.

Templates have versions. When a template is updated, a new version of the template is created. The previous version of the template remains available for use in clusters. However, the template author can make the template versions unavailable for use.

Each cluster in Commander has a configuration that was obtained from the template (unless the cluster was already imported). The cluster also “remembers” on the basis of which template and which version it is configured. Thanks to this binding, a set of cluster input parameters is displayed in the cluster as a web form from a given version of a given template.

When a cluster is transferred to a new template or a new version of a template, the set of input parameters may change. This may include the appearance of mandatory parameters that were not filled initially and do not have default values. Then, when switching from one template (version) to another, it may be necessary to change or supplement the input parameters so that the new configuration is created correctly.

Inside the template interface, there is a list of clusters whose configuration is based on this template at the moment. From this interface, you can switch many clusters to a new (or old) version of the template in just a few clicks. This operation will fail if the cluster configuration contains errors. This can also happen because there may be missing mandatory input parameters that are not provided on the current version of the template but are present in the new one.

Creating and maintaining a template can be a laborious engineering task that requires testing the installation and updating of clusters. Versions of templates may accumulate during this work.

To make it easier to navigate through the versions, Commander provides an option to leave a comment for versions. There is also an option to hide template versions from template users. This can be useful to protect the user from a knowingly non-working version of the template.

Resources, Resource Catalogs

Note: This is not to be confused with Kubernetes resources.

Sometimes, clusters require pre-prepared infrastructure. This can include dedicated subnets, pre-existing load balancers, virtual machines, domain names, IP addresses, and so on. It’s convenient to have these data prepared beforehand and track their usage, if any, across clusters.

For this purpose, the Commander has a “Resources” section. This section consists of resource catalogs. Resources themselves are structured data represented as JSON objects, and each catalog holds a collection of these resources. When creating a catalog, you specify its name, resource schema, and identifier. The resource schema is defined using the same syntax and visual builder as the cluster template’s input parameters. An example of a resource schema:

- key: hostname
  type: string
  title: Hostname
  unique: true
  pattern: ^[a-z0-9.-]+$
  identifier: true

- key: ip
  type: string
  title: IP Address
  format: ipv4
  unique: true
  identifier: true

To use one or more resources in a cluster, you need to specify in the cluster template that the field is a choice from a catalog. This is done using the catalog identifier. An example parameter for a cluster template:

- key: workerMachines
  title: Workers
  catalog: worker-nodes
  minItems: 1
  maxItems: 10

If a resource is used in a cluster, it cannot be used in another cluster. If a cluster that uses the resource is deleted or stops using the resource due to a change, the resource becomes available for use by other clusters.

If the resource is invalid according to the schema, it cannot be used by clusters. The resource can be edited to make it valid. The catalog schema can only be edited when no resources in the catalog are being used by any cluster.

Resource catalogs can be imported using the API or through the interface by uploading a JSON file. If an existing catalog is identified in this file, the resources will be added to it during the import process. An example file of a catalog with resources that can be imported:

{
  "name": "Рабочие хосты",
  "slug": "worker-nodes",
  "params": [
    {
      "key": "hostname",
      "type": "string",
      "title": "Имя хоста",
      "unique": true,
      "pattern": "^[a-z0-9.-]+$",
      "identifier": true
    },
    {
      "key": "ip",
      "type": "string",
      "title": "IP-адрес",
      "format": "ipv4",
      "unique": true,
      "identifier": true
    }
  ],
  "resources": [
    { "values": { "ip": "10.128.0.39", "hostname": "worker-1" } },
    { "values": { "ip": "10.128.0.47", "hostname": "worker-2" } },
    { "values": { "ip": "10.128.0.24", "hostname": "worker-3" } },
    { "values": { "ip": "10.128.0.17", "hostname": "worker-4" } },
    { "values": { "ip": "10.128.0.55", "hostname": "worker-5" } },
    { "values": { "ip": "10.128.0.49", "hostname": "worker-6" } }
  ]
}

Integration API and Tokens

The Commander API provides a limited set of actions:

  1. Create, change, and delete clusters
  2. Create, change and delete resources in catalogs
  3. Read templates
  4. Read resource catalogs

To access the API in Commander, you can issue a token. The token can have either rights to all possible operations in the API or only read rights.

Details of the API implementation are described in the Integration API section.

Audit

For all entities, Commander keeps a history of changes. Clusters, templates, resources, catalogs, API access tokens - for all of them, a history of actions and changes is recorded, which can be used to track who, when and what actions were performed in Commander.

Currently, this functionality only relates to work related to the Commander API. In the future, an audit log from application clusters will be available in Commander.