Frequently Asked Questions

The page contains frequently asked questions about configuring and using the Deckhouse Kubernetes Platform.

Common

What to do if, with VPN enabled, the container with the installer cannot access the network?

If a VPN is installed on the computer from which the cluster bootstrap is performed, there may be a problem with the container with the installer (or graphical installer) accessing the network. Because of this, for example, the graphical installer may not be displayed in the browser.

The problem can be solved in one of the following ways:

Disable VPN on your computer and restart the container with the installer.
If you cannot disable VPN (for example, if the cluster bootstrap is running on a VPN network), use the --network host parameter when starting the container with the installer (for Docker Desktop on Mac OS, the parameter is available starting with version 4.34.0). This will allow the container to access the network.

Example of launching a container with an installer with VPN enabled, using the --network host parameter:
```
docker run --network host --pull=always -it -v "$PWD/config.yml:/config.yml" -v "$HOME/.ssh/:/tmp/.ssh/" -v "$PWD/dhctl-tmp:/tmp/dhctl" registry.deckhouse.ru/deckhouse/ce/install:early-access bash
```
Example of running the graphical installer with VPN enabled, using the --network host parameter:
```
docker run --network host --rm --pull always -v $HOME/.d8installer:$HOME/.d8installer -v /var/run/docker.sock:/var/run/docker.sock -p 127.0.0.1:8080:8080 registry.deckhouse.ru/deckhouse/installer:latest -r $HOME/.d8installer
```

How do I change the release channel for a module?

A module can be built-in to DKP or connected from a module source (defined using ModuleSource). Built-in modules have a common release cycle with DKP and are updated together with DKP. The release channel of a built-in module always matches the DKP release channel. A module connected from a source has its own release cycle, which is independent of the DKP release cycle. The release channel of a module connected from a source can be changed.

Below is the process of changing the release channel for a module connected from a source.

By default, the release channel for modules is inherited from the DKP release channel (specified in the releaseChannel parameter of the deckhouse ModuleConfig). For more information about release channels, see the Release Channels section.

For modules from a source, the release channel is specified using ModuleUpdatePolicy, which is then linked to the module via the updatePolicy parameter in ModuleConfig.

To change the release channel for a module from a source, follow these steps:

Define the module update policy.

Create a ModuleUpdatePolicy where you specify the release channel in the releaseChannel parameter.

Example ModuleUpdatePolicy:

apiVersion: deckhouse.io/v1alpha2
kind: ModuleUpdatePolicy
metadata:
  name: my-module-policy
spec:
  releaseChannel: Alpha
  # If necessary, specify the update mode and update windows.
  # update:
  #   mode: AutoPatch
  #   windows: []

Ensure that the policy has been created:

d8 k get mup my-module-policy

Output example:

NAME               RELEASE CHANNEL   UPDATE MODE
my-module-policy   Alpha             AutoPatch

Link the update policy to the module.

Specify the name of the created update policy in the updatePolicy parameter of the corresponding module’s ModuleConfig.

To edit the ModuleConfig, use the command (specify the module name):
```
d8 k edit mc my-module
```
Example ModuleConfig:
```
apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: my-module
spec:
  enabled: true
  # ModuleUpdatePolicy name
  updatePolicy: my-module-policy
```

When you change the module’s release channel, its version will change according to the configured update mode.

To view the current release channel of the module and other information about the module’s state in the cluster, use the corresponding Module object.

Example command to get information about the module:

d8 k get module my-module -o yaml

The update policy used will be specified in the properties.updatePolicy field, and the current release channel in the properties.releaseChannel field. Example output:

    apiVersion: deckhouse.io/v1alpha1
kind: Module
metadata:
  name: my-module
  # ...
properties:
  # ...
  releaseChannel: Alpha # Module release channel.
  updatePolicy: my-module-policy # Module update policy.
  version: v1.16.10  # Module version.
  # ...

  

What should I do if the module image did not download and the module did not reinstall?

In some cases, there may be a problem with automatically downloading the image and reinstalling the module. These cases include:

Damage to the file system or other problems that have rendered the module image invalid.
Switching to a different registry.
Switching from one DKP edition to another.

In this case, the module may be in the Ready state. The error occurs in the module’s pods. To find the problematic pod, use the command:

d8 k -n d8-<module-name> get pods

A problematic pod will have a status other than Running.

To view information about a pod, use the command:

d8 k -n d8-<module-name> describe pod <pod-name>

Example of an error message in the pod when there is a problem with downloading the image and reinstalling the module:

    Failed to pull image "registry.deckhouse.ru/deckhouse/ce/modules/console@sha256:a12b4f8de1d997005155d0ba0a7c968a015dd8d18bb5d54645ddb040ddab1ef4": rpc error: code = NotFound desc = failed to pull and unpack image "registry.deckhouse.ru/deckhouse/ce/modules/console@sha256:a12b4f8de1d997005155d0ba0a7c968a015dd8d18bb5d54645ddb040ddab1ef4": failed to resolve reference ...

  

To download the image and reinstall the module that caused the problem:

Get a list of module releases:

d8 k get mr -l module=my-module

Output example:

NAME               PHASE        UPDATE POLICY   TRANSITIONTIME   MESSAGE
my-module-v3.7.4   Superseded                   5d23h
my-module-v3.7.5   Deployed                     5d23h

Find the module release deployed in the cluster in the list (it should have the status Deployed).

Add the annotation modules.deckhouse.io/reinstall=true to the expanded release:
```
d8 k annotate mr my-module-v3.7.5 modules.deckhouse.io/reinstall=true
```

After adding the annotation, the module image is re-downloaded from the registry, the module is validated with the current settings from ModuleConfig, and installed in the cluster. After successful reinstallation, the annotation is automatically removed from ModuleRelease.

To verify that the module has been successfully reinstalled and all module pods is working, use the command:

d8 k -n d8-<module-name> get pods

All pods in the module must have the status Running. Example:

    NAME                                READY   STATUS    RESTARTS   AGE
backend-567d6c6cdc-g5qgt            1/1     Running   0          2d2h
frontend-7c8b567759-h8jdf           1/1     Running   0          2d2h
observability-gw-86cf75f5d6-7xljh   1/1     Running   0          2d2h

  

Cluster and Infrastructure

What to do if you encounter problems updating Kubernetes components on cluster nodes, synchronizing nodes, or applying NodeGroup Configuration?

If Kubernetes components are not updated on the cluster node, the NodeGroup configuration is not applied, and not all NodeGroup nodes are synchronized (have the UPTODATE status), perform the following steps:

Check the bashible logs on the node where the problems are occurring. The bashible mechanism is used to keep cluster nodes up to date. It is started by the bashible.timer timer at regular intervals as a service on the cluster nodes. This involves restarting, synchronizing scripts, and executing them (if necessary).

To check bashible logs, use the command:
```
journalctl -u bashible
```
If the response contains the message Configuration is in sync, nothing to do, the node is synchronized and there are no problems. The absence of this message or the presence of errors indicates a problem.

Check the synchronization status of cluster nodes using the command:

d8 k get ng

The number of nodes in the UPTODATE state must match the total number of nodes in each group.

Example output:

NAME       TYPE     READY   NODES   UPTODATE   INSTANCES   DESIRED   MIN   MAX   STANDBY   STATUS   AGE    SYNCED
frontend   Static   1       1       1                                                               118d   True
master     Static   3       3       3                                                               118d   True
system     Static   2       2       2                                                               118d   True
worker     Static   2       2       2                                                               118d   True

What to do if problems adding a node to the cluster via Cluster API Provider Static are present?

If, when adding a node to the cluster via Cluster API Provider Static (CAPS), it remains in Pending or Bootstrapping status, perform the following steps:

Verify that the access keys specified in the SSHCredentials resource are correct. Ensure that the username and SSH key specified in SSHCredentials are correct.
On the node where the problem occurred, check that the public key corresponding to the private key from SSHCredentials is present in authorized_keys. Example command for checking:
```
cat ~/.ssh/authorized_keys
```
Check the number of nodes specified in NodeGroup, which should include the node being added. Make sure that the maximum number of nodes is not exceeded.
Check the status of the bashible.service on the node that caused the problem:
```
systemctl status bashible.service
```
It must have the status active (running). If the service has the status inactive or failed, the service has not started. This indicates a problem with the configuration process.
If the steps above did not resolve the issue, remove the problematic node and its StaticInstance resource from the cluster so that the system will attempt to recreate them. To do this:
- Get a list of nodes and locate the problematic one:
```
d8 k get nodes
```
- Find the corresponding StaticInstance resource:
```
kubectl get staticinstances -n <namespace-name>
```
- Remove the problematic node:
```
kubectl delete node <node-name>
```
- Remove the corresponding StaticInstance resource:
```
kubectl delete staticinstances -n <namespace-name> <static-instance-name>
```

How to change the instance type for nodes with the CloudPermanent type?

To change the instance type for nodes with the CloudPermanent type, follow these steps:

Make a backup of etcd and the /etc/kubernetes directory.
Transfer the archive to a server outside the cluster (e.g., on a local machine).
Ensure there are no alerts in the cluster that can prevent the update of the master nodes.
Make sure that Deckhouse queue is empty. To view the status of all Deckhouse job queues, run the following command:
```
d8 s queue list
```
Example output (queues are empty):
```
Summary:
- 'main' queue: empty.
- 88 other queues (0 active, 88 empty): 0 tasks.
- no tasks to handle.
```

Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):

DH_VERSION=$(d8 k -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') 
DH_EDITION=$(d8 k -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}' | tr '[:upper:]' '[:lower:]' ) 
docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \
  registry.deckhouse.ru/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash

In the installer container, run the following command to check the state before working:

dhctl terraform check --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
  --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>

The command output should indicate that Terraform found no inconsistencies and no changes are required.

In the installer container, run the command to edit the cluster configuration (specify the addresses of all master nodes in the --ssh-host parameter):

dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
  --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>

Edit the instanceClass parameter of the desired node group by changing the instance type and save the changes. Example settings for the masterNodeGroup of the Yandex Cloud provider:

masterNodeGroup:
 replicas: 3  # required number of master nodes
 instanceClass:
   cores: 4      # change the number of CPUs
   memory: 8192  # change the memory size (in MB)
   # other instance parameters...
   externalIPAddresses:
   - "Auto"      # for each master node
   - "Auto"
   - "Auto"

In the installer container, run the following command to perform nodes upgrade:

You should read carefully what converge is going to do when it asks for approval.

When the command is executed, the nodes will be replaced by new nodes with confirmation on each node. The replacement will be performed one by one in reverse order (2,1,0).
```
dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
  --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
```
Repeat the steps below (Sec. 9-12) for each master node one by one, starting with the node with the highest number (suffix 2) and ending with the node with the lowest number (suffix 0).
On the newly created node, check the systemd-unit log for the bashible.service. Wait until the node configuration is complete (you will see a message nothing to do in the log):
```
journalctl -fu bashible.service
```

Make sure the node is listed as an etcd cluster member:

for pod in $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name); do
  d8 k -n kube-system exec "$pod" -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
  --endpoints https://127.0.0.1:2379/ member list -w table
  if [ $? -eq 0 ]; then
    break
  fi
done

Make sure control-plane-manager is running on the node:

d8 k -n kube-system wait pod --timeout=10m --for=condition=ContainersReady \
  -l app=d8-control-plane-manager --field-selector spec.nodeName=<MASTER-NODE-N-NAME>

Proceed to update the next node (repeat the steps above).

What to do if the resources generated by an object are not created after adding the object?

If the required resources do not appear after creating an object in the system (for example, dexAuthenticator), follow these steps:

Check if there are any critical alerts in the cluster that may block the creation of the required objects. To do this, use the command:

d8 k get clusteralerts.deckhouse.io

Example output:

NAME               ALERT                                           SEVERITY   AGE   LAST RECEIVED   STATUS
012f602592aa7a91   K8SSchedulerTargetDown                          3          16h   54s             firing
0836dc893d5ecc65   KubernetesDeploymentReplicasUnavailable         5          15h   62s             firing
08742f87d62d0063   NTPDaemonOnNodeDoesNotSynchronizeTime           5          16h   46s             firing
172cfd38d2f7fd19   D8DeckhouseQueueIsHung                          7          12h   66s             firing
1c5705daf731f5cf   D8StrongholdNoActiveNodes                       3          16h   55s             firing
1d2c2f7d69f69f4b   D8DeckhouseIsNotOnReleaseChannel                9          12h   53s             firing
205a551243d795f3   D8LogShipperAgentNotScheduledInCluster          7          15h   63s             firing
2e34039aa7a3018e   D8NodeIsNotUpdating                             9          12h   47s             firing
31baf9a70d657275   D8StrongholdClusterNotHealthy                   7          16h   55s             firing

For more information about alerts, see the The list of alerts section.

Check the Deckhouse task queue:
```
d8 s queue list
```
Example output (queues are empty):
```
Summary:
- 'main' queue: empty.
- 88 other queues (0 active, 88 empty): 0 tasks.
- no tasks to handle.
```
If there are many unprocessed or long-running tasks in the queue, this may indicate problems.

Analyze DKP logs and events:

To view logs in real time, use the command:

d8 k -n d8-system logs -f -l app=deckhouse

Example output:

{"level":"info","logger":"addon-operator","msg":"ConvergeModules task for OperatorStartup in phase '', trigger is Operator-Startup","binding":"ConvergeModules","event.type":"OperatorStartup","queue":"main","task.flow":"start","task.id":"fde0eb3b-5c3e-4da6-a0d8-a52f8ae03428","time":"2025-11-26T08:29:33Z"}
{"level":"warn","logger":"addon-operator.converge-modules","msg":"ConvergeModules: functional scheduler not finished","binding":"ConvergeModules","event.type":"OperatorStartup","queue":"main","task.id":"fde0eb3b-5c3e-4da6-a0d8-a52f8ae03428","time":"2025-11-26T08:29:33Z"}

When analyzing logs, pay special attention to warnings (WARNING) and error messages (ERROR).

To view events, use the command:

d8 k -n d8-system get events

Example output:

LAST SEEN   TYPE      REASON              OBJECT                                          MESSAGE
11m         Warning   Unhealthy           pod/deckhouse-5886c9bd77-vgdbw                  Readiness probe failed: HTTP probe failed with statuscode: 500
7m22s       Normal    SuccessfulDelete    replicaset/deckhouse-5886c9bd77                 Deleted pod: deckhouse-5886c9bd77-vgdbw
7m20s       Normal    Scheduled           pod/deckhouse-6bc5c4494-fwx6z                   Successfully assigned d8-system/deckhouse-6bc5c4494-fwx6z to sandbox1-master-0
7m20s       Normal    Pulling             pod/deckhouse-6bc5c4494-fwx6z                   Pulling image "dev-registry.deckhouse.io/sys/deckhouse-oss@sha256:17ac07634e17422df52720264cddec3916ed6985a77782dc8a24fe5352290e6e"

When analyzing events, pay special attention to those with the Warning type.

Deckhouse

How to check the job queue in Deckhouse?

How to check the status of all Deckhouse task queues?

To view the status of all Deckhouse job queues, run the following command:

d8 s queue list

Example output (queues are empty):

    Summary:
- 'main' queue: empty.
- 88 other queues (0 active, 88 empty): 0 tasks.
- no tasks to handle.

  

How to view the status of the main task queue?

To view the status of the Deckhouse main task queue, run the following command:

d8 s queue main

Example output (38 tasks in the main queue):

Queue 'main': length 38, status: 'run first task'

Example output (the main queue is empty):

Queue 'main': length 0, status: 'waiting for task 0s'

What to do if there are problems updating DKP?

Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods are in an unworkable state

If the Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods in the d8-system namespace are in an unworkable state. Perform the following steps:

Check the Deckhouse logs using the command:
```
d8 k -n d8-system logs -f -l app=deckhouse | jq -Rr 'fromjson? | .msg'
```
If there are any problems, information about them will be included in the output. When analyzing logs, pay special attention to warnings (WARNING) and error messages (ERROR).

Check Deckhouse events using the command:

d8 k -n d8-system describe po -l app=deckhouse | awk '
/^Name:/ { 
    pod = $2; 
    print "=== " pod " ==="; 
    in_events = 0 
}
/Events:/ { 
    in_events = 1; 
    next 
}
in_events && /^$/ { 
    in_events = 0; 
    print "---" 
}
in_events && !/^Events:/ { 
    print $0 
}
' | sed '/^---$/N;/^\n$/D'

Pod events contain key information about problems (e.g., planning errors, image loading errors, etc.). When analyzing events, pay special attention to those with the Warning type.

Example output:

Type     Reason     Age                      From     Message
----     ------     ----                     ----     -------
Warning  Unhealthy  4m44s (x1918 over 154m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500

DKP update is stuck in the Release is suspended status

The status Release is suspended indicates that it has been postponed and is currently unavailable (not recommended) for installation. In this case, it is recommended to remain on the latest available release or on the one currently installed (it will have the status Deployed).

To view the list of releases, use the command:

d8 k get deckhousereleases.deckhouse.io

Example output:

    NAME       PHASE        TRANSITIONTIME   MESSAGE
v1.69.13   Skipped      3h46m
v1.69.14   Skipped      3h46m
v1.69.15   Skipped      3h46m
v1.69.16   Superseded   160m
v1.70.12   Suspended    49d              Release is suspended
v1.70.13   Skipped      36d
v1.70.14   Skipped      34d
v1.70.15   Skipped      28d
v1.70.16   Skipped      19d
v1.70.17   Deployed     160m
v1.71.3    Suspended    14d              Release is suspended

  

IAM

What to do if there are problems applying DexProvider settings?

If you have changed the DexProvider settings in the user-authn module and one of the following problems occurs:

No changes are visible (settings are not applied).
When attempting to log in to the platform’s web interface with any type of authorization, a 500 Internal Server Error occurs without a detailed description.

Perform the following steps:

Check the status of the dex deployment:

d8 k -n d8-user-authn get pod

Example output:

NAME                                    READY   STATUS    RESTARTS   AGE
dex-5ddb779b7d-6pbhs                    2/2     Running   0          20h
kubeconfig-generator-7c46977b9f-5kdmc   1/1     Running   0          20h

If the module is functioning properly and the correct configuration is specified in DexProvider, all pods will have the status Running.

Check the logs for the problematic pod:
```
d8 k -n d8-user-authn logs dex-<pod-name>
```
Based on the information from the logs, correct the configuration in the DexProvider resource and wait for the dex pods to restart. Within a few minutes, the pods will restart automatically, and the platform’s web interface (located at console.<CLUSTER_NAME_TEMPLATE>) will become available and will reflect the changes made to the DexProvider resource.

Kubernetes and Scheduling

What to do when the API server is overloaded?

The following signs may indicate problems with API server load and memory consumption:

kubectl (d8) responds slowly or does not respond at all (commands are executed slowly or not at all).
Pods are recreated in the cluster for no apparent reason.

If these signs are present, perform the following actions:

Check the resource consumption of API server pods. To do this, use the command:

d8 k -n kube-system top po -l component=kube-apiserver

Pay attention to MEMORY consumption and CPU.

Example output:

NAME                               CPU(cores)   MEMORY(bytes)
kube-apiserver-sandbox1-master-0   251m         1476Mi

Check the metrics in Grafana.

To view the metrics, open the dashboard “Home” → “Dashboards” → “Kubernetes Cluster” → “Control Plane Status”. Review the graphs related to the API server (“Kube-apiserver CPU Usage”, “Kube-apiserver Memory Usage”, “Kube-apiserver latency”, etc.).
Review the API server audit logs to identify the source of high memory consumption. One common cause of high memory consumption is a large number of requests.

How to check the Kubernetes version in use?

To check the Kubernetes version in use, run the command:

d8 k get nodes

Example output:

    NAME                   STATUS   ROLES                  AGE    VERSION
frontend-0             Ready    frontend               118d   v1.31.9
master-0               Ready    control-plane,master   118d   v1.31.9
master-1               Ready    control-plane,master   118d   v1.31.9
master-2               Ready    control-plane,master   118d   v1.31.9
system-0               Ready    system                 118d   v1.31.9
system-1               Ready    system                 118d   v1.31.9
worker-0               Ready    worker                 37d    v1.31.9
worker-1               Ready    worker                 19d    v1.31.9

  

Common

What to do if, with VPN enabled, the container with the installer cannot access the network?

How do I change the release channel for a module?

What should I do if the module image did not download and the module did not reinstall?

Cluster and Infrastructure

What to do if you encounter problems updating Kubernetes components on cluster nodes, synchronizing nodes, or applying NodeGroup Configuration?

What to do if problems adding a node to the cluster via Cluster API Provider Static are present?

How to change the instance type for nodes with the CloudPermanent type?

What to do if the resources generated by an object are not created after adding the object?

Deckhouse

How to check the job queue in Deckhouse?

How to check the status of all Deckhouse task queues?

How to view the status of the main task queue?

What to do if there are problems updating DKP?

Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods are in an unworkable state

DKP update is stuck in the Release is suspended status

IAM

What to do if there are problems applying DexProvider settings?

Kubernetes and Scheduling

What to do when the API server is overloaded?

How to check the Kubernetes version in use?

An error has occurred

Tell us what you didn’t like.

Frequently Asked Questions

Common

What to do if, with VPN enabled, the container with the installer cannot access the network?

How do I change the release channel for a module?

What should I do if the module image did not download and the module did not reinstall?

Cluster and Infrastructure

What to do if you encounter problems updating Kubernetes components on cluster nodes, synchronizing nodes, or applying NodeGroup Configuration?

What to do if problems adding a node to the cluster via Cluster API Provider Static are present?

How to change the instance type for nodes with the CloudPermanent type?

What to do if the resources generated by an object are not created after adding the object?

Deckhouse

How to check the job queue in Deckhouse?

How to check the status of all Deckhouse task queues?

How to view the status of the main task queue?

What to do if there are problems updating DKP?

Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods are in an unworkable state

DKP update is stuck in the Release is suspended status

IAM

What to do if there are problems applying DexProvider settings?

Kubernetes and Scheduling

What to do when the API server is overloaded?

How to check the Kubernetes version in use?

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error