The page contains frequently asked questions about configuring and using the Deckhouse Kubernetes Platform.

Common

How can I apply an update for a specific module immediately?

To apply an update for a specific module immediately, set the modules.deckhouse.io/apply-now: "true" annotation on the corresponding ModuleRelease resource.

This annotation applies the release immediately without waiting for the update window. The requirements from spec.requirements still apply. If they are not met, the release will not be applied.

Example of setting the annotation for the console module:

d8 k annotate mr console-v1.43.3 modules.deckhouse.io/apply-now="true"

This can also be done using the d8 CLI for convenience (module names and versions are autocompleted):

d8 system module apply-now console v1.43.3

Example resource with the annotation set:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleRelease
metadata:
  name: console-v1.43.3
  annotations:
    modules.deckhouse.io/apply-now: "true"
...

How can I apply an update DKP immediately, bypassing update windows, canary releases, and manual update mode?

To apply a Deckhouse Kubernetes Platform (DKP) update immediately, add the annotation release.deckhouse.io/apply-now: "true" to the corresponding DeckhouseRelease resource.

This will bypass update windows, canary release settings, and the manual cluster update mode. The update will be applied immediately after the annotation is set.

Example command to set the annotation and skip update windows for version v1.56.2:

d8 k annotate deckhousereleases v1.56.2 release.deckhouse.io/apply-now="true"

Example of a resource with the annotation set:

apiVersion: deckhouse.io/v1alpha1
kind: DeckhouseRelease
metadata:
  annotations:
    release.deckhouse.io/apply-now: "true"
...

What to do if, with VPN enabled, the container with the installer cannot access the network?

If a VPN is installed on the computer from which the cluster bootstrap is performed, there may be a problem with the container with the installer (or graphical installer) accessing the network. Because of this, for example, the graphical installer may not be displayed in the browser.

The problem can be solved in one of the following ways:

  • Disable VPN on your computer and restart the container with the installer.
  • If you cannot disable VPN (for example, if the cluster bootstrap is running on a VPN network), use the --network host parameter when starting the container with the installer (for Docker Desktop on Mac OS, the parameter is available starting with version 4.34.0). This will allow the container to access the network.

    Example of launching a container with an installer with VPN enabled, using the --network host parameter:

    docker run --network host --pull=always -it -v "$PWD/config.yml:/config.yml" -v "$HOME/.ssh/:/tmp/.ssh/" -v "$PWD/dhctl-tmp:/tmp/dhctl" registry.deckhouse.ru/deckhouse/ce/install:early-access bash
    

    Example of running the graphical installer with VPN enabled, using the --network host parameter:

    docker run --network host --rm --pull always -v $HOME/.d8installer:$HOME/.d8installer -v /var/run/docker.sock:/var/run/docker.sock -p 127.0.0.1:8080:8080 registry.deckhouse.ru/deckhouse/installer:latest -r $HOME/.d8installer
    

How do I change the release channel for a module?

A module can be built-in to DKP or connected from a module source (defined using ModuleSource). Built-in modules have a common release cycle with DKP and are updated together with DKP. The release channel of a built-in module always matches the DKP release channel. A module connected from a source has its own release cycle, which is independent of the DKP release cycle. The release channel of a module connected from a source can be changed.

Below is the process of changing the release channel for a module connected from a source.

By default, the release channel for modules is inherited from the DKP release channel (specified in the releaseChannel parameter of the deckhouse ModuleConfig). For more information about release channels, see the Release Channels section.

For modules from a source, the release channel is specified using ModuleUpdatePolicy, which is then linked to the module via the updatePolicy parameter in ModuleConfig.

To change the release channel for a module from a source, follow these steps:

  1. Define the module update policy.

    Create a ModuleUpdatePolicy where you specify the release channel in the releaseChannel parameter.

    Example ModuleUpdatePolicy:

    apiVersion: deckhouse.io/v1alpha2
    kind: ModuleUpdatePolicy
    metadata:
      name: my-module-policy
    spec:
      releaseChannel: Alpha
      # If necessary, specify the update mode and update windows.
      # update:
      #   mode: AutoPatch
      #   windows: []
    

    Ensure that the policy has been created:

    d8 k get mup my-module-policy
    

    Output example:

    NAME               RELEASE CHANNEL   UPDATE MODE
    my-module-policy   Alpha             AutoPatch
    
  2. Link the update policy to the module.

    Specify the name of the created update policy in the updatePolicy parameter of the corresponding module’s ModuleConfig.

    To edit the ModuleConfig, use the command (specify the module name):

    d8 k edit mc my-module
    

    Example ModuleConfig:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: my-module
    spec:
      enabled: true
      # ModuleUpdatePolicy name
      updatePolicy: my-module-policy
    

When you change the module’s release channel, its version will change according to the configured update mode.

To view the current release channel of the module and other information about the module’s state in the cluster, use the corresponding Module object.

Example command to get information about the module:

d8 k get module my-module -o yaml

The update policy used will be specified in the properties.updatePolicy field, and the current release channel in the properties.releaseChannel field. Example output:

apiVersion: deckhouse.io/v1alpha1
kind: Module
metadata:
  name: my-module
  # ...
properties:
  # ...
  releaseChannel: Alpha # Module release channel.
  updatePolicy: my-module-policy # Module update policy.
  version: v1.16.10  # Module version.
  # ...

What should I do if the module image did not download and the module did not reinstall?

In some cases, there may be a problem with automatically downloading the image and reinstalling the module. These cases include:

  • Damage to the file system or other problems that have rendered the module image invalid.
  • Switching to a different registry.
  • Switching from one DKP edition to another.

In this case, the module may be in the Ready state. The error occurs in the module’s pods. To find the problematic pod, use the command:

d8 k -n d8-<module-name> get pods

A problematic pod will have a status other than Running.

To view information about a pod, use the command:

d8 k -n d8-<module-name> describe pod <pod-name>

Example of an error message in the pod when there is a problem with downloading the image and reinstalling the module:

Failed to pull image "registry.deckhouse.ru/deckhouse/ce/modules/console@sha256:a12b4f8de1d997005155d0ba0a7c968a015dd8d18bb5d54645ddb040ddab1ef4": rpc error: code = NotFound desc = failed to pull and unpack image "registry.deckhouse.ru/deckhouse/ce/modules/console@sha256:a12b4f8de1d997005155d0ba0a7c968a015dd8d18bb5d54645ddb040ddab1ef4": failed to resolve reference ...

To download the image and reinstall the module that caused the problem:

  1. Get a list of module releases:

    d8 k get mr -l module=my-module
    

    Output example:

    NAME               PHASE        UPDATE POLICY   TRANSITIONTIME   MESSAGE
    my-module-v3.7.4   Superseded                   5d23h
    my-module-v3.7.5   Deployed                     5d23h
    

    Find the module release deployed in the cluster in the list (it should have the status Deployed).

  2. Add the annotation modules.deckhouse.io/reinstall=true to the expanded release:

    d8 k annotate mr my-module-v3.7.5 modules.deckhouse.io/reinstall=true
    

After adding the annotation, the module image is re-downloaded from the registry, the module is validated with the current settings from ModuleConfig, and installed in the cluster. After successful reinstallation, the annotation is automatically removed from ModuleRelease.

To verify that the module has been successfully reinstalled and all module pods is working, use the command:

d8 k -n d8-<module-name> get pods

All pods in the module must have the status Running. Example:

NAME                                READY   STATUS    RESTARTS   AGE
backend-567d6c6cdc-g5qgt            1/1     Running   0          2d2h
frontend-7c8b567759-h8jdf           1/1     Running   0          2d2h
observability-gw-86cf75f5d6-7xljh   1/1     Running   0          2d2h

Cluster and Infrastructure

What to do if you encounter problems updating Kubernetes components on cluster nodes, synchronizing nodes, or applying NodeGroup Configuration?

If Kubernetes components are not updated on the cluster node, the NodeGroup configuration is not applied, and not all NodeGroup nodes are synchronized (have the UPTODATE status), perform the following steps:

  1. Check the bashible logs on the node where the problems are occurring. The bashible mechanism is used to keep cluster nodes up to date. It is started by the bashible.timer timer at regular intervals as a service on the cluster nodes. This involves restarting, synchronizing scripts, and executing them (if necessary).

    To check bashible logs, use the command:

    journalctl -u bashible
    

    If the response contains the message Configuration is in sync, nothing to do, the node is synchronized and there are no problems. The absence of this message or the presence of errors indicates a problem.

  2. Check the synchronization status of cluster nodes using the command:

    d8 k get ng
    

    The number of nodes in the UPTODATE state must match the total number of nodes in each group.

    Example output:

    NAME       TYPE     READY   NODES   UPTODATE   INSTANCES   DESIRED   MIN   MAX   STANDBY   STATUS   AGE    SYNCED
    frontend   Static   1       1       1                                                               118d   True
    master     Static   3       3       3                                                               118d   True
    system     Static   2       2       2                                                               118d   True
    worker     Static   2       2       2                                                               118d   True
    

What to do if problems adding a node to the cluster via Cluster API Provider Static are present?

If, when adding a node to the cluster via Cluster API Provider Static (CAPS), it remains in Pending or Bootstrapping status, perform the following steps:

  1. Verify that the access keys specified in the SSHCredentials resource are correct. Ensure that the username and SSH key specified in SSHCredentials are correct.

  2. On the node where the problem occurred, check that the public key corresponding to the private key from SSHCredentials is present in authorized_keys. Example command for checking:

    cat ~/.ssh/authorized_keys
    
  3. Check the number of nodes specified in NodeGroup, which should include the node being added. Make sure that the maximum number of nodes is not exceeded.

  4. Check the status of the bashible.service on the node that caused the problem:

    systemctl status bashible.service
    

    It must have the status active (running). If the service has the status inactive or failed, the service has not started. This indicates a problem with the configuration process.

  5. If the steps above did not resolve the issue, remove the problematic node and its StaticInstance resource from the cluster so that the system will attempt to recreate them. To do this:

    • Get a list of nodes and locate the problematic one:

      d8 k get nodes
      
    • Find the corresponding StaticInstance resource:

      kubectl get staticinstances -n <namespace-name>
      
    • Remove the problematic node:

      kubectl delete node <node-name>
      
    • Remove the corresponding StaticInstance resource:

      kubectl delete staticinstances -n <namespace-name> <static-instance-name>
      

How to change the instance type for nodes with the CloudPermanent type?

To change the instance type for nodes with the CloudPermanent type, follow these steps:

  1. Make a backup of etcd and the /etc/kubernetes directory.
  2. Transfer the archive to a server outside the cluster (e.g., on a local machine).
  3. Ensure there are no alerts in the cluster that can prevent the update of the master nodes.
  4. Make sure that Deckhouse queue is empty. To view the status of all Deckhouse job queues, run the following command:

    d8 s queue list
    

    Example output (queues are empty):

    Summary:
    - 'main' queue: empty.
    - 88 other queues (0 active, 88 empty): 0 tasks.
    - no tasks to handle.
    
  5. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):

    DH_VERSION=$(d8 k -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/version}') 
    DH_EDITION=$(d8 k -n d8-system get deployment deckhouse -o jsonpath='{.metadata.annotations.core\.deckhouse\.io\/edition}' | tr '[:upper:]' '[:lower:]' ) 
    docker run --pull=always -it -v "$HOME/.ssh/:/tmp/.ssh/" \
      registry.deckhouse.ru/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
    
  6. In the installer container, run the following command to check the state before working:

    dhctl terraform check --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
      --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
    

    The command output should indicate that Terraform found no inconsistencies and no changes are required.

  7. In the installer container, run the command to edit the cluster configuration (specify the addresses of all master nodes in the --ssh-host parameter):

    dhctl config edit provider-cluster-configuration --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
      --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
    
  8. Edit the instanceClass parameter of the desired node group by changing the instance type and save the changes. Example settings for the masterNodeGroup of the Yandex Cloud provider:

    masterNodeGroup:
     replicas: 3  # required number of master nodes
     instanceClass:
       cores: 4      # change the number of CPUs
       memory: 8192  # change the memory size (in MB)
       # other instance parameters...
       externalIPAddresses:
       - "Auto"      # for each master node
       - "Auto"
       - "Auto"
    
  9. In the installer container, run the following command to perform nodes upgrade:

    You should read carefully what converge is going to do when it asks for approval.

    When the command is executed, the nodes will be replaced by new nodes with confirmation on each node. The replacement will be performed one by one in reverse order (2,1,0).

    dhctl converge --ssh-agent-private-keys=/tmp/.ssh/<SSH_KEY_FILENAME> --ssh-user=<USERNAME> \
      --ssh-host <MASTER-NODE-0-HOST> --ssh-host <MASTER-NODE-1-HOST> --ssh-host <MASTER-NODE-2-HOST>
    

    Repeat the steps below (Sec. 9-12) for each master node one by one, starting with the node with the highest number (suffix 2) and ending with the node with the lowest number (suffix 0).

  10. On the newly created node, check the systemd-unit log for the bashible.service. Wait until the node configuration is complete (you will see a message nothing to do in the log):

    journalctl -fu bashible.service
    
  11. Make sure the node is listed as an etcd cluster member:

    for pod in $(d8 k -n kube-system get pod -l component=etcd,tier=control-plane -o name); do
      d8 k -n kube-system exec "$pod" -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
      --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key \
      --endpoints https://127.0.0.1:2379/ member list -w table
      if [ $? -eq 0 ]; then
        break
      fi
    done
    
  12. Make sure control-plane-manager is running on the node:

    d8 k -n kube-system wait pod --timeout=10m --for=condition=ContainersReady \
      -l app=d8-control-plane-manager --field-selector spec.nodeName=<MASTER-NODE-N-NAME>
    
  13. Proceed to update the next node (repeat the steps above).

What to do if the resources generated by an object are not created after adding the object?

If the required resources do not appear after creating an object in the system (for example, dexAuthenticator), follow these steps:

  1. Check if there are any critical alerts in the cluster that may block the creation of the required objects. To do this, use the command:

    d8 k get clusteralerts.deckhouse.io
    

    Example output:

    NAME               ALERT                                           SEVERITY   AGE   LAST RECEIVED   STATUS
    012f602592aa7a91   K8SSchedulerTargetDown                          3          16h   54s             firing
    0836dc893d5ecc65   KubernetesDeploymentReplicasUnavailable         5          15h   62s             firing
    08742f87d62d0063   NTPDaemonOnNodeDoesNotSynchronizeTime           5          16h   46s             firing
    172cfd38d2f7fd19   D8DeckhouseQueueIsHung                          7          12h   66s             firing
    1c5705daf731f5cf   D8StrongholdNoActiveNodes                       3          16h   55s             firing
    1d2c2f7d69f69f4b   D8DeckhouseIsNotOnReleaseChannel                9          12h   53s             firing
    205a551243d795f3   D8LogShipperAgentNotScheduledInCluster          7          15h   63s             firing
    2e34039aa7a3018e   D8NodeIsNotUpdating                             9          12h   47s             firing
    31baf9a70d657275   D8StrongholdClusterNotHealthy                   7          16h   55s             firing
    

    For more information about alerts, see the The list of alerts section.

  2. Check the Deckhouse task queue:

    d8 s queue list
    

    Example output (queues are empty):

    Summary:
    - 'main' queue: empty.
    - 88 other queues (0 active, 88 empty): 0 tasks.
    - no tasks to handle.
    

    If there are many unprocessed or long-running tasks in the queue, this may indicate problems.

  3. Analyze DKP logs and events:

    • To view logs in real time, use the command:

      d8 k -n d8-system logs -f -l app=deckhouse
      

      Example output:

      {"level":"info","logger":"addon-operator","msg":"ConvergeModules task for OperatorStartup in phase '', trigger is Operator-Startup","binding":"ConvergeModules","event.type":"OperatorStartup","queue":"main","task.flow":"start","task.id":"fde0eb3b-5c3e-4da6-a0d8-a52f8ae03428","time":"2025-11-26T08:29:33Z"}
      {"level":"warn","logger":"addon-operator.converge-modules","msg":"ConvergeModules: functional scheduler not finished","binding":"ConvergeModules","event.type":"OperatorStartup","queue":"main","task.id":"fde0eb3b-5c3e-4da6-a0d8-a52f8ae03428","time":"2025-11-26T08:29:33Z"}
      

      When analyzing logs, pay special attention to warnings (WARNING) and error messages (ERROR).

    • To view events, use the command:

      d8 k -n d8-system get events
      

      Example output:

      LAST SEEN   TYPE      REASON              OBJECT                                          MESSAGE
      11m         Warning   Unhealthy           pod/deckhouse-5886c9bd77-vgdbw                  Readiness probe failed: HTTP probe failed with statuscode: 500
      7m22s       Normal    SuccessfulDelete    replicaset/deckhouse-5886c9bd77                 Deleted pod: deckhouse-5886c9bd77-vgdbw
      7m20s       Normal    Scheduled           pod/deckhouse-6bc5c4494-fwx6z                   Successfully assigned d8-system/deckhouse-6bc5c4494-fwx6z to sandbox1-master-0
      7m20s       Normal    Pulling             pod/deckhouse-6bc5c4494-fwx6z                   Pulling image "dev-registry.deckhouse.io/sys/deckhouse-oss@sha256:17ac07634e17422df52720264cddec3916ed6985a77782dc8a24fe5352290e6e"
      

    When analyzing events, pay special attention to those with the Warning type.

Deckhouse

How can I check DKP versions available in different release channels?

For information about the current DKP versions across all release channels, visit releases.deckhouse.io.

How to check the job queue in Deckhouse?

How to check the status of all Deckhouse task queues?

To view the status of all Deckhouse job queues, run the following command:

d8 s queue list

Example output (queues are empty):

Summary:
- 'main' queue: empty.
- 88 other queues (0 active, 88 empty): 0 tasks.
- no tasks to handle.

How to view the status of the main task queue?

To view the status of the Deckhouse main task queue, run the following command:

d8 s queue main

Example output (38 tasks in the main queue):

Queue 'main': length 38, status: 'run first task'

Example output (the main queue is empty):

Queue 'main': length 0, status: 'waiting for task 0s'

What should I do if DKP is not receiving updates from the configured channel?

  • Ensure the correct release channel is configured.
  • Check that DNS resolution for the Deckhouse image registry is working correctly.

    Get and compare the IP addresses of registry.deckhouse.io from both a node and the deckhouse Pod. They must match.

    Example of obtaining an IP of registry.deckhouse.io from a node:

    getent ahosts registry.deckhouse.io
    

    Example output:

    185.193.90.38    STREAM registry.deckhouse.io
    185.193.90.38    DGRAM
    185.193.90.38    RAW
    

    Example of obtaining an IP of registry.deckhouse.io from the deckhouse Pod:

    d8 k -n d8-system exec -ti svc/deckhouse-leader -c deckhouse -- getent ahosts registry.deckhouse.io
    

    Example output:

    185.193.90.38    STREAM registry.deckhouse.io
    185.193.90.38    DGRAM  registry.deckhouse.io
    

    If the resulted IPs do not match, check DNS settings on the node. Pay attention to the search domain list in /etc/resolv.conf, which affects name resolution in the deckhouse Pod. If the search parameter in /etc/resolv.conf file specifies a domain with wildcard DNS resolution configured, this may lead to incorrect IP address resolution for the Deckhouse image registry (see example below).

Example DNS settings that may cause issues resolving the DKP image registry IP address

Below is an example of how DNS settings may result in different resolution behavior on the node and in a Kubernetes Pod:

  • Example of /etc/resolv.conf on the node:

    nameserver 10.0.0.10
    search company.my
    

    On nodes, the default ndot setting is 1 (options ndots:1), while in Kubernetes Pods, it’s 5. This causes different resolution logic for DNS names with 5 or fewer dots on a node and on the Pod.

  • The DNS zone company.my has a wildcard entry *.company.my that resolves to 10.0.0.100. This means any undefined DNS name in the company.my zone resolves to 10.0.0.100.

Taking into account the search parameter in /etc/resolv.conf, when accessing registry.deckhouse.io from a node, the system will attempt to resolve the IP address for registry.deckhouse.io (because it considers it fully qualified due to the default options ndots:1 setting).

However, when accessing registry.deckhouse.io from a Kubernetes Pod, considering the options ndots:5 setting used by default in Kubernetes and the search parameter, the system will first attempt to resolve the name registry.deckhouse.io.company.my. This name will resolve to the IP address 10.0.0.100 because, according to the company.my DNS zone’s wildcard configuration, *.company.my is resolved to 10.0.0.100. As a result, the Pod will fail to connect to the registry.deckhouse.io host and will be unable to download information about available Deckhouse updates.

How can I know when a new DKP version is available for the cluster?

As soon as a new version appears on the configured release channel:

  • The DeckhouseReleaseIsWaitingManualApproval alert will appear if the cluster is in manual update mode.
  • A new DeckhouseRelease custom resource will be created. To see the list of releases, run d8 k get deckhousereleases. If the new version is in Pending state, it means it hasn’t been installed yet. Possible reasons:
    • Manual update mode is enabled.
    • Automatic update mode is enabled and update windows are scheduled, but the window hasn’t started yet.
    • Automatic update mode is enabled and update windows are not scheduled, but the update is delayed by a random period to reduce load on the container image registry. The status.message field of the DeckhouseRelease resource will show a corresponding message.
    • The update.notification.minimalNotificationTime parameter is set, and the delay period hasn’t elapsed.

How can I receive information about upcoming updates in advance?

You can get information about upcoming minor DKP version updates on the release channel in one of the following ways:

How can I tell that the update was successful?

If the DeckhouseUpdating alert is gone, the update has finished.

You can also check the status of DKP releases in the cluster with the following command:

d8 k get deckhouserelease

Example output:

NAME       PHASE        TRANSITIONTIME   MESSAGE
v1.46.8    Superseded   13d
v1.46.9    Superseded   11d
v1.47.0    Superseded   4h12m
v1.47.1    Deployed     4h12m

The Deployed status means the cluster has switched to the corresponding version, but it doesn’t guarantee that the update has been successful.

To ensure the update completed successfully, check the state of the deckhouse Pod with the following command:

d8 k -n d8-system get pods -l app=deckhouse

Example output:

NAME                   READY  STATUS   RESTARTS  AGE
deckhouse-7844b47bcd-qtbx9  1/1   Running  0       1d
  • If the Pod is Running and shows 1/1 under READY, it means the update completed successfully.
  • If the Pod is Running but shows 0/1 under READY, it means the update is still in progress. If it stays like this for more than 20–30 minutes, it may indicate a problem with DKP that requires investigation.
  • If the Pod is not Running, it may indicate a problem with DKP that requires investigation.

If something goes wrong

  • Check the logs using the following command:

    d8 k -n d8-system logs -f -l app=deckhouse | jq -Rr 'fromjson? | .msg'
    
  • Collect debug information and contact DKP technical support.

How can I tell that an update is in progress?

During an update:

  • The DeckhouseUpdating alert is active.
  • The deckhouse Pod is not in the Ready state. If the Pod stays in a non-Ready state for a long time, it may indicate an issue with DKP that requires investigation.

What to do if there are problems updating DKP?

Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods are in an unworkable state

If the Deckhouse Kubernetes Platform update fails, one or more Deckhouse pods in the d8-system namespace are in an unworkable state. Perform the following steps:

  1. Check the Deckhouse logs using the command:

    d8 k -n d8-system logs -f -l app=deckhouse | jq -Rr 'fromjson? | .msg'
    

    If there are any problems, information about them will be included in the output. When analyzing logs, pay special attention to warnings (WARNING) and error messages (ERROR).

  2. Check Deckhouse events using the command:

    d8 k -n d8-system describe po -l app=deckhouse | awk '
    /^Name:/ { 
        pod = $2; 
        print "=== " pod " ==="; 
        in_events = 0 
    }
    /Events:/ { 
        in_events = 1; 
        next 
    }
    in_events && /^$/ { 
        in_events = 0; 
        print "---" 
    }
    in_events && !/^Events:/ { 
        print $0 
    }
    ' | sed '/^---$/N;/^\n$/D'
    

    Pod events contain key information about problems (e.g., planning errors, image loading errors, etc.). When analyzing events, pay special attention to those with the Warning type.

    Example output:

    Type     Reason     Age                      From     Message
    ----     ------     ----                     ----     -------
    Warning  Unhealthy  4m44s (x1918 over 154m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500
    

DKP update is stuck in the Release is suspended status

The status Release is suspended indicates that it has been postponed and is currently unavailable (not recommended) for installation. In this case, it is recommended to remain on the latest available release or on the one currently installed (it will have the status Deployed).

To view the list of releases, use the command:

d8 k get deckhousereleases.deckhouse.io

Example output:

NAME       PHASE        TRANSITIONTIME   MESSAGE
v1.69.13   Skipped      3h46m
v1.69.14   Skipped      3h46m
v1.69.15   Skipped      3h46m
v1.69.16   Superseded   160m
v1.70.12   Suspended    49d              Release is suspended
v1.70.13   Skipped      36d
v1.70.14   Skipped      34d
v1.70.15   Skipped      28d
v1.70.16   Skipped      19d
v1.70.17   Deployed     160m
v1.71.3    Suspended    14d              Release is suspended

IAM

What to do if there are problems applying DexProvider settings?

If you have changed the DexProvider settings in the user-authn module and one of the following problems occurs:

  • No changes are visible (settings are not applied).
  • When attempting to log in to the platform’s web interface with any type of authorization, a 500 Internal Server Error occurs without a detailed description.

Perform the following steps:

  1. Check the status of the dex deployment:

    d8 k -n d8-user-authn get pod
    

    Example output:

    NAME                                    READY   STATUS    RESTARTS   AGE
    dex-5ddb779b7d-6pbhs                    2/2     Running   0          20h
    kubeconfig-generator-7c46977b9f-5kdmc   1/1     Running   0          20h
    

    If the module is functioning properly and the correct configuration is specified in DexProvider, all pods will have the status Running.

  2. Check the logs for the problematic pod:

    d8 k -n d8-user-authn logs dex-<pod-name>
    

    Based on the information from the logs, correct the configuration in the DexProvider resource and wait for the dex pods to restart. Within a few minutes, the pods will restart automatically, and the platform’s web interface (located at console.<CLUSTER_NAME_TEMPLATE>) will become available and will reflect the changes made to the DexProvider resource.

Kubernetes and Scheduling

What to do when the API server is overloaded?

The following signs may indicate problems with API server load and memory consumption:

  • kubectl (d8) responds slowly or does not respond at all (commands are executed slowly or not at all).
  • Pods are recreated in the cluster for no apparent reason.

If these signs are present, perform the following actions:

  1. Check the resource consumption of API server pods. To do this, use the command:

    d8 k -n kube-system top po -l component=kube-apiserver
    

    Pay attention to MEMORY consumption and CPU.

    Example output:

    NAME                               CPU(cores)   MEMORY(bytes)
    kube-apiserver-sandbox1-master-0   251m         1476Mi
    
  2. Check the metrics in Grafana.

    To view the metrics, open the dashboard “Home” → “Dashboards” → “Kubernetes Cluster” → “Control Plane Status”. Review the graphs related to the API server (“Kube-apiserver CPU Usage”, “Kube-apiserver Memory Usage”, “Kube-apiserver latency”, etc.).

  3. Review the API server audit logs to identify the source of high memory consumption. One common cause of high memory consumption is a large number of requests.

How to check the Kubernetes version in use?

To check the Kubernetes version in use, run the command:

d8 k get nodes

Example output:

NAME                   STATUS   ROLES                  AGE    VERSION
frontend-0             Ready    frontend               118d   v1.31.9
master-0               Ready    control-plane,master   118d   v1.31.9
master-1               Ready    control-plane,master   118d   v1.31.9
master-2               Ready    control-plane,master   118d   v1.31.9
system-0               Ready    system                 118d   v1.31.9
system-1               Ready    system                 118d   v1.31.9
worker-0               Ready    worker                 37d    v1.31.9
worker-1               Ready    worker                 19d    v1.31.9