The module lifecycle stageGeneral Availability
The module has requirements for installation

Triggers

Triggers (alerting rules) define the conditions for creating alerts when metric values deviate from expected thresholds.

Triggers are defined in rule groups as items in the spec.rules array. If a rule contains the alert field, it is treated as a trigger and is used to create alerts.

Types of rule groups with triggers

Three types of rule groups can define triggers:

Rule group type Scope Who has access
System rule groups (ClusterObservabilityMetricsRulesGroup) Cluster level DKP administrators
Project rule groups (ObservabilityMetricsRulesGroup) Project (namespace) level Users of the corresponding project
Standard propagated rule groups (ClusterObservabilityPropagatedMetricsRulesGroup) Created at the cluster level and automatically available in all projects Users of all projects

Rule group types:

Additional labels for alerts shipped with DKP

The observability module lets you apply additional labels to alerting rules shipped with DKP. To do this, use the ClusterObservabilityAlertAdditionalLabels resource.

Additional labels are applied only to alerts created using the ClusterObservabilityMetricsRulesGroup and ClusterObservabilityPropagatedMetricsRulesGroup resources with the heritage: deckhouse label.

To add labels to custom alerts created with ObservabilityMetricsRulesGroup or ClusterObservabilityMetricsRulesGroup, use the spec.rules.labels field.

Configuration examples:

  • Adding a label to all cluster alerts:

    apiVersion: observability.deckhouse.io/v1alpha1
    kind: ClusterObservabilityAlertAdditionalLabels
    metadata:
      name: all-alerts
    spec:
      alertSelector:
        matchExpressions:
          - key: alertname
            operator: Exists
      additionalLabels:
        example-label-name: example-label-value
  • Adding the severity=Info label to alerts with severity_level of 7, 8, and 9:

    apiVersion: observability.deckhouse.io/v1alpha1
    kind: ClusterObservabilityAlertAdditionalLabels
    metadata:
      name: severity-info-low-priority
    spec:
      alertSelector:
        matchExpressions:
          - key: severity_level
            operator: In
            values: ["7", "8", "9"]
      additionalLabels:
        severity: Info
  • Adding the team=custom label to alerts with the specified names:

    apiVersion: observability.deckhouse.io/v1alpha1
    kind: ClusterObservabilityAlertAdditionalLabels
    metadata:
      name: deckhouse-team-routing
    spec:
      alertSelector:
        matchExpressions:
          - key: alertname
            operator: In
            values:
              - D8CNIMisconfigured
              - D8DeckhouseIsNotOnReleaseChannel
              - D8DeckhouseIsNotOnReleaseChannel
      additionalLabels:
        team: custom

Trigger groups

Trigger groups are used to logically organize monitoring rules and manage their parameters at the rule set level.

Groups are convenient for combining triggers related to the same component, service, or project, as well as for applying a shared evaluation interval to all rules in the group.

Notifications

The observability module provides mechanisms for configuring alert notification delivery and controlling access to notification channels at both the cluster and project levels.

The following delivery channels are supported:

  • Email
  • Telegram
  • Slack
  • Webhook
  • ExpressMessenger

Connection parameters depend on the channel type and are configured through the corresponding Kubernetes resource.

Types of notification channels

Three types of notification channels are supported:

Channel type Scope Who can create
System channels (ClusterObservabilityNotificationChannel) Cluster level DKP administrators
Project channels (ObservabilityNotificationChannel) Project (namespace) level Users of the corresponding project
Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel) Created at the cluster level and automatically available in all projects DKP administrators

Channel types:

  • System channels (ClusterObservabilityNotificationChannel): Used for cluster-level notification delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification channels”.

  • Project channels (ObservabilityNotificationChannel): Allow configuring notification delivery within a specific project. Available in the corresponding project in the Deckhouse web UI under “Monitoring” → “Notification settings” → “Notification channels”.

  • Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel): Created at the cluster level and automatically become available in all projects for notification delivery. Use the ClusterObservabilityPropagatedNotificationChannel resource or the d8 CLI utility to create them.

Webhook channel HTTP client configuration

For channels with spec.type: Webhook, you can additionally configure outbound HTTP client settings in spec.webhook.httpConfig.

There are 3 mutually exclusive authentication options:

  • basicAuth
  • authorization
  • oauth2

Required and optional fields

  • Required for a Webhook channel: spec.webhook.url
  • spec.webhook.httpConfig is optional
  • If oauth2 is used, oauth2.clientId and oauth2.tokenUrl are required

In addition to authentication, httpConfig supports:

  • transport options (enableHttp2, followRedirects)
  • proxy settings (proxyUrl, noProxy, proxyFromEnvironment, proxyConnectHeader)
  • TLS settings (tlsConfig)
  • custom headers (httpHeaders)

For OAuth2, there are two configuration levels:

  • httpConfig.proxy* and httpConfig.tlsConfig apply to webhook delivery requests
  • httpConfig.oauth2.proxy* and httpConfig.oauth2.tlsConfig apply to OAuth2 token requests

When using file-based fields (passwordFile, credentialsFile, clientSecretFile), the referenced files must be available inside the Alertmanager container.

Example with basicAuth:

apiVersion: observability.deckhouse.io/v1alpha1
kind: ClusterObservabilityNotificationChannel
metadata:
  name: webhook-channel-basic-auth
spec:
  type: Webhook
  webhook:
    url: https://hooks.example/webhook
    httpConfig:
      basicAuth:
        username: notify-user
        password: notify-secret

Example with authorization:

apiVersion: observability.deckhouse.io/v1alpha1
kind: ClusterObservabilityNotificationChannel
metadata:
  name: webhook-channel-authorization
spec:
  type: Webhook
  webhook:
    url: https://hooks.example/webhook
    httpConfig:
      authorization:
        type: Bearer
        credentials: opaque-token

Example with oauth2:

apiVersion: observability.deckhouse.io/v1alpha1
kind: ClusterObservabilityNotificationChannel
metadata:
  name: webhook-channel-oauth2
spec:
  type: Webhook
  webhook:
    url: https://hooks.example/webhook
    httpConfig:
      oauth2:
        clientId: my-client
        clientSecret: my-secret
        tokenUrl: https://idp.example/token
        scopes:
          - read
          - write
        endpointParams:
          audience: myapp

Notification policies

Notification policies define which channel should be used to deliver notifications for an alert (or a group of alerts).

Policy type Description How to configure
System notification policies Used to configure delivery rules for system alerts. System policies can use only system notification channels. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification policies”. Use the ClusterObservabilityNotificationPolicy resource.
Project notification policies Used to configure delivery rules for project alerts. Project policies can use project or standard cluster channels, but not system notification channels. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification policies”. Use the ObservabilityNotificationPolicy resource.

Notification silencing

In situations where notifications are expected in advance (for example, during planned maintenance or testing), the observability module allows disabling notification delivery for alerts matching specified conditions.

Silence type Description How to configure
System notification silences Used to configure silencing rules for system alert delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification silencing”. Use the ClusterObservabilityNotificationSilence resource.
Project notification silences Used to configure silencing rules for project alert delivery. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification silencing”. Use the ObservabilityNotificationSilence resource.

Alerts

The observability module provides access control separation for cluster-level and project-level alerts and allows viewing the list of active and resolved alerts.

Active alerts are grouped by severity level:

  • critical (critical, S1–S3)
  • warning (warning, S4–S6)
  • informational (info, S7–S9)

When viewing an alert, the user can see general information, labels, annotations, and a graph.

Types of alerts

Two types of alerts are supported:

Alert type Scope Who has access
System alerts (ClusterObservabilityAlerts) Cluster level DKP administrators
Project alerts (ObservabilityAlerts) Project (namespace) level Users of the corresponding project

Alert types:

  • System alerts (ClusterObservabilityAlerts): Relate to DKP cluster components. The full list of active and resolved system alerts is available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Active alerts”.

  • Project alerts (ObservabilityAlerts): Relate to resources of a specific project (namespace). The full list of active and resolved project alerts is available in the Deckhouse web UI in the corresponding project under “Monitoring” → “Active alerts”.

DeadMansSwitch and PrometheusUnavailable alerts

DeadMansSwitch

DeadMansSwitch is a service alert that fires continuously, confirming the normal operation of Prometheus and the entire alert delivery pipeline.

If DeadMansSwitch stops arriving, the PrometheusUnavailable alert starts firing.

By default, the DeadMansSwitch alert is sent to all configured notification channels, unless label-based filtering is configured in notification policies.

To avoid cluttering the alert list, DeadMansSwitch is hidden from the output of the d8 k get clusterobservabilityalerts (list/watch) command. To retrieve it directly, use the following command:

d8 k get clusterobservabilityalert deadmansswitch

Disabling this alert is not recommended, but if necessary it can be disabled manually using the deadMansSwitch.enabled parameter.

If disabled manually, the PrometheusUnavailable alert is not created.

PrometheusUnavailable

PrometheusUnavailable (formerly MissingDeadMansSwitch) is an alert that fires if DeadMansSwitch is missing for more than 2 minutes.

This indicates a problem in the alert delivery pipeline. Possible reasons include:

  • Prometheus is unavailable.
  • Communication between Prometheus and Alertmanager is broken.
  • Another issue prevents alert delivery.

The PrometheusUnavailable alert is a system alert and is displayed both in the Deckhouse web UI and in the output of the d8 k get clusterobservabilityalerts command.