The module lifecycle stagePreview
The module has requirements for installation

Triggers

Triggers (alerting rules) define the conditions for creating alerts when metric values deviate from expected thresholds.

Triggers are defined in rule groups as items in the spec.rules array. If a rule contains the alert field, it is treated as a trigger and is used to create alerts.

Types of rule groups with triggers

Three types of rule groups can define triggers:

Rule group type Scope Who has access
System rule groups (ClusterObservabilityMetricsRulesGroup) Cluster level DKP administrators
Project rule groups (ObservabilityMetricsRulesGroup) Project (namespace) level Users of the corresponding project
Standard propagated rule groups (ClusterObservabilityPropagatedMetricsRulesGroup) Created at the cluster level and automatically available in all projects Users of all projects

Rule group types:

Trigger groups

Trigger groups are used to logically organize monitoring rules and manage their parameters at the rule set level.

Groups are convenient for combining triggers related to the same component, service, or project, as well as for applying a shared evaluation interval to all rules in the group.

Notifications

The observability module provides mechanisms for configuring alert notification delivery and controlling access to notification channels at both the cluster and project levels.

The following delivery channels are supported:

  • Email
  • Telegram
  • Slack
  • Webhook

Connection parameters depend on the channel type and are configured through the corresponding Kubernetes resource.

Types of notification channels

Three types of notification channels are supported:

Channel type Scope Who can create
System channels (ClusterObservabilityNotificationChannel) Cluster level DKP administrators
Project channels (ObservabilityNotificationChannel) Project (namespace) level Users of the corresponding project
Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel) Created at the cluster level and automatically available in all projects DKP administrators

Channel types:

  • System channels (ClusterObservabilityNotificationChannel): Used for cluster-level notification delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification channels”.

  • Project channels (ObservabilityNotificationChannel): Allow configuring notification delivery within a specific project. Available in the corresponding project in the Deckhouse web UI under “Monitoring” → “Notification settings” → “Notification channels”.

  • Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel): Created at the cluster level and automatically become available in all projects for notification delivery. Use the ClusterObservabilityPropagatedNotificationChannel resource or the d8 CLI utility to create them.

Notification policies

Notification policies define which channel should be used to deliver notifications for an alert (or a group of alerts).

Policy type Description How to configure
System notification policies Used to configure delivery rules for system alerts. System policies can use only system notification channels. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification policies”. Use the ClusterObservabilityNotificationPolicy resource.
Project notification policies Used to configure delivery rules for project alerts. Project policies can use project or standard cluster channels, but not system notification channels. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification policies”. Use the ObservabilityNotificationPolicy resource.

Notification silencing

In situations where notifications are expected in advance (for example, during planned maintenance or testing), the observability module allows disabling notification delivery for alerts matching specified conditions.

Silence type Description How to configure
System notification silences Used to configure silencing rules for system alert delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification silencing”. Use the ClusterObservabilityNotificationSilence resource.
Project notification silences Used to configure silencing rules for project alert delivery. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification silencing”. Use the ObservabilityNotificationSilence resource.

Alerts

The observability module provides access control separation for cluster-level and project-level alerts and allows viewing the list of active and resolved alerts.

Active alerts are grouped by severity level:

  • critical (critical, S1–S3)
  • warning (warning, S4–S6)
  • informational (info, S7–S9)

When viewing an alert, the user can see general information, labels, annotations, and a graph.

Types of alerts

Two types of alerts are supported:

Alert type Scope Who has access
System alerts (ClusterObservabilityAlerts) Cluster level DKP administrators
Project alerts (ObservabilityAlerts) Project (namespace) level Users of the corresponding project

Alert types:

  • System alerts (ClusterObservabilityAlerts): Relate to DKP cluster components. The full list of active and resolved system alerts is available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Active alerts”.

  • Project alerts (ObservabilityAlerts): Relate to resources of a specific project (namespace). The full list of active and resolved project alerts is available in the Deckhouse web UI in the corresponding project under “Monitoring” → “Active alerts”.

DeadMansSwitch and PrometheusUnavailable alerts

DeadMansSwitch

DeadMansSwitch is a service alert that fires continuously, confirming the normal operation of Prometheus and the entire alert delivery pipeline.

If DeadMansSwitch stops arriving, the PrometheusUnavailable alert starts firing.

By default, the DeadMansSwitch alert is sent to all configured notification channels, unless label-based filtering is configured in notification policies.

To avoid cluttering the alert list, DeadMansSwitch is hidden from the output of the d8 k get clusterobservabilityalerts (list/watch) command. To retrieve it directly, use the following command:

d8 k get clusterobservabilityalert deadmansswitch

Disabling this alert is not recommended, but if necessary it can be disabled manually using the deadMansSwitch.enabled parameter.

If disabled manually, the PrometheusUnavailable alert is not created.

PrometheusUnavailable

PrometheusUnavailable (formerly MissingDeadMansSwitch) is an alert that fires if DeadMansSwitch is missing for more than 2 minutes.

This indicates a problem in the alert delivery pipeline. Possible reasons include:

  • Prometheus is unavailable.
  • Communication between Prometheus and Alertmanager is broken.
  • Another issue prevents alert delivery.

The PrometheusUnavailable alert is a system alert and is displayed both in the Deckhouse web UI and in the output of the d8 k get clusterobservabilityalerts command.