The module lifecycle stage: Preview
The module has requirements for installation
Triggers
Triggers (alerting rules) define the conditions for creating alerts when metric values deviate from expected thresholds.
Triggers are defined in rule groups as items in the spec.rules array.
If a rule contains the alert field, it is treated as a trigger and is used to create alerts.
Types of rule groups with triggers
Three types of rule groups can define triggers:
| Rule group type | Scope | Who has access |
|---|---|---|
| System rule groups (ClusterObservabilityMetricsRulesGroup) | Cluster level | DKP administrators |
| Project rule groups (ObservabilityMetricsRulesGroup) | Project (namespace) level | Users of the corresponding project |
| Standard propagated rule groups (ClusterObservabilityPropagatedMetricsRulesGroup) | Created at the cluster level and automatically available in all projects | Users of all projects |
Rule group types:
-
System rule groups (ClusterObservabilityMetricsRulesGroup): Used to define triggers for platform-level and cluster component alerts. Created and managed by DKP administrators.
-
Project rule groups (ObservabilityMetricsRulesGroup): Used to define triggers related to a specific project (namespace). Project users can create and edit them within the configured access permissions.
-
Standard propagated rule groups (ClusterObservabilityPropagatedMetricsRulesGroup): Created at the cluster level and automatically become available in all projects.
Trigger groups
Trigger groups are used to logically organize monitoring rules and manage their parameters at the rule set level.
Groups are convenient for combining triggers related to the same component, service, or project, as well as for applying a shared evaluation interval to all rules in the group.
Notifications
The observability module provides mechanisms for configuring alert notification delivery
and controlling access to notification channels at both the cluster and project levels.
The following delivery channels are supported:
EmailTelegramSlackWebhook
Connection parameters depend on the channel type and are configured through the corresponding Kubernetes resource.
Types of notification channels
Three types of notification channels are supported:
| Channel type | Scope | Who can create |
|---|---|---|
| System channels (ClusterObservabilityNotificationChannel) | Cluster level | DKP administrators |
| Project channels (ObservabilityNotificationChannel) | Project (namespace) level | Users of the corresponding project |
| Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel) | Created at the cluster level and automatically available in all projects | DKP administrators |
Channel types:
-
System channels (ClusterObservabilityNotificationChannel): Used for cluster-level notification delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification channels”.
-
Project channels (ObservabilityNotificationChannel): Allow configuring notification delivery within a specific project. Available in the corresponding project in the Deckhouse web UI under “Monitoring” → “Notification settings” → “Notification channels”.
-
Standard propagated channels (ClusterObservabilityPropagatedNotificationChannel): Created at the cluster level and automatically become available in all projects for notification delivery. Use the ClusterObservabilityPropagatedNotificationChannel resource or the
d8CLI utility to create them.
Notification policies
Notification policies define which channel should be used to deliver notifications for an alert (or a group of alerts).
| Policy type | Description | How to configure |
|---|---|---|
| System notification policies | Used to configure delivery rules for system alerts. System policies can use only system notification channels. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification policies”. | Use the ClusterObservabilityNotificationPolicy resource. |
| Project notification policies | Used to configure delivery rules for project alerts. Project policies can use project or standard cluster channels, but not system notification channels. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification policies”. | Use the ObservabilityNotificationPolicy resource. |
Notification silencing
In situations where notifications are expected in advance (for example, during planned maintenance or testing),
the observability module allows disabling notification delivery for alerts matching specified conditions.
| Silence type | Description | How to configure |
|---|---|---|
| System notification silences | Used to configure silencing rules for system alert delivery. Available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Notification settings” → “Notification silencing”. | Use the ClusterObservabilityNotificationSilence resource. |
| Project notification silences | Used to configure silencing rules for project alert delivery. Available in the corresponding project under “Monitoring” → “Notification settings” → “Notification silencing”. | Use the ObservabilityNotificationSilence resource. |
Alerts
The observability module provides access control separation for cluster-level and project-level alerts
and allows viewing the list of active and resolved alerts.
Active alerts are grouped by severity level:
- critical (
critical, S1–S3) - warning (
warning, S4–S6) - informational (
info, S7–S9)
When viewing an alert, the user can see general information, labels, annotations, and a graph.
Types of alerts
Two types of alerts are supported:
| Alert type | Scope | Who has access |
|---|---|---|
| System alerts (ClusterObservabilityAlerts) | Cluster level | DKP administrators |
| Project alerts (ObservabilityAlerts) | Project (namespace) level | Users of the corresponding project |
Alert types:
-
System alerts (ClusterObservabilityAlerts): Relate to DKP cluster components. The full list of active and resolved system alerts is available in the Deckhouse web UI under “System” → “System management” → “Monitoring” → “Active alerts”.
-
Project alerts (ObservabilityAlerts): Relate to resources of a specific project (namespace). The full list of active and resolved project alerts is available in the Deckhouse web UI in the corresponding project under “Monitoring” → “Active alerts”.
DeadMansSwitch and PrometheusUnavailable alerts
DeadMansSwitch
DeadMansSwitch is a service alert that fires continuously, confirming the normal operation of Prometheus and the entire alert delivery pipeline.
If DeadMansSwitch stops arriving, the PrometheusUnavailable alert starts firing.
By default, the DeadMansSwitch alert is sent to all configured notification channels, unless label-based filtering is configured in notification policies.
To avoid cluttering the alert list, DeadMansSwitch is hidden from the output of the d8 k get clusterobservabilityalerts (list/watch) command.
To retrieve it directly, use the following command:
d8 k get clusterobservabilityalert deadmansswitchDisabling this alert is not recommended, but if necessary it can be disabled manually
using the deadMansSwitch.enabled parameter.
If disabled manually, the PrometheusUnavailable alert is not created.
PrometheusUnavailable
PrometheusUnavailable (formerly MissingDeadMansSwitch) is an alert that fires if DeadMansSwitch is missing for more than 2 minutes.
This indicates a problem in the alert delivery pipeline. Possible reasons include:
- Prometheus is unavailable.
- Communication between Prometheus and Alertmanager is broken.
- Another issue prevents alert delivery.
The PrometheusUnavailable alert is a system alert and is displayed both in the Deckhouse web UI
and in the output of the d8 k get clusterobservabilityalerts command.