Available in editions: CE, BE, SE, SE+, EE
The module lifecycle stage: General Availability
The monitoring-deckhouse module provides comprehensive monitoring, alerting, and observability for the Deckhouse Kubernetes Platform itself. It monitors the health, performance, and proper operation of Deckhouse core components to ensure platform stability and reliability.
This module is a critical observability component that works in conjunction with the prometheus module to provide insights into Deckhouse’s operational state.
The module deploys monitoring resources that:
- Collect Deckhouse metrics — Scrapes metrics from the Deckhouse pod using PodMonitor resources, including:
- Self metrics on port
4222via/metricsendpoint - Custom hook-generated metrics via
/metrics/hooksendpoint - Module execution metrics, hook performance, and system health indicators
- Self metrics on port
- Define alerting rules — Provides comprehensive Prometheus alerting rules organized into several categories:
- Deckhouse availability — Monitors pod health, readiness, and uptime
- Deckhouse malfunctioning — Detects excessive restarts, registry access issues, hung processes
- Release management — Tracks release channel subscriptions, pending updates, and manual approvals
- Module management — Monitors module state, validation errors, and deprecated configurations
- CNI checks — Detects multiple CNI configurations and misconfigurations
- OS requirements — Identifies nodes running deprecated operating system versions
- Provide Grafana dashboards — Includes pre-built Grafana dashboards for visualizing:
- Deckhouse performance metrics
- Module execution statistics
- Hook run times and resource usage
- Queue processing and convergence status
Metrics collection
The module configures a PodMonitor that scrapes two endpoints from the Deckhouse pod:
- Deckhouse metrics (
/metrics) — Core Deckhouse operational metrics:deckhouse_live_ticks— Health indicator incrementing every 10 secondsdeckhouse_registry_errors— Registry connectivity issuesdeckhouse_module_hook_run_seconds— Module hook execution durationdeckhouse_tasks_queue_action_duration_seconds— Task queue processing times- And many more operational metrics
- Hook metrics (
/metrics/hooks) — Custom metrics generated by Deckhouse hooks withhonorLabels: trueto preserve hook-specific labels
Integration with Observability module
When the observability module is enabled, this module automatically creates:
ClusterObservabilityMetricsRulesGroupresources for Prometheus rulesClusterObservabilityDashboardresources for Grafana dashboards
This enables centralized management and multi-tenancy support for monitoring resources.
Requirements
- prometheus module must be enabled (automatic dependency)
- operator-prometheus module should be enabled for PodMonitor support