Available in editions: CE, BE, SE, SE+, EE
The module lifecycle stage: General Availability
The monitoring-deckhouse module provides comprehensive monitoring, alerting, and observability for the Deckhouse Kubernetes Platform itself. It monitors the health, performance, and proper operation of DKP core components to ensure platform stability and reliability.
This module is a critical observability component that works in conjunction with the prometheus module to provide insights into DKP operational state.
The module deploys monitoring resources that:
- Collect Deckhouse metrics: Module scrapes metrics from the Deckhouse pod using PodMonitor resources, including:
- Self metrics on port
4222via/metricsendpoint. - Custom hook-generated metrics via
/metrics/hooksendpoint. - Module execution metrics, hook performance, and system health indicators.
- Self metrics on port
- Define alerting rules: Module provides comprehensive Prometheus alerting rules organized into several categories:
- DKP availability: Monitors pod health, readiness, and uptime.
- DKP malfunctioning: Detects excessive restarts, registry access issues, hung processes.
- Release management: Tracks release channel subscriptions, pending updates, and manual approvals.
- Module management: Monitors module state, validation errors, and deprecated configurations.
- CNI checks: Detects multiple CNI configurations and misconfigurations.
- OS requirements: Identifies nodes running deprecated operating system versions.
- Provide Grafana dashboards: Includes pre-built Grafana dashboards for visualizing:
- DKP performance metrics.
- Module execution statistics.
- Hook run times and resource usage.
- Queue processing and convergence status.
Metrics collection
The module configures a PodMonitor that scrapes two endpoints from the Deckhouse pod:
- DKP metrics (
/metrics): Core DKP operational metrics:deckhouse_live_ticks: Health indicator incrementing every 10 seconds.deckhouse_registry_errors: Registry connectivity issues.deckhouse_module_hook_run_seconds: Module hook execution duration.deckhouse_tasks_queue_action_duration_seconds: Task queue processing times.- And many more operational metrics.
- Hook metrics (
/metrics/hooks): Custom metrics generated by DKP hooks withhonorLabels: trueto preserve hook-specific labels.
Integration with Observability module
When the observability module is enabled, this module automatically creates:
- ClusterObservabilityMetricsRulesGroup resources for Prometheus rules.
- ClusterObservabilityDashboard resources for Grafana dashboards.
This enables centralized management and multi-tenancy support for monitoring resources.
Requirements
prometheusmodule must be enabled (automatic dependency).operator-prometheusmodule should be enabled for PodMonitor support.