The module lifecycle stage: General Availability
v1.12.4
Bug Fixes
- Fixed module deployment validation failure on newer versions of Deckhouse Kubernetes Platform.
v1.12.3
Bug Fixes
- Fixed an issue where some components could fail to start.
v1.12.1
Bug Fixes
- Fixed web monitoring zone assignment not being saved correctly when creating or editing a monitored site.
- Fixed web monitoring API not accepting site names containing dots (e.g.
vcenter.example.com). - Fixed TLS certificate verification failing for sites monitored via explicit IP addresses instead of DNS resolution.
- Fixed web monitoring dashboard zone filter showing no values.
- Fixed check configuration inputs being too narrow to display values.
v1.12.0
Features
- Web monitoring. Added a distributed monitoring system for checking website availability from external monitoring points. Monitoring agents run on servers outside the cluster, perform HTTP, TCP, DNS, and ICMP ping checks, and send results back to the platform.
- Configure monitored websites per project with customizable check intervals (10s–60s), response validation rules (status codes, response body, headers, SSL certificates), and monitoring zones.
- Manage monitoring zones and agents in the admin UI. Agents support one-liner installation, automatic binary updates, and status tracking (online/offline).
- Built-in Grafana dashboards: overall monitoring status, per-zone status, per-site metrics, and agent logs.
- Automatic alerting for site failures, DNS resolution errors, and monitoring zone health issues.
- Outage detection and availability reporting with 24-hour and 30-day availability metrics.
- Web monitoring API for programmatic site management.
Enhancements
- Reduced scheduler memory consumption.
v1.11.2
Enhancements
- Improved stability of the managed-postgres migration process.
- Dependent components (backend, scheduler, alertgate) now automatically restart when database credentials change.
- Increased test coverage for built-in hooks with CI integration and coverage reporting.
v1.11.0
Enhancements
- Added alerting for log storage when log samples are being discarded due to exceeded limits, helping administrators identify projects that need limit adjustments.
Changes
- Built-in PostgreSQL is now powered by the Deckhouse Kubernetes Platform managed-postgres module. For installations using the built-in database (not an external PostgreSQL), the previous operator-postgres has been replaced with the DKP managed-postgres module. This provides faster security updates and CVE remediation, as well as a production-grade managed database with broader functionality. For backward compatibility, the previous operator-postgres scheme is still supported but will be removed in the next release. Before upgrading, please review the migration guide.
Bug Fixes
- Fixed audit log verbosity to reduce noise in production logs.
- Fixed a crash in the deploy configuration hook that occurred when trace or log storage was disabled.
v1.10.3
Bug Fixes
- Fixed a crash during module deployment when trace storage, log storage, or monitoring subsystems are not enabled.
v1.10.2
Bug Fixes
- Fixed metric storage caching that could cause increased latency and load on long-term storage.
- Fixed trace storage alerting rules that could produce false-positive alerts.
- Fixed project dashboards displaying internal tabs that should not be visible to users.
- Fixed insufficient resource limits for auxiliary components on large installations.
v1.10.1
Bug Fixes
- Fixed store-gateway caching.
v1.10.0
Features
- Added distributed trace storage. The platform now supports receiving, storing, and viewing traces:
- Trace ingestion via OpenTelemetry (OTLP), Jaeger, and Zipkin protocols.
- Trace viewing and search in the UI.
- Per-project trace ingestion limits configurable via admin UI and API.
- Trace storage usage statistics.
- Architecture documentation and integration instructions.
- Space and project pages now use human-readable names in URLs instead of UUIDs. Previously bookmarked UUID-based URLs continue to work.
Enhancements
- The monitoring agent can now be deployed on nodes using the containerdV2 container runtime.
- Eliminated false-positive alerts about cluster sync status in single-cluster deployments.
- Updated monitoring agent to v0.89.5:
- Fixed OS version detection when the agent runs inside a container.
- Fixed a crash that could occur on CentOS 7 during package version detection.
- Fixed PostgreSQL plugin issues when monitoring Yandex Cloud RDS instances.
- Updated alert names, descriptions, and aggregation rules for better clarity and fewer false positives.
Bug Fixes
- Fixed a data migration issue where duplicate Prometheus recording rules could cause migration failures.
- Fixed special character handling in PromQL expressions, preventing query errors when alert rules or dashboards contain single quotes or other reserved characters.
Security
- Fixed Critical and High severity CVEs in trace storage components and base images.
- Updated base container images to the latest available versions.
- Monitoring agent Go runtime upgraded to 1.24.12 to address known CVEs.
v1.9.9
Security
- Switched all Ceph components to distroless images to reduce security risks associated with unused software in standard container images.
Enhancements
- Documentation updates: added instructions for configuration the Nginx plugin to monitor Nginx using custom log paths and log formats.
Bug Fixes
- Updated memory and disk usage units from SI to IEC on the Main dashboard to ensure consistency with other dashboards.
v1.9.8
Security
- Fixed vulnerabilities related to CVE-2024-25621
v1.9.5
Security
- Fixed vulnerabilities related to CVEs (CVE-2025-30153, CVE-2025-47912, CVE-2025-58183, CVE-2025-58185, CVE-2025-58186, CVE-2025-58187, CVE-2025-58188, CVE-2025-58189, CVE-2025-61723, CVE-2025-61724, CVE-2025-61725)
v1.9.4
Enhancements
- Added support for process discovery inside containers run by CRI-O.
- Improved user creation flow for internal authentication: superusers can now create users and set passwords without requiring email confirmation.
Bug Fixes
- Updated monitoring agent to v0.89.0:
- Fixing MySQL replication metrics collection for MySQL 8.5+.
- Fixed an issue where the PostgreSQL version could be incorrectly detected, leading to inaccurate metric collection.
v1.9.3
Security
- Fixed Critical and High severity CVEs.
v1.9.2
Security
- Fixed Critical and High severity CVEs.
v1.9.1
Bug Fixes
- Fixed an issue where the operator-ceph could get stuck in a Pending state.
v1.9.0
Enhancements
- Updated backup documentation: added an rclone example for backup and restore.
- Hid internal CRDs from public documentation.
Security
- Fixed all Critical and High severity CVEs.
v1.8.0
Changed
- Integrated Ceph operator into the module. External ceph-operator module is no longer required.
- Switched to DKP-provided base images for some components to ensure better compatibility and maintainability.
- Updated base container images with partial CVE fixes to improve security.
Bug Fixes
- Resolved issues in hooks that could lead to incomplete or faulty module installation.
v1.7.3
Bug Fixes
- Fixed an issue preventing correct exit from dashboard edit mode.
v1.7.2
Bug Fixes
- Fixed an issue where the GeneratorURL field of an alert was incorrectly appended with the service’s internal port, causing the link to malfunction.
v1.7.1
Bug Fixes
- Fixed an issue that prevented the backend from establishing connections to external services — a regression introduced in version 1.7.0. This affected authorization workflows using third-party systems.
- Updated the backend application’s security context to comply with the “restricted” security policy.
v1.7
Changes
- Changed images from Alt Linux to Distroles for all components on ALT Linux.
- Added
livenessProbeandreadinessProbeto pods where they were missing. - Migrated all components to use the same security context as in DKP.
- Updated Distroles images to use the Distroles base image from DKP.
- Fixed multiple CVEs and build warnings.
Enhancements
- Migrated service ports from 80 and 443 to high ports.
Features
- Added the ability for users with the User role to create dashboard/rules API tokens.
- Included several widely used plugins into the Grafana image.
Bug Fixes
- Fixed certificate validation issues caused by missing CA bundle.
- Added missing registry subdomain to TLS certificate SANs.
- Fixed issue with disabling/overwriting triggers when deleting a parent trigger.
- Fixed missing category in the dashboard form.
- Fixed issue where the emergency brake (which stops metric and log collection when Ceph is near capacity) did not activate if monitoring components were disabled in ModuleConfig
- Migrated dashboards using Loki data sources to use the
default-logsdata source. - Fixed false-positive warnings about invalid data sources when using
default-logsin dashboards. - Renamed incorrect metric names in the Redis dashboard.