Release Notes | observability-platform

The module lifecycle stage: General Availability
The module has requirements for installation

v1.12.14

Features

Monitoring of QEMU/libvirt and VMware/vCenter virtual machines. The updated monitoring agent collects CPU, memory, disk, and network metrics per VM: for QEMU/libvirt — automatically on host hypervisors with no configuration; for VMware — via vCenter (credentials are provided in the agent configuration). Three new dashboards are added: VM Detail, QEMU/KVM Hypervisors, and VMware/ESXi Hypervisors.
“Managed by code” badge for web-monitoring sites. Sites whose configuration is owned by external tooling (webmon-cli, GitOps and similar) can now be marked with a “managed by code” badge — operators immediately see that edits made through the UI may be overwritten on the next sync from the source of truth.

Enhancements

Web-monitoring dashboards now support localization based on the platform locale.
Improved support for newer pgbouncer versions in the monitoring agent.

Bug Fixes

Fixed the PostgreSQL plugin-status alert: previously the alert could fail to fire for some types of agent plugin configuration errors. All PostgreSQL plugin configuration checks are now reported correctly.

v1.12.13

Enhancements

Aligned web-monitoring configuration validation across the server, the webmon-cli utility, and the documentation. The same field shape is now accepted in all three places: HTTP method TRACE is supported, disableHttp2ForHttps is rejected when it is not a quoted boolean string ("true" / "false"), the h duration unit is no longer accepted in check responseTime.max, headers are validated per element (each entry must have a non-empty name and a string value, no extra keys), and requestDefaults (scheme, timeout, headers) is checked against the same rules as the per-probe configuration. The deprecated dataBase64 field has been removed from webmon-cli (released as v0.5.0). Configurations that silently passed validation in earlier versions may start failing webmon-cli validate and the API; please review CI pipelines that use webmon-cli as a merge gate.

Bug Fixes

Fixed variable templating in the web-monitoring drilldown dashboard so that drilldown navigation works correctly.

v1.12.10

Bug Fixes

Fixed web-monitoring sites table where per-row action buttons (open dashboard, open logs, edit, delete) could be clipped and hidden.

v1.12.9

Features

Support for internationalized (IDN) hostnames in web-monitoring. Site hosts may contain any non-ASCII characters (for example, 例え.jp); monitoring agents probe them correctly and the UI displays the host in its original form.
New “Check IPs” button on the Web-monitoring page. Opens the list “IP addresses used for web-monitoring checks” — the source IPs used to perform checks, grouped by monitoring zone, with the current online/offline status of each agent. Useful for configuring allow-lists on the monitored sites.

Enhancements

Check rendering on the monitored site page: compound checks (for example, SSL validity together with certificate expiry) are shown as a single block with the status of each condition on its own row; labels are consistent and repeating names are disambiguated.
UX of web-monitoring forms and the sites table: added missing breadcrumbs, long site names and hosts are properly truncated in the sites table, and advanced settings are expanded automatically when they are set.
Default check interval for new monitored sites changed from 20 to 30 seconds.
The token create and edit form now shows only the fields applicable to the token’s scope — Project, Space, or Global scope; unused options are hidden.
Restored the ability to set the maximum response body size in web-monitoring HTTP checks. The minimum size, the maximum, or both can now be configured.
Improved configuration validation in the webmon-cli utility: rules are aligned with server-side checks, and error messages now include the project name and the line number in the YAML file.

Changes

Stricter configuration validation in webmon-cli. The utility now rejects YAML configs that the server would reject too: HTTP probes require an explicit scheme (http or https), SSL checks are blocked on scheme: http, host and hosts are IPv4-only, durations and response body sizes must carry a unit suffix (5s, 1500ms, 10Kb), the h unit (for example 1h) is not a supported duration unit, compound durations (for example 1m30s) are not supported, and siteLabels.zone must not contain duplicates. Configs that silently passed earlier versions may start failing webmon-cli validate; please review CI pipelines that use it as a merge-request gate.
Tokens bound to multiple scopes are no longer editable through the UI. On the edit page such tokens are rendered as a read-only summary with the hint “Token is bound to multiple scopes; scope cannot be edited here.”; for tokens whose edit route cannot be derived unambiguously, the edit link is replaced with “Token is managed via API.” Scope changes for these tokens must be performed through the Observability API; rotation and archiving remain available in the UI.

Bug Fixes

Fixed saving of the SSL certificate expiry check in web-monitoring — previously such a check could not be saved.
Fixed status panels on web-monitoring dashboards: resolved duplicated rows, incorrect labels, and wrong cell colors.
Fixed an error when opening a project dashboard for a site with a non-ASCII name.
Fixed the “Use a custom address list” setting when editing a monitored site: turning it off now actually clears the previously saved addresses.
Fixed saving of check group names in web-monitoring and indistinguishable labels shown for multiple checks of the same type on the monitored site page.

v1.12.8

Bug Fixes

Fixed an issue where exporting web-monitoring reports as CSV could fail.

v1.12.7

Enhancements

The webmon-cli utility now supports assigning multiple monitoring zones to a site using comma-separated values in the zone label.

Bug Fixes

Fixed an issue where auto-discovered hosts and available dashboards could stop updating in the UI.

v1.12.6

Bug Fixes

Fixed web-monitoring configuration not being updated on monitoring agents when probes are removed from a site.

v1.12.5

Enhancements

Redesigned web-monitoring site creation form. A new simple mode allows quick setup with just a URL and sensible defaults; advanced mode provides full access to all configuration options including method, headers, authentication, TLS settings, and custom hosts.
Added change history for web-monitoring site configurations, showing a timeline of modifications with details of what was changed.
New webmon-cli command-line utility for importing and validating web-monitoring configurations from YAML files, replacing the previously built-in converter.

v1.12.4

Bug Fixes

Fixed module deployment validation failure on newer versions of Deckhouse Kubernetes Platform.

v1.12.3

Bug Fixes

Fixed an issue where some components could fail to start.

v1.11.5

Bug Fixes

Fixed an issue where some components could fail to start.

v1.10.5

Bug Fixes

Fixed an issue where some components could fail to start.

v1.12.1

Bug Fixes

Fixed web-monitoring zone assignment not being saved correctly when creating or editing a monitored site.
Fixed web-monitoring API not accepting site names containing dots (e.g. vcenter.example.com).
Fixed TLS certificate verification failing for sites monitored via explicit IP addresses instead of DNS resolution.
Fixed web-monitoring dashboard zone filter showing no values.
Fixed check configuration inputs being too narrow to display values.

v1.12.0

Features

Web-monitoring. Added a distributed monitoring system for checking website availability from external monitoring points. Monitoring agents run on servers outside the cluster, perform HTTP, TCP, DNS, and ICMP ping checks, and send results back to the platform.
- Configure monitored websites per project with customizable check intervals (10s–60s), response validation rules (status codes, response body, headers, SSL certificates), and monitoring zones.
- Manage monitoring zones and agents in the admin UI. Agents support one-liner installation, automatic binary updates, and status tracking (online/offline).
- Built-in Grafana dashboards: overall monitoring status, per-zone status, per-site metrics, and agent logs.
- Automatic alerting for site failures, DNS resolution errors, and monitoring zone health issues.
- Outage detection and availability reporting with 24-hour and 30-day availability metrics.
- Web-monitoring API for programmatic site management.

Enhancements

Reduced scheduler memory consumption.

v1.11.2

Enhancements

Improved stability of the managed-postgres migration process.
Dependent components (backend, scheduler, alertgate) now automatically restart when database credentials change.
Increased test coverage for built-in hooks with CI integration and coverage reporting.

v1.11.0

Enhancements

Added alerting for log storage when log samples are being discarded due to exceeded limits, helping administrators identify projects that need limit adjustments.

Changes

Built-in PostgreSQL is now powered by the Deckhouse Kubernetes Platform managed-postgres module. For installations using the built-in database (not an external PostgreSQL), the previous operator-postgres has been replaced with the DKP managed-postgres module. This provides faster security updates and CVE remediation, as well as a production-grade managed database with broader functionality. For backward compatibility, the previous operator-postgres scheme is still supported but will be removed in the next release. Before upgrading, please review the migration guide.

Bug Fixes

Fixed audit log verbosity to reduce noise in production logs.
Fixed a crash in the deploy configuration hook that occurred when trace or log storage was disabled.

v1.10.3

Bug Fixes

Fixed a crash during module deployment when trace storage, log storage, or monitoring subsystems are not enabled.

v1.10.2

Bug Fixes

Fixed metric storage caching that could cause increased latency and load on long-term storage.
Fixed trace storage alerting rules that could produce false-positive alerts.
Fixed project dashboards displaying internal tabs that should not be visible to users.
Fixed insufficient resource limits for auxiliary components on large installations.

v1.10.1

Bug Fixes

Fixed store-gateway caching.

v1.10.0

Features

Added distributed trace storage. The platform now supports receiving, storing, and viewing traces:
- Trace ingestion via OpenTelemetry (OTLP), Jaeger, and Zipkin protocols.
- Trace viewing and search in the UI.
- Per-project trace ingestion limits configurable via admin UI and API.
- Trace storage usage statistics.
- Architecture documentation and integration instructions.
Space and project pages now use human-readable names in URLs instead of UUIDs. Previously bookmarked UUID-based URLs continue to work.

Enhancements

The monitoring agent can now be deployed on nodes using the containerdV2 container runtime.
Eliminated false-positive alerts about cluster sync status in single-cluster deployments.
Updated monitoring agent to v0.89.5:
- Fixed OS version detection when the agent runs inside a container.
- Fixed a crash that could occur on CentOS 7 during package version detection.
- Fixed PostgreSQL plugin issues when monitoring Yandex Cloud RDS instances.
Updated alert names, descriptions, and aggregation rules for better clarity and fewer false positives.

Bug Fixes

Fixed a data migration issue where duplicate Prometheus recording rules could cause migration failures.
Fixed special character handling in PromQL expressions, preventing query errors when alert rules or dashboards contain single quotes or other reserved characters.

Security

Fixed Critical and High severity CVEs in trace storage components and base images.
Updated base container images to the latest available versions.
Monitoring agent Go runtime upgraded to 1.24.12 to address known CVEs.

v1.9.9

Security

Switched all Ceph components to distroless images to reduce security risks associated with unused software in standard container images.

Enhancements

Documentation updates: added instructions for configuration the Nginx plugin to monitor Nginx using custom log paths and log formats.

Bug Fixes

Updated memory and disk usage units from SI to IEC on the Main dashboard to ensure consistency with other dashboards.

v1.9.8

Security

Fixed vulnerabilities related to CVE-2024-25621

v1.9.5

Security

Fixed vulnerabilities related to CVEs (CVE-2025-30153, CVE-2025-47912, CVE-2025-58183, CVE-2025-58185, CVE-2025-58186, CVE-2025-58187, CVE-2025-58188, CVE-2025-58189, CVE-2025-61723, CVE-2025-61724, CVE-2025-61725)

v1.9.4

Enhancements

Added support for process discovery inside containers run by CRI-O.
Improved user creation flow for internal authentication: superusers can now create users and set passwords without requiring email confirmation.

Bug Fixes

Updated monitoring agent to v0.89.0:
- Fixing MySQL replication metrics collection for MySQL 8.5+.
- Fixed an issue where the PostgreSQL version could be incorrectly detected, leading to inaccurate metric collection.

v1.9.3

Security

Fixed Critical and High severity CVEs.

v1.9.2

Security

Fixed Critical and High severity CVEs.

v1.9.1

Bug Fixes

Fixed an issue where the operator-ceph could get stuck in a Pending state.

v1.9.0

Enhancements

Updated backup documentation: added an rclone example for backup and restore.
Hid internal CRDs from public documentation.

Security

Fixed all Critical and High severity CVEs.

v1.8.0

Changed

Integrated Ceph operator into the module. External ceph-operator module is no longer required.
Switched to DKP-provided base images for some components to ensure better compatibility and maintainability.
Updated base container images with partial CVE fixes to improve security.

Bug Fixes

Resolved issues in hooks that could lead to incomplete or faulty module installation.

v1.7.3

Bug Fixes

Fixed an issue preventing correct exit from dashboard edit mode.

v1.7.2

Bug Fixes

Fixed an issue where the GeneratorURL field of an alert was incorrectly appended with the service’s internal port, causing the link to malfunction.

v1.7.1

Bug Fixes

Fixed an issue that prevented the backend from establishing connections to external services — a regression introduced in version 1.7.0. This affected authorization workflows using third-party systems.
Updated the backend application’s security context to comply with the “restricted” security policy.

v1.7

Changes

Changed images from Alt Linux to Distroles for all components on ALT Linux.
Added livenessProbe and readinessProbe to pods where they were missing.
Migrated all components to use the same security context as in DKP.
Updated Distroles images to use the Distroles base image from DKP.
Fixed multiple CVEs and build warnings.

Enhancements

Migrated service ports from 80 and 443 to high ports.

Features

Added the ability for users with the User role to create dashboard/rules API tokens.
Included several widely used plugins into the Grafana image.

Bug Fixes

Fixed certificate validation issues caused by missing CA bundle.
Added missing registry subdomain to TLS certificate SANs.
Fixed issue with disabling/overwriting triggers when deleting a parent trigger.
Fixed missing category in the dashboard form.
Fixed issue where the emergency brake (which stops metric and log collection when Ceph is near capacity) did not activate if monitoring components were disabled in ModuleConfig
Migrated dashboards using Loki data sources to use the default-logs data source.
Fixed false-positive warnings about invalid data sources when using default-logs in dashboards.
Renamed incorrect metric names in the Redis dashboard.