The module lifecycle stage: General Availability
General Information
Main entities in the system:
- Project — an entity that stores monitoring data. Depending on the configuration, this can include metrics, logs, or traces. The configuration is determined by the type of license purchased. Within a project, trigger processing (Alerting Rules) and notification rules (Alertmanager configuration) are configured.
- Workspace — a logical grouping of projects.
- Entire System — a logical grouping of workspaces.
- API Token — an entity used to access the API and the monitoring data stored in the system. With an API token, you can perform various operations such as reading or writing metrics and managing Alerting Rules.
- User — an account in the system that has various access rights to different system entities.
Entire System
This section provides capabilities for managing all users, workspaces, and projects. Global dashboards, global triggers, and notification channels can also be configured in this section.
Workspaces
In this subsection, the user can view workspaces and their associated projects, as well as manage workspaces directly: create and delete them.
Important: a user can only see the workspaces and projects they have access to. If a user has access to only one project in a workspace that contains multiple projects, the user will only see the workspace itself and the project they have access to.
Dashboards
Dashboards in Deckhouse Observability Platform provide visual representations of system metrics and data that help users analyze the performance and health of their infrastructure. They display charts and panels based on metrics and logs, making it more convenient and efficient to track key indicators and perform detailed data analysis.
Dashboard Hierarchy
In Deckhouse Observability Platform, dashboards are organized according to the following hierarchy:
-
Installation-level dashboards — pre-installed dashboards based on opAgent metrics and Deckhouse Kubernetes Platform metrics. They are an integral part of the distribution; neither users nor administrators can modify them or view their list. These dashboards are displayed contextually, when the required (key) metric for each dashboard is available.
-
Entire System-level dashboards — dashboards created in the system and available in all projects. Users at the project level cannot modify or delete these dashboards, which is useful for distributing standard dashboards.
Important: modifying this dashboard will cause it to be updated across all projects. Be careful when making changes.
- Project-level dashboards — dashboards that users create on their own within their projects. They are used to build charts based on project-specific metrics, to visualize key aspects of service operation, incoming data, and the state of the project itself.
Dashboard Management
In this subsection, the user can:
- view, edit, and delete Entire System-level dashboards;
- view, edit, and delete directories for Entire System-level dashboards.
Dashboard Actions
-
Creating a new dashboard:
- click the “Add” button;
- fill in the following fields:
- “Name” — specify the dashboard name;
- “Directory” — select a directory to place the dashboard in;
- “Edit JSON” — paste the dashboard JSON configuration in Grafana format;
- save the dashboard.
Important: before creating the first dashboard, you must create a directory, as placing dashboards outside of directories is not possible.
-
Editing a dashboard:
- open the dashboard for editing;
- copy and paste the updated JSON configuration;
- save the changes.
-
Deleting a dashboard:
- open the dashboard for editing;
- click the “Delete” button.
Important: be careful when making changes to Entire System-level dashboards and directories, as they will affect all projects in the system.
Creating, Editing, and Deleting Directories
-
Creating a new directory:
- click the “Add” button;
- enter the directory name;
- save the directory.
-
Editing a directory:
- open the directory list;
- select the directory you want to modify;
- make the necessary changes;
- save the changes.
-
Deleting a directory:
- open the directory list;
- select the directory you want to delete;
- click the “Delete” button;
- confirm the deletion.
Important: when a directory is deleted, all dashboards within it will be permanently deleted without the possibility of recovery.
Notification Center
The Notification Center provides the user with the ability to manage triggers and recording rules at the Entire System level, as well as configure notification channels.
Triggers
A trigger is a rule that defines conditions under which a specific event (alert) is fired. In Deckhouse Observability Platform, triggers do not directly deliver notifications. For notifications to be delivered, you need to configure notification channels and specify the relationship between triggers and those channels.
Trigger Hierarchy
In Deckhouse Observability Platform, triggers are organized into a three-level model:
- Installation-level triggers — pre-installed triggers configured by Deckhouse Observability Platform developers. They are created based on the metrics collected by opAgent and are an integral part of the software.
- Entire System-level triggers — triggers created by Deckhouse Observability Platform administrators, available in all projects across the entire system.
- User triggers — triggers that users create in their projects.
Trigger Management
In this subsection, the user can:
- view, override, or disable installation-level triggers. Changes to these triggers affect all projects;
- create and manage Entire System-level triggers. These triggers are available in all projects, and users at the project level can override or disable them.
Important: be careful when making changes to Entire System-level triggers, as they will affect all projects in the system.
Viewing Triggers
Triggers in the Notification Center interface are displayed in two sections:
- installation-level triggers are displayed in the “Default” section;
- Entire System-level triggers are displayed in the “Custom” section.
Trigger Actions
The following actions are available to users for triggers:
-
Disabling installation-level triggers:
- find the desired trigger in the “Default” list;
- click the “Disable Rule” button;
The trigger can later be re-enabled by clicking the “Enable Rule” button.
-
Overriding installation-level triggers:
- find the desired trigger in the “Default” list;
- click the “Override Rule” button;
- make the necessary changes and save them;
- the overridden trigger will move from the “Default” list to the “Custom” list.
-
Reverting an Entire System-level trigger to its original value:
- find the desired trigger in the “Custom” list;
- click the “Revert to Original” button;
- the trigger will move back to the “Default” list.
-
Creating an Entire System-level trigger:
-
click the “Add” button;
-
fill in the corresponding fields:
- “Name” — enter a name to be used as the trigger name.
- “Expression” — write a PromQL expression that will be evaluated to check the trigger. The trigger is considered fired if the query returns at least one series.
sum(rate(http_requests_total[5m])) > 10- “Delay before alert fires” — specify how long the alert will not be considered active after the firing condition is met. This helps avoid flapping alerts.
- “Duration of firing after resolution” — specify how long the alert will remain active after the condition is no longer met. This is useful for avoiding flapping alerts on borderline values.
- “Labels” — specify key-value pairs for labels that will be added to the alert.
- “Annotations” — specify additional metadata or information that should be part of the alert (e.g., troubleshooting instructions or contact information).
-
Deleting a trigger:
- open the trigger for editing;
- click the “Delete” button.
Recording Rules
Recording rules allow you to compute new metrics based on existing data, thereby reducing the load on dashboards and enabling complex calculations to be performed in advance.
Users can create, modify, and manage recording rules, which improves system performance and enhances analytical capabilities.
Recording Rules Hierarchy
In Deckhouse Observability Platform, recording rules are organized into a three-level model:
- Installation-level recording rules — pre-installed rules configured by Deckhouse Observability Platform developers, based on data collected by opAgent.
- Entire System-level recording rules — created by Deckhouse Observability Platform administrators and available in all projects within the system.
- User recording rules — created by users within specific projects.
Recording Rules Management
In this subsection, the user can:
- view, override, and disable installation-level recording rules;
- create and manage Entire System-level recording rules.
Important: be careful when making changes to Entire System-level recording rules, as they will affect all projects in the system.
Viewing Recording Rules
- “Default” — displays installation-level recording rules.
- “Custom” — displays Entire System-level recording rules.
Recording Rules Actions
-
Disabling an installation-level recording rule:
- click the “Disable Rule” button next to the desired rule;
- to re-enable the rule, click “Enable Rule”.
-
Overriding an installation-level recording rule:
- click the “Override Rule” button next to the rule;
- make the necessary changes and save them;
- after saving, the rule will disappear from the “Default” list and appear in the “Custom” list;
- to revert to the original value, find the rule in the “Custom” list and click “Revert to Original”.
-
Creating a new Entire System-level recording rule:
-
click the “Add” button;
-
fill in the required fields:
- “Name” — specify the rule name;
- “Expression” — write a PromQL expression to compute the rule:
avg_over_time(http_requests_total[5m])- “Labels” — define labels (key-value pairs) for additional classification and filtering of metrics.
-
-
Deleting a recording rule:
- open the rule for editing;
- click the “Delete” button.
Notification Channels
Notification channels allow you to configure notification delivery methods through various channels. All configured channels will be available when setting up notification delivery at the project level. This allows centralized management of notification channels, hiding access to sensitive data (e.g., username and password for the SMTP server) and leaving it only to Entire System administrators. Users can quickly start sending notifications without worrying about access configuration.
The following delivery channels are currently supported:
- Email — delivery via email;
- Slack — delivery to messengers compatible with the Slack API;
- Telegram — delivery to the Telegram messenger;
- Webhook — configuration of custom webhooks.
Creating a Channel
To create a new delivery channel, follow these steps:
- click the “Add” button;
- select the type of channel you want to create.
Configuring an Email Channel
- select “Create Email Channel”;
- fill in the required fields:
- “Channel Name” — specify a convenient name for the channel;
- “Sender” — the email address that will be listed as the sender;
- “Host” — the SMTP server address;
- “Username” — the username for SMTP server authentication;
- “Password” — the password for SMTP server authentication;
- “Authorization Key” — an additional key for authentication (if required);
- “Authorization Secret” — an additional secret for authentication (if required);
- “Require TLS” — specify whether TLS is required for the connection to the SMTP server.
Configuring a Slack Channel
- select “Create Slack Channel”;
- fill in the required fields:
- “Channel Name” — specify a convenient name for the channel;
- “API URL” — specify the API URL for Slack integration.
Configuring a Telegram Channel
- select “Create Telegram Channel”;
- fill in the required fields:
- “Channel Name” — specify a convenient name for the channel;
- “API URL” — default:
https://api.telegram.org; - “Telegram Bot Token” — the token obtained when creating a Telegram bot.
Configuring a Webhook Channel
- select “Create Webhook Channel”;
- fill in the required fields:
- “Channel Name” — specify a convenient name for the channel.
Creating this channel allows you to make arbitrary webhook requests in projects, providing flexibility in integrating the system with other services and applications.
Editing a Channel
To edit an existing delivery channel, follow these steps:
- select the channel you want to edit and click the “Edit” button;
- make the necessary changes to the fields;
- save the changes.
Deleting a Channel
To delete an existing delivery channel, follow these steps:
- select the channel you want to delete;
- open the channel for editing;
- click the “Delete” button.
Entire System Usage Statistics
In this section, the user can view statistics on Deckhouse Observability Platform usage. The data is aggregated by workspaces and presented as various charts and metrics, allowing you to assess performance and data volumes.
Entire System Metrics
- “Metrics: Active Series” — the number of active series. An active series is the number of unique metrics received within the last two hours.
- “Metrics: Raw incoming rate” — the incoming metric stream per second. This chart displays data after deduplication.
- “Metrics: Total count of samples” — the total number of samples stored in long-term storage. A sample is an individual metric value collected at a specific point in time.
- “Metrics: Discarded samples” — errors when writing series. Possible error causes include:
sample_out_of_bounds— the sample is outside the allowed value range;sample_out_of_order— the sample arrived in the wrong order;sample_too_old— the sample is too old and is not accepted by the system;sample_too_far_in_future— the sample is too far in the future;new_value_for_timestamp— an attempt to write a different value for the same timestamps;per_user_series_limit— the metric ingestion rate limit for the project has been exceeded;metric_series_limit— the metric-related limit has been exceeded.
- “Metrics: Storage usage” — the amount of disk space used by metrics in long-term storage.
Logs
- “Logs: Raw incoming rate” — the incoming log stream.
- “Logs: Discarded samples” — errors when writing logs. Possible error causes include:
rate_limited— the write rate limit has been exceeded;stream_limit— the stream count limit has been exceeded;label_name_too_long— the label name length exceeds the allowed limit;label_value_too_long— the label value length exceeds the allowed limit;line_too_long— the line length exceeds the allowed limit;max_label_names_per_series— the limit on the number of label names per series has been exceeded;per_stream_rate_limit— the per-stream write rate limit has been exceeded.
- “Logs: Storage usage” — the amount of disk space used by logs.
Users
This section displays the list of all users. User management capabilities depend on the type of authentication configured in Deckhouse Observability Platform.
External Authentication
If Deckhouse Observability Platform has authentication via external systems enabled (e.g., LDAP, OAuth, etc.), only viewing the user list is available in this section.
Internal Authentication
If Deckhouse Observability Platform uses internal authentication (email and password), additional user management capabilities are available in this section:
- “Delete Users” — removing users from the system.
- “Change User Role” — changing a user’s role. More details about the role model can be found in the corresponding documentation section.
Important: this section displays the user’s role at the Entire System level. A user may have different roles with different sets of privileges for other objects in the system.
API Tokens
In this section, the user can view, create, edit, and delete API tokens issued in all projects of the system. API tokens are used to access the API, monitoring data, and logs stored in the system, and allow various operations such as reading or writing metrics and logs and managing triggers. The available API can be reviewed in the corresponding documentation section.
Viewing the API Token List
The user can view the list of all API tokens issued for projects within the system. This allows monitoring current tokens and preventing unauthorized access.
Reissuing a Token
To reissue a token:
- find the desired token in the list;
- click the “Reissue Token” button.
Important: after reissuing a token, all integrations using that token will stop working until the old token is updated and replaced with the new one.
Creating a Token
To create a new API token, follow these steps:
- click the “Add” button;
- fill in the required fields:
- “Name” — specify a name for the token that describes its purpose (e.g.,
Token for Prometheus remote write); - “Expiration” — determines how long the token remains valid:
- “Date” — specify a specific date and time until which the token will be valid. After this time, the token becomes invalid;
- “Unlimited” — if no date is specified, the token remains valid until it is deleted;
- “Permissions” — select a combination of access rights matching the required set of permissions. More details about the role model can be found in the corresponding documentation section;
- “Scope” — determines which system entities (Entire System, workspaces, and projects) the token will apply to.
- “Name” — specify a name for the token that describes its purpose (e.g.,
Note on multi-tenant tokens: if the scope of the selected token covers multiple projects, a workspace, or the entire system, the token is classified as multi-tenant. These tokens have the following characteristics:
- they are limited to read-only access to monitoring data from the selected scope;
- they provide access to data from all projects within the specified scope;
- queries using a multi-tenant token are executed across the entire data volume of projects in the selected scope, providing comprehensive access to monitoring information.
Editing a Token
To edit an existing token, follow these steps:
- find the desired token in the list;
- click the “Edit” button;
- make the necessary changes:
- change the token name;
- update the expiration date or set one if it was not previously set;
- change the scope or permissions.
Deleting a Token
To delete an existing token, follow these steps:
- find the desired token in the list;
- click the “Delete” button.
Important: after deleting a token, all integrations using that token will immediately stop working.
Workspace
This section provides capabilities for managing all users and projects belonging to the selected workspace.
Projects
In this subsection, the user can view projects belonging to the current workspace, as well as manage projects: create and delete them.
Important: a user can only see the projects they have access to.
Workspace Usage Statistics
In this section, the user can view statistics on Deckhouse Observability Platform usage. The data is aggregated by projects of the current workspace and presented as various charts and metrics, allowing you to assess performance and data volumes.
Workspace Metrics
- “Metrics: Active Series” — the number of active series. An active series is the number of unique metrics received within the last two hours.
- “Metrics: Raw incoming rate” — the incoming metric stream per second. This chart displays data after deduplication.
- “Metrics: Total count of samples” — the total number of samples stored in long-term storage. A sample is an individual metric value collected at a specific point in time.
- “Metrics: Discarded samples” — errors when writing series. Possible error causes include:
sample_out_of_bounds— the sample is outside the allowed value range;sample_out_of_order— the sample arrived in the wrong order;sample_too_old— the sample is too old and is not accepted by the system;sample_too_far_in_future— the sample is too far in the future;new_value_for_timestamp— an attempt to write a different value for the same timestamps;per_user_series_limit— the metric ingestion rate limit for the project has been exceeded;metric_series_limit— the metric-related limit has been exceeded.
- “Metrics: Storage usage” — the amount of disk space used by metrics in long-term storage.
Workspace Logs
- “Logs: Raw incoming rate” — the incoming log stream.
- “Logs: Discarded samples” — errors when writing logs. Possible error causes include:
rate_limited— the write rate limit has been exceeded;stream_limit— the stream count limit has been exceeded;label_name_too_long— the label name length exceeds the allowed limit;label_value_too_long— the label value length exceeds the allowed limit;line_too_long— the line length exceeds the allowed limit;max_label_names_per_series— the limit on the number of label names per series has been exceeded;per_stream_rate_limit— the per-stream write rate limit has been exceeded.
- “Logs: Storage usage” — the amount of disk space used by logs.
Workspace Members
In this section, you can view the list of users who have access to the current workspace. User management capabilities depend on the type of authentication configured in Deckhouse Observability Platform.
Workspace External Authentication
If Deckhouse Observability Platform has authentication via external systems enabled (e.g., LDAP, OAuth, etc.), only viewing the user list is available in this section.
Workspace Internal Authentication
If Deckhouse Observability Platform uses internal authentication (email and password), additional user management capabilities are available in this section:
- “Delete Users” — the ability to remove users from the system;
- “Change User Role” — the ability to change a user’s role. More details about the role model can be found in the corresponding documentation section.
Important: this section displays the user’s role at the Entire System level. A user may have different roles with different sets of privileges for other objects in the system.
Workspace API Tokens
In this section, the user can view, create, edit, and delete API tokens issued in all projects of the current workspace. API tokens are used to access the API, monitoring data, and logs stored in the system, and allow various operations such as reading or writing metrics and logs, as well as managing triggers. The available API can be reviewed in the corresponding documentation section.
Viewing the Workspace API Token List
The user can view the list of all API tokens issued for projects within the current workspace. This allows monitoring current tokens and preventing unauthorized access.
Reissuing a Workspace Token
To reissue a token:
- find the desired token in the list;
- click the “Reissue Token” button.
Important: after reissuing a token, all integrations using that token will stop working until the old token is updated and replaced with the new one.
Creating a Workspace Token
To create a new API token, follow these steps:
- click the “Add” button;
- fill in the required fields:
- “Name” — specify a name for the token that describes its purpose (e.g.,
Token for Prometheus remote write); - “Expiration” — determines how long the token remains valid:
- “Date” — specify a specific date and time until which the token will be valid. After this time, the token becomes invalid;
- “Unlimited” — if no date is specified, the token remains valid until it is deleted;
- “Permissions” — select a combination of access rights matching the required set of permissions. More details about the role model can be found in the corresponding documentation section;
- “Scope” — determines which system entities (Entire System, workspaces, and projects) the token will apply to.
- “Name” — specify a name for the token that describes its purpose (e.g.,
Note on multi-tenant tokens: if the scope of the selected token covers multiple projects, a workspace, or the entire system, the token is classified as multi-tenant. These tokens have the following characteristics:
- they are limited to read-only access to monitoring data from the selected scope;
- they provide access to data from all projects within the specified scope;
- queries using a multi-tenant token are executed across the entire data volume of projects in the selected scope, providing comprehensive access to monitoring information.
Editing a Workspace Token
To edit an existing token, follow these steps:
- find the desired token in the list;
- click the “Edit” button;
- make the necessary changes:
- change the token name;
- update the expiration date or set one if it was not previously set;
- change the scope or permissions.
Deleting a Workspace Token
To delete an existing token, follow these steps:
- find the desired token in the list;
- click the “Delete” button.
Important: after deleting a token, all integrations using that token will immediately stop working.
Workspace Settings
In the “Settings” section, users can manage the basic parameters of a workspace, such as its name, etc. This provides administrators with the ability to keep data up to date and, if necessary, delete the workspace.
Changing the Workspace Name
To change the workspace name, follow these steps:
- go to the “Settings” section;
- find the “Workspace Name” field;
- enter a new name for the workspace;
- click the “Save” button.
Deleting a Workspace
To delete a workspace, follow these steps:
- go to the “Settings” section;
- click the “Delete” button;
- confirm the deletion by following the on-screen instructions.
Important: deleting a workspace will result in the deletion of all associated data, such as projects, dashboards, and triggers. All integrations associated with the projects of this workspace will stop working. Be careful and make sure you want to delete the workspace before confirming the action.
Project
Project-Level Dashboards
Project-level dashboards in Deckhouse Observability Platform allow users to create charts and panels based on project-specific metrics, as well as other metrics, to visualize key aspects of service operation, incoming data, and the state of the project itself. In addition, Entire System-level and installation-level dashboards are displayed here in read-only mode.
Project Dashboard Hierarchy
In Deckhouse Observability Platform, dashboards are organized into the following hierarchy:
-
Installation-level dashboards — pre-installed dashboards based on opAgent metrics and Deckhouse Kubernetes Platform metrics. They are an integral part of the distribution; users and administrators cannot modify them or view their list. These dashboards are displayed contextually, when the required (key) metric for each dashboard is available.
-
Entire System-level dashboards — dashboards available in all projects. Users at the project level cannot modify or delete these dashboards, which is useful for distributing standard dashboards.
Important: modifying this dashboard will cause it to be updated across all projects. Be careful when making changes.
-
Project-level dashboards — dashboards that users create on their own within their projects. They are used to build charts based on project-specific metrics, as well as other metrics, to visualize key aspects of service operation, incoming data, and the state of the project itself.
Project Dashboard Management
In this subsection, the user can:
- view, edit, and delete project-level dashboards and directories;
- view Entire System-level and installation-level dashboards (read-only).
Viewing Project Dashboards
Dashboards in the interface are displayed in three sections:
- “Servers” — displays a list of hosts where opAgent is installed:
-
“Active” — a list of hosts from which data has been received within the last hour;
-
“Stale” — a list of hosts from which data was previously received but is no longer being received.
After data stops arriving, the host moves from “Active” to “Stale” after one hour.
-
- “Dashboards” — a list of user dashboards and Entire System-level and installation-level dashboards (available in read-only mode).
- “Integrations” — a list of dashboards grouped by integrations (databases, queue managers, etc.). This section displays dashboards based on services discovered by opAgent.
Project Server Actions
-
Viewing dashboards for a server:
- click on the server name.
Available dashboards for all servers:
- “Common” — basic server information: CPU, LA, memory consumption;
- “Disk” — used and available disk space resources;
- “Network” — network interfaces and their activity;
- “Netstat” — network connection statistics;
- “Process” — the state of active processes on the server;
- one or more additional dashboards (depending on services discovered by opAgent).
-
Deleting a host:
-
“Deleting a host from the active list” — if a host was accidentally added to a project, it can be deleted. If data sending is not stopped, the host will reappear in the active list after one hour. To delete a host, click the “Delete” button.
-
“Deleting a host from the stale list” — if the host data is no longer needed and data is no longer being received from it, the host can be hidden from the list. To do this, click the “Delete” button.
-
-
Viewing active alerts:
- Next to each server, the number of active alerts is displayed. Clicking this icon navigates to the “Notification Center” section, where the output will be filtered by the selected server.
Project Integration Actions
-
Viewing dashboards for an integration:
- find the desired service in the list and click on its name.
-
Viewing active alerts:
- next to each service, the number of active alerts is displayed. Clicking this icon navigates to the “Notification Center” section, where the output will be filtered by the selected service.
Project Dashboard Actions
-
Creating a new dashboard:
- click the “Add” button.
Important: before creating the first dashboard, you must create a directory, as placing dashboards outside of directories is not possible.
- fill in the following fields:
- “Name” — the dashboard name;
- “Directory” — the directory to place the dashboard in;
- “Edit JSON” — the dashboard JSON configuration in Grafana format, if you have one, or leave the default value;
- save the dashboard.
As a result, the dashboard composition interface will open.
-
Viewing a dashboard:
- find the desired dashboard and click on its name.
-
Editing a dashboard:
- open the dashboard for editing;
- copy and paste the updated JSON configuration;
- change the name (if needed);
- change the placement directory (if needed);
- save the changes.
Alternative: open the desired dashboard, edit or add the needed panels, and save the changes.
-
Deleting a dashboard:
- open the dashboard for editing;
- click the “Delete” button.
Creating, Editing, and Deleting Project Directories
-
Creating a new directory:
- click the “Create” button;
- enter the directory name;
- save the directory.
-
Editing a directory:
- open the directory list;
- select the directory you want to modify;
- make the necessary changes;
- save the changes.
-
Deleting a directory:
- open the directory list;
- select the directory you want to delete;
- click the “Delete” button;
- confirm the deletion.
Important: when a directory is deleted, all dashboards within it will be permanently deleted without the possibility of recovery.
Data Overview
In the “Data Overview” section, users can execute read queries for data such as metrics, logs, and traces, as well as queries to external storages provided that additional data sources are configured. More details about data sources can be found in the corresponding documentation section.
Capabilities
- querying metrics, logs, and traces — executing read queries for data stored in Deckhouse Observability Platform;
- querying external storages — if additional data sources are configured, users can execute queries to external systems.
Query Languages
The following query languages are used depending on the data source type:
- metrics — PromQL (Prometheus Query Language) is used to query metric data;
- logs — LogQL (Loki Query Language) is used to query log data;
- traces — TraceQL (Tempo Query Language) is used to query trace data.
Query examples:
- Querying metrics using PromQL:
sum(rate(http_requests_total[5m])) by (job)
- Querying logs using LogQL:
{job="varlogs"} |= "error"
- Querying traces using TraceQL:
{span.http.status_code >= 400}
Project Notification Center
The Notification Center provides the user with the ability to view currently active triggers, as well as manage project triggers and recording rules.
Project Notifications
In this section, users can view the list of active alerts. Alerts are grouped by severity level, which is determined based on the severity label. The following groups are available:
- “Active” — all active alerts, including those without a
severityvalue set; - “Critical” — alerts with
severity: critical; - “Warning” — alerts with
severity: warning; - “Informational” — alerts with
severity: info; - “Configuration” — alerts with
severity: info, specialized alerts that signal incomplete opAgent configuration; - “Resolved” — a list of all resolved alerts.
A search by alert labels is also available in this section. To use it, specify the required label and its value in the filter bar.
Project Alert Actions
The following capabilities are provided for each alert:
-
Viewing the alert graph — the graph displays changes in values and threshold values. To view the graph, find the desired alert and click the “Alert Graph” button.
-
Disabling an alert — the ability to temporarily stop receiving the alert (e.g., during scheduled maintenance). To disable an alert, find it and click the “Disable” button. More details about disabling alerts can be found in the documentation section.
-
Viewing the trigger — viewing the notification rule that caused the alert to fire. Click “View Notification Rule” to obtain this information.
-
Force-resolving an alert — for triggers with double-check enabled, there is an option to force-resolve them. To do this, find the desired alert and click “Resolve”. More details about double-check can be found in the documentation section.
Project Triggers
A trigger is a rule that defines conditions under which a specific event (alert) is fired. In Deckhouse Observability Platform, triggers do not directly deliver notifications. For notifications to be delivered, you need to configure notification channels and specify the relationship between triggers and those channels.
Project Trigger Hierarchy
In Deckhouse Observability Platform, triggers are organized into a three-level model:
- Installation-level triggers — pre-installed triggers configured by Deckhouse Observability Platform developers. They are created based on the metrics collected by opAgent and are an integral part of the software.
- Entire System-level triggers — triggers created by Deckhouse Observability Platform administrators, available in all projects across the entire system.
- User triggers — triggers that users create in their projects.
Project Trigger Management
In this subsection, the user can:
- view, override, or disable installation-level and Entire System-level triggers;
- create and manage project-level triggers.
Viewing Project Triggers
Triggers in the Notification Center interface are displayed in two sections:
- installation-level and Entire System-level triggers are displayed in the “Default” section.
- project-level triggers are displayed in the “Custom” section.
Project Trigger Actions
The following actions are available to users for triggers:
-
Disabling installation-level and Entire System-level triggers:
- find the desired trigger in the “Default” list;
- click the “Disable Rule” button.
The trigger can later be re-enabled by clicking the “Enable Rule” button.
-
Overriding installation-level and project-level triggers:
- find the desired trigger in the “Default” list;
- click the “Override Rule” button;
- make the necessary changes and save them.
The overridden trigger will move from the “Default” list to the “Custom” list.
-
Reverting an installation-level or Entire System-level trigger to its original value:
- find the desired trigger in the “Custom” list;
- click the “Revert to Original” button.
The trigger will move back to the “Default” list.
-
Creating a project-level trigger:
-
click the “Add” button;
-
fill in the corresponding fields:
- “Name” — the name to be used as the trigger name;
- “Expression” — a PromQL expression that will be evaluated to check the trigger. The trigger is considered fired if the query returns at least one series. Example:
sum(rate(http_requests_total[5m])) > 10- “Delay before alert fires” — the time during which the alert will not be considered active after the firing condition is met. This helps avoid flapping alerts;
- “Duration of firing after resolution” — the time during which the alert will remain active after the condition is no longer met. This is useful for avoiding flapping alerts on borderline values;
- “Labels” — key-value pairs for labels that will be added to the alert;
- “Annotations” — additional metadata or information that should be part of the alert (e.g., troubleshooting instructions or contact information).
-
-
Deleting a trigger
- open the trigger for editing;
- click the “Delete” button.
Project Trigger Double-Check
In the standard trigger evaluation logic, if the query returns at least one series, the trigger is considered fired. In most cases, this is sufficient; however, if a metric was previously present and then disappeared, the trigger will be considered inactive. This approach may not be suitable for certain triggers based on opAgent metrics, which monitors various services (e.g., databases) and collects metrics from them. If a service stops working, all its metrics disappear, and this can be incorrectly interpreted as trigger resolution.
For such situations, the trigger double-check feature has been implemented. The double-check ensures that the trigger has truly stopped being active, rather than the metric simply having disappeared.
Project Double-Check Configuration
To enable double-check in a trigger, add the following labels:
double_check— indicates whether an additional check is needed (true|false);resolve_double_check_metric— the name of the metric that will be additionally checked;resolve_double_check_labels— the list of labels from the triggered series whose values will be taken and substituted into the query for the metric fromresolve_double_check_metric.
How It Works
When double-check is enabled, the system performs an additional query to verify whether the trigger has truly stopped being active or the metric has disappeared. If the metric has disappeared, the trigger will remain active. The trigger stops being active in two cases:
- the metric has reappeared and does not satisfy the firing conditions;
- the user has force-resolved the alert (e.g., if the database was removed from the server).
Project Recording Rules
Recording rules allow you to compute new metrics based on existing data, thereby reducing the load on dashboards and enabling complex calculations to be performed in advance.
Users can create, modify, and manage recording rules, which improves system performance and enhances analytical capabilities.
Project Recording Rules Hierarchy
In Deckhouse Observability Platform, recording rules are organized into a three-level model:
- Installation-level recording rules — pre-installed rules configured by Deckhouse Observability Platform developers, based on data collected by opAgent.
- Entire System-level recording rules — created by Deckhouse Observability Platform administrators and available in all projects.
- User recording rules — created by users within specific projects.
Project Recording Rules Management
In this subsection, the user can:
- view, override, and disable installation-level and Entire System-level recording rules;
- create and manage project-level recording rules.
Viewing Project Recording Rules
- “Default” — displays installation-level and Entire System-level recording rules;
- “Custom” — displays project-level recording rules.
Project Recording Rules Actions
-
Disabling an installation-level or Entire System-level recording rule:
- click the “Disable Rule” button next to the desired rule;
- to re-enable the rule, click “Enable Rule”.
-
Overriding an installation-level or Entire System-level recording rule:
- click the “Override Rule” button next to the rule;
- make the necessary changes and save them;
After saving, the rule will disappear from the “Default” section and appear in the “Custom” section.
To revert to the original value, find the rule in the “Custom” list and click “Revert to Original”.
-
Creating a new project-level recording rule:
-
click the “Add” button;
-
fill in the required fields:
- “Name” — the rule name;
- “Expression” — a PromQL expression to compute the rule:
avg_over_time(http_requests_total[5m])- “Labels” — labels (key-value pairs) for additional classification and filtering of metrics.
-
-
Deleting a recording rule:
- open the rule for editing;
- click the “Delete” button.
Silenced Notifications
In this section, users can manage current alert silences, create new ones, and view expired silences. This functionality is designed for temporarily stopping the receipt of notifications from Deckhouse Observability Platform, which can be useful during scheduled maintenance, system testing, or for excluding unnecessary alerts.
Creating a Notification Silence
To create a new notification silence, follow these steps:
- click the “Add” button;
- fill in the required fields:
- “Starts” — the date and time when the silence begins;
- “Ends” — the date and time when the silence ends;
- “Comment” — a comment describing the reason for the silence;
- “Filters” — filters to apply the silence to specific alerts:
- “Label Name” — the label names that will be used for filtering;
- “Match Type” — the filtering operator (e.g.,
=,!=,=~); - “Label Value” — the label values for filtering.
- Click “Save” to create the notification silence.
Editing Notification Silences
To edit an existing silence, follow these steps:
- find the desired silence in the list;
- click the “Edit” button;
- make the necessary changes to the fields;
- click “Save” to save the changes.
Deleting Notification Silences
To delete an existing silence, follow these steps:
- find the desired silence in the list;
- click the “Delete” button;
- confirm the deletion by following the on-screen instructions.
Important: deleting a notification silence will immediately resume the delivery of corresponding alerts.
Recreating Notification Silences
To create a new silence based on an expired one, follow these steps:
- find the desired silence in the list of expired silences;
- click the “Recreate” button;
- edit the silence period (start and end), if necessary;
- click “Save” to create a new silence based on the expired one.
Notification Rules
In this section, users can view, create, edit, and delete notification rules. These rules allow configuring routing for delivering notifications about active triggers using the available notification channels. Information about notification channels and their configuration can be found in the corresponding documentation section.
Creating a Notification Rule
To create a new notification rule, follow these steps:
- click the “Add” button;
- select one of the available notification channels;
- fill in the required fields:
- “Name” — the name of the rule for identification;
- “Send on Resolution” — enable if you need to send notifications when an event ends;
- “Continue Processing” — enable if you need to send information about one alert to multiple channels or multiple recipients;
- “Default” — set this if the rule should apply to all alerts that do not match other rules. It is recommended to always have such a rule so as not to miss alerts;
- configure filters for the rule:
- “Label Name” — the label name;
- “Match Type” — the filtering operator (e.g.,
=,!=,=~); - “Label Value” — the value for the label;
- configure routes for delivering notifications through channels:
- “Slack” — Slack channels for sending notifications;
- “Email” — email addresses to which notifications will be sent;
- “Telegram” — Telegram chat IDs for sending messages;
- “Webhook” — webhook URLs for sending data;
- click “Save” to save the created rule.
Editing Notification Rules
To edit an existing notification rule, follow these steps:
- find the desired rule in the list;
- click the “Edit” button;
- make the necessary changes to the fields;
- click “Save” to save the changes.
Deleting Notification Rules
To delete an existing notification rule, follow these steps:
- Find the desired rule in the list.
- Click the “Delete” button.
- Confirm the deletion by following the on-screen instructions.
Important: deleting a notification rule immediately stops routing notifications for that rule. Make sure you actually want to delete this rule before confirming.
Project Usage Statistics
In this section, the user can view statistics on Deckhouse Observability Platform usage. Only data for the current project is displayed, presented as various charts and metrics, allowing you to assess performance and data volumes.
Project Metrics
- “Metrics: Active Series” — the number of active series. An active series is the number of unique metrics received within the last two hours.
- “Metrics: Raw incoming rate” — the incoming metric stream per second. This chart displays data after deduplication.
- “Metrics: Total count of samples” — the total number of samples stored in long-term storage. A sample is an individual metric value collected at a specific point in time.
- “Metrics: Discarded samples” — errors when writing series. Possible error causes include:
sample_out_of_bounds— the sample is outside the allowed value range;sample_out_of_order— the sample arrived in the wrong order;sample_too_old— the sample is too old and is not accepted by the system;sample_too_far_in_future— the sample is too far in the future;new_value_for_timestamp— an attempt to write a different value for the same timestamps;per_user_series_limit— the metric ingestion rate limit for the project has been exceeded;metric_series_limit— the metric-related limit has been exceeded.
- “Metrics: Storage usage” — the amount of disk space used by metrics in long-term storage.
Project Logs
- “Logs: Raw incoming rate” — the incoming log stream.
- “Logs: Discarded samples” — errors when writing logs. Possible error causes include:
rate_limited— the write rate limit has been exceeded;stream_limit— the stream count limit has been exceeded;label_name_too_long— the label name length exceeds the allowed limit;label_value_too_long— the label value length exceeds the allowed limit;line_too_long— the line length exceeds the allowed limit;max_label_names_per_series— the limit on the number of label names per series has been exceeded;per_stream_rate_limit— the per-stream write rate limit has been exceeded.
- “Logs: Storage usage” — the amount of disk space used by logs.
Project Traces
- “Traces: Raw incoming rate” — the incoming trace stream in bytes per second.
- “Traces: Live traces” — the number of active traces currently being processed.
- “Traces: Discarded spans” — errors during span ingestion. Possible error reasons include:
rate_limited— the trace ingestion rate limit has been exceeded;trace_too_large— the trace size exceeds the allowedmax_bytes_per_tracelimit;live_traces_exceeded— the limit on the number of active traces has been exceeded.
- “Traces: Storage usage” — the amount of disk space used by traces in long-term storage.
Project Members
In this section, the user can view the list of users who have access to the current project. User management capabilities depend on the type of authentication configured in Deckhouse Observability Platform.
Project External Authentication
If Deckhouse Observability Platform has authentication via external systems enabled (e.g., LDAP, OAuth, etc.), only viewing the user list is available in this section.
Project Internal Authentication
If Deckhouse Observability Platform uses internal authentication (email and password), additional user management capabilities are available in this section:
- deleting users — the ability to remove users from the system;
- changing a user’s role — the ability to change a user’s role. More details about the role model can be found in the corresponding documentation section.
Project API Tokens
In this section, the user can view, create, edit, and delete API tokens issued for the current project. API tokens are used to access the API, monitoring data, logs, and traces stored in the system, and allow various operations such as reading or writing metrics, logs, and traces, as well as managing triggers. The available API can be reviewed in the corresponding documentation section.
Viewing the Project API Token List
The user can view the list of all API tokens issued for the current project. This allows monitoring tokens and preventing unauthorized access.
Reissuing a Project Token
To reissue a token:
- find the desired token in the list;
- click the “Reissue Token” button.
Important: after reissuing a token, all integrations using that token will stop working until the old token is updated and replaced with the new one.
Creating a Project Token
To create a new API token, follow these steps:
- click the “Add” button;
- fill in the required fields:
- “Name” — specify a name for the token that describes its purpose (e.g.,
Token for Prometheus remote write); - “Expiration” — determines how long the token remains valid:
- “Date” — specify a specific date and time until which the token will be valid. After this time, the token becomes invalid;
- “Unlimited” — if no date is specified, the token remains valid until it is deleted;
- “Permissions” — select a combination of access rights matching the required set of permissions. More details about the role model can be found in the corresponding documentation section;
- “Scope” — determines which system entities (Entire System, workspaces, and projects) the token will apply to.
- “Name” — specify a name for the token that describes its purpose (e.g.,
Note on multi-tenant tokens: if the scope of the selected token covers multiple projects, a workspace, or the entire system, the token is classified as multi-tenant. These tokens have the following characteristics:
- they are limited to read-only access to monitoring data from the selected scope;
- they provide access to data from all projects within the specified scope;
- queries using a multi-tenant token are executed across the entire data volume of projects in the selected scope, providing comprehensive access to monitoring information.
Editing a Project Token
To edit an existing token, follow these steps:
- find the desired token in the list;
- click the “Edit” button;
- make the necessary changes:
- change the token name;
- update the expiration date or set one if it was not previously set;
- change the scope or permissions.
Deleting a Project Token
To delete an existing token, follow these steps:
- find the desired token in the list;
- click the “Delete” button.
Important: after deleting a token, all integrations using that token will immediately stop working.
Data Source
In the “Data Source” section, the user can view, create, edit, and delete additional data sources. Data sources allow connecting data from external systems or retrieving data from multiple projects using multi-tenant API tokens. The following systems are currently supported:
- Prometheus;
- Loki;
- Elasticsearch;
- Postgres;
- MySQL.
Creating a Data Source
To add a new data source, follow these steps:
-
click the “Add” button;
-
fill in the required fields:
-
“Name” — the data source name. This name will be displayed in the “Data Overview” section and will be available as an available datasource when creating dashboards;
-
“Type” — the data source type from the available list (Prometheus, Loki, Elasticsearch, Postgres, MySQL);
-
“Basic Authorization” — whether basic authorization is required for connecting to the data source;
-
“Username” — the username for authorization (if applicable);
-
“URL” — the URL for connecting to the data source;
-
“Edit
dataJson” — configuration parameters in JSON format. For example, for Prometheus:{ "timeInterval":"30s } -
“Edit
secureJsonData” — security configuration parameters in JSON format. For example, for Prometheus:{ "basicAuthPassword": "password" }
-
-
click the “Add” button to save the new data source.
Deleting a Data Source
To delete an existing data source, follow these steps:
- find the desired data source in the list;
- click the “Delete” button;
- confirm the deletion by following the on-screen instructions.
Editing a Data Source
To edit an existing data source, follow these steps:
- find the desired data source in the list;
- click the “Edit” button;
- make the necessary changes to the fields;
- click the “Save” button.
These features allow the user to effectively manage connections to various external data systems, providing flexibility and convenience when using additional data sources in the Deckhouse Observability Platform interface.
Data Source Configuration Examples
This section contains instructions for configuring various systems to send data (metrics or logs) to Deckhouse Observability Platform. The set of instructions may vary depending on the installation configuration, such as the availability of log storage or opAgent support.
Important: the instructions contain actual project tokens. Depending on which project the instructions are opened in, different tokens will be displayed. Using instructions from one project to configure another project is not allowed.
PostgreSQL
See the PostgreSQL data source documentation for detailed information.
- Type: postgres
jsonData:
{
"connMaxLifetime": 14400,
"database": "database_name",
"maxIdleConns": 100,
"maxIdleConnsAuto": true,
"maxOpenConns": 100,
"postgresVersion": 1400,
"sslmode": "disable"
}
secureJsonData:
{"basicAuthPassword": "password"}
Elasticsearch
See the Elasticsearch data source documentation for detailed information.
- Type: elasticsearch
jsonData:
{
"includeFrozen": false,
"index": "mindex",
"logLevelField": "",
"logMessageField": "",
"maxConcurrentShardRequests": 5,
"timeField": "date"
}
Basic authorization:
- check the “Use basic auth” checkbox to
true; - fill in the “Username” field.
secureJsonData:
{"basicAuthPassword": "password"}
Certificate:
Add to jsonData:
{
...
"oauthPassThru": false,
"serverName": "server_name",
"sigV4Auth": false,
"tlsAuth": true,
"tlsAuthWithCACert": false,
"tlsSkipVerify": true
}
Add the certificate and PEM file to secureJsonData:
{
...
"basicAuthPassword": "password",
"tlsClientCert": "certificate content",
"tlsClientKey": "PEM file content"
}
Prometheus
See the Prometheus data source Grafana documentation for detailed information.
- Type: prometheus
jsonData:
{
"httpMethod": "POST",
"manageAlerts": true,
"prometheusType": "Prometheus",
"prometheusVersion": "2.44.0",
"incrementalQueryOverlapWindow": "10m"
}
Example of Adding a Data Source for Accessing Data Across Multiple Deckhouse Observability Platform Projects
jsonData:
{
"manage_alerts": "false",
"time_interval": "30s",
"prometheus_type": "Mimir",
"http_header_name1": "X-Auth-Token",
"prometheus_version": "2.4.0"
}
secureJsonData:
{
"http_header_value1": "<TOKEN_FOR_READING_METRICS>"
}
An API token with access to the required projects must be created beforehand.
Project Settings
In the “Settings” section, users can manage the basic parameters of a project, such as changing its name or deleting it. This provides administrators with the ability to keep data up to date and, if necessary, delete projects.
Rules for metrics, logs, and traces for writing and reading can also be configured in this section.
Changing the Project Name
To change the project name, follow these steps:
- go to the “Settings” section;
- find the “Project Name” field;
- enter a new name for the project;
- click the “Save” button.
Deleting a Project
To delete a project, follow these steps:
- go to the “Settings” section;
- click the “Delete” button;
- confirm the deletion by following the on-screen instructions.
Important: deleting a project will result in the deletion of all associated data, metrics, logs and traces, dashboards, and triggers. All integrations associated with this project will stop working. Be careful and make sure you want to delete the project before confirming the action.
Limits Configuration
Each project in Deckhouse Observability Platform has resource limit settings, which allows quotas and resource planning for the entire system.
By default, a project inherits base limits, which can be managed in the “System Settings” menu (details can be found in the corresponding documentation section). Users can view the limits, and the superadministrator can modify them for a specific project.
Metrics Limits
Write Limits
ingestion_rate:- Defines the maximum number of samples that can be written per second for a single series. When this limit is reached, the system checks the value set in the
ingestion_burst_sizeparameter. - Default value:
1,000,000.
- Defines the maximum number of samples that can be written per second for a single series. When this limit is reached, the system checks the value set in the
ingestion_burst_size:- Sets the maximum allowed burst for
ingestion_rate. This value must be equal to or greater thaningestion_rate, as it represents an absolute limit. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to/api/v1/push. - Default value:
1,100,000.
- Sets the maximum allowed burst for
max_global_series_per_user:- Sets the overall maximum limit of active series for a project. Series are considered active if they were added within the last 2 hours. When this limit is reached, Okmeter Storage returns HTTP code 400 on a POST request to
/api/v1/push. - Default value:
1,000,000.
- Sets the overall maximum limit of active series for a project. Series are considered active if they were added within the last 2 hours. When this limit is reached, Okmeter Storage returns HTTP code 400 on a POST request to
max_global_series_per_metric:- Sets the limit on the maximum number of active series by metric name at the cluster level before replication. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to
/api/v1/push. - Default value:
20,000.
- Sets the limit on the maximum number of active series by metric name at the cluster level before replication. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to
max_global_exemplars_per_user:- Defines the maximum number of unique exemplars that can be written for a single project.
- Default value: Unlimited (0).
Read Limits
max_total_query_length:- Limits the total time range of a query, determined by the difference between the end and start times. This limit is applied before time-based query sharding.
- Default value: Unlimited (0s).
max_partial_query_length:- Limits the time range for sub-queries resulting from sharding the original query. This parameter value must be less than or equal to the
max_total_query_lengthvalue. - Default value: Unlimited (0s).
- Limits the time range for sub-queries resulting from sharding the original query. This parameter value must be less than or equal to the
max_fetched_chunk_bytes_per_query:- Sets the maximum size in bytes of all data chunks that can be fetched by a query from each ingester component and S3 storage.
- Default value: Unlimited (0).
max_fetched_series_per_query:- Defines the maximum number of unique series that can be returned in response to a query.
- Default value:
100,000.
Alert and Record Rules Limits
ruler_max_rule_groups_per_tenant:- Defines the maximum number of rule groups available for each project. The actual number of rules that can be set for a project equals the product of
ruler_max_rule_groups_per_tenantandruler_max_rules_per_rule_group. It is recommended to adjust this value depending on the number of Ruler instances in your system. - Default value:
35.
- Defines the maximum number of rule groups available for each project. The actual number of rules that can be set for a project equals the product of
ruler_max_rules_per_rule_group:- Defines the maximum number of rules in each rule group. Limiting the number of rules in each group helps evenly distribute the load across Ruler instances and ensures more efficient resource utilization.
- Default value:
50.
Notification Rate Limits
alertmanager_notification_rate_limit:- Defines the maximum number of notifications (counting the total number of notifications across all delivery channels) in a project that can be sent per second.
- Default value:
0(unlimited).
Data Retention Period
compactor_blocks_retention_period:- Defines the period during which data will be stored.
- Default value: 400 days.
Logs Limits
Log Write Limits
ingestion_rate_mb:- Limits the log write rate in megabytes per second.
ingestion_burst_size_mb:- Sets the maximum allowed burst for
ingestion_rate_mb.
- Sets the maximum allowed burst for
per_stream_rate_limit:- Limits the data write rate to a specific log stream.
per_stream_rate_limit_burst:- Sets the maximum allowed burst for
per_stream_rate_limit.
- Sets the maximum allowed burst for
Log Read Limits
max_chunks_per_query:- Defines the maximum number of data chunks that can be returned in a single query.
max_streams_per_user:- Defines the maximum number of log streams that can be returned in a single query for one user.
max_entries_limit_per_query:- Defines the maximum number of log entries that can be returned in a single query.
max_query_length:- Limits the time range of a log query.
max_queriers_per_tenant:- Limits the number of queries that can be executed simultaneously for one project.
Log Retention Period
retention_period:- Defines the period during which log data will be stored.
- Default value: 14 days.
Traces Limits
Trace Write Limits
ingestion_rate_limit_bytes:- Limits the trace write rate in bytes per second.
- Default value:
10,000,000(10 MB/s).
ingestion_burst_size_bytes:- Sets the maximum allowed burst for
ingestion_rate_limit_bytes. - Default value:
15,000,000(15 MB).
- Sets the maximum allowed burst for
max_traces_per_user:- Defines the maximum number of active traces for a project on a single ingester.
- Default value:
40,000.
max_global_traces_per_user:- Sets the overall maximum limit of active traces for a project across all ingesters.
- Default value:
80,000.
max_bytes_per_trace:- Limits the maximum size of a single trace in bytes.
- Default value:
1,500,000(1.5 MB).
Trace Read Limits
max_blocks_per_tag_values_query:- Defines the maximum number of blocks that can be scanned when querying tag values.
- Default value:
150.
max_search_duration:- Limits the maximum duration of a search query.
- Default value:
30s.
Trace Retention Period
block_retention:- Defines the period during which trace data will be stored.
- Default value:
7d(7 days).
Administrator Web Interface
General Settings
Any changes in this menu item can only be made in coordination with Flant specialists.
This menu item contains various feature flag parameters for enabling and disabling different system functionalities. In most cases, no changes are required in this section.
Base Limits
Each project in Deckhouse Observability Platform has resource limit settings, which allows quotas and resource planning for the system.
By default, projects inherit base limits that are defined in this section. These limits help control resource usage and prevent exceeding allocated quotas.
Important: changing base limits may affect the operation of all projects using these settings. Make sure the changes are correct before saving.
Base Metrics Limits
Metrics Write Limits
ingestion_rate:- Defines the maximum number of samples that can be written per second for a single series. When this limit is reached, the system checks the value set in the
ingestion_burst_sizeparameter. - Default value:
1,000,000.
- Defines the maximum number of samples that can be written per second for a single series. When this limit is reached, the system checks the value set in the
ingestion_burst_size:- Sets the maximum allowed burst for
ingestion_rate. This value must be equal to or greater thaningestion_rate, as it represents an absolute limit. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to/api/v1/push. - Default value:
1,100,000.
- Sets the maximum allowed burst for
max_global_series_per_user:- Sets the overall maximum limit of active series for a project. Series are considered active if they were added within the last 2 hours. When this limit is reached, Okmeter Storage returns HTTP code 400 on a POST request to
/api/v1/push. - Default value:
1,000,000.
- Sets the overall maximum limit of active series for a project. Series are considered active if they were added within the last 2 hours. When this limit is reached, Okmeter Storage returns HTTP code 400 on a POST request to
max_global_series_per_metric:- Sets the limit on the maximum number of active series by metric name at the cluster level before replication. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to
/api/v1/push. - Default value:
20,000.
- Sets the limit on the maximum number of active series by metric name at the cluster level before replication. When this limit is exceeded, Okmeter Storage returns HTTP code 400 on a POST request to
max_global_exemplars_per_user:- Defines the maximum number of unique exemplars that can be written for a single project.
- Default value: Unlimited (0).
Metrics Read Limits
max_total_query_length:- Limits the total time range of a query, determined by the difference between the end and start times. This limit is applied before time-based query sharding.
- Default value: Unlimited (0s).
max_partial_query_length:- Limits the time range for sub-queries resulting from sharding the original query. This parameter value must be less than or equal to the
max_total_query_lengthvalue. - Default value: Unlimited (0s).
- Limits the time range for sub-queries resulting from sharding the original query. This parameter value must be less than or equal to the
max_fetched_chunk_bytes_per_query:- Sets the maximum size in bytes of all data chunks that can be fetched by a query from each ingester component and S3 storage.
- Default value: Unlimited (0).
max_fetched_series_per_query:- Defines the maximum number of unique series that can be returned in response to a query.
- Default value:
100,000.
Alert and Record Rules Limits
ruler_max_rule_groups_per_tenant:- Defines the maximum number of rule groups available for each project. The actual number of rules that can be set for a project equals the product of
ruler_max_rule_groups_per_tenantandruler_max_rules_per_rule_group. It is recommended to adjust this value depending on the number of Ruler instances in your system. - Default value:
35.
- Defines the maximum number of rule groups available for each project. The actual number of rules that can be set for a project equals the product of
ruler_max_rules_per_rule_group:- Defines the maximum number of rules in each rule group. Limiting the number of rules in each group helps evenly distribute the load across Ruler instances and ensures more efficient resource utilization.
- Default value:
50.
Notification Rate Limits
alertmanager_notification_rate_limit:- Defines the maximum number of notifications (counting the total number of notifications across all delivery channels) in a project that can be sent per second.
- Default value: 0 (unlimited).
Metrics Retention Period
compactor_blocks_retention_period:- Defines the period during which data will be stored.
- Default value: 400 days.
Base Logs Limits
Log Write Limits
ingestion_rate_mb:- Limits the log write rate in megabytes per second.
ingestion_burst_size_mb:- Sets the maximum allowed burst for
ingestion_rate_mb.
- Sets the maximum allowed burst for
per_stream_rate_limit:- Limits the data write rate to a specific log stream.
per_stream_rate_limit_burst:- Sets the maximum allowed burst for
per_stream_rate_limit.
- Sets the maximum allowed burst for
Log Read Limits
max_chunks_per_query:- Defines the maximum number of data chunks that can be returned in a single query.
max_streams_per_user:- Defines the maximum number of log streams that can be returned in a single query for one user.
max_entries_limit_per_query:- Defines the maximum number of log entries that can be returned in a single query.
max_query_length:- Limits the time range of a log query.
max_queriers_per_tenant:- Limits the number of queries that can be executed simultaneously for one project.
Log Retention Period
retention_period:- Defines the period during which log data will be stored.
- Default value: 14 days.
Base Traces Limits
Trace Write Limits
ingestion_rate_limit_bytes:- Limits the trace write rate in bytes per second.
- Default value:
10,000,000(10 MB/s).
ingestion_burst_size_bytes:- Sets the maximum allowed burst for
ingestion_rate_limit_bytes. - Default value:
15,000,000(15 MB).
- Sets the maximum allowed burst for
max_traces_per_user:- Defines the maximum number of active traces for a project on a single ingester.
- Default value:
40,000.
max_global_traces_per_user:- Sets the overall maximum limit of active traces for a project across all ingesters.
- Default value:
80,000.
max_bytes_per_trace:- Limits the maximum size of a single trace in bytes.
- Default value:
1,500,000(1.5 MB).
Trace Read Limits
max_blocks_per_tag_values_query:- Defines the maximum number of blocks that can be scanned when querying tag values.
- Default value:
150.
max_search_duration:- Limits the maximum duration of a search query.
- Default value:
30s.
Trace Retention Period
block_retention:- Defines the period during which trace data will be stored.
- Default value:
7d(7 days).
Setting Up Role Mapping
This section is available when using external authentication.
Role mapping allows associating external user groups with roles and entities in the system. When using external authentication, a list of the user’s groups is transmitted upon login. Using these groups, access rules (mappings) for users to system entities can be established. An unlimited number of mappings can be added.
Important: changes to user, role, and system entity associations take effect at the time of user authorization. Therefore, when changing a mapping, the user (or users) must log out and log back in to the system. More details about setting up role mapping can be found in the corresponding documentation section.
Cluster
Any changes in this menu item can only be made in coordination with Flant specialists.
This section lists the clusters that have been added to the configuration replication system. If the “disaster-resistant” installation option is selected, there may be more than one cluster here. Each cluster represents a separate system installation, most commonly hosted in a separate data center.
This section also displays the synchronization status of settings between various system components and the data synchronization status between installations, if a disaster-resistant installation is being used.
Users
This section allows viewing all users who have logged into the system, as well as tracking all associations of users with all system entities. This section is useful for debugging role mappings to understand which roles are assigned to users for various system entities.
Important: changes to user, role, and system entity associations take effect at the time of user authorization. Therefore, when changing a mapping, the user (or users) must log out and log back in to the system.
Audit Logs
The “Audit Logs” section allows viewing and analyzing the audit log, which contains records of all changes occurring in the system. This includes object changes, user actions, and change sources. Analyzing these logs helps track system security and integrity, as well as identify any suspicious activities.
Audit Log Filters
Various filters are provided for more convenient use of the section. You can use them to refine your search and display specific records.
Available filter parameters:
- “Object ID” — the unique identifier of the object;
- “Object Type” — the object type for filtering (e.g., project, user, role, etc.);
- “Created From” — the start date and time for the search period (format: dd.mm.yyyy, –:–);
- “Created Until” — the end date and time for the search period (format: dd.mm.yyyy, –:–);
- “User ID” — the unique identifier of the user;
- “Change Source” — the source of changes (if known);
- “Action” — the action type (e.g., create, edit, delete);
- “Changed Field” — the specific field whose changes you want to view;
- “Changed Value” — the specific value the field was changed to.