The module is available only in Deckhouse Enterprise Edition.

This module allows you to run and manage Deckhouse Observability Platform, a centralized monitoring platform.

Platform capabilities

The platform features the following capabilities:

  • Collection of monitoring data (metrics and logs) via a unified centralized interface with the option of multi-year storage.
  • User-friendly and intuitive interface for browsing monitoring data.
  • Monitoring agent for servers with built-in integrations (i.e. they are automatically detected and put on monitoring) running common software, such as databases, queue managers, load balancers, etc.
  • Ready-to-use dashboards and alerts for built-in integrations.
  • Option to set up your own dashboards and alerts, as well as alert delivery channels.
  • Notification center to browse through triggered alerts and their history.
  • Flexible permissions system with role-based model and support for authentication using third-party systems.

Typical usage scenario

The typical usage scenario of Deckhouse Observability Platform

Platform’s key features

  • Capable of storing hundreds of millions of unique, highly granular metrics.
  • Long-term data retention (5 years or more).
  • System for differentiating metric flows and access rights:
    • separation of metrics on a per-project basis — each project or service can push metrics to its own repository;
    • multi-project query mode — the option to make a query to display metrics from multiple projects at a time.
  • Flexible permissions system:
    • role-based model for controlling user access to various system components and data;
    • support for authentication using third-party systems such as LDAP, OAuth2, and others.
  • Storing metrics from any source that supports the Remote Write protocol (e.g., Prometheus, VictoriaMetrics, etc.) as well as logs using Loki.
  • Per-tenant statistics of storage resource utilization.
  • Unified web interface:
    • for handling graphs; supports the Grafana dashboard format;
    • for working with alerts and triggers;
    • for managing and customizing the platform.
  • Notification center:
    • convenient handling of active alerts and their history;
    • filtering and sorting notifications based on multiple criteria;
    • option to temporarily disable alerts if necessary;
    • rapid response to changes in system state.
  • Flexible dashboards and triggers setup with the option for common company-wide settings allow you to conveniently manage single-type dashboards and triggers while maintaining the flexibility to customize at the individual project level.

Agent’s key features

Description

Deckhouse Observability Platform Agent (hereinafter referred to as opAgent) is a statically built binary file written in Go language. As such, it can be run in any Linux-like operating system. opAgent supports x86_64 and ARM architectures.

opAgent is deployed in one command . It automatically starts collecting metrics from both the operating system and the application software installed on the servers. opAgent features more than 15 different integrations with application software, allowing it to automatically discover, connect to, and scrap key metrics from that software. In most cases, the opAgent does not require any configuration. However, in some cases, it needs to be granted appropriate permissions so that it can collect metrics (the system will notify you and prompt for commands to execute).

What opAgent can do

  • opAgent can collect basic server utilization metrics such as:
    • CPU;
    • Disk;
    • RAM + SWAP;
    • Network / Network connections (netstat);
    • OS resources broken down by processes.
  • opAgent can automatically detect various applications installed on servers and set them to be monitored:
    • PostgreSQL;
    • PgBouncer;
    • MySQL;
    • MongoDB;
    • Redis;
    • Memcached;
    • Cassandra;
    • Elasticsearch;
    • ZooKeeper;
    • Kafka;
    • Sphinx;
    • RabbitMQ;
    • Nginx;
    • PHP-FPM;
    • Java/JMX;
    • Envoy.
  • opAgent can collect custom metrics in a wide range of ways:
    • SQL-query — running SQL queries;
    • Redis-query — running Redis queries;
    • Exec — running the executable file and retrieving metrics from it;
    • Logparser — parsing logs and converting log fields into metrics;
    • HTTP-query — making HTTP requests to retrieve metrics;
    • StatsD — StatsD server built into the agent for quick counter collection;
    • Prometheus exporters — scraping metrics from any OpenMetrics-compatible exporters.