The module collects statistics by availability type for cluster components and Deckhouse. It enables evaluating the degree of SLA compliance for these components, presents availability data via a web interface, and provides a web page with the operating statuses of the cluster components.
- agent — probes the availability of components and sends the results to the server; runs on the master nodes;
- upmeter — aggregates the results and implements the API server to retrieve them;
- status — shows the current availability level over the previous 10 minutes (this one requires authorization by default, but you can disable it);
- webui — is a dashboard with statistics on probes and availability groups (requires authorization);
- smoke-mini — continuous smoke testing using a StatefulSet that looks like an actual application.
The module sends about 100 metric readings every 5 minutes. This figure depends on the number of Deckhouse modules enabled.
Example of a web interface:
Example of Grafana plots based on upmeter metrics: