Service labels and metrics

In Deckhouse Prom++ terms, the endpoint from which metrics can be collected is called an instance—usually a single worker process.
A set of instances with the same purpose—for example, replicas of the same process used for scaling or fault tolerance—is called a job.

The following is an example of an API server job with four instance replicas:

- job: `api-server`
  - instance 1: `1.2.3.4:5670`
  - instance 2: `1.2.3.4:5671`
  - instance 3: `5.6.7.8:5670`
  - instance 4: `5.6.7.8:5671`

Automatically generated labels and time series

When Deckhouse Prom++ polls the target endpoint (target), it automatically adds several service labels to the collected time series to help identify the data source:

job: Name of the job to which the target system belongs, according to the configuration.
instance: Part of the target service URL in the <host>:<port> format from which the data was collected.

If any of these labels (job or instance) is already present in the source data, the behavior is determined by the honor_labels setting. For details, refer to the data collection (scraping) configuration section.

Each time data is collected (scraped) from an instance, Deckhouse Prom++ creates several additional time series:

up{job="<job-name>", instance="<instance-id>"}: Equals to 1 if the instance is available (scrape was successful) and 0 if the poll failed.
scrape_duration_seconds{job="<job-name>", instance="<instance-id>"}: Polling duration in seconds.
scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"}: Number of metrics after applying metric relabeling rules.
scrape_samples_scraped{job="<job-name>", instance="<instance-id>"}: Total number of metrics received from the target system for the poll.
scrape_series_added{job="<job-name>", instance="<instance-id>"}: Approximate number of new time series added per poll.

The up time series is widely used to monitor instance availability.

If the extra-scrape-metrics flag is enabled, the following metrics are additionally available:

scrape_timeout_seconds{job="<job-name>", instance="<instance-id>"}: Configured scrape_timeout value for the target system.
scrape_sample_limit{job="<job-name>", instance="<instance-id>"}: Configured limit on the number of metrics (sample_limit) to poll. If no limit is specified, a value of 0 will be returned.
scrape_body_size_bytes{job="<job-name>", instance="<instance-id>"}: Size of the last poll response (in bytes) if the poll was successful. If an error occurs due to exceeding the body_size_limit, -1 will be returned; in other cases of unsuccessful polling, 0 will be returned.

Automatically generated labels and time series

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error