Compare languages | Prometheus-мониторинг: FAQ

Как собирать метрики с приложений, расположенных вне кластера?

How do I collect metrics from applications running outside of the cluster?

  1. Сконфигурировать Service по аналогии с сервисом для сбора метрик с вашего приложения, но без указания параметра spec.selector.
  2. Создать Endpoints для этого Service, явно указав в них IP:PORT, по которым ваши приложения отдают метрики.

    Важный момент: имена портов в Endpoints должны совпадать с именами этих портов в Service.

  1. Configure a Service similar to the one that collects metrics from your application (but do not set the spec.selector parameter).
  2. Create Endpoints for this Service and explicitly specify the IP:PORT pairs that your applications use to expose metrics.

Пример

Port names in Endpoints must match those in the Service.

Метрики приложения доступны без TLS, по адресу http://10.182.10.5:9114/metrics.

An example

yaml apiVersion: v1 kind: Service metadata: name: my-app namespace: my-namespace labels: prometheus.deckhouse.io/custom-target: my-app spec: ports:

  • name: http-metrics port: 9114 — apiVersion: v1 kind: Endpoints metadata: name: my-app namespace: my-namespace subsets:
  • addresses:
  • ip: 10.182.10.5 ports:
  • name: http-metrics port: 9114

Application metrics are freely available (no TLS involved) at http://10.182.10.5:9114/metrics.

Как добавить дополнительные dashboard’ы в вашем проекте?

yaml apiVersion: v1 kind: Service metadata: name: my-app namespace: my-namespace labels: prometheus.deckhouse.io/custom-target: my-app spec: ports:

  • name: http-metrics port: 9114 — apiVersion: v1 kind: Endpoints metadata: name: my-app namespace: my-namespace subsets:
  • addresses:
  • ip: 10.182.10.5 ports:
  • name: http-metrics port: 9114

Добавление пользовательских dashboard’ов для Grafana в Deckhouse реализовано с помощью подхода Infrastructure as a Code. Чтобы ваш dashboard появился в Grafana, необходимо создать в кластере специальный ресурс — GrafanaDashboardDefinition.

How do I create custom Grafana dashboards?

Пример:

Custom Grafana dashboards can be added to the project using the Infrastructure as a Code approach. To add your dashboard to Grafana, create the dedicated GrafanaDashboardDefinition Custom Resource in the cluster.

yaml apiVersion: deckhouse.io/v1 kind: GrafanaDashboardDefinition metadata: name: my-dashboard spec: folder: My folder # Папка, в которой в Grafana будет отображаться ваш dashboard. definition: | { “annotations”: { “list”: [ { “builtIn”: 1, “datasource”: “– Grafana –”, “enable”: true, “hide”: true, “iconColor”: “rgba(0, 211, 255, 1)”, “limit”: 100, …

An example:

Важно! Системные и добавленные через GrafanaDashboardDefinition dashboard’ы нельзя изменить через интерфейс Grafana.

yaml apiVersion: deckhouse.io/v1 kind: GrafanaDashboardDefinition metadata: name: my-dashboard spec: folder: My folder # The folder where the custom dashboard will be located. definition: | { “annotations”: { “list”: [ { “builtIn”: 1, “datasource”: “– Grafana –”, “enable”: true, “hide”: true, “iconColor”: “rgba(0, 211, 255, 1)”, “limit”: 100, …

Как добавить алерты и/или recording-правила для вашего проекта?

Caution! System dashboards and dashboards added using GrafanaDashboardDefinition cannot be modified via the Grafana interface.

Для добавления алертов существует специальный ресурс — CustomPrometheusRules.

How do I add alerts and/or recording rules?

Параметры:

  • groups — единственный параметр, в котором необходимо описать группы алертов. Структура групп полностью совпадает с аналогичной в prometheus-operator.

The CustomPrometheusRules resource allows you to add alerts.

Пример:

Parameters:

  • groups — is the only parameter where you need to define alert groups. The structure of the groups is similar to that of prometheus-operator.

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: my-rules spec: groups:

  • name: cluster-state-alert.rules rules:
  • alert: CephClusterErrorState annotations: description: Storage cluster is in error state for more than 10m. summary: Storage cluster is in error state plk_markup_format: markdown expr: | ceph_health_status{job=”rook-ceph-mgr”} > 1

An example:

Как подключить дополнительные data source для Grafana?

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: my-rules spec: groups:

  • name: cluster-state-alert.rules rules:
  • alert: CephClusterErrorState annotations: description: Storage cluster is in error state for more than 10m. summary: Storage cluster is in error state plk_markup_format: markdown expr: | ceph_health_status{job=”rook-ceph-mgr”} > 1

Для подключения дополнительных data source к Grafana существует специальный ресурс — GrafanaAdditionalDatasource.

How do I provision additional Grafana data sources?

Параметры ресурса подробно описаны в документации к Grafana. Тип ресурса смотрите в документации по конкретному datasource.

The GrafanaAdditionalDatasource allows you to provision additional Grafana data sources.

Пример:

A detailed description of the resource parameters is available in the Grafana documentation.

yaml apiVersion: deckhouse.io/v1 kind: GrafanaAdditionalDatasource metadata: name: another-prometheus spec: type: prometheus access: Proxy url: https://another-prometheus.example.com/prometheus basicAuth: true basicAuthUser: foo jsonData: timeInterval: 30s httpMethod: POST secureJsonData: basicAuthPassword: bar

See the datasource type in the documentation for the specific datasource.

Как обеспечить безопасный доступ к метрикам?

An example:

Для обеспечения безопасности настоятельно рекомендуем использовать kube-rbac-proxy.

yaml apiVersion: deckhouse.io/v1 kind: GrafanaAdditionalDatasource metadata: name: another-prometheus spec: type: prometheus access: Proxy url: https://another-prometheus.example.com/prometheus basicAuth: true basicAuthUser: foo jsonData: timeInterval: 30s httpMethod: POST secureJsonData: basicAuthPassword: bar

Пример безопасного сбора метрик с приложения, расположенного в кластере

How do I enable secure access to metrics?

Для настройки защиты метрик приложения с использованием kube-rbac-proxy и последующей сборки метрик с него средствами Prometheus выполните следующие шаги:

To enable secure access to metrics, we strongly recommend using kube-rbac-proxy.

  1. Создайте ServiceAccount с указанными ниже правами:

An example of collecting metrics securely from an application inside a cluster

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: rbac-proxy-test — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8:rbac-proxy subjects:

  • kind: ServiceAccount name: rbac-proxy-test namespace: default

Do the following to set up application metrics protection via the kube-rbac-proxy with the subsequent metrics scraping using Prometheus tools:

Обратите внимание, что используется встроенная в Deckhouse ClusterRole d8:rbac-proxy.

  1. Create a new ServiceAccount with the following permissions:
  1. Создайте конфигурацию для kube-rbac-proxy:

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: rbac-proxy-test — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8:rbac-proxy subjects:

  • kind: ServiceAccount name: rbac-proxy-test namespace: default

yaml

apiVersion: v1 kind: ConfigMap metadata: name: rbac-proxy-config-test namespace: rbac-proxy-test data: config-file.yaml: |+ authorization: resourceAttributes: namespace: default apiVersion: v1 resource: services subresource: proxy name: rbac-proxy-test

The example uses the d8:rbac-proxy built-in Deckhouse ClusterRole.

Более подробную информацию по атрибутам можно найти в документации Kubernetes.

  1. Create a configuration for the kube-rbac-proxy:
  1. Создайте Service и Deployment для вашего приложения, где kube-rbac-proxy займет позицию sidecar-контейнера:

yaml

apiVersion: v1 kind: ConfigMap metadata: name: rbac-proxy-config-test namespace: rbac-proxy-test data: config-file.yaml: |+ authorization: resourceAttributes: namespace: default apiVersion: v1 resource: services subresource: proxy name: rbac-proxy-test

yaml

apiVersion: v1 kind: Service metadata: name: rbac-proxy-test labels: prometheus.deckhouse.io/custom-target: rbac-proxy-test spec: ports:

  • name: https-metrics port: 8443 targetPort: https-metrics selector: app: rbac-proxy-test — apiVersion: apps/v1 kind: Deployment metadata: name: rbac-proxy-test spec: replicas: 1 selector: matchLabels: app: rbac-proxy-test template: metadata: labels: app: rbac-proxy-test spec: securityContext: runAsUser: 65532 serviceAccountName: rbac-proxy-test containers:
  • name: kube-rbac-proxy image: quay.io/brancz/kube-rbac-proxy:v0.14.0 args:
  • ”–secure-listen-address=0.0.0.0:8443”
  • ”–upstream=http://127.0.0.1:8081/”
  • ”–config-file=/kube-rbac-proxy/config-file.yaml”
  • ”–logtostderr=true”
  • ”–v=10” ports:
  • containerPort: 8443 name: https-metrics volumeMounts:
  • name: config mountPath: /kube-rbac-proxy
  • name: prometheus-example-app image: quay.io/brancz/prometheus-example-app:v0.1.0 args:
  • ”–bind=127.0.0.1:8081” volumes:
  • name: config configMap: name: rbac-proxy-config-test

Get more information on authorization attributes in the Kubernetes documentation.

  1. Назначьте необходимые права на ресурс для Prometheus:
  1. Create Service and Deployment for your application with the kube-rbac-proxy as a sidecar container:

yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: rbac-proxy-test-client rules:

  • apiGroups: [””] resources: [“services/proxy”] verbs: [“get”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test-client roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rbac-proxy-test-client subjects:
  • kind: ServiceAccount name: prometheus namespace: d8-monitoring

yaml

apiVersion: v1 kind: Service metadata: name: rbac-proxy-test labels: prometheus.deckhouse.io/custom-target: rbac-proxy-test spec: ports:

  • name: https-metrics port: 8443 targetPort: https-metrics selector: app: rbac-proxy-test — apiVersion: apps/v1 kind: Deployment metadata: name: rbac-proxy-test spec: replicas: 1 selector: matchLabels: app: rbac-proxy-test template: metadata: labels: app: rbac-proxy-test spec: securityContext: runAsUser: 65532 serviceAccountName: rbac-proxy-test containers:
  • name: kube-rbac-proxy image: quay.io/brancz/kube-rbac-proxy:v0.14.0 args:
  • ”–secure-listen-address=0.0.0.0:8443”
  • ”–upstream=http://127.0.0.1:8081/”
  • ”–config-file=/kube-rbac-proxy/config-file.yaml”
  • ”–logtostderr=true”
  • ”–v=10” ports:
  • containerPort: 8443 name: https-metrics volumeMounts:
  • name: config mountPath: /kube-rbac-proxy
  • name: prometheus-example-app image: quay.io/brancz/prometheus-example-app:v0.1.0 args:
  • ”–bind=127.0.0.1:8081” volumes:
  • name: config configMap: name: rbac-proxy-config-test

После шага 4 метрики вашего приложения должны появиться в Prometheus.

  1. Add the necessary resource permissions to Prometheus:

Пример безопасного сбора метрик с приложения, расположенного вне кластера

yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: rbac-proxy-test-client rules:

  • apiGroups: [””] resources: [“services/proxy”] verbs: [“get”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test-client roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rbac-proxy-test-client subjects:
  • kind: ServiceAccount name: prometheus namespace: d8-monitoring

Предположим, что есть доступный через интернет сервер, на котором работает node-exporter. По умолчанию node-exporter слушает на порту 9100 и доступен на всех интерфейсах. Необходимо обеспечить контроль доступа к node-exporter для безопасного сбора метрик. Ниже приведен пример такой настройки.

After step 4, your application’s metrics should become available in Prometheus.

Требования:

  • Из кластера должен быть доступ до сервиса kube-rbac-proxy, запущенного на удаленном сервере.
  • От удаленного сервера должен быть доступ до API-сервера кластера.

An example of collecting metrics securely from an application outside a cluster

Выполните следующие шаги:

  1. Создайте ServiceAccount с указанными ниже правами:

Suppose there is a server exposed to the Internet on which the node-exporter is running. By default, the node-exporter listens on port 9100 and is available on all interfaces. One needs to ensure access control to the node-exporter so that metrics can be collected securely. Below is an example of how you can set this up.

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-external-endpoint rules:

  • apiGroups: [“authentication.k8s.io”] resources:
  • tokenreviews verbs: [“create”]
  • apiGroups: [“authorization.k8s.io”] resources:
  • subjectaccessreviews verbs: [“create”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-external-endpoint-server-01 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-external-endpoint subjects:
  • kind: ServiceAccount name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts

Requirements:

  • There must be network access from the cluster to the kube-rbac-proxy service running on the remote server.
  • The remote server must have access to the Kubernetes API server.
  1. Сгенерируйте kubeconfig для созданного ServiceAccount (пример генерации kubeconfig для ServiceAccount).

Follow these steps:

  1. Create a new ServiceAccount with the following permissions:
  1. Положите получившийся kubeconfig на удаленный сервер. В дальнейшем понадобится указать путь к этому kubeconfig в настройках kube-rbac-proxy (в примере используется путь ${PWD}/.kube/config).

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-external-endpoint rules:

  • apiGroups: [“authentication.k8s.io”] resources:
  • tokenreviews verbs: [“create”]
  • apiGroups: [“authorization.k8s.io”] resources:
  • subjectaccessreviews verbs: [“create”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-external-endpoint-server-01 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-external-endpoint subjects:
  • kind: ServiceAccount name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts
  1. Настройте node-exporter на удаленном сервере, чтобы он был доступен только на локальном интерфейсе (слушал 127.0.0.1:9100).
  2. Запустите kube-rbac-proxy на удаленном сервере:
  1. Generate a kubeconfig file for the created ServiceAccount (refer to the example on how to generate kubeconfig for ServiceAccount).

shell docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10

  1. Copy the kubeconfig file to the remote server. You will also have to specify the kubeconfig path in the kube-rbac-proxy settings (our example uses ${PWD}/.kube/config).
  1. Проверьте, что порт 8443 доступен по внешнему адресу удаленного сервера.
  1. Configure node-exporter on the remote server to be accessible only on the local interface (i.e., listening on 127.0.0.1:9100).
  2. Run kube-rbac-proxy on the remote server:
  1. Создайте в кластере Service и Endpoint, указав в качестве <server_ip_address> внешний адрес удаленного сервера:

shell docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10

yaml

apiVersion: v1 kind: Service metadata: name: prometheus-external-endpoint-server-01 labels: prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01 spec: ports:

  • name: https-metrics port: 8443 — apiVersion: v1 kind: Endpoints metadata: name: prometheus-external-endpoint-server-01 subsets:
  • addresses:
  • ip: ports:
  • name: https-metrics port: 8443
  1. Check that port 8443 is accessible at the remote server’s external address.

Как добавить Alertmanager?

  1. Create Service and Endpoint, specifying the external address of the remote server as <server_ip_address>:

Создайте custom resource CustomAlertmanager с типом Internal.

yaml

apiVersion: v1 kind: Service metadata: name: prometheus-external-endpoint-server-01 labels: prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01 spec: ports:

  • name: https-metrics port: 8443 — apiVersion: v1 kind: Endpoints metadata: name: prometheus-external-endpoint-server-01 subsets:
  • addresses:
  • ip: ports:
  • name: https-metrics port: 8443

Пример:

How do I add Alertmanager?

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: webhook spec: type: Internal internal: route: groupBy: [‘job’] groupWait: 30s groupInterval: 5m repeatInterval: 12h receiver: ‘webhook’ receivers:

  • name: ‘webhook’ webhookConfigs:
  • url: ‘http://webhookserver:8080/’

Create a custom resource CustomAlertmanager with type Internal.

Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.

Example:

Как добавить внешний дополнительный Alertmanager?

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: webhook spec: type: Internal internal: route: groupBy: [‘job’] groupWait: 30s groupInterval: 5m repeatInterval: 12h receiver: ‘webhook’ receivers:

  • name: ‘webhook’ webhookConfigs:
  • url: ‘http://webhookserver:8080/’

Создайте custom resource CustomAlertmanager с типом External, который может указывать на Alertmanager по FQDN или через сервис в Kubernetes-кластере.

Refer to the description of the CustomAlertmanager custom resource for more information about the parameters.

Пример FQDN Alertmanager:

How do I add an additional Alertmanager?

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-fqdn-alertmanager spec: external: address: https://alertmanager.mycompany.com/myprefix type: External

Create a custom resource CustomAlertmanager with the type External, it can point to Alertmanager through the FQDN or Kubernetes service.

Пример Alertmanager с Kubernetes service:

FQDN Alertmanager example:

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-service-alertmanager spec: external: service: namespace: myns name: my-alertmanager path: /myprefix/ type: External

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-fqdn-alertmanager spec: external: address: https://alertmanager.mycompany.com/myprefix type: External

Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.

Alertmanager with a Kubernetes service:

Как в Alertmanager игнорировать лишние алерты?

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-service-alertmanager spec: external: service: namespace: myns name: my-alertmanager path: /myprefix/ type: External

Решение сводится к настройке маршрутизации алертов в вашем Alertmanager.

Refer to the description of the CustomAlertmanager Custom Resource for more information about the parameters.

Потребуется:

How do I ignore unnecessary alerts in Alertmanager?

  1. Завести получателя без параметров.
  2. Смаршрутизировать лишние алерты в этого получателя.

The solution comes down to configuring alert routing in the Alertmanager.

Ниже приведены примеры настройки CustomAlertmanager.

You will need to:

Чтобы получать только алерты с лейблами service: foo|bar|baz:

  1. Create a parameterless receiver.
  2. Route unwanted alerts to this receiver.

yaml receivers: Получатель, определенный без параметров, будет работать как “/dev/null”.

  • name: blackhole Действующий получатель
  • name: some-other-receiver … route: receiver по умолчанию. receiver: blackhole routes: Дочерний маршрут
  • matchers:
  • matchType: =~ name: service value: ^(foo|bar|baz)$ receiver: some-other-receiver

Below are samples for configuring CustomAlertmanager.

Чтобы получать все алерты, кроме DeadMansSwitch:

Receive all alerts with labels service: foo|bar|baz:

yaml receivers: Получатель, определенный без параметров, будет работать как “/dev/null”.

  • name: blackhole Действующий получатель.
  • name: some-other-receiver … route: receiver по умолчанию. receiver: some-other-receiver routes: Дочерний маршрут.
  • matchers:
  • matchType: = name: alertname value: DeadMansSwitch receiver: blackhole

yaml receivers: The parameterless receiver is similar to “/dev/null”.

  • name: blackhole Your valid receiver.
  • name: some-other-receiver … route: Default receiver. receiver: blackhole routes: Child receiver.
  • matchers:
  • matchType: =~ name: service value: ^(foo|bar|baz)$ receiver: some-other-receiver

С подробным описанием всех параметров можно ознакомиться в официальной документации.

Receive all alerts except for DeadMansSwitch:

Почему нельзя установить разный scrapeInterval для отдельных таргетов?

yaml receivers: The parameterless receiver is similar to “/dev/null”.

  • name: blackhole Your valid receiver.
  • name: some-other-receiver … route: default receiver receiver: some-other-receiver routes: Child receiver.
  • matchers:
  • matchType: = name: alertname value: DeadMansSwitch receiver: blackhole

Наиболее полный ответ на этот вопрос дает разработчик Prometheus Brian Brazil. Если коротко, разные scrapeInterval’ы принесут следующие проблемы:

  • увеличение сложности конфигурации;
  • проблемы при написании запросов и создании графиков;
  • короткие интервалы больше похожи на профилирование приложения, и, скорее всего, Prometheus — не самый подходящий инструмент для этого.

A detailed description of all parameters can be found in the official documentation.

Наиболее разумное значение для scrapeInterval находится в диапазоне 10–60 секунд.

Why can’t different scrape Intervals be set for individual targets?

Как ограничить потребление ресурсов Prometheus?

The Prometheus developer Brian Brazil provides, probably, the most comprehensive answer to this question. In short, different scrapeIntervals are likely to cause the following complications:

  • Increasing configuration complexity;
  • Problems with writing queries and creating graphs;
  • Short intervals are more like profiling an app, and Prometheus isn’t the best tool to do this in most cases.

Чтобы избежать ситуаций, когда VPA запрашивает для Prometheus или Longterm Prometheus ресурсов больше, чем есть на выделенном для этого узле, можно явно ограничить VPA с помощью параметров модуля:

  • vpa.longtermMaxCPU;
  • vpa.longtermMaxMemory;
  • vpa.maxCPU;
  • vpa.maxMemory.

The most appropriate value for scrapeInterval is in the range of 10-60s.

Как получить доступ к метрикам Prometheus из Lens?

How do I limit Prometheus resource consumption?

Внимание! Использование данной конфигурации создает сервис, в котором метрики Prometheus доступны без авторизации.

To avoid situations when VPA requests more resources for Prometheus or Longterm Prometheus than those available on the corresponding node, you can explicitly limit VPA using module parameters:

  • vpa.longtermMaxCPU
  • vpa.longtermMaxMemory
  • vpa.maxCPU
  • vpa.maxMemory

Для обеспечения доступа Lens к метрикам Prometheus необходимо создать в кластере ряд ресурсов.

How do I get access to Prometheus metrics from Lens?

Caution! Using this configuration creates a service in which Prometheus metrics are available without authorization.

yaml

apiVersion: v1 kind: Namespace metadata: name: lens-proxy — apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-lens-proxy namespace: lens-proxy — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-lens-proxy:prometheus-access rules:

  • apiGroups: [“monitoring.coreos.com”] resources: [“prometheuses/http”] resourceNames: [“main”, “longterm”] verbs: [“get”, “create”, “update”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-lens-proxy:prometheus-access roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-lens-proxy:prometheus-access subjects:
  • kind: ServiceAccount name: prometheus-lens-proxy namespace: lens-proxy — apiVersion: v1 kind: Secret metadata: name: prometheus-lens-proxy-sa namespace: lens-proxy annotations: kubernetes.io/service-account.name: prometheus-lens-proxy type: kubernetes.io/service-account-token — apiVersion: v1 kind: ConfigMap metadata: name: prometheus-lens-proxy-conf namespace: lens-proxy data: “39-log-format.sh”: | cat > /etc/nginx/conf.d/log-format.conf «“EOF” log_format body ‘$remote_addr - $remote_user [$time_local] “$request” ‘ ‘$status $body_bytes_sent “$http_referer” ‘ ‘“$http_user_agent” “$http_x_forwarded_for”’ ‘ req body: $request_body’; EOF “40-prometheus-proxy-conf.sh”: | #!/bin/sh prometheus_service=”$(getent hosts prometheus.d8-monitoring | awk ‘{print $2}’)” nameserver=”$(awk ‘/nameserver/{print $2}’ < /etc/resolv.conf)” cat > /etc/nginx/conf.d/prometheus.conf «EOF server { listen 80 default_server; resolver ${nameserver} valid=30s; set $upstream ${prometheus_service}; location / { proxy_http_version 1.1; proxy_set_header Authorization “Bearer ${BEARER_TOKEN}”; proxy_pass https://$upstream:9090$request_uri; } access_log /dev/stdout body; } EOF — apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-lens-proxy namespace: lens-proxy spec: replicas: 1 selector: matchLabels: app: prometheus-lens-proxy template: metadata: labels: app: prometheus-lens-proxy spec: containers:
  • name: nginx image: nginx:1.21.4-alpine env:
  • name: BEARER_TOKEN valueFrom: secretKeyRef: name: prometheus-lens-proxy-sa key: token ports:
  • containerPort: 80 volumeMounts:
  • mountPath: /docker-entrypoint.d/40-prometheus-proxy-conf.sh subPath: “40-prometheus-proxy-conf.sh” name: prometheus-lens-proxy-conf
  • mountPath: /docker-entrypoint.d/39-log-format.sh name: prometheus-lens-proxy-conf subPath: 39-log-format.sh serviceAccountName: prometheus-lens-proxy volumes:
  • name: prometheus-lens-proxy-conf configMap: name: prometheus-lens-proxy-conf defaultMode: 0755 — apiVersion: v1 kind: Service metadata: name: prometheus-lens-proxy namespace: lens-proxy spec: selector: app: prometheus-lens-proxy ports:
  • protocol: TCP port: 8080 targetPort: 80

To provide Lens access to Prometheus metrics, you need to create some resources in a cluster.

После деплоя ресурсов метрики Prometheus будут доступны по адресу lens-proxy/prometheus-lens-proxy:8080. Тип Prometheus в Lens — Prometheus Operator.

yaml

apiVersion: v1 kind: Namespace metadata: name: lens-proxy — apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-lens-proxy namespace: lens-proxy — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-lens-proxy:prometheus-access rules:

  • apiGroups: [“monitoring.coreos.com”] resources: [“prometheuses/http”] resourceNames: [“main”, “longterm”] verbs: [“get”, “create”, “update”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-lens-proxy:prometheus-access roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-lens-proxy:prometheus-access subjects:
  • kind: ServiceAccount name: prometheus-lens-proxy namespace: lens-proxy — apiVersion: v1 kind: Secret metadata: name: prometheus-lens-proxy-sa namespace: lens-proxy annotations: kubernetes.io/service-account.name: prometheus-lens-proxy type: kubernetes.io/service-account-token — apiVersion: v1 kind: ConfigMap metadata: name: prometheus-lens-proxy-conf namespace: lens-proxy data: “39-log-format.sh”: | cat > /etc/nginx/conf.d/log-format.conf «“EOF” log_format body ‘$remote_addr - $remote_user [$time_local] “$request” ‘ ‘$status $body_bytes_sent “$http_referer” ‘ ‘“$http_user_agent” “$http_x_forwarded_for”’ ‘ req body: $request_body’; EOF “40-prometheus-proxy-conf.sh”: | #!/bin/sh prometheus_service=”$(getent hosts prometheus.d8-monitoring | awk ‘{print $2}’)” nameserver=”$(awk ‘/nameserver/{print $2}’ < /etc/resolv.conf)” cat > /etc/nginx/conf.d/prometheus.conf «EOF server { listen 80 default_server; resolver ${nameserver} valid=30s; set $upstream ${prometheus_service}; location / { proxy_http_version 1.1; proxy_set_header Authorization “Bearer ${BEARER_TOKEN}”; proxy_pass https://$upstream:9090$request_uri; } access_log /dev/stdout body; } EOF — apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-lens-proxy namespace: lens-proxy spec: replicas: 1 selector: matchLabels: app: prometheus-lens-proxy template: metadata: labels: app: prometheus-lens-proxy spec: containers:
  • name: nginx image: nginx:1.21.4-alpine env:
  • name: BEARER_TOKEN valueFrom: secretKeyRef: name: prometheus-lens-proxy-sa key: token ports:
  • containerPort: 80 volumeMounts:
  • mountPath: /docker-entrypoint.d/40-prometheus-proxy-conf.sh subPath: “40-prometheus-proxy-conf.sh” name: prometheus-lens-proxy-conf
  • mountPath: /docker-entrypoint.d/39-log-format.sh name: prometheus-lens-proxy-conf subPath: 39-log-format.sh serviceAccountName: prometheus-lens-proxy volumes:
  • name: prometheus-lens-proxy-conf configMap: name: prometheus-lens-proxy-conf defaultMode: 0755 — apiVersion: v1 kind: Service metadata: name: prometheus-lens-proxy namespace: lens-proxy spec: selector: app: prometheus-lens-proxy ports:
  • protocol: TCP port: 8080 targetPort: 80

Начиная с версии 5.2.7, Lens требует наличия меток pod и namespace в метриках node-exporter’а. В противном случае потребление ресурсов узла не будет отображаться на диаграммах Lens.

Чтобы исправить это, примените следующий ресурс:

After the resources deployment, Prometheus metrics will be available at address lens-proxy/prometheus-lens-proxy:8080. Lens Prometheus type — Prometheus Operator.

Starting from the version 5.2.7, Lens requires pod and namespace labels to be present on node-exporter metrics. Otherwise, node resource consumption will not appear on Lens charts.

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: lens-hack spec: groups:

  • name: lens-hack rules:
  • expr: node_cpu_seconds_total{mode=~”user system”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_cpu_seconds_total
  • expr: node_filesystem_size_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_filesystem_size_bytes
  • expr: node_filesystem_avail_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_filesystem_avail_bytes
  • expr: node_memory_MemTotal_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_MemTotal_bytes
  • expr: node_memory_MemFree_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_MemFree_bytes
  • expr: node_memory_Buffers_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_Buffers_bytes
  • expr: node_memory_Cached_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_Cached_bytes

To fix this, apply the following resource:

Как настроить ServiceMonitor или PodMonitor для работы с Prometheus?

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: lens-hack spec: groups:

  • name: lens-hack rules:
  • expr: node_cpu_seconds_total{mode=~”user system”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_cpu_seconds_total
  • expr: node_filesystem_size_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_filesystem_size_bytes
  • expr: node_filesystem_avail_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
  • on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_filesystem_avail_bytes
  • expr: node_memory_MemTotal_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_MemTotal_bytes
  • expr: node_memory_MemFree_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_MemFree_bytes
  • expr: node_memory_Buffers_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_Buffers_bytes
  • expr: node_memory_Cached_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”} record: node_memory_Cached_bytes

Добавьте лейбл prometheus: main к Pod/Service Monitor. Добавьте в namespace, в котором находится Pod/Service Monitor, лейбл prometheus.deckhouse.io/monitor-watcher-enabled: "true".

Пример:

How do I set up a ServiceMonitor or PodMonitor to work with Prometheus?

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/monitor-watcher-enabled: “true” — apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app namespace: frontend labels: prometheus: main spec: selector: matchLabels: app: example-app endpoints:

  • port: web

Add the prometheus: main label to the PodMonitor or ServiceMonitor. Add the label prometheus.deckhouse.io/monitor-watcher-enabled: "true" to the namespace where the PodMonitor or ServiceMonitor was created.

Как настроить PrometheusRules для работы с Prometheus?

Example:

Добавьте в namespace, в котором находятся PrometheusRules, лейбл prometheus.deckhouse.io/rules-watcher-enabled: "true".

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/monitor-watcher-enabled: “true” — apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app namespace: frontend labels: prometheus: main spec: selector: matchLabels: app: example-app endpoints:

  • port: web

Пример:

How do I set up a PrometheusRules to work with Prometheus?

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/rules-watcher-enabled: “true”

Add the label prometheus.deckhouse.io/rules-watcher-enabled: "true" to the namespace where the PrometheusRules was created.

Как увеличить размер диска

Example:

  1. Для увеличения размера отредактируйте PersistentVolumeClaim, указав новый размер в поле spec.resources.requests.storage.
    • Увеличение размера возможно, если в StorageClass поле allowVolumeExpansion установлено в true.
  2. Если используемое хранилище не поддерживает изменение диска на лету, в статусе PersistentVolumeClaim появится сообщение Waiting for user to (re-)start a pod to finish file system resize of volume on node..
  3. Перезапустите под для завершения изменения размера файловой системы.

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/rules-watcher-enabled: “true”

Как получить информацию об алертах в кластере?

How to expand disk size

Информацию об активных алертах можно получить не только в веб-интерфейсе Grafana/Prometheus, но и в CLI. Это может быть полезным, если у вас есть только доступ к API-серверу кластера и нет возможности открыть веб-интерфейс Grafana/Prometheus.

  1. To request a larger volume for a PVC, edit the PVC object and specify a larger size in spec.resources.requests.storage field.
    • You can only expand a PVC if its storage class’s allowVolumeExpansion field is set to true.
  2. If storage doesn’t support online resize, the message Waiting for user to (re-)start a pod to finish file system resize of volume on node. will appear in the PersistentVolumeClaim status.
  3. Restart the Pod to complete the file system resizing.

Выполните следующую команду для получения списка алертов в кластере:

How to get information about alerts in a cluster?

shell kubectl get clusteralerts

You can get information about active alerts not only in the Grafana/Prometheus web interface but in the CLI. This can be useful if you only have access to the cluster API server and there is no way to open the Grafana/Prometheus web interface.

Пример:

Run the following command to get cluster alerts:

shell

shell kubectl get clusteralerts

kubectl get clusteralerts NAME ALERT SEVERITY AGE LAST RECEIVED STATUS 086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing 27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing

Example:

Выполните следующую команду для просмотра конкретного алерта:

shell

shell kubectl get clusteralerts -o yaml

kubectl get clusteralerts NAME ALERT SEVERITY AGE LAST RECEIVED STATUS 086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing 27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing

Пример:

Run the following command to view a specific alert:

shell

shell kubectl get clusteralerts -o yaml

kubectl get clusteralerts 235d4efba7df6af4 -o yaml alert: description: | The recommended course of action:

  1. Retrieve details of the Deployment: kubectl -n d8-snapshot-controller describe deploy snapshot-controller
  2. View the status of the Pod and try to figure out why it is not running: kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller labels: pod: snapshot-controller-75bd776d76-xhb2c prometheus: deckhouse tier: cluster name: D8SnapshotControllerPodIsNotReady severityLevel: “8” summary: The snapshot-controller Pod is NOT Ready. apiVersion: deckhouse.io/v1alpha1 kind: ClusterAlert metadata: creationTimestamp: “2023-05-15T14:24:08Z” generation: 1 labels: app: prometheus heritage: deckhouse name: 235d4efba7df6af4 resourceVersion: “36262598” uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7 status: alertStatus: firing lastUpdateTime: “2023-05-15T18:10:09Z” startsAt: “2023-05-10T13:43:09Z”

Example:

Помните о специальном алерте DeadMansSwitch — его присутствие в кластере говорит о работоспособности Prometheus.

shell

Как добавить дополнительные эндпоинты в scrape config?

kubectl get clusteralerts 235d4efba7df6af4 -o yaml alert: description: | The recommended course of action:

  1. Retrieve details of the Deployment: kubectl -n d8-snapshot-controller describe deploy snapshot-controller
  2. View the status of the Pod and try to figure out why it is not running: kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller labels: pod: snapshot-controller-75bd776d76-xhb2c prometheus: deckhouse tier: cluster name: D8SnapshotControllerPodIsNotReady severityLevel: “8” summary: The snapshot-controller Pod is NOT Ready. apiVersion: deckhouse.io/v1alpha1 kind: ClusterAlert metadata: creationTimestamp: “2023-05-15T14:24:08Z” generation: 1 labels: app: prometheus heritage: deckhouse name: 235d4efba7df6af4 resourceVersion: “36262598” uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7 status: alertStatus: firing lastUpdateTime: “2023-05-15T18:10:09Z” startsAt: “2023-05-10T13:43:09Z”

Добавьте в namespace, в котором находится ScrapeConfig, лейбл prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true".

Remember the special alert DeadMansSwitch — its presence in the cluster indicates that Prometheus is working.

Пример:

How do I add additional endpoints to a scrape config?

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”

Add the label prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true" to the namespace where the ScrapeConfig was created.

Добавьте ScrapeConfig, который имеет обязательный лейбл prometheus: main:

Example:

yaml apiVersion: monitoring.coreos.com/v1alpha1 kind: ScrapeConfig metadata: name: example-scrape-config namespace: frontend labels: prometheus: main spec: honorLabels: true staticConfigs:

  • targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’] relabelings:
  • regex: endpoint|namespace|pod|service action: labeldrop
  • targetLabel: scrape_endpoint replacement: main
  • targetLabel: job replacement: kube-state-metrics metricsPath: ‘/metrics’

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”

Add the ScrapeConfig with the required label prometheus: main:

 

yaml apiVersion: monitoring.coreos.com/v1alpha1 kind: ScrapeConfig metadata: name: example-scrape-config namespace: frontend labels: prometheus: main spec: honorLabels: true staticConfigs:

  • targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’] relabelings:
  • regex: endpoint|namespace|pod|service action: labeldrop
  • targetLabel: scrape_endpoint replacement: main
  • targetLabel: job replacement: kube-state-metrics metricsPath: ‘/metrics’