|
|
Как собирать метрики с приложений, расположенных вне кластера?
| How do I collect metrics from applications running outside of the cluster?
|
- Сконфигурировать Service по аналогии с сервисом для сбора метрик с вашего приложения, но без указания параметра
spec.selector .
- Создать Endpoints для этого Service, явно указав в них
IP:PORT , по которым ваши приложения отдают метрики.
Важный момент: имена портов в Endpoints должны совпадать с именами этих портов в Service.
|
- Configure a Service similar to the one that collects metrics from your application (but do not set the
spec.selector parameter).
- Create Endpoints for this Service and explicitly specify the
IP:PORT pairs that your applications use to expose metrics.
|
Пример
|
Port names in Endpoints must match those in the Service.
|
Метрики приложения доступны без TLS, по адресу http://10.182.10.5:9114/metrics .
| An example
|
yaml
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: my-namespace
labels:
prometheus.deckhouse.io/custom-target: my-app
spec:
ports:
- name: http-metrics
port: 9114
—
apiVersion: v1
kind: Endpoints
metadata:
name: my-app
namespace: my-namespace
subsets:
- addresses:
- ip: 10.182.10.5
ports:
- name: http-metrics
port: 9114
| Application metrics are freely available (no TLS involved) at http://10.182.10.5:9114/metrics .
|
Как добавить дополнительные dashboard’ы в вашем проекте?
| yaml
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: my-namespace
labels:
prometheus.deckhouse.io/custom-target: my-app
spec:
ports:
- name: http-metrics
port: 9114
—
apiVersion: v1
kind: Endpoints
metadata:
name: my-app
namespace: my-namespace
subsets:
- addresses:
- ip: 10.182.10.5
ports:
- name: http-metrics
port: 9114
|
Добавление пользовательских dashboard’ов для Grafana в Deckhouse реализовано с помощью подхода Infrastructure as a Code.
Чтобы ваш dashboard появился в Grafana, необходимо создать в кластере специальный ресурс — GrafanaDashboardDefinition .
| How do I create custom Grafana dashboards?
|
Пример:
| Custom Grafana dashboards can be added to the project using the Infrastructure as a Code approach.
To add your dashboard to Grafana, create the dedicated GrafanaDashboardDefinition Custom Resource in the cluster.
|
yaml
apiVersion: deckhouse.io/v1
kind: GrafanaDashboardDefinition
metadata:
name: my-dashboard
spec:
folder: My folder # Папка, в которой в Grafana будет отображаться ваш dashboard.
definition: |
{
“annotations”: {
“list”: [
{
“builtIn”: 1,
“datasource”: “– Grafana –”,
“enable”: true,
“hide”: true,
“iconColor”: “rgba(0, 211, 255, 1)”,
“limit”: 100,
…
| An example:
|
Важно! Системные и добавленные через GrafanaDashboardDefinition dashboard’ы нельзя изменить через интерфейс Grafana.
| yaml
apiVersion: deckhouse.io/v1
kind: GrafanaDashboardDefinition
metadata:
name: my-dashboard
spec:
folder: My folder # The folder where the custom dashboard will be located.
definition: |
{
“annotations”: {
“list”: [
{
“builtIn”: 1,
“datasource”: “– Grafana –”,
“enable”: true,
“hide”: true,
“iconColor”: “rgba(0, 211, 255, 1)”,
“limit”: 100,
…
|
Как добавить алерты и/или recording-правила для вашего проекта?
|
Caution! System dashboards and dashboards added using GrafanaDashboardDefinition cannot be modified via the Grafana interface.
|
Для добавления алертов существует специальный ресурс — CustomPrometheusRules .
| How do I add alerts and/or recording rules?
|
Параметры:
| The CustomPrometheusRules resource allows you to add alerts.
|
Пример:
| Parameters:
groups — is the only parameter where you need to define alert groups. The structure of the groups is similar to that of prometheus-operator.
|
yaml
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
name: my-rules
spec:
groups:
- name: cluster-state-alert.rules
rules:
- alert: CephClusterErrorState
annotations:
description: Storage cluster is in error state for more than 10m.
summary: Storage cluster is in error state
plk_markup_format: markdown
expr: |
ceph_health_status{job=”rook-ceph-mgr”} > 1
| An example:
|
Как подключить дополнительные data source для Grafana?
| yaml
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
name: my-rules
spec:
groups:
- name: cluster-state-alert.rules
rules:
- alert: CephClusterErrorState
annotations:
description: Storage cluster is in error state for more than 10m.
summary: Storage cluster is in error state
plk_markup_format: markdown
expr: |
ceph_health_status{job=”rook-ceph-mgr”} > 1
|
Для подключения дополнительных data source к Grafana существует специальный ресурс — GrafanaAdditionalDatasource .
| How do I provision additional Grafana data sources?
|
Параметры ресурса подробно описаны в документации к Grafana. Тип ресурса смотрите в документации по конкретному datasource.
| The GrafanaAdditionalDatasource allows you to provision additional Grafana data sources.
|
Пример:
| A detailed description of the resource parameters is available in the Grafana documentation.
|
yaml
apiVersion: deckhouse.io/v1
kind: GrafanaAdditionalDatasource
metadata:
name: another-prometheus
spec:
type: prometheus
access: Proxy
url: https://another-prometheus.example.com/prometheus
basicAuth: true
basicAuthUser: foo
jsonData:
timeInterval: 30s
httpMethod: POST
secureJsonData:
basicAuthPassword: bar
| See the datasource type in the documentation for the specific datasource.
|
Как обеспечить безопасный доступ к метрикам?
| An example:
|
Для обеспечения безопасности настоятельно рекомендуем использовать kube-rbac-proxy .
| yaml
apiVersion: deckhouse.io/v1
kind: GrafanaAdditionalDatasource
metadata:
name: another-prometheus
spec:
type: prometheus
access: Proxy
url: https://another-prometheus.example.com/prometheus
basicAuth: true
basicAuthUser: foo
jsonData:
timeInterval: 30s
httpMethod: POST
secureJsonData:
basicAuthPassword: bar
|
Пример безопасного сбора метрик с приложения, расположенного в кластере
| How do I enable secure access to metrics?
|
Для настройки защиты метрик приложения с использованием kube-rbac-proxy и последующей сборки метрик с него средствами Prometheus выполните следующие шаги:
| To enable secure access to metrics, we strongly recommend using kube-rbac-proxy.
|
- Создайте
ServiceAccount с указанными ниже правами:
| An example of collecting metrics securely from an application inside a cluster
|
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbac-proxy-test
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rbac-proxy-test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: d8:rbac-proxy
subjects:
- kind: ServiceAccount
name: rbac-proxy-test
namespace: default
| Do the following to set up application metrics protection via the kube-rbac-proxy with the subsequent metrics scraping using Prometheus tools:
|
Обратите внимание, что используется встроенная в Deckhouse ClusterRole d8:rbac-proxy .
|
- Create a new
ServiceAccount with the following permissions:
|
- Создайте конфигурацию для
kube-rbac-proxy :
| yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbac-proxy-test
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rbac-proxy-test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: d8:rbac-proxy
subjects:
- kind: ServiceAccount
name: rbac-proxy-test
namespace: default
|
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rbac-proxy-config-test
namespace: rbac-proxy-test
data:
config-file.yaml: |+
authorization:
resourceAttributes:
namespace: default
apiVersion: v1
resource: services
subresource: proxy
name: rbac-proxy-test
|
The example uses the d8:rbac-proxy built-in Deckhouse ClusterRole .
|
Более подробную информацию по атрибутам можно найти в документации Kubernetes.
|
- Create a configuration for the
kube-rbac-proxy :
|
- Создайте
Service и Deployment для вашего приложения, где kube-rbac-proxy займет позицию sidecar-контейнера:
| yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rbac-proxy-config-test
namespace: rbac-proxy-test
data:
config-file.yaml: |+
authorization:
resourceAttributes:
namespace: default
apiVersion: v1
resource: services
subresource: proxy
name: rbac-proxy-test
|
yaml
apiVersion: v1
kind: Service
metadata:
name: rbac-proxy-test
labels:
prometheus.deckhouse.io/custom-target: rbac-proxy-test
spec:
ports:
- name: https-metrics
port: 8443
targetPort: https-metrics
selector:
app: rbac-proxy-test
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbac-proxy-test
spec:
replicas: 1
selector:
matchLabels:
app: rbac-proxy-test
template:
metadata:
labels:
app: rbac-proxy-test
spec:
securityContext:
runAsUser: 65532
serviceAccountName: rbac-proxy-test
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.14.0
args:
- ”–secure-listen-address=0.0.0.0:8443”
- ”–upstream=http://127.0.0.1:8081/”
- ”–config-file=/kube-rbac-proxy/config-file.yaml”
- ”–logtostderr=true”
- ”–v=10”
ports:
- containerPort: 8443
name: https-metrics
volumeMounts:
- name: config
mountPath: /kube-rbac-proxy
- name: prometheus-example-app
image: quay.io/brancz/prometheus-example-app:v0.1.0
args:
- ”–bind=127.0.0.1:8081”
volumes:
- name: config
configMap:
name: rbac-proxy-config-test
|
Get more information on authorization attributes in the Kubernetes documentation.
|
- Назначьте необходимые права на ресурс для Prometheus:
|
- Create
Service and Deployment for your application with the kube-rbac-proxy as a sidecar container:
|
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rbac-proxy-test-client
rules:
- apiGroups: [””]
resources: [“services/proxy”]
verbs: [“get”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rbac-proxy-test-client
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rbac-proxy-test-client
subjects:
- kind: ServiceAccount
name: prometheus
namespace: d8-monitoring
| yaml
apiVersion: v1
kind: Service
metadata:
name: rbac-proxy-test
labels:
prometheus.deckhouse.io/custom-target: rbac-proxy-test
spec:
ports:
- name: https-metrics
port: 8443
targetPort: https-metrics
selector:
app: rbac-proxy-test
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbac-proxy-test
spec:
replicas: 1
selector:
matchLabels:
app: rbac-proxy-test
template:
metadata:
labels:
app: rbac-proxy-test
spec:
securityContext:
runAsUser: 65532
serviceAccountName: rbac-proxy-test
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.14.0
args:
- ”–secure-listen-address=0.0.0.0:8443”
- ”–upstream=http://127.0.0.1:8081/”
- ”–config-file=/kube-rbac-proxy/config-file.yaml”
- ”–logtostderr=true”
- ”–v=10”
ports:
- containerPort: 8443
name: https-metrics
volumeMounts:
- name: config
mountPath: /kube-rbac-proxy
- name: prometheus-example-app
image: quay.io/brancz/prometheus-example-app:v0.1.0
args:
- ”–bind=127.0.0.1:8081”
volumes:
- name: config
configMap:
name: rbac-proxy-config-test
|
После шага 4 метрики вашего приложения должны появиться в Prometheus.
|
- Add the necessary resource permissions to Prometheus:
|
Пример безопасного сбора метрик с приложения, расположенного вне кластера
| yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rbac-proxy-test-client
rules:
- apiGroups: [””]
resources: [“services/proxy”]
verbs: [“get”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rbac-proxy-test-client
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rbac-proxy-test-client
subjects:
- kind: ServiceAccount
name: prometheus
namespace: d8-monitoring
|
Предположим, что есть доступный через интернет сервер, на котором работает node-exporter . По умолчанию node-exporter слушает на порту 9100 и доступен на всех интерфейсах. Необходимо обеспечить контроль доступа к node-exporter для безопасного сбора метрик. Ниже приведен пример такой настройки.
| After step 4, your application’s metrics should become available in Prometheus.
|
Требования:
- Из кластера должен быть доступ до сервиса
kube-rbac-proxy , запущенного на удаленном сервере.
- От удаленного сервера должен быть доступ до API-сервера кластера.
| An example of collecting metrics securely from an application outside a cluster
|
Выполните следующие шаги:
- Создайте
ServiceAccount с указанными ниже правами:
| Suppose there is a server exposed to the Internet on which the node-exporter is running. By default, the node-exporter listens on port 9100 and is available on all interfaces. One needs to ensure access control to the node-exporter so that metrics can be collected securely. Below is an example of how you can set this up.
|
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-external-endpoint-server-01
namespace: d8-service-accounts
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-external-endpoint
rules:
- apiGroups: [“authentication.k8s.io”]
resources:
- tokenreviews
verbs: [“create”]
- apiGroups: [“authorization.k8s.io”]
resources:
- subjectaccessreviews
verbs: [“create”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-external-endpoint-server-01
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-external-endpoint
subjects:
- kind: ServiceAccount
name: prometheus-external-endpoint-server-01
namespace: d8-service-accounts
| Requirements:
- There must be network access from the cluster to the
kube-rbac-proxy service running on the remote server.
- The remote server must have access to the Kubernetes API server.
|
- Сгенерируйте
kubeconfig для созданного ServiceAccount (пример генерации kubeconfig для ServiceAccount ).
| Follow these steps:
- Create a new
ServiceAccount with the following permissions:
|
- Положите получившийся
kubeconfig на удаленный сервер. В дальнейшем понадобится указать путь к этому kubeconfig в настройках kube-rbac-proxy (в примере используется путь ${PWD}/.kube/config ).
| yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-external-endpoint-server-01
namespace: d8-service-accounts
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-external-endpoint
rules:
- apiGroups: [“authentication.k8s.io”]
resources:
- tokenreviews
verbs: [“create”]
- apiGroups: [“authorization.k8s.io”]
resources:
- subjectaccessreviews
verbs: [“create”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-external-endpoint-server-01
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-external-endpoint
subjects:
- kind: ServiceAccount
name: prometheus-external-endpoint-server-01
namespace: d8-service-accounts
|
- Настройте
node-exporter на удаленном сервере, чтобы он был доступен только на локальном интерфейсе (слушал 127.0.0.1:9100 ).
- Запустите
kube-rbac-proxy на удаленном сервере:
|
- Generate a
kubeconfig file for the created ServiceAccount (refer to the example on how to generate kubeconfig for ServiceAccount ).
|
shell
docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10
|
- Copy the
kubeconfig file to the remote server. You will also have to specify the kubeconfig path in the kube-rbac-proxy settings (our example uses ${PWD}/.kube/config ).
|
- Проверьте, что порт
8443 доступен по внешнему адресу удаленного сервера.
|
- Configure
node-exporter on the remote server to be accessible only on the local interface (i.e., listening on 127.0.0.1:9100 ).
- Run
kube-rbac-proxy on the remote server:
|
- Создайте в кластере
Service и Endpoint , указав в качестве <server_ip_address> внешний адрес удаленного сервера:
| shell
docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10
|
yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus-external-endpoint-server-01
labels:
prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01
spec:
ports:
- name: https-metrics
port: 8443
—
apiVersion: v1
kind: Endpoints
metadata:
name: prometheus-external-endpoint-server-01
subsets:
- addresses:
- ip:
ports:
- name: https-metrics
port: 8443
|
- Check that port
8443 is accessible at the remote server’s external address.
|
Как добавить Alertmanager?
|
- Create
Service and Endpoint , specifying the external address of the remote server as <server_ip_address> :
|
Создайте custom resource CustomAlertmanager с типом Internal .
| yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus-external-endpoint-server-01
labels:
prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01
spec:
ports:
- name: https-metrics
port: 8443
—
apiVersion: v1
kind: Endpoints
metadata:
name: prometheus-external-endpoint-server-01
subsets:
- addresses:
- ip:
ports:
- name: https-metrics
port: 8443
|
Пример:
| How do I add Alertmanager?
|
yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: webhook
spec:
type: Internal
internal:
route:
groupBy: [‘job’]
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: ‘webhook’
receivers:
- name: ‘webhook’
webhookConfigs:
- url: ‘http://webhookserver:8080/’
| Create a custom resource CustomAlertmanager with type Internal .
|
Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.
| Example:
|
Как добавить внешний дополнительный Alertmanager?
| yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: webhook
spec:
type: Internal
internal:
route:
groupBy: [‘job’]
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: ‘webhook’
receivers:
- name: ‘webhook’
webhookConfigs:
- url: ‘http://webhookserver:8080/’
|
Создайте custom resource CustomAlertmanager с типом External , который может указывать на Alertmanager по FQDN или через сервис в Kubernetes-кластере.
| Refer to the description of the CustomAlertmanager custom resource for more information about the parameters.
|
Пример FQDN Alertmanager:
| How do I add an additional Alertmanager?
|
yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: my-fqdn-alertmanager
spec:
external:
address: https://alertmanager.mycompany.com/myprefix
type: External
| Create a custom resource CustomAlertmanager with the type External , it can point to Alertmanager through the FQDN or Kubernetes service.
|
Пример Alertmanager с Kubernetes service:
| FQDN Alertmanager example:
|
yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: my-service-alertmanager
spec:
external:
service:
namespace: myns
name: my-alertmanager
path: /myprefix/
type: External
| yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: my-fqdn-alertmanager
spec:
external:
address: https://alertmanager.mycompany.com/myprefix
type: External
|
Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.
| Alertmanager with a Kubernetes service:
|
Как в Alertmanager игнорировать лишние алерты?
| yaml
apiVersion: deckhouse.io/v1alpha1
kind: CustomAlertmanager
metadata:
name: my-service-alertmanager
spec:
external:
service:
namespace: myns
name: my-alertmanager
path: /myprefix/
type: External
|
Решение сводится к настройке маршрутизации алертов в вашем Alertmanager.
| Refer to the description of the CustomAlertmanager Custom Resource for more information about the parameters.
|
Потребуется:
| How do I ignore unnecessary alerts in Alertmanager?
|
- Завести получателя без параметров.
- Смаршрутизировать лишние алерты в этого получателя.
| The solution comes down to configuring alert routing in the Alertmanager.
|
Ниже приведены примеры настройки CustomAlertmanager .
| You will need to:
|
Чтобы получать только алерты с лейблами service: foo|bar|baz :
|
- Create a parameterless receiver.
- Route unwanted alerts to this receiver.
|
yaml
receivers:
Получатель, определенный без параметров, будет работать как “/dev/null”.
- name: blackhole
Действующий получатель
- name: some-other-receiver
…
route:
receiver по умолчанию.
receiver: blackhole
routes:
Дочерний маршрут
- matchers:
- matchType: =~
name: service
value: ^(foo|bar|baz)$
receiver: some-other-receiver
| Below are samples for configuring CustomAlertmanager .
|
Чтобы получать все алерты, кроме DeadMansSwitch :
| Receive all alerts with labels service: foo|bar|baz :
|
yaml
receivers:
Получатель, определенный без параметров, будет работать как “/dev/null”.
- name: blackhole
Действующий получатель.
- name: some-other-receiver
…
route:
receiver по умолчанию.
receiver: some-other-receiver
routes:
Дочерний маршрут.
- matchers:
- matchType: =
name: alertname
value: DeadMansSwitch
receiver: blackhole
| yaml
receivers:
The parameterless receiver is similar to “/dev/null”.
- name: blackhole
Your valid receiver.
- name: some-other-receiver
…
route:
Default receiver.
receiver: blackhole
routes:
Child receiver.
- matchers:
- matchType: =~
name: service
value: ^(foo|bar|baz)$
receiver: some-other-receiver
|
С подробным описанием всех параметров можно ознакомиться в официальной документации.
| Receive all alerts except for DeadMansSwitch :
|
Почему нельзя установить разный scrapeInterval для отдельных таргетов?
| yaml
receivers:
The parameterless receiver is similar to “/dev/null”.
- name: blackhole
Your valid receiver.
- name: some-other-receiver
…
route:
default receiver
receiver: some-other-receiver
routes:
Child receiver.
- matchers:
- matchType: =
name: alertname
value: DeadMansSwitch
receiver: blackhole
|
Наиболее полный ответ на этот вопрос дает разработчик Prometheus Brian Brazil.
Если коротко, разные scrapeInterval’ы принесут следующие проблемы:
- увеличение сложности конфигурации;
- проблемы при написании запросов и создании графиков;
- короткие интервалы больше похожи на профилирование приложения, и, скорее всего, Prometheus — не самый подходящий инструмент для этого.
| A detailed description of all parameters can be found in the official documentation.
|
Наиболее разумное значение для scrapeInterval находится в диапазоне 10–60 секунд.
| Why can’t different scrape Intervals be set for individual targets?
|
Как ограничить потребление ресурсов Prometheus?
| The Prometheus developer Brian Brazil provides, probably, the most comprehensive answer to this question.
In short, different scrapeIntervals are likely to cause the following complications:
- Increasing configuration complexity;
- Problems with writing queries and creating graphs;
- Short intervals are more like profiling an app, and Prometheus isn’t the best tool to do this in most cases.
|
Чтобы избежать ситуаций, когда VPA запрашивает для Prometheus или Longterm Prometheus ресурсов больше, чем есть на выделенном для этого узле, можно явно ограничить VPA с помощью параметров модуля:
vpa.longtermMaxCPU ;
vpa.longtermMaxMemory ;
vpa.maxCPU ;
vpa.maxMemory .
| The most appropriate value for scrapeInterval is in the range of 10-60s.
|
Как получить доступ к метрикам Prometheus из Lens?
| How do I limit Prometheus resource consumption?
|
Внимание! Использование данной конфигурации создает сервис, в котором метрики Prometheus доступны без авторизации.
| To avoid situations when VPA requests more resources for Prometheus or Longterm Prometheus than those available on the corresponding node, you can explicitly limit VPA using module parameters:
vpa.longtermMaxCPU
vpa.longtermMaxMemory
vpa.maxCPU
vpa.maxMemory
|
Для обеспечения доступа Lens к метрикам Prometheus необходимо создать в кластере ряд ресурсов.
| How do I get access to Prometheus metrics from Lens?
|
|
Caution! Using this configuration creates a service in which Prometheus metrics are available without authorization.
|
yaml
apiVersion: v1
kind: Namespace
metadata:
name: lens-proxy
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-lens-proxy:prometheus-access
rules:
- apiGroups: [“monitoring.coreos.com”]
resources: [“prometheuses/http”]
resourceNames: [“main”, “longterm”]
verbs: [“get”, “create”, “update”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-lens-proxy:prometheus-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-lens-proxy:prometheus-access
subjects:
- kind: ServiceAccount
name: prometheus-lens-proxy
namespace: lens-proxy
—
apiVersion: v1
kind: Secret
metadata:
name: prometheus-lens-proxy-sa
namespace: lens-proxy
annotations:
kubernetes.io/service-account.name: prometheus-lens-proxy
type: kubernetes.io/service-account-token
—
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-lens-proxy-conf
namespace: lens-proxy
data:
“39-log-format.sh”: |
cat > /etc/nginx/conf.d/log-format.conf «“EOF”
log_format body ‘$remote_addr - $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘“$http_user_agent” “$http_x_forwarded_for”’
‘ req body: $request_body’;
EOF
“40-prometheus-proxy-conf.sh”: |
#!/bin/sh
prometheus_service=”$(getent hosts prometheus.d8-monitoring | awk ‘{print $2}’)”
nameserver=”$(awk ‘/nameserver/{print $2}’ < /etc/resolv.conf)”
cat > /etc/nginx/conf.d/prometheus.conf «EOF
server {
listen 80 default_server;
resolver ${nameserver} valid=30s;
set $upstream ${prometheus_service};
location / {
proxy_http_version 1.1;
proxy_set_header Authorization “Bearer ${BEARER_TOKEN}”;
proxy_pass https://$upstream:9090$request_uri;
}
access_log /dev/stdout body;
}
EOF
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-lens-proxy
template:
metadata:
labels:
app: prometheus-lens-proxy
spec:
containers:
- name: nginx
image: nginx:1.21.4-alpine
env:
- name: BEARER_TOKEN
valueFrom:
secretKeyRef:
name: prometheus-lens-proxy-sa
key: token
ports:
- containerPort: 80
volumeMounts:
- mountPath: /docker-entrypoint.d/40-prometheus-proxy-conf.sh
subPath: “40-prometheus-proxy-conf.sh”
name: prometheus-lens-proxy-conf
- mountPath: /docker-entrypoint.d/39-log-format.sh
name: prometheus-lens-proxy-conf
subPath: 39-log-format.sh
serviceAccountName: prometheus-lens-proxy
volumes:
- name: prometheus-lens-proxy-conf
configMap:
name: prometheus-lens-proxy-conf
defaultMode: 0755
—
apiVersion: v1
kind: Service
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
spec:
selector:
app: prometheus-lens-proxy
ports:
- protocol: TCP
port: 8080
targetPort: 80
| To provide Lens access to Prometheus metrics, you need to create some resources in a cluster.
|
|
|
После деплоя ресурсов метрики Prometheus будут доступны по адресу lens-proxy/prometheus-lens-proxy:8080 .
Тип Prometheus в Lens — Prometheus Operator .
| yaml
apiVersion: v1
kind: Namespace
metadata:
name: lens-proxy
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-lens-proxy:prometheus-access
rules:
- apiGroups: [“monitoring.coreos.com”]
resources: [“prometheuses/http”]
resourceNames: [“main”, “longterm”]
verbs: [“get”, “create”, “update”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-lens-proxy:prometheus-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-lens-proxy:prometheus-access
subjects:
- kind: ServiceAccount
name: prometheus-lens-proxy
namespace: lens-proxy
—
apiVersion: v1
kind: Secret
metadata:
name: prometheus-lens-proxy-sa
namespace: lens-proxy
annotations:
kubernetes.io/service-account.name: prometheus-lens-proxy
type: kubernetes.io/service-account-token
—
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-lens-proxy-conf
namespace: lens-proxy
data:
“39-log-format.sh”: |
cat > /etc/nginx/conf.d/log-format.conf «“EOF”
log_format body ‘$remote_addr - $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘“$http_user_agent” “$http_x_forwarded_for”’
‘ req body: $request_body’;
EOF
“40-prometheus-proxy-conf.sh”: |
#!/bin/sh
prometheus_service=”$(getent hosts prometheus.d8-monitoring | awk ‘{print $2}’)”
nameserver=”$(awk ‘/nameserver/{print $2}’ < /etc/resolv.conf)”
cat > /etc/nginx/conf.d/prometheus.conf «EOF
server {
listen 80 default_server;
resolver ${nameserver} valid=30s;
set $upstream ${prometheus_service};
location / {
proxy_http_version 1.1;
proxy_set_header Authorization “Bearer ${BEARER_TOKEN}”;
proxy_pass https://$upstream:9090$request_uri;
}
access_log /dev/stdout body;
}
EOF
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-lens-proxy
template:
metadata:
labels:
app: prometheus-lens-proxy
spec:
containers:
- name: nginx
image: nginx:1.21.4-alpine
env:
- name: BEARER_TOKEN
valueFrom:
secretKeyRef:
name: prometheus-lens-proxy-sa
key: token
ports:
- containerPort: 80
volumeMounts:
- mountPath: /docker-entrypoint.d/40-prometheus-proxy-conf.sh
subPath: “40-prometheus-proxy-conf.sh”
name: prometheus-lens-proxy-conf
- mountPath: /docker-entrypoint.d/39-log-format.sh
name: prometheus-lens-proxy-conf
subPath: 39-log-format.sh
serviceAccountName: prometheus-lens-proxy
volumes:
- name: prometheus-lens-proxy-conf
configMap:
name: prometheus-lens-proxy-conf
defaultMode: 0755
—
apiVersion: v1
kind: Service
metadata:
name: prometheus-lens-proxy
namespace: lens-proxy
spec:
selector:
app: prometheus-lens-proxy
ports:
- protocol: TCP
port: 8080
targetPort: 80
|
Начиная с версии 5.2.7 , Lens требует наличия меток pod и namespace в метриках node-exporter’а.
В противном случае потребление ресурсов узла не будет отображаться на диаграммах Lens.
|
|
Чтобы исправить это, примените следующий ресурс:
| After the resources deployment, Prometheus metrics will be available at address lens-proxy/prometheus-lens-proxy:8080 .
Lens Prometheus type — Prometheus Operator .
|
| Starting from the version 5.2.7 , Lens requires pod and namespace labels to be present on node-exporter metrics.
Otherwise, node resource consumption will not appear on Lens charts.
|
yaml
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
name: lens-hack
spec:
groups:
- name: lens-hack
rules:
-
expr: node_cpu_seconds_total{mode=~”user |
system”, pod!~”.+”, namespace!~”.+”} |
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_cpu_seconds_total
- expr: node_filesystem_size_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_filesystem_size_bytes
- expr: node_filesystem_avail_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_filesystem_avail_bytes
- expr: node_memory_MemTotal_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_MemTotal_bytes
- expr: node_memory_MemFree_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_MemFree_bytes
- expr: node_memory_Buffers_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_Buffers_bytes
- expr: node_memory_Cached_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_Cached_bytes
| To fix this, apply the following resource:
|
|
|
Как настроить ServiceMonitor или PodMonitor для работы с Prometheus?
| yaml
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
name: lens-hack
spec:
groups:
- name: lens-hack
rules:
-
expr: node_cpu_seconds_total{mode=~”user |
system”, pod!~”.+”, namespace!~”.+”} |
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_cpu_seconds_total
- expr: node_filesystem_size_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_filesystem_size_bytes
- expr: node_filesystem_avail_bytes{mountpoint=”/”, pod!~”.+”, namespace!~”.+”}
- on(node) group_left(namespace, pod) kube_pod_info{namespace=”d8-monitoring”,
created_by_name=”node-exporter”}
record: node_filesystem_avail_bytes
- expr: node_memory_MemTotal_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_MemTotal_bytes
- expr: node_memory_MemFree_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_MemFree_bytes
- expr: node_memory_Buffers_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_Buffers_bytes
- expr: node_memory_Cached_bytes{pod!~”.+”, namespace!~”.+”} * on(node) group_left(namespace,
pod) kube_pod_info{namespace=”d8-monitoring”, created_by_name=”node-exporter”}
record: node_memory_Cached_bytes
|
Добавьте лейбл prometheus: main к Pod/Service Monitor.
Добавьте в namespace, в котором находится Pod/Service Monitor, лейбл prometheus.deckhouse.io/monitor-watcher-enabled: "true" .
|
|
Пример:
| How do I set up a ServiceMonitor or PodMonitor to work with Prometheus?
|
yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/monitor-watcher-enabled: “true”
—
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
namespace: frontend
labels:
prometheus: main
spec:
selector:
matchLabels:
app: example-app
endpoints:
| Add the prometheus: main label to the PodMonitor or ServiceMonitor.
Add the label prometheus.deckhouse.io/monitor-watcher-enabled: "true" to the namespace where the PodMonitor or ServiceMonitor was created.
|
Как настроить PrometheusRules для работы с Prometheus?
| Example:
|
Добавьте в namespace, в котором находятся PrometheusRules, лейбл prometheus.deckhouse.io/rules-watcher-enabled: "true" .
| yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/monitor-watcher-enabled: “true”
—
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
namespace: frontend
labels:
prometheus: main
spec:
selector:
matchLabels:
app: example-app
endpoints:
|
Пример:
| How do I set up a PrometheusRules to work with Prometheus?
|
yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/rules-watcher-enabled: “true”
| Add the label prometheus.deckhouse.io/rules-watcher-enabled: "true" to the namespace where the PrometheusRules was created.
|
Как увеличить размер диска
| Example:
|
- Для увеличения размера отредактируйте PersistentVolumeClaim, указав новый размер в поле
spec.resources.requests.storage .
- Увеличение размера возможно, если в StorageClass поле
allowVolumeExpansion установлено в true .
- Если используемое хранилище не поддерживает изменение диска на лету, в статусе PersistentVolumeClaim появится сообщение
Waiting for user to (re-)start a pod to finish file system resize of volume on node. .
- Перезапустите под для завершения изменения размера файловой системы.
| yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/rules-watcher-enabled: “true”
|
Как получить информацию об алертах в кластере?
| How to expand disk size
|
Информацию об активных алертах можно получить не только в веб-интерфейсе Grafana/Prometheus, но и в CLI. Это может быть полезным, если у вас есть только доступ к API-серверу кластера и нет возможности открыть веб-интерфейс Grafana/Prometheus.
|
- To request a larger volume for a PVC, edit the PVC object and specify a larger size in
spec.resources.requests.storage field.
- You can only expand a PVC if its storage class’s
allowVolumeExpansion field is set to true.
- If storage doesn’t support online resize, the message
Waiting for user to (re-)start a pod to finish file system resize of volume on node. will appear in the PersistentVolumeClaim status.
- Restart the Pod to complete the file system resizing.
|
Выполните следующую команду для получения списка алертов в кластере:
|
|
shell
kubectl get clusteralerts
| You can get information about active alerts not only in the Grafana/Prometheus web interface but in the CLI. This can be useful if you only have access to the cluster API server and there is no way to open the Grafana/Prometheus web interface.
|
Пример:
| Run the following command to get cluster alerts:
|
shell
| shell
kubectl get clusteralerts
|
kubectl get clusteralerts
NAME ALERT SEVERITY AGE LAST RECEIVED STATUS
086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing
226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing
235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing
27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing
ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing
| Example:
|
Выполните следующую команду для просмотра конкретного алерта:
| shell
|
shell
kubectl get clusteralerts -o yaml
| kubectl get clusteralerts
NAME ALERT SEVERITY AGE LAST RECEIVED STATUS
086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing
226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing
235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing
27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing
ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing
|
Пример:
| Run the following command to view a specific alert:
|
shell
| shell
kubectl get clusteralerts -o yaml
|
kubectl get clusteralerts 235d4efba7df6af4 -o yaml
alert:
description: |
The recommended course of action:
- Retrieve details of the Deployment:
kubectl -n d8-snapshot-controller describe deploy snapshot-controller
- View the status of the Pod and try to figure out why it is not running:
kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller
labels:
pod: snapshot-controller-75bd776d76-xhb2c
prometheus: deckhouse
tier: cluster
name: D8SnapshotControllerPodIsNotReady
severityLevel: “8”
summary: The snapshot-controller Pod is NOT Ready.
apiVersion: deckhouse.io/v1alpha1
kind: ClusterAlert
metadata:
creationTimestamp: “2023-05-15T14:24:08Z”
generation: 1
labels:
app: prometheus
heritage: deckhouse
name: 235d4efba7df6af4
resourceVersion: “36262598”
uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7
status:
alertStatus: firing
lastUpdateTime: “2023-05-15T18:10:09Z”
startsAt: “2023-05-10T13:43:09Z”
| Example:
|
Помните о специальном алерте DeadMansSwitch — его присутствие в кластере говорит о работоспособности Prometheus.
| shell
|
Как добавить дополнительные эндпоинты в scrape config?
| kubectl get clusteralerts 235d4efba7df6af4 -o yaml
alert:
description: |
The recommended course of action:
- Retrieve details of the Deployment:
kubectl -n d8-snapshot-controller describe deploy snapshot-controller
- View the status of the Pod and try to figure out why it is not running:
kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller
labels:
pod: snapshot-controller-75bd776d76-xhb2c
prometheus: deckhouse
tier: cluster
name: D8SnapshotControllerPodIsNotReady
severityLevel: “8”
summary: The snapshot-controller Pod is NOT Ready.
apiVersion: deckhouse.io/v1alpha1
kind: ClusterAlert
metadata:
creationTimestamp: “2023-05-15T14:24:08Z”
generation: 1
labels:
app: prometheus
heritage: deckhouse
name: 235d4efba7df6af4
resourceVersion: “36262598”
uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7
status:
alertStatus: firing
lastUpdateTime: “2023-05-15T18:10:09Z”
startsAt: “2023-05-10T13:43:09Z”
|
Добавьте в namespace, в котором находится ScrapeConfig, лейбл prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true" .
| Remember the special alert DeadMansSwitch — its presence in the cluster indicates that Prometheus is working.
|
Пример:
| How do I add additional endpoints to a scrape config?
|
yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”
| Add the label prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true" to the namespace where the ScrapeConfig was created.
|
Добавьте ScrapeConfig, который имеет обязательный лейбл prometheus: main :
| Example:
|
yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: example-scrape-config
namespace: frontend
labels:
prometheus: main
spec:
honorLabels: true
staticConfigs:
- targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’]
relabelings:
- regex: endpoint|namespace|pod|service
action: labeldrop
- targetLabel: scrape_endpoint
replacement: main
- targetLabel: job
replacement: kube-state-metrics
metricsPath: ‘/metrics’
| yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
labels:
prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”
|
| Add the ScrapeConfig with the required label prometheus: main :
|
| yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: example-scrape-config
namespace: frontend
labels:
prometheus: main
spec:
honorLabels: true
staticConfigs:
- targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’]
relabelings:
- regex: endpoint|namespace|pod|service
action: labeldrop
- targetLabel: scrape_endpoint
replacement: main
- targetLabel: job
replacement: kube-state-metrics
metricsPath: ‘/metrics’
|
|
|