The extended-monitoring module: configuration

Available in editions: CE, BE, SE, SE+, EE

How to use `extended-monitoring-exporter`

Attach the extended-monitoring.deckhouse.io/enabled label to the Namespace to enable the export of extended monitoring metrics. You can do it by:

adding the appropriate helm-chart to the project (recommended method);
adding it to .gitlab-ci.yml (kubectl patch/create);
attaching it manually (kubectl label namespace my-app-production extended-monitoring.deckhouse.io/enabled="").
configuring via namespace-configurator module.

Any of the methods above would result in the emergence of the default metrics (+ any custom metrics with the threshold.extended-monitoring.deckhouse.io/ prefix) for all supported Kubernetes objects in the target namespace. Note that monitoring is enabled automatically for a number of non-namespaced Kubernetes objects described below.

You can also add custom labels with the specified value to threshold.extended-monitoring.deckhouse.io/something Kubernetes objects, e.g., kubectl label pod test threshold.extended-monitoring.deckhouse.io/disk-inodes-warning=30. In this case, the label value will replace the default one.

If you want to override the default thresholds for all objects in a namespace, you can set the threshold.extended-monitoring.deckhouse.io/ label at the namespace level. For example: kubectl label namespace my-app-production threshold.extended-monitoring.deckhouse.io/5xx-warning=20 This will replace the default value for all objects in the namespace that do not already have this label set.

You can disable monitoring on a per-object basis by adding the extended-monitoring.deckhouse.io/enabled=false label to it. Thus, the default labels will also be disabled (as well as label-based alerts).

Standard labels and supported Kubernetes objects

Below is the list of labels used in Prometheus Rules and their default values.

Note, that all the labels start with the threshold.extended-monitoring.deckhouse.io/ prefix. The value specified in a label is a number that sets the alert trigger threshold.

For example, the label threshold.extended-monitoring.deckhouse.io/5xx-warning: "5" on the Ingress resource changes the alert threshold from 10% (default) to 5%.

Non-namespaced Kubernetes objects

Non-namespaced Kubernetes objects do not need labels on the namespace, and monitoring on them is enabled by default when the module is enabled.

Node

Label	Type	Default value
disk-bytes-warning	int (percent)	70
disk-bytes-critical	int (percent)	80
disk-inodes-warning	int (percent)	90
disk-inodes-critical	int (percent)	95
load-average-per-core-warning	int	3
load-average-per-core-critical	int	10

Caution! These labels do not apply to imagefs (/var/lib/docker by default) and nodefs (/var/lib/kubelet by default) volumes. The thresholds for these volumes are configured completely automatically according to the kubelet’s eviction thresholds. The default values are available here; for more info, see the exporter.

Namespaced Kubernetes objects

Pod

Label	Type	Default value
disk-bytes-warning	int (percent)	85
disk-bytes-critical	int (percent)	95
disk-inodes-warning	int (percent)	85
disk-inodes-critical	int (percent)	90

Ingress

Label	Type	Default value
5xx-warning	int (percent)	10
5xx-critical	int (percent)	20

Deployment

Label	Type	Default value
replicas-not-ready	int (count)	0

The threshold implies the number of unavailable replicas in addition to maxUnavailable. This threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable by the amount specified. Suppose replicas-not-ready is 0. In this case, the threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable. If replicas-not-ready is set to 1, then the threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable + 1. This way, you can fine-tune this parameter for specific Deployments (that may be unavailable) in the namespace with the extended monitoring enabled to avoid getting excessive alerts.

StatefulSet

Label	Type	Default value
replicas-not-ready	int (count)	0

The threshold implies the number of unavailable replicas in addition to maxUnavailable (see the comments on Deployment).

DaemonSet

Label	Type	Default value
replicas-not-ready	int (count)	0

The threshold implies the number of unavailable replicas in addition to maxUnavailable (see the comments on Deployment).

CronJob

Note that only the deactivation using the extended-monitoring.deckhouse.io/enabled=false label is supported.

How does it work?

The module exports specific Kubernetes object labels to Prometheus. It allows you to improve Prometheus rules by adding the thresholds for triggering alerts. Using metrics that this module exports, you can, e.g., replace the “magic” constants in rules.

Before:

(
  kube_statefulset_status_replicas - kube_statefulset_status_replicas_ready
)
> 1

After:

(
  kube_statefulset_status_replicas - kube_statefulset_status_replicas_ready
)
> on (namespace, statefulset)
(
  max by (namespace, statefulset) (extended_monitoring_statefulset_threshold{threshold="replicas-not-ready"})
)

The module has 59 alerts.

The module is enabled by default in the following bundles: Default, Managed. The module is disabled by default in the Minimal bundle.

How to explicitly enable the module…

Set the spec.enabled module parameter to true or false in the ModuleConfig/extended-monitoring resource (create it, if necessary) to explicitly enable or disable the module, or use the deckhouse-controller module command in the d8-system/deckhouse pod.

Example of enabling the module:

by using the ModuleConfig resource:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  enabled: true

by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module enable extended-monitoring

Example of disabling the module:

by using the ModuleConfig resource:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  enabled: false

by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module disable extended-monitoring

Conversions

The module is configured using the ModuleConfig resource, the schema of which contains a version number. When you apply an old version of the ModuleConfig schema in a cluster, automatic transformations are performed. To manually update the ModuleConfig schema version, the following steps must be completed sequentially for each version :

Updates from version 1 to 2:
If the .imageAvailability.skipRegistryCertVerification field is set to true, add .imageAvailability.registry.tlsConfig.insecureSkipVerify=true. Then, delete the skipRegistryCertVerification field from the object. If the .imageAvailability object becomes empty after this change, delete it.

Settings

The module is configured using the ModuleConfig custom resource named extended-monitoring (learn more about setting up Deckhouse…).

Example of the ModuleConfig/extended-monitoring resource for configuring the module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  version: 2
  enabled: true
  settings: # <-- Module parameters from the "Parameters" section below.

Parameters

Schema version: 2

settings
object
- settings.certificates
  object
  Settings for monitoring the certificates in the Kubernetes cluster.
  - settings.certificates.exporterEnabled
    boolean
    Enables x509-certificate-exporter.
    
    Default: false
- settings.events
  object
  Settings for monitoring the events in the Kubernetes cluster.
  - settings.events.exporterEnabled
    boolean
    Enables eventsExporter.
    
    Default: false
  - settings.events.severityLevel
    string
    Whether to expose only crucial events.
    
    Default: "OnlyWarnings"
    Allowed values: All, OnlyWarnings
- settings.imageAvailability
  object
  Settings for monitoring the availability of images in the cluster.
  - settings.imageAvailability.exporterEnabled
    boolean
    Enables imageAvailabilityExporter.
    
    Default: true
  - settings.imageAvailability.forceCheckDisabledControllers
    array of strings
    A list of controller kinds for which image is forcibly checked, even when workloads are disabled or suspended.
    
    Specify All to check all controller kinds.
    
    Example:
    forceCheckDisabledControllers: - Deployment - StatefulSet
    - Element of the array
      string
      Allowed values: Deployment, StatefulSet, DaemonSet, CronJob, All
  - settings.imageAvailability.ignoredImages
    array of strings
    A list of images to ignore when checking the presence in the registry, e.g., alpine:3.12 or quay.io/test/test:v1.1.
    
    Example:
    ignoredImages: - alpine:3.10 - alpine:3.2
  - settings.imageAvailability.mirrors
    array of objects
    List of mirrors for container registries.
    
    Example:
    mirrors: - original: docker.io mirror: mirror.gcr.io - original: internal-registry.com mirror: mirror.internal-registry.com
    - settings.imageAvailability.mirrors.mirror
      string
      Required value
    - settings.imageAvailability.mirrors.original
      string
      Required value
  - settings.imageAvailability.registry
    object
    Connection settings for container registry.
    - settings.imageAvailability.registry.scheme
      string
      Container registry access scheme.
      
      Default: "HTTPS"
      Allowed values: HTTP, HTTPS
    - settings.imageAvailability.registry.tlsConfig
      object
      Connection settings for container registry.
      - settings.imageAvailability.registry.tlsConfig.ca
        string
        Root CA certificate to validate the container registry’s HTTPS certificate (if self-signed certificates are used).
      - settings.imageAvailability.registry.tlsConfig.insecureSkipVerify
        boolean
        Whether to skip the verification of the container registry certificate.
        
        Default: false
- settings.nodeSelector
  object
  The same as in the pods’ spec.nodeSelector parameter in Kubernetes.
  
  If the parameter is omitted or false, it will be determined automatically.
- settings.tolerations
  array of objects
  The same as in the pods’ spec.tolerations parameter in Kubernetes;
  
  If the parameter is omitted or false, it will be determined automatically.
  - settings.tolerations.effect
    string
  - settings.tolerations.key
    string
  - settings.tolerations.operator
    string
  - settings.tolerations.tolerationSeconds
    integer
  - settings.tolerations.value
    string

How to use extended-monitoring-exporter

Standard labels and supported Kubernetes objects

Non-namespaced Kubernetes objects

Node

Namespaced Kubernetes objects

Pod

Ingress

Deployment

StatefulSet

DaemonSet

CronJob

How does it work?

Conversions

Settings

Parameters

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error

How to use `extended-monitoring-exporter`