The module is enabled by default in the following bundles: Default, Managed. The module is disabled by default in the Minimal bundle.

How to explicitly enable the module…

Note that the configOverrides parameter of the InitConfiguration resource is used when installing Deckhouse, not ModuleConfig.

Set the extendedMonitoringEnabled: true or extendedMonitoringEnabled: false in the configOverrides parameter to explicitly enable or disable the module when installing Deckhouse.

Use the extendedMonitoring section of the configOverrides parameter to configure the module when installing Deckhouse.

Set the spec.enabled module parameter to true or false to explicitly enable or disable the module.

Example of enabling the extended-monitoring module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  enabled: true

Example of disabling the extended-monitoring module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  enabled: false

The module is configured using the ModuleConfig custom resource named extended-monitoring (learn more about setting up Deckhouse…).

Example of the ModuleConfig/extended-monitoring resource for configuring the module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: extended-monitoring
spec:
  version: 1
  enabled: true
  settings: # <-- Module parameters from the "Parameters" section below.

Parameters

Schema version: 1

  • certificatesobject

    Settings for monitoring the certificates in the Kubernetes cluster.

    • certificates.exporterEnabledboolean

      Enables certExporter.

      Default: false

  • eventsobject

    Settings for monitoring the events in the Kubernetes cluster.

    • events.exporterEnabledboolean

      Enables eventsExporter.

      Default: false

    • events.severityLevelstring

      Whether to expose only crutial events.

      Default: "OnlyWarnings"

      Allowed values: All, OnlyWarnings

  • imageAvailabilityobject

    Settings for monitoring the availability of images in the cluster.

    • imageAvailability.exporterEnabledboolean

      Enables imageAvailabilityExporter.

      Default: true

    • imageAvailability.ignoredImagesarray of strings

      A list of images to ignore when checking the presence in the registry, e.g., alpine:3.12 or quay.io/test/test:v1.1.

      Example:

      ignoredImages:
      - alpine:3.10
      - alpine:3.2
      
    • imageAvailability.skipRegistryCertVerificationboolean

      Whether to skip the verification of the container registry certificate.

      Default: false

  • nodeSelectorobject

    The same as in the pods’ spec.nodeSelector parameter in Kubernetes.

    If the parameter is omitted or false, it will be determined automatically.

  • tolerationsarray of objects

    The same as in the pods’ spec.tolerations parameter in Kubernetes;

    If the parameter is omitted or false, it will be determined automatically.

    • tolerations.effectstring
    • tolerations.keystring
    • tolerations.operatorstring
    • tolerations.tolerationSecondsinteger
    • tolerations.valuestring

How to use extended-monitoring-exporter

Attach the extended-monitoring.flant.com/enabled annotation to the Namespace to enable the export of extended monitoring metrics. You can do it by:

  • adding the appropriate helm-chart to the project (recommended method);
  • adding it to .gitlab-ci.yml (kubectl patch/create);
  • attaching it manually (kubectl annotate namespace my-app-production extended-monitoring.flant.com/enabled="").
  • configuring via namespace-configurator module.

Any of the methods above would result in the emergence of the default metrics (+ any custom metrics with the threshold.extended-monitoring.flant.com/ prefix) for all supported Kubernetes objects in the target Namespace. Note that monitoring and standard annotations are enabled automatically for a number of non-namespaced Kubernetes objects described below.

You can also add custom annotations with the specified value to threshold.extended-monitoring.flant.com/something Kubernetes objects, e.g., kubectl annotate pod test threshold.extended-monitoring.flant.com/disk-inodes-warning-threshold=30. In this case, the annotation value will replace the default one.

You can disable monitoring on a per-object basis by adding the extended-monitoring.flant.com/enabled=false annotation to it. Thus, the default annotations will also be disabled (as well as annotation-based alerts).

Standard annotations and supported Kubernetes objects

Below is the list of annotations used in Prometheus Rules and their default values.

Caution! All annotations:

  1. Start with a threshold.extended-monitoring.flant.com/ prefix;
  2. Have an integer value (except for the extended-monitoring.flant.com/enabled Namespace-annotation — its value can be omitted). The specified value defines the alert threshold.

Non-namespaced Kubernetes objects

Do not require a Namespace annotation and are enabled by default.

Node
Annotation Type Default value
disk-bytes-warning int (percent) 70
disk-bytes-critical int (percent) 80
disk-inodes-warning int (percent) 85
disk-inodes-critical int (percent) 90
load-average-per-core-warning int 3
load-average-per-core-critical int 10

Caution! These annotations do not apply to imagefs (/var/lib/docker by default) and nodefs (/var/lib/kubelet by default) volumes. The thresholds for these volumes are configured completely automatically according to the kubelet’s eviction thresholds. The default values are available here; for more info, see the exporter.

Namespaced Kubernetes objects

Pod
Annotation Type Default value
disk-bytes-warning int (percent) 85
disk-bytes-critical int (percent) 95
disk-inodes-warning int (percent) 85
disk-inodes-critical int (percent) 90
container-throttling-warning int (percent) 25
container-throttling-critical int (percent) 50
container-cores-throttling-warning int (cores)  
container-cores-throttling-critical int (cores)  
Ingress
Annotation Type Default value
5xx-warning int (percent) 10
5xx-critical int (percent) 20
Deployment
Annotation Type Default value
replicas-not-ready int (count) 0

The threshold implies the number of unavailable replicas in addition to maxUnavailable. This threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable by the amount specified. Suppose replicas-not-ready is 0. In this case, the threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable. If replicas-not-ready is set to 1, then the threshold will be triggered if the number of unavailable replicas is greater than maxUnavailable + 1. This way, you can fine-tune this parameter for specific Deployments (that may be unavailable) in the Namespace with the extended monitoring enabled to avoid getting excessive alerts.

Statefulset
Annotation Type Default value
replicas-not-ready int (count) 0

The threshold implies the number of unavailable replicas in addition to maxUnavailable (see the comments on Deployment).

DaemonSet
Annotation Type Default value
replicas-not-ready int (count) 0

The threshold implies the number of unavailable replicas in addition to maxUnavailable (see the comments on Deployment).

CronJob

Note that only the deactivation using the extended-monitoring.flant.com/enabled=false annotation is supported.

How does it work?

The module exports specific Kubernetes object annotations to Prometheus. It allows you to improve Prometheus rules by adding the thresholds for triggering alerts. Using metrics that this module exports, you can, e.g., replace the “magic” constants in rules.

Before:

max by (namespace, pod, container) (
  (
    rate(container_cpu_cfs_throttled_periods_total[5m])
    /
    rate(container_cpu_cfs_periods_total[5m])
  )
  > 0.85
)

After:

max by (namespace, pod, container) (
  (
    rate(container_cpu_cfs_throttled_periods_total[5m])
    /
    rate(container_cpu_cfs_periods_total[5m])
  )
  > on (namespace, pod) group_left
    max by (namespace, pod) (extended_monitoring_pod_threshold{threshold="container-throttling-critical"}) / 100
)