The module does not require any configuration – it works right out-of-the-box.

The module has 25 alerts.

The module is enabled by default in the following bundles: Default, Managed. The module is disabled by default in the Minimal bundle.

How to explicitly enable the module…

Set the spec.enabled module parameter to true or false in the ModuleConfig/prometheus resource (create it, if necessary) to explicitly enable or disable the module, or use the deckhouse-controller module command in the d8-system/deckhouse pod.

Example of enabling the module:

  • by using the ModuleConfig resource:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: prometheus
    spec:
      enabled: true
    
  • by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

    kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module enable prometheus
    

Example of disabling the module:

  • by using the ModuleConfig resource:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: prometheus
    spec:
      enabled: false
    
  • by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

    kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module disable prometheus
    

The module is configured using the ModuleConfig custom resource named prometheus (learn more about setting up Deckhouse…).

Example of the ModuleConfig/prometheus resource for configuring the module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: prometheus
spec:
  version: 2
  enabled: true
  settings: # <-- Module parameters from the "Parameters" section below.

Parameters

Schema version: 2

  • auth
    object

    Options related to authentication or authorization in the application.

    • auth.allowedUserGroups
      array of strings

      An array of user groups that can access Grafana & Prometheus.

      This parameter is used if the user-authn module is enabled or the externalAuthentication parameter is set.

      Caution! Note that you must add those groups to the appropriate field in the DexProvider config if this module is used together with the user-authn one.

    • auth.externalAuthentication
      object

      Parameters to enable external authentication based on the NGINX Ingress external-auth mechanism that uses the Nginx auth_request module.

      External authentication is enabled automatically if the user-authn module is enabled.

      • auth.externalAuthentication.authSignInURL
        string

        The URL to redirect the user for authentication (if the authentication service returned a non-200 HTTP response code).

        Example:

        authSignInURL: https://example.com/dex/sign_in
        
      • auth.externalAuthentication.authURL
        string

        The URL of the authentication service. If the user is authenticated, the service should return an HTTP 200 response code.

        Example:

        authURL: https://example.com/dex/auth
        
    • auth.satisfyAny
      boolean

      Enables single authentication.

      If used together with the whitelistSourceRanges parameter, it authorizes all the users from above networks (no need to enter a username and password).

      Default: false

      Example:

      satisfyAny: true
      
    • auth.whitelistSourceRanges
      array of strings

      An array if CIDRs that are allowed to authenticate in Grafana & Prometheus.

      Example:

      whitelistSourceRanges:
      - 1.1.1.1/32
      
  • externalLabels
    object

    The set of external labels to add to the metrics.

    It’s possible to expand the environment variables of the config-reloader container in external labels such as:

    • HOSTNAME/POD_NAME - contains the name of the pod (for example prometheus-main-0, prometheus-main-1, etc.).
    • SHARD - contains the shard number.

    Example:

    externalLabels:
      prometheus_replica: "$(POD_NAME)"
      shard: "$(SHARD)"
      hostname: "$(HOSTNAME)"
    
  • grafana
    object

    Grafana installation-related settings.

    • grafana.customPlugins
      array of strings

      A list of custom Grafana plugins. Contains plugin names from the official repository.

      Here is how you can add custom plugins (in this case, clickhouse-datasource and flowcharting-panel plugins are used):

      grafana:
        customPlugins:
        - agenty-flowcharting-panel
        - vertamedia-clickhouse-datasource
      

      You can also install plugins from other sources by passing a link to the plugin zip archive in the format <url to plugin zip>;<plugin name>:

      grafana:
        customPlugins:
        - http://10.241.32.16:3000/netsage-bumpchart-panel-1.1.1.zip;netsage-bumpchart-panel
      

      Example:

      customPlugins:
      - agenty-flowcharting-panel
      - vertamedia-clickhouse-datasource
      
    • grafana.useDarkTheme
      boolean

      The dark theme is enabled by default.

      Default: false

      Example:

      useDarkTheme: true
      
  • highAvailability
    boolean

    Manually enable the high availability mode.

    By default, Deckhouse automatically decides whether to enable the HA mode. Click here to learn more about the HA mode for modules.

    Example:

    highAvailability: true
    
  • https
    object

    What certificate type to use with Grafana/Prometheus.

    This parameter completely overrides the global.modules.https settings.

    Examples:

    https:
      mode: CustomCertificate
      customCertificate:
        secretName: foobar
    
    https:
      mode: CertManager
      certManager:
        clusterIssuerName: letsencrypt
    
    • https.certManager
      object
      • https.certManager.clusterIssuerName
        string

        What ClusterIssuer to use for Grafana/Prometheus.

        Currently, letsencrypt, letsencrypt-staging, selfsigned are available. Also, you can define your own.

        Default: "letsencrypt"

    • https.customCertificate
      object
      • https.customCertificate.secretName
        string

        The name of the secret in the d8-system namespace to use with Grafana/Prometheus.

        This secret must have the kubernetes.io/tls format.

        Default: "false"

    • https.mode
      string

      The HTTPS usage mode:

      • Disabled — Grafana/Prometheus will work over HTTP only;
      • CertManager — Grafana/Prometheus will use HTTPS and get a certificate from the clusterissuer defined in the certManager.clusterIssuerName parameter.
      • CustomCertificate — Grafana/Prometheus will use HTTPS using the certificate from the d8-system namespace.
      • OnlyInURI — Grafana/Prometheus will work over HTTP (thinking that there is an external HTTPS load balancer in front that terminates HTTPS traffic). All the links in the user-authn will be generated using the HTTPS scheme. Load balancer should provide a redirect from HTTP to HTTPS.

      Default: "Disabled"

      Allowed values: Disabled, CertManager, CustomCertificate, OnlyInURI

  • ingressClass
    string

    The class of the Ingress controller used for Grafana/Prometheus.

    An optional parameter. By default, the modules.ingressClass global value is used.

    Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

    Example:

    ingressClass: public
    
  • longtermMaxDiskSizeGigabytes
    Deprecated
    integer

    Deprecated and will be removed. Doesn’t affect anything. The maximum size (in GiB) to which the Longterm Prometheus’ disk can be automatically resized.

  • longtermNodeSelector
    object

    The same as in the Pods’ spec.nodeSelector parameter in Kubernetes.

    If the parameter is omitted or false, it will be determined automatically.

    Example:

    longtermNodeSelector:
      disktype: ssd
    
  • longtermRetentionDays
    integer

    How long to keep the data in longterm Prometheus.

    Setting this parameter to 0 will result in Longterm Prometheus not running in the cluster.

    Default: 1095

  • longtermScrapeInterval
    string

    Sets the interval for making “data snapshots” of the main Prometheus by the longterm Prometheus.

    Default: "5m"

  • longtermStorageClass
    string

    The name of the StorageClass to use for Longterm Prometheus.

    If omitted, the StorageClass of the existing Longterm Prometheus PVC is used. If there is no PVC yet, either the global StorageClass or global.discovery.defaultStorageClass is used, and if those are undefined, the emptyDir volume is used to store the data;

    global.discovery.defaultStorageClass is applied during module activation, changing default StorageClass in cluster won’t result in disk re-provisioning.

    CAUTION! Setting this value to one that differs from the current one (in the existing PVC) will result in Longterm Prometheus volume reprovisioning and data loss.

    Example:

    longtermStorageClass: ceph-ssd
    
  • longtermTolerations
    array of objects

    The same as in the Pods’ spec.tolerations parameter in Kubernetes;

    If the parameter is omitted or false, it will be determined automatically.

    Example:

    longtermTolerations:
    - key: key1
      operator: Equal
      value: value1
      effect: NoSchedule
    
    • longtermTolerations.effect
      string
    • longtermTolerations.key
      string
    • longtermTolerations.operator
      string
    • longtermTolerations.tolerationSeconds
      integer
    • longtermTolerations.value
      string
  • mainMaxDiskSizeGigabytes
    Deprecated
    integer

    Deprecated and will be removed. Doesn’t affect anything. The maximum size (in GiB) that the main Prometheus’ volume can automatically resize to.

  • nodeSelector
    object

    The same as in the Pods’ spec.nodeSelector parameter in Kubernetes.

    If the parameter is omitted or false, it will be determined automatically.

    Example:

    nodeSelector:
      disktype: ssd
    
  • retentionDays
    integer

    How long to keep the data.

    Default: 15

  • scrapeInterval
    string

    Sets the interval for scraping metrics from targets.

    Evaluation Interval is always equal to scrapeInterval.

    Default: "30s"

    Pattern: ^([\d]*y)?([\d]*w)?([\d]*d)?([\d]*h)?([\d]*m)?([\d]*s)?$

  • storageClass
    string

    The name of the StorageClass to use.

    If omitted, the StorageClass of the existing Prometheus PVC is used. If there is no PVC yet, either the global StorageClass or global.discovery.defaultStorageClass is used, and if those are undefined, the emptyDir volume is used to store the data.

    global.discovery.defaultStorageClass is applied during module activation, changing default StorageClass in cluster won’t result in disk re-provisioning.

    storageClass: false — forces the emptyDir usage. You will need to delete the old PVC and restart the Pod manually.

    CAUTION! Setting this value to one that differs from the current one (in the existing PVC) will result in Prometheus volume re-provisioning and data loss.

    Examples:

    storageClass: ceph-ssd
    
    storageClass: 'false'
    
  • tolerations
    array of objects

    The same as in the Pods’ spec.tolerations parameter in Kubernetes;

    If the parameter is omitted or false, it will be determined automatically.

    Example:

    tolerations:
    - key: key1
      operator: Equal
      value: value1
      effect: NoSchedule
    
    • tolerations.effect
      string
    • tolerations.key
      string
    • tolerations.operator
      string
    • tolerations.tolerationSeconds
      integer
    • tolerations.value
      string
  • vpa
    object

    VPA settings for pods.

    Default: {"updateMode":"Initial"}

    Examples:

    vpa:
      updateMode: Initial
      longtermMaxCPU: '1'
      longtermMaxMemory: 1500Mi
      maxCPU: 1000m
      maxMemory: 1500Mi
    
    vpa:
      updateMode: 'Off'
    
    • vpa.longtermMaxCPU

      The maximum value that the VPA can set for the Longterm Prometheus Pods.

      The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

      Example:

      longtermMaxCPU: 0.1
      
    • vpa.longtermMaxMemory

      The maximum memory requests the VPA can set for the longterm Prometheus Pods.

      The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

      Example:

      longtermMaxMemory: 4Mi
      
    • vpa.maxCPU

      The maximum value that the VPA can set for the CPU requests for the main Prometheus Pods.

      The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

      Example:

      maxCPU: '3'
      
    • vpa.maxMemory

      The maximum memory requests the VPA can set for the main Prometheus Pods.

      The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

      Example:

      maxMemory: 3Mi
      
    • vpa.updateMode
      string

      The VPA usage mode.

      Default: "Initial"

      Allowed values: Initial, Auto, Off

Authentication

user-authn module provides authentication by default. Also, externalAuthentication can be configured (see below). If these options are disabled, the module will use basic auth with the auto-generated password and the user admin.

Use kubectl to see password:

kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module values prometheus -o json | jq '.internal.auth.password'

Delete the Secret to re-generate password:

kubectl -n d8-monitoring delete secret/basic-auth

Note! The auth.password parameter is deprecated.

Notes

  • retentionSize for the main and longterm Prometheus is calculated automatically; you cannot set this value manually!
    • The following calculation algorithm is used:
      • pvc_size * 0.85 — if the PVC exists;
      • 10 GiB — if there is no PVC and if the StorageClass supports resizing;
      • 25 GiB — if there is no PVC and if the StorageClass does not support resizing;
    • If the local-storage is used, and you have to change the retentionSize, then you need to manually change the size of the PV and PVC. Caution! Note that the value from .status.capacity.storage PVC is used for the calculation since it reflects the actual size of the PV in the case of manual resizing.
  • 40 GiB — size of PersistentVolumeClaim created by default.
  • You can change the size of Prometheus disks in the standard Kubernetes way (if the StorageClass permits this) by editing the .spec.resources.requests.storage field of the PersistentVolumeClaim resource.