The module does not require any configuration – it works right out-of-the-box.

The module has 25 alerts.

The module is enabled by default in the following bundles: Default, Managed. The module is disabled by default in the Minimal bundle.

How to explicitly enable the module…

Set the spec.enabled module parameter to true or false in the ModuleConfig/prometheus resource (create it, if necessary) to explicitly enable or disable the module, or use the deckhouse-controller module command in the d8-system/deckhouse pod.

Example of enabling the module:

  • by using the ModuleConfig resource:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: prometheus
    spec:
      enabled: true
    
  • by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

    kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module enable prometheus
    

Example of disabling the module:

  • by using the ModuleConfig resource:

    apiVersion: deckhouse.io/v1alpha1
    kind: ModuleConfig
    metadata:
      name: prometheus
    spec:
      enabled: false
    
  • by using the deckhouse-controller command (you need a kubectl, configured to work with the cluster):

    kubectl -ti -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module disable prometheus
    

The module is configured using the ModuleConfig custom resource named prometheus (learn more about setting up Deckhouse…).

Example of the ModuleConfig/prometheus resource for configuring the module:

apiVersion: deckhouse.io/v1alpha1
kind: ModuleConfig
metadata:
  name: prometheus
spec:
  version: 2
  enabled: true
  settings: # <-- Module parameters from the "Parameters" section below.

Parameters

Schema version: 2

  • settings
    object
    • settings.auth
      object

      Options related to authentication or authorization in the application.

      • settings.auth.allowedUserGroups
        array of strings

        An array of user groups that can access Grafana & Prometheus.

        This parameter is used if the user-authn module is enabled or the externalAuthentication parameter is set.

        Caution! Note that you must add those groups to the appropriate field in the DexProvider config if this module is used together with the user-authn one.

      • settings.auth.externalAuthentication
        object

        Parameters to enable external authentication based on the NGINX Ingress external-auth mechanism that uses the Nginx auth_request module.

        External authentication is enabled automatically if the user-authn module is enabled.

        • settings.auth.externalAuthentication.authSignInURL
          string

          The URL to redirect the user for authentication (if the authentication service returned a non-200 HTTP response code).

          Example:

          authSignInURL: https://example.com/dex/sign_in
          
        • settings.auth.externalAuthentication.authURL
          string

          The URL of the authentication service. If the user is authenticated, the service should return an HTTP 200 response code.

          Example:

          authURL: https://example.com/dex/auth
          
      • settings.auth.satisfyAny
        boolean

        Enables single authentication.

        If used together with the whitelistSourceRanges parameter, it authorizes all the users from above networks (no need to enter a username and password).

        Default: false

        Example:

        satisfyAny: true
        
      • settings.auth.whitelistSourceRanges
        array of strings

        An array if CIDRs that are allowed to authenticate in Grafana & Prometheus.

        Example:

        whitelistSourceRanges:
        - 1.1.1.1/32
        
    • settings.externalLabels
      object

      The set of external labels to add to the metrics.

      It’s possible to expand the environment variables of the config-reloader container in external labels such as:

      • HOSTNAME/POD_NAME - contains the name of the pod (for example prometheus-main-0, prometheus-main-1, etc.).
      • SHARD - contains the shard number.

      Example:

      externalLabels:
        prometheus_replica: "$(POD_NAME)"
        shard: "$(SHARD)"
        hostname: "$(HOSTNAME)"
      
    • settings.grafana
      object

      Grafana installation-related settings.

      • settings.grafana.customPlugins
        array of strings

        A list of custom Grafana plugins. Contains plugin names from the official repository.

        Here is how you can add custom plugins (in this case, clickhouse-datasource and flowcharting-panel plugins are used):

        grafana:
          customPlugins:
          - agenty-flowcharting-panel
          - vertamedia-clickhouse-datasource
        

        You can also install plugins from other sources by passing a link to the plugin zip archive in the format <url to plugin zip>;<plugin name>:

        grafana:
          customPlugins:
          - http://10.241.32.16:3000/netsage-bumpchart-panel-1.1.1.zip;netsage-bumpchart-panel
        

        Example:

        customPlugins:
        - agenty-flowcharting-panel
        - vertamedia-clickhouse-datasource
        
      • settings.grafana.useDarkTheme
        boolean

        The dark theme is enabled by default.

        Default: false

        Example:

        useDarkTheme: true
        
    • settings.highAvailability
      boolean

      Manually enable the high availability mode.

      By default, Deckhouse automatically decides whether to enable the HA mode. Click here to learn more about the HA mode for modules.

      Example:

      highAvailability: true
      
    • settings.https
      object

      What certificate type to use with Grafana/Prometheus.

      This parameter completely overrides the global.modules.https settings.

      Examples:

      https:
        mode: CustomCertificate
        customCertificate:
          secretName: foobar
      
      https:
        mode: CertManager
        certManager:
          clusterIssuerName: letsencrypt
      
      • settings.https.certManager
        object
        • settings.https.certManager.clusterIssuerName
          string

          What ClusterIssuer to use for Grafana/Prometheus.

          Currently, letsencrypt, letsencrypt-staging, selfsigned are available. Also, you can define your own.

          Default: "letsencrypt"

      • settings.https.customCertificate
        object
        • settings.https.customCertificate.secretName
          string

          The name of the secret in the d8-system namespace to use with Grafana/Prometheus.

          This secret must have the kubernetes.io/tls format.

          Default: "false"

      • settings.https.mode
        string

        The HTTPS usage mode:

        • Disabled — Grafana/Prometheus will work over HTTP only;
        • CertManager — Grafana/Prometheus will use HTTPS and get a certificate from the clusterissuer defined in the certManager.clusterIssuerName parameter.
        • CustomCertificate — Grafana/Prometheus will use HTTPS using the certificate from the d8-system namespace.
        • OnlyInURI — Grafana/Prometheus will work over HTTP (thinking that there is an external HTTPS load balancer in front that terminates HTTPS traffic). All the links in the user-authn will be generated using the HTTPS scheme. Load balancer should provide a redirect from HTTP to HTTPS.

        Default: "Disabled"

        Allowed values: Disabled, CertManager, CustomCertificate, OnlyInURI

    • settings.ingressClass
      string

      The class of the Ingress controller used for Grafana/Prometheus.

      An optional parameter. By default, the modules.ingressClass global value is used.

      Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

      Example:

      ingressClass: public
      
    • settings.longtermMaxDiskSizeGigabytes
      Deprecated
      integer

      Deprecated and will be removed. Doesn’t affect anything. The maximum size (in GiB) to which the Longterm Prometheus’ disk can be automatically resized.

    • settings.longtermNodeSelector
      object

      The same as in the Pods’ spec.nodeSelector parameter in Kubernetes.

      If the parameter is omitted or false, it will be determined automatically.

      Example:

      longtermNodeSelector:
        disktype: ssd
      
    • settings.longtermPodAntiAffinity
      string

      Defines the podAntiAffinity configuration for the Prometheus longterm instance relative to the Prometheus main instance.

      • Preferred — allows scheduling Prometheus longterm instance alongside the Prometheus main instance if it is not possible to place them on different nodes.
      • Required — does not allow scheduling Prometheus longterm instance on the same node as the Prometheus main instance.

      Default: "Preferred"

      Allowed values: Preferred, Required

    • settings.longtermRetentionDays
      integer

      How long to keep the data in longterm Prometheus.

      Setting this parameter to 0 will result in Longterm Prometheus not running in the cluster.

      Default: 1095

    • settings.longtermScrapeInterval
      string

      Sets the interval for making “data snapshots” of the main Prometheus by the longterm Prometheus.

      Default: "5m"

    • settings.longtermStorageClass
      string

      The name of the StorageClass to use for Longterm Prometheus.

      If omitted, the StorageClass of the existing Longterm Prometheus PVC is used. If there is no PVC yet, the StorageClass will be used according to the global storageClass parameter setting.

      The global storageClass parameter is only considered when the module is enabled. Changing the global storageClass parameter while the module is enabled will not trigger disk re-provisioning.

      Warning. Specifying a value different from the one currently used (in the existing PVC) will result in disk re-provisioning and all data will be deleted.

      Warning. When migrating Prometheus with local storage to other nodes, the pod will hang in the Pending state. In this case, it will be necessary to save the Prometheus database, delete the old PVC and restart the pod manually. Local storage refers to a StorageClass that is associated not with network storage, but with a local volume on a node (for example, StorageClass created by the local-path-provider module).

      If false is specified, emptyDir will be forced to be used.

      Example:

      longtermStorageClass: ceph-ssd
      
    • settings.longtermTolerations
      array of objects

      The same as in the Pods’ spec.tolerations parameter in Kubernetes;

      If the parameter is omitted or false, it will be determined automatically.

      Example:

      longtermTolerations:
      - key: key1
        operator: Equal
        value: value1
        effect: NoSchedule
      
      • settings.longtermTolerations.effect
        string
      • settings.longtermTolerations.key
        string
      • settings.longtermTolerations.operator
        string
      • settings.longtermTolerations.tolerationSeconds
        integer
      • settings.longtermTolerations.value
        string
    • settings.mainMaxDiskSizeGigabytes
      Deprecated
      integer

      Deprecated and will be removed. Doesn’t affect anything. The maximum size (in GiB) that the main Prometheus’ volume can automatically resize to.

    • settings.nodeSelector
      object

      The same as in the Pods’ spec.nodeSelector parameter in Kubernetes.

      If the parameter is omitted or false, it will be determined automatically.

      Example:

      nodeSelector:
        disktype: ssd
      
    • settings.retentionDays
      integer

      How long to keep the data.

      Default: 15

    • settings.scrapeInterval
      string

      Sets the interval for scraping metrics from targets.

      Evaluation Interval is always equal to scrapeInterval.

      Default: "30s"

      Pattern: ^([\d]*y)?([\d]*w)?([\d]*d)?([\d]*h)?([\d]*m)?([\d]*s)?$

    • settings.storageClass
      string

      The name of the StorageClass to use for Longterm Prometheus.

      If omitted, the StorageClass of the existing Prometheus PVC is used. If there is no PVC yet, the StorageClass will be used according to the global storageClass parameter setting.

      The global storageClass parameter is only considered when the module is enabled. Changing the global storageClass parameter while the module is enabled will not trigger disk re-provisioning.

      Warning. Specifying a value different from the one currently used (in the existing PVC) will result in disk re-provisioning and all data will be deleted.

      If false is specified, emptyDir will be forced to be used.

      Examples:

      storageClass: ceph-ssd
      
      storageClass: 'false'
      
    • settings.tolerations
      array of objects

      The same as in the Pods’ spec.tolerations parameter in Kubernetes;

      If the parameter is omitted or false, it will be determined automatically.

      Example:

      tolerations:
      - key: key1
        operator: Equal
        value: value1
        effect: NoSchedule
      
      • settings.tolerations.effect
        string
      • settings.tolerations.key
        string
      • settings.tolerations.operator
        string
      • settings.tolerations.tolerationSeconds
        integer
      • settings.tolerations.value
        string
    • settings.vpa
      object

      VPA settings for pods.

      Default: {"updateMode":"Initial"}

      Examples:

      vpa:
        updateMode: Initial
        longtermMaxCPU: '1'
        longtermMaxMemory: 1500Mi
        maxCPU: 1000m
        maxMemory: 1500Mi
      
      vpa:
        updateMode: 'Off'
      
      • settings.vpa.longtermMaxCPU

        The maximum value that the VPA can set for the Longterm Prometheus Pods.

        The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

        Example:

        longtermMaxCPU: 0.1
        
      • settings.vpa.longtermMaxMemory

        The maximum memory requests the VPA can set for the longterm Prometheus Pods.

        The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

        Example:

        longtermMaxMemory: 4Mi
        
      • settings.vpa.maxCPU

        The maximum value that the VPA can set for the CPU requests for the main Prometheus Pods.

        The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

        Example:

        maxCPU: '3'
        
      • settings.vpa.maxMemory

        The maximum memory requests the VPA can set for the main Prometheus Pods.

        The default value is chosen automatically based on the maximum number of Pods that can be created in the cluster considering the current number of nodes and their settings. For more information, see the detect_vpa_max hook of the module.

        Example:

        maxMemory: 3Mi
        
      • settings.vpa.updateMode
        string

        The VPA usage mode.

        Default: "Initial"

        Allowed values: Initial, Auto, Off

Authentication

user-authn module provides authentication by default. Also, externalAuthentication can be configured (see below). If these options are disabled, the module will use basic auth with the auto-generated password and the user admin.

Use kubectl to see password:

kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse -- deckhouse-controller module values prometheus -o json | jq '.internal.auth.password'

Delete the Secret to re-generate password:

kubectl -n d8-monitoring delete secret/basic-auth

Note! The auth.password parameter is deprecated.

Notes

  • retentionSize for the main and longterm Prometheus is calculated automatically; you cannot set this value manually!
    • The following calculation algorithm is used:
      • pvc_size * 0.85 — if the PVC exists;
      • 10 GiB — if there is no PVC and if the StorageClass supports resizing;
      • 25 GiB — if there is no PVC and if the StorageClass does not support resizing;
    • If the local-storage is used, and you have to change the retentionSize, then you need to manually change the size of the PV and PVC. Caution! Note that the value from .status.capacity.storage PVC is used for the calculation since it reflects the actual size of the PV in the case of manual resizing.
  • 40 GiB — size of PersistentVolumeClaim created by default.
  • You can change the size of Prometheus disks in the standard Kubernetes way (if the StorageClass permits this) by editing the .spec.resources.requests.storage field of the PersistentVolumeClaim resource.