Note that only HPA (Horizontal Pod Autoscaling) with apiVersion: autoscaling/v2, whose support has been available since Kubernetes v1.12, is discussed below.
Configuring HPA requires:
- defining what is being scaled (
.spec.scaleTargetRef
); - defining the scaling range (
.spec.minReplicas
,.scale.maxReplicas
); - defining metrics to be used as the basis for scaling (
.spec.metrics
) and registering them with the Kubernetes API.
Metrics in terms of HPA are of three types:
- classic — of type (
.spec.metrics[].type
) “Resource”; these are used for simple scaling based on CPU and memory consumption; - custom — of type (
.spec.metrics[].type
) “Pods” or “Object”; - external — of type (
.spec.metrics[].type
) “External”.
Caution! By default, HPA uses different approaches for scaling:
- If the metrics indicate that scaling up is required, it is done immediately (
spec.behavior.scaleUp.stabilizationWindowSeconds
= 0). The only limitation is the rate of increase: pods can double in 15 seconds, but if there are less than 4 pods, 4 new pods will be added. - If the metrics indicate that scaling down is required, it happens within 5 minutes (
spec.behavior.scaleUp.stabilizationWindowSeconds
= 300): suggestions for a new number of replicas are calculated, then the largest value is selected. There is no limit on the number of pods to be removed at once.
If metrics are subject to fluctuations that result in a surge of unnecessary application replicas, the following approaches are used:
- Wrapping the metric with an aggregation function (e. g.,
avg_over_time()
) if the metric is defined by a PromQL query. For more details, see. example. - Increasing the stabilization window (parameter
spec.behavior.scaleUp.stabilizationWindowSeconds
) in the HorizontalPodAutoscaler resource. During the this period, requests to increase the number of replicas will be accumulated, then the most modest request will be selected. This method is identical to applying themin_over_time(<stabilizationWindowSeconds>)
aggregation function, but only if the metric is increasing and scaling up is required. For scaling down, the default settings usually work good enough. For more details, see example. - Limiting the rate of increase of the new replica count with
spec.behavior.scaleUp.policies
.
Scaling types
The following metrics can be used to scale applications:
- Classic metrics.
- Custom namespace-scoped metrics. This type is suitable if you have a single application, the source of the metrics is in the namespace and it is tied to one of the objects.
- Custom cluster-wide metrics. This type is suitable if you have many applications using the same metric, whose source is in the application namespace, and it is associated with one of the objects. Such metrics let you put common infrastructure components into a separate deployment (“infra”).
- If the metric source is not tied to the application namespace, you can use external metrics. For example, metrics provided by a cloud provider or an external SaaS service.
Caution! We recommend using option 1 (classic metrics), or option 2 (custom metrics defined in the Namespace). In this case, we suggest defining the application configuration (including its autoscaling) in the app repository. You should consider options 3 and 4 only if you have a large collection of identical microservices.
Classic resource consumption-based scaling
Below is an example HPA configuration for scaling based on the classic metrics from metrics.k8s.io
: CPU and memory utilization for pods. The averageUtulization
value reflects the target percentage of resources that have been requested.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: app-prod
spec:
# Indicates the controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
# Controller scaling limits.
minReplicas: 1
maxReplicas: 10
# If the application is prone to short-term spikes in CPU consumption,
# you can postpone the scaling decision to see if it is necessary.
# By default, scaling up occurs immediately.
behavior:
scaleUp:
stabilizationWindowSeconds: 300
metrics:
# CPU- and memory-based scaling.
- type: Resource
resource:
name: cpu
target:
# Scaling will occur when the average CPU utilization of all pods in scaleTargetRef exceeds the specified value.
# For a metric with type: Resource, only type: Utilization is available.
type: Utilization
# Scaling will occur if 1 core is requested for all Deployment pods and if over 700m is already in use on average.
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
# Example of a scaling rule when the average memory usage of all pods in scaleTargetRef exceeds the given value.
type: Utilization
# Scaling will occur if 1 GB of memory is requested for the pods and more than 800 MB is already in use on average.
averageUtilization: 80
Scaling based on custom metrics
Registering custom metrics with Kubernetes API
Custom metrics must be registered with the /apis/custom.metrics.k8s.io/
API, this registration is done by prometheus-metrics-adapter (and it also implements the API). The metrics can then be referenced using the HorizontalPodAutoscaler object. Customizing a vanilla prometheus-metrics-adapter is a time-consuming process. We made it easier by defining a set of Custom Resources with different Scopes:
- Namespaced:
ServiceMetric
;IngressMetric
;PodMetric
;DeploymentMetric
;StatefulsetMetric
;NamespaceMetric
;DaemonSetMetric
(unavailable to users).
- Cluster:
ClusterServiceMetric
(unavailable to users);ClusterIngressMetric
(unavailable to users);ClusterPodMetric
(unavailable to users);ClusterDeploymentMetric
(unavailable to users);ClusterStatefulsetMetric
(unavailable to users);ClusterDaemonSetMetric
(unavailable to users).
You can use the cluster-wide resource to define the metric globally, and use the Namespace to redefine it locally. Format is the same for all custom resources.
Using custom metrics in HPA
Once a custom metric is registered, it can be referenced. In terms of HPA, custom metrics can be of two types — Pods
and Object
.
Object
refers to an object in the cluster that has metrics in Prometheus with corresponding labels (namespace=XXX,ingress=YYYYY
). These labels will be substituted in place of <<.LabelMatchers>>
in your custom request.
apiVersion: deckhouse.io/v1beta1
kind: IngressMetric
metadata:
name: mymetric
namespace: mynamespace
spec:
query: sum(rate(ingress_nginx_detail_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>) OR on() vector(0)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# Specifies the controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 2
# Metrics to use for scaling.
# An example of using custom metrics.
metrics:
- type: Object
object:
# An object that has metrics in Prometheus.
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: myingress
metric:
# A metric registered with the IngressMetric or ClusterIngressMetric custom resource.
# You can use rps_1m, rps_5m, or rps_15m that come with the prometheus-metrics-adapter module.
name: mymetric
target:
# You can use `Value` or `AverageValue` for metrics of type Object.
type: AverageValue
# Scaling occurs if the average value of the custom metric for all pods in the Deployment deviates significantly from 10.
averageValue: 10
Pods
— all pods will be selected from the resource managed by HPA and metrics will be collected for each pod with the relevant labels (namespace=XXX
, pod=YYYY-sadiq
, namespace=XXX
, pod=YYYY-e3adf
, etc.). Then the HPA will calculate an average value based on these metrics and will use it for scaling.
Using custom metrics with the RabbitMQ queue size
In the example below, scaling is performed based on the send_forum_message
queue in RabbitMQ for which the rmq
service is registered. If the number of messages in this queue exceeds 42, scaling is carried out.
apiVersion: deckhouse.io/v1beta1
kind: ServiceMetric
metadata:
name: rmq-queue-forum-messages
namespace: mynamespace
spec:
query: sum (rabbitmq_queue_messages{<<.LabelMatchers>>,queue=~"send_forum_message",vhost="/"}) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: Object
object:
describedObject:
apiVersion: v1
kind: Service
name: rmq
metric:
name: rmq-queue-forum-messages
target:
type: Value
value: 42
Using volatile custom metrics
This example improves on the previous one.
In the example below, scaling is based on the send_forum_message
queue in RabbitMQ, for which the rmq
service is registered. If the number of messages in this queue exceeds 42, scaling takes place. The MQL function avg_over_time()
smooths (averages the metric) to avoid over-scaling due to short-term spikes in the number of messages.
apiVersion: deckhouse.io/v1beta1
kind: ServiceMetric
metadata:
name: rmq-queue-forum-messages
namespace: mynamespace
spec:
query: sum (avg_over_time(rabbitmq_queue_messages{<<.LabelMatchers>>,queue=~"send_forum_message",vhost="/"}[5m])) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: Object
object:
describedObject:
apiVersion: v1
kind: Service
name: rmq
metric:
name: rmq-queue-forum-messages
target:
type: Value
value: 42
Examples involving custom metrics of type Pods
In the example below, the number of workers is scaled based on the percentage of active php-fpm workers.
The trigger is the average number of php-fpm-workers in mybackend
Deployment, which should not exceed 5.
apiVersion: deckhouse.io/v1beta1
kind: PodMetric
metadata:
name: php-fpm-active-workers
spec:
query: sum (phpfpm_processes_total{state="active",<<.LabelMatchers>>}) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mybackend
minReplicas: 1
maxReplicas: 5
metrics:
# HPA has to loop through all the Deployment pods and collect metrics from them.
- type: Pods
# Unlike type: Object, you don't have to specify describedObject.
pods:
metric:
# A custom metric registered using the PodMetric custom resource.
name: php-fpm-active-workers
target:
# For metrics of type: Pods, only AverageValue can be used.
type: AverageValue
# The scaling will take place if the average metric value of all Deployment pods exceeds 5.
averageValue: 5
Scaling the Deployment based on the percentage number of active php-fpm-workers:
---
apiVersion: deckhouse.io/v1beta1
kind: PodMetric
metadata:
name: php-fpm-active-worker
spec:
# Percentage of active php-fpm-workers. The round() function gets rid of millipercentages in HPA.
query: round(sum by(<<.GroupBy>>) (phpfpm_processes_total{state="active",<<.LabelMatchers>>}) / sum by(<<.GroupBy>>) (phpfpm_processes_total{<<.LabelMatchers>>}) * 100)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: {{ .Chart.Name }}-hpa
spec:
# The cpntroller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: {{ .Chart.Name }}
minReplicas: 4
maxReplicas: 8
metrics:
- type: Pods
pods:
metric:
name: php-fpm-active-worker
target:
type: AverageValue
# Scaling will take place if, on average, the Deployment has 80% of the workers in use.
averageValue: 80
Registering external metrics with the Kubernetes API
The prometheus-metrics-adapter
module supports the externalRules
mechanism. It allows you to define custom PromQL queries and register them as metrics.
A universal rule that allows you to create your own metrics without customization in prometheus-metrics-adapter
has been added in the installation examples — “any metric in Prometheus with the name kube_adapter_metric_<name>
will be registered in the API under the name <name>
”. Then, you just need to write an exporter that will export such a metric or create a recording rule in Prometheus that will aggregate your metric based on other metrics.
Below is an example of CustomPrometheusRules:
The example showcases Prometheus custom rules for the mymetric
metric.
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
# The recommended template for naming your CustomPrometheusRules.
name: prometheus-metrics-adapter-mymetric
spec:
groups:
# The recommended template
- name: prometheus-metrics-adapter.mymetric
rules:
# The name of your new metric
# Note that the 'kube_adapter_metric_' prefix is mandatory.
- record: kube_adapter_metric_mymetric
# A request with results that will end up in the final metric; there's no point in attaching extra labels to it.
expr: sum(ingress_nginx_detail_sent_bytes_sum) by (namespace,ingress)
Using external metrics in HPA
Once an external metric is registered, you can refer to it.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The controller to be scaled (reference to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 2
metrics:
# Scaling based on the external metrics
- type: External
external:
metric:
# The metric we registered by creating a metric in Prometheus (kube_adapter_metric_mymetric), but without the prefix 'kube_adapter_metric_'.
name: mymetric
selector:
# For external metrics, you can and should refine the request with labels.
matchLabels:
namespace: mynamespace
ingress: myingress
target:
# Only `type: Value` can be used for metrics of type External.
type: Value
# Scaling will take place if the value of our metric exceeds 10.
value: 10
Using the queue size in Amazon SQS
To install an exporter to integrate with SQS:
- Create a dedicated “service” Git repository ( alternatively, you could use, e.g., an “infrastructure” repository).
- Copy the exporter installation and the script to it — these will be used to create the necessary CustomPrometheusRules.
That’s it, you have integrated the cluster. In case you need to configure autoscaling for just one application (in a single namespace), we recommend installing the exporter together with that application and using NamespaceMetrics
.
The following is an example of an exporter (e. g., sqs-exporter) to retrieve metrics from Amazon SQS if:
- a
send_forum_message
queue is running in Amazon SQS; - scaling is done when the number of messages in that queue exceeds 42.
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
# The recommended name — prometheus-metrics-adapter-<metric name>.
name: prometheus-metrics-adapter-sqs-messages-visible
# Pay attention!
namespace: d8-monitoring
labels:
# Pay attention!
prometheus: main
# Pay attention!
component: rules
spec:
groups:
- name: prometheus-metrics-adapter.sqs_messages_visible # the recommended template
rules:
- record: kube_adapter_metric_sqs_messages_visible # Pay attention! The 'kube_adapter_metric_' prefix is required.
expr: sum (sqs_messages_visible) by (queue)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: External
external:
metric:
# Must match CustomPrometheusRules record name without 'kube_adapter_metric_' prefix.
name: sqs_messages_visible
selector:
matchLabels:
queue: send_forum_messages
target:
type: Value
value: 42
Debugging
How do I get a list of custom metrics?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/
How do I get the value of a metric associated with an object?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/services/*/my-service-metric
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses/*/rps_1m
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses/*/mymetric
How do I get the value of a metric created via NamespaceMetric
?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/metrics/my-ns-metric
How do I get external metrics?
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/d8-ingress-nginx/d8_ingress_nginx_ds_cpu_utilization