Below, only HPAs of the apiVersion: autoscaling/v2 type (supported from Kubernetes v1.12 onward) are considered.
To configure an HPA, you need to:
- determine the scaling target (
.spec.scaleTargetRef
); - define the scaling range (
.spec.minReplicas
,.scale.maxReplicas
); - define the metrics that will be used for scaling and register them with the Kubernetes API (
.spec.metrics
).
There are three types of metrics in terms of an HPA:
- classic — these have the “Resource” type (
.spec.metrics[].type
) and are used to scale based on memory and CPU consumption; - custom — these have the “Pods” or “Object” type (
.spec.metrics[].type
); - external — these have the “External” type (
.spec.metrics[].type
).
Caution! During scale, HPA uses different approaches by default:
- If the metrics indicate that HPA must scale up the target, it happens immediately (
spec.behavior.scaleUp.stabilizationWindowSeconds
= 0). The only limitation — scale speed. During 15 seconds, the Pods can either double their number or if there are less than 4 Pods now, maximum four new Pods will be added. - If the metrics indicate that HPA must scale down the target, it happens smoothly. During 5 minutes (
spec.behavior.scaleUp.stabilizationWindowSeconds
= 300), HPA collects suggestions about scaling and finally chooses the largest value. There aren’t scale-down speed limitations.
If you have metric flapping problems which lead to unwanted scales, there are options:
- If your metric is based on a PromQL query, you can use an aggregation function like
avg_over_time()
to smooth out the fluctuations. Example… - You can increase
spec.behavior.scaleUp.stabilizationWindowSeconds
inHorisontalPodAutoscaler
resource. In this case, HPA collects scale suggestions during the period and finally chooses the minimal value. In other words, this solution is identical using themin_over_time(<stabilizationWindowSeconds>)
aggregating function only when the metric is growing up, and HPA decides to scale up. For scaling down, it is usually enough standard Stabilisation Window settings. Example… - You can also tighten the scale-up speed with
spec.behavior.scaleUp.policies
settings.
What scaling type should I prefer?
- The typical use-cases of a classic type are pretty obvious.
- Suppose you have a single application, the source of metrics is located inside the namespace, and it is associated with one of the objects. In this case, we recommend using the custom namespace-scoped metrics.
- Use custom Cluster-wide metrics if multiple applications use the same metric associated with one of the objects, and the metric’s source belongs to the application namespace. Such metrics can help you combine common infrastructure components into a separate (“infra”) Deployment.
- Use external metrics if the source of the metric does not belong to the application namespace. These can be, for example, cloud provider or SaaS-related metrics.
Caution! We strongly recommend using either Option 1. (classic metrics) or Option 2. (custom metrics defined in the namespace). In this case, you can define the entire configuration of the application (including the autoscaling logic) in the repository of the application. Options 3 and 4 should only be considered if you have a large collection of identical microservices.
Classic resource consumption-based scaling
Below is an example of the HPA for scaling based on standard metrics.k8s.io
metrics (CPU and memory of the Pods). Please, take special note of the averageUtulization
— this value reflects the target percentage of resources that have been requested.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: app-prod
spec:
# The targets of scaling (link to a Deployment or StatefulSet).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
# Min and max values for replication.
minReplicas: 1
maxReplicas: 10
behavior:
# If short-term spikes of CPU usage are regular for the application,
# you can postpone the scaling decision to be sure if it is necessary.
# By default, scaling up occurs immediately.
scaleUp:
stabilizationWindowSeconds: 300
metrics:
# Scaling based on CPU and Memory consumption.
- type: Resource
resource:
name: cpu
target:
# Scale up if the average CPU utilization by all the Pods in scaleTargetRef exceeds the specified value.
# For type: Resource metrics only the type: Utilization parameter is available.
type: Utilization
# Scale up if all the Deployment's Pods have requested 1 CPU core and consumed more than 700m on average.
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
# Scale up if the average Memory utilization by all the Pods in scaleTargetRef exceeds the specified value.
type: Utilization
# Scale up if all the Deployment's Pods have requested 1GB and consumed more than 800MB on average.
averageUtilization: 80
Custom metrics-based scaling
Registering custom metrics with the Kubernetes API
Custom metrics must be registered with the /apis/custom.metrics.k8s.io/
API. In our case, prometheus-metrics-adapter
(it also implements the API) performs the registration. The HorizontalPodAutoscaler
object can refer to these metrics after the registration is complete. Setting up a vanilla prometheus-metrics-adapter
is a time-consuming process. Happily, we have somewhat simplified it by defining a set of Custom Resources with different Scopes:
- Namespaced:
ServiceMetric
IngressMetric
PodMetric
DeploymentMetric
StatefulsetMetric
NamespaceMetric
DaemonSetMetric
(not available to users)
- Cluster:
ClusterServiceMetric
(not available to users)ClusterIngressMetric
(not available to users)ClusterPodMetric
(not available to users)ClusterDeploymentMetric
(not available to users)ClusterStatefulsetMetric
(not available to users)ClusterDaemonSetMetric
(not available to users)
You can globally define a metric using the Cluster-scoped resource, while the namespaced resource allows you to redefine it locally. All CRs have the same format.
Using custom metrics with HPA
After a custom metric is registered, you can refer to it. For the HPA, custom metrics can be of two types — Pods
and Object
. Object
is a reference to a cluster object that has metrics with the appropriate labels (namespace=XXX,ingress=YYY
) in Prometheus. These labels will be substituted instead of <<.LabelMatchers>>
in your custom request.
apiVersion: deckhouse.io/v1beta1
kind: IngressMetric
metadata:
name: mymetric
namespace: mynamespace
spec:
query: sum(rate(ingress_nginx_detail_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>) OR on() vector(0)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 2
# What metrics to use for scaling. We use custom metrics of the Object type.
metrics:
- type: Object
object:
# Some object that has metrics in Prometheus.
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: myingress
metric:
# The metric registered using IngressMetric or ClusterIngressMetric CRs.
# Can be used rps_1m, rps_5m or rps_15m which come with the prometheus-metrics-adapter module.
name: mymetric
target:
# `Value` or `AverageValue` can be used for metrics of the Object type.
type: AverageValue
# Scaling occurs if the average value for all Pods in the Deployment of the custom metric is very different from 10.
averageValue: 10
In the case of the Pods
metric type, the process is more complex. First, metrics with the appropriate labels (namespace=XXX,pod=YYY-sadiq
,namespace=XXX,pod=YYY-e3adf
,…) will be collected for all the Pods of the resource to scale. Next, HPA will calculate the average value based on these metrics and will use it for scaling. Example…
Example of using RabbitMQ queue size-based custom metrics
Suppose there is a send_forum_message
queue in RabbitMQ, and this message broker is exposed as an rmq
service. Then, suppose, we want to scale up the cluster if there are more than 42 messages in the queue.
apiVersion: deckhouse.io/v1beta1
kind: ServiceMetric
metadata:
name: rmq-queue-forum-messages
namespace: mynamespace
spec:
query: sum (rabbitmq_queue_messages{<<.LabelMatchers>>,queue=~"send_forum_message",vhost="/"}) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: Object
object:
describedObject:
apiVersion: v1
kind: Service
name: rmq
metric:
name: rmq-queue-forum-messages
target:
type: Value
value: 42
Example of using unstable custom metric
Improvement for example above.
Suppose there is a send_forum_message
queue in RabbitMQ, and this message broker is exposed as an rmq
service. Then, suppose, we want to scale up the cluster if there are more than 42 messages in the queue. At the same time, we do not want to react to short-term spikes, for this we use MQL-function avg_over_time()
.
apiVersion: deckhouse.io/v1beta1
kind: ServiceMetric
metadata:
name: rmq-queue-forum-messages
namespace: mynamespace
spec:
query: sum (avg_over_time(rabbitmq_queue_messages{<<.LabelMatchers>>,queue=~"send_forum_message",vhost="/"}[5m])) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: Object
object:
describedObject:
apiVersion: v1
kind: Service
name: rmq
metric:
name: rmq-queue-forum-messages
target:
type: Value
value: 42
Examples of using custom metrics of the Pods
type
Suppose we want the average number of php-fpm workers in the mybackend
Deployment to be no more than 5.
apiVersion: deckhouse.io/v1beta1
kind: PodMetric
metadata:
name: php-fpm-active-workers
spec:
query: sum (phpfpm_processes_total{state="active",<<.LabelMatchers>>}) by (<<.GroupBy>>)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mybackend
minReplicas: 1
maxReplicas: 5
metrics:
# HPA must go through all the Pods in the Deployment and collect metrics from them.
- type: Pods
# You do not need to specify descripedObject (in contrast to type: Object).
pods:
metric:
# Custom metric, registered using the PodMetric CR.
name: php-fpm-active-workers
target:
# For type: Pods metrics, the AverageValue can only be used.
type: AverageValue
# Scale up if the average metric value for all the Pods of the myworker Deployment is greater than 5.
averageValue: 5
The Deployment is scaled based on the percentage of active php-fpm workers.
---
apiVersion: deckhouse.io/v1beta1
kind: PodMetric
metadata:
name: php-fpm-active-worker
spec:
# Percentage of active php-fpm workers. The round() function rounds the percentage.
query: round(sum by(<<.GroupBy>>) (phpfpm_processes_total{state="active",<<.LabelMatchers>>}) / sum by(<<.GroupBy>>) (phpfpm_processes_total{<<.LabelMatchers>>}) * 100)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: {{ .Chart.Name }}-hpa
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: {{ .Chart.Name }}
minReplicas: 4
maxReplicas: 8
metrics:
- type: Pods
pods:
metric:
name: php-fpm-active-worker
target:
type: AverageValue
# Scale up if, on average, 80% of workers in the deployment are running at full capacity.
averageValue: 80
Registering external metrics with the Kubernetes API
The prometheus-metrics-adapter
module supports the externalRules
mechanism. Using it, you can create custom PromQL requests and register them as metrics.
In our installations, we have implemented a universal rule that allows you to create your metrics without using prometheus-metrics-adapter
— “any Prometheus metric called kube_adapter_metric_<name>
will be registered in the API under the <name>
”. In other words, all you need is to either write an exporter (to export the metric) or create a recording rule in Prometheus that will aggregate your metric based on other metrics.
An example of CustomPrometheusRules
:
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
# The recommended template for naming your CustomPrometheusRules.
name: prometheus-metrics-adapter-mymetric
spec:
groups:
# Recommended template for the name key.
- name: prometheus-metrics-adapter.mymetric
rules:
# The name of the new metric. Pay attention! The 'kube_adapter_metric_' prefix is required.
- record: kube_adapter_metric_mymetric
# The results of this request will be passed to the final metric; there is no reason to include excess labels into it.
expr: sum(ingress_nginx_detail_sent_bytes_sum) by (namespace,ingress)
Using external metrics with HPA
You can refer to a metric after it is registered.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 2
# Use external metrics for scaling.
metrics:
- type: External
external:
metric:
# The metric that we registered by creating a metric in Prometheus's kube_adapter_metric_mymetric but without 'kube_adapter_metric_' prefix.
name: mymetric
selector:
# For external metrics, you can and should specify matching labels.
matchLabels:
namespace: mynamespace
ingress: myingress
target:
# Only `type: Value` can be used for metrics of the External type.
type: Value
# Scale up if the value of our metric is greater than 10.
value: 10
Example of scaling based on the Amazon SQS queue size
Note that an exporter is required to integrate with SQS. For this, create a separate “service” git repository (or you can use an “infrastructure” repository) and put the installation of this exporter as well as the script to create the necessary
CustomPrometheusRules
into this repository. If you need to configure autoscaling for a single application (especially if it runs in a single namespace), we recommend putting the exporter together with the application and usingNamespaceMetrics
.
Suppose there is a send_forum_message
queue in Amazon SQS. Then, suppose, we want to scale up the cluster if there are more than 42 messages in the queue. Also, you will need an exporter to collect Amazon SQS metrics (say, sqs-exporter).
apiVersion: deckhouse.io/v1
kind: CustomPrometheusRules
metadata:
# The recommended name — prometheus-metrics-adapter-<metric name>.
name: prometheus-metrics-adapter-sqs-messages-visible
# Pay attention!
namespace: d8-monitoring
labels:
# Pay attention!
prometheus: main
# Pay attention!
component: rules
spec:
groups:
- name: prometheus-metrics-adapter.sqs_messages_visible # the recommended template
rules:
- record: kube_adapter_metric_sqs_messages_visible # Pay attention! The 'kube_adapter_metric_' prefix is required.
expr: sum (sqs_messages_visible) by (queue)
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
name: myhpa
namespace: mynamespace
spec:
# The targets of scaling (link to a deployment or statefulset).
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myconsumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: External
external:
metric:
# Must match CustomPrometheusRules record name without 'kube_adapter_metric_' prefix.
name: sqs_messages_visible
selector:
matchLabels:
queue: send_forum_messages
target:
type: Value
value: 42
Debugging
How do I get a list of custom metrics?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/
How do I get the value of a metric associated with an object?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/services/*/my-service-metric
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses/*/rps_1m
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses/*/mymetric
How do I get the value of a metric created via NamespaceMetric
?
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/metrics/my-ns-metric
How do I get external metrics?
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/d8-ingress-nginx/d8_ingress_nginx_ds_cpu_utilization