Scale workloads with Ingress traffic!
Introduction
The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on some metrics.
By default, you can autoscale workloads based on metrics exposed by Metrics Server, a standard component of a Kubernetes cluster.
Metrics Server enables, for example, the kubectl top
command.
-> kubectl top pods
NAME CPU(cores) MEMORY(bytes)
apache2-7c444757d6-rwsdn 1m 6Mi
Usually, you scale via CPU usage, putting a target value on the average utilization. Once exceeded, the Deployment will scale accordingly.
Can we scale our application using a more meaningful metric?
The usual way to scale a web application that is serving traffic is to use CPU usage.
However, CPU usage is only a side effect of traffic increase.
A more meaningful metric would be to scale our application using HTTP metrics!
But wait, we don't have these metrics by default.
What do we do then?
The key: Prometheus Adapter
At SIGHUP, we take advantage of the metrics exposed by the NGINX Ingress Controller, a standard component on our Fury clusters.
The metrics exposed by NGINX Ingress Controller are scraped by Prometheus and parsed via Prometheus Adapter.
Prometheus Adapter is the key to unlocking custom HPA autoscaling, as it translates these metrics and exposes them as custom metrics, making them available to be consumed by HPA.
Implementation
We created a simple repository with a bit of Fury Distribution containing all the infrastructural pieces. You can try a full demo here:
- https://github.com/sighupio/blog-posts-example/tree/main/scale-workloads-with-ingress-traffic
- Or a Katacoda scenario, here: https://www.katacoda.com/sighupio/scenarios/scale-workloads-with-ingress-traffic
We have all the infrastructural pieces we need after deploying NGINX Ingress Controller, Prometheus and Prometheus Adapter.
In our Prometheus Adapter config.yaml
file, there is only one custom metric:
nginx_ingress_controller_requests_rate
This metric takes all the requests in 1-minute period for each Ingress deployed.
rules:
- seriesQuery: '{__name__=~"^nginx_ingress_controller_requests.*",namespace!=""}'
seriesFilters: []
resources:
template: <<.Resource>>
overrides:
exported_namespace:
resource: "namespace"
name:
matches: ""
as: "nginx_ingress_controller_requests_rate"
metricsQuery: round(sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>), 0.001)
Let's create a simple apache2 deployment with an Ingress:
---
apiVersion: v1
kind: Namespace
metadata:
name: autoscale-all-the-things
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: autoscale-all-the-things
name: apache2
labels:
app: apache2
spec:
replicas: 1
selector:
matchLabels:
app: apache2
template:
metadata:
labels:
app: apache2
spec:
containers:
- name: httpd
image: httpd:2.4.46
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: apache2
namespace: autoscale-all-the-things
labels:
app: apache2
spec:
ports:
- port: 80
name: http
targetPort: 80
selector:
app: apache2
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: apache2
namespace: autoscale-all-the-things
spec:
rules:
- host: apache2.yourdomain.com
http:
paths:
- path: /
backend:
serviceName: apache2
servicePort: 80
tls:
- hosts:
- apache2.yourdomain.com
We can now call Ingress to simulate some traffic.
In this example, we will use hey .
hey -n 20000 -q 120 -c 1 https://apache2.yourdomain.com/
This command executes 20000
HTTP calls (-n
), with one client (-c
) and with a maximum throughput of 120
requests per second (-q
).
It's essential to tune the requests per second because we will rely on this to check if our HPA with custom metrics is behaving correctly.
Before using these metrics in an HPA resource, we can inspect the APIserver to see if the value is populated:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-all-the-things/ingress/apache2/nginx_ingress_controller_requests_rate" | jq .
The result of this command will be something like this:
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-all-the-things/ingress.extensions/apache2/nginx_ingress_controller_requests_rate"
},
"items": [
{
"describedObject": {
"kind": "Ingress",
"namespace": "autoscale-all-the-things",
"name": "apache2",
"apiVersion": "extensions/v1beta1"
},
"metricName": "nginx_ingress_controller_requests_rate",
"timestamp": "2021-05-25T12:43:43Z",
"value": "119780m",
"selector": null
}
]
}
The value is expressed in milli-requests-rate. It will work fine even if it's ugly to see.
Hello HPA autoscaling/v2beta2
In this demo, we are leveraging the autoscaling/v2beta2 HorizontalPodAutoscaler. This version has been available since Kubernetes version 1.12
, but it's still not widely known and used.
The definition of our HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: apache2
namespace: autoscale-all-the-things
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: apache2
minReplicas: 1
maxReplicas: 100
metrics:
- type: Object
object:
metric:
name: nginx_ingress_controller_requests_rate
describedObject:
apiVersion: extensions/v1beta1
kind: Ingress
name: apache2
target:
type: AverageValue
averageValue: "100"
Let's explain what we are creating.
First of all, we are defining that our Deployment, apache2 will scale from 1 to 100 replicas, based on the custom metric nginx_ingress_controller_requests_rate
on the object Ingress/apache2.
Then the fundamental configuration needed is the target. Defining the target type AverageValue
will take this metric and divide that number by the current number of Pods on our Deployment. This way, we can have an average we can use to scale workload accordingly!
Let's do some load testing again:
hey -n 20000 -q 120 -c 1 https://apache2.yourdomain.com/
Since we put 100 AverageValue on our HPA and we are load testing with 120 requests per second, we should eventually see two pods up&running!
As a side note, when interacting with HPA, if you get the status with:
kubectl get hpa apache2
via command line, the targets are always 0/0
. This is not a problem because this is the output for the v1 version.
To see the current state of a v2beta2 version, you need to use the describe
directive:
kubectl describe hpa apache2
The output will be something like this:
-> kubectl describe hpa apache2
Name: apache2
Namespace: autoscale-all-the-things
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 20 May 2021 18:19:19 +0200
Reference: Deployment/apache2
Metrics: ( current / target )
"nginx_ingress_controller_requests_rate" on Ingress/apache2 (target average value): 0 / 100
Min replicas: 1
Max replicas: 100
Deployment pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric nginx_ingress_controller_requests_rate(nil)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 39m (x3 over 4d21h) horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Normal SuccessfulRescale 5m45s (x4 over 4d21h) horizontal-pod-autoscaler New size: 2; reason: external metric nginx_ingress_controller_requests_rate(nil) above target
Conclusion
Thank you for reading, let's hope this article has unlocked some nice autoscaling capabilities in your workloads.
If you want to know more about this topic: