Introduction

The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on some metrics.

By default, you can autoscale workloads based on metrics exposed by Metrics Server, a standard component of a Kubernetes cluster.
Metrics Server enables, for example, the kubectl top command.

-> kubectl top pods                     
NAME                       CPU(cores)   MEMORY(bytes)   
apache2-7c444757d6-rwsdn   1m           6Mi

Usually, you scale via CPU usage, putting a target value on the average utilization. Once exceeded, the Deployment will scale accordingly.

Can we scale our application using a more meaningful metric?

The usual way to scale a web application that is serving traffic is to use CPU usage.
However, CPU usage is only a side effect of traffic increase.
A more meaningful metric would be to scale our application using HTTP metrics!

But wait, we don't have these metrics by default.

What do we do then?

The key: Prometheus Adapter

At SIGHUP, we take advantage of the metrics exposed by the NGINX Ingress Controller, a standard component on our Fury clusters.
The metrics exposed by NGINX Ingress Controller are scraped by Prometheus and parsed via Prometheus Adapter.

Prometheus Adapter is the key to unlocking custom HPA autoscaling, as it translates these metrics and exposes them as custom metrics, making them available to be consumed by HPA.

Implementation

We created a simple repository with a bit of Fury Distribution containing all the infrastructural pieces. You can try a full demo here:

https://github.com/sighupio/blog-posts-example/tree/main/scale-workloads-with-ingress-traffic
Or a Katacoda scenario, here: https://www.katacoda.com/sighupio/scenarios/scale-workloads-with-ingress-traffic

We have all the infrastructural pieces we need after deploying NGINX Ingress Controller, Prometheus and Prometheus Adapter.

In our Prometheus Adapter config.yaml file, there is only one custom metric:

nginx_ingress_controller_requests_rate

This metric takes all the requests in 1-minute period for each Ingress deployed.

rules:
  - seriesQuery: '{__name__=~"^nginx_ingress_controller_requests.*",namespace!=""}'
    seriesFilters: []
    resources:
      template: <<.Resource>>
      overrides:
        exported_namespace:
          resource: "namespace"
    name:
      matches: ""
      as: "nginx_ingress_controller_requests_rate"
    metricsQuery: round(sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>), 0.001)

Let's create a simple apache2 deployment with an Ingress:

---
apiVersion: v1
kind: Namespace
metadata:
  name: autoscale-all-the-things
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: autoscale-all-the-things
  name: apache2
  labels:
    app: apache2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: apache2
  template:
    metadata:
      labels:
        app: apache2
    spec:
      containers:
        - name: httpd
          image: httpd:2.4.46
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: apache2
  namespace: autoscale-all-the-things
  labels:
    app: apache2
spec:
  ports:
    - port: 80
      name: http
      targetPort: 80
  selector:
    app: apache2
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  name: apache2
  namespace: autoscale-all-the-things
spec:
  rules:
    - host: apache2.yourdomain.com
      http:
        paths:
          - path: /
            backend:
              serviceName: apache2
              servicePort: 80
  tls:
    - hosts:
        - apache2.yourdomain.com

We can now call Ingress to simulate some traffic.
In this example, we will use hey .

hey -n 20000 -q 120 -c 1 https://apache2.yourdomain.com/

This command executes 20000 HTTP calls (-n), with one client (-c) and with a maximum throughput of 120 requests per second (-q).
It's essential to tune the requests per second because we will rely on this to check if our HPA with custom metrics is behaving correctly.

Before using these metrics in an HPA resource, we can inspect the APIserver to see if the value is populated:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-all-the-things/ingress/apache2/nginx_ingress_controller_requests_rate" | jq .

The result of this command will be something like this:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-all-the-things/ingress.extensions/apache2/nginx_ingress_controller_requests_rate"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Ingress",
        "namespace": "autoscale-all-the-things",
        "name": "apache2",
        "apiVersion": "extensions/v1beta1"
      },
      "metricName": "nginx_ingress_controller_requests_rate",
      "timestamp": "2021-05-25T12:43:43Z",
      "value": "119780m",
      "selector": null
    }
  ]
}

The value is expressed in milli-requests-rate. It will work fine even if it's ugly to see.

Hello HPA autoscaling/v2beta2

In this demo, we are leveraging the autoscaling/v2beta2 HorizontalPodAutoscaler. This version has been available since Kubernetes version 1.12, but it's still not widely known and used.

The definition of our HPA:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: apache2
  namespace: autoscale-all-the-things
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: apache2
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: Object
      object:
        metric:
          name: nginx_ingress_controller_requests_rate
        describedObject:
          apiVersion: extensions/v1beta1
          kind: Ingress
          name: apache2
        target:
          type: AverageValue
          averageValue: "100"

Let's explain what we are creating.

First of all, we are defining that our Deployment, apache2 will scale from 1 to 100 replicas, based on the custom metric nginx_ingress_controller_requests_rate on the object Ingress/apache2.

Then the fundamental configuration needed is the target. Defining the target type AverageValue will take this metric and divide that number by the current number of Pods on our Deployment. This way, we can have an average we can use to scale workload accordingly!

Let's do some load testing again:

hey -n 20000 -q 120 -c 1 https://apache2.yourdomain.com/

Since we put 100 AverageValue on our HPA and we are load testing with 120 requests per second, we should eventually see two pods up&running!

As a side note, when interacting with HPA, if you get the status with:

kubectl get hpa apache2

via command line, the targets are always 0/0 . This is not a problem because this is the output for the v1 version.

To see the current state of a v2beta2 version, you need to use the describe directive:

kubectl describe hpa apache2

The output will be something like this:

-> kubectl describe hpa apache2                           
Name:                                                                                  apache2
Namespace:                                                                             autoscale-all-the-things
Labels:                                                                                <none>
Annotations:                                                                           <none>
CreationTimestamp:                                                                     Thu, 20 May 2021 18:19:19 +0200
Reference:                                                                             Deployment/apache2
Metrics:                                                                               ( current / target )
  "nginx_ingress_controller_requests_rate" on Ingress/apache2 (target average value):  0 / 100
Min replicas:                                                                          1
Max replicas:                                                                          100
Deployment pods:                                                                       2 current / 2 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from external metric nginx_ingress_controller_requests_rate(nil)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type    Reason             Age                    From                       Message
  ----    ------             ----                   ----                       -------
  Normal  SuccessfulRescale  39m (x3 over 4d21h)    horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
  Normal  SuccessfulRescale  5m45s (x4 over 4d21h)  horizontal-pod-autoscaler  New size: 2; reason: external metric nginx_ingress_controller_requests_rate(nil) above target

Conclusion

Thank you for reading, let's hope this article has unlocked some nice autoscaling capabilities in your workloads.

If you want to know more about this topic: