Configuring HPA on On-Prem Kubernetes: A Practical Guide
TLDR
Set up metrics-server in your Kubernetes cluster, define resource requests, apply an HPA, and let Kubernetes scale your workloads automatically. Visualize everything in Grafana.
Introduction
This article focuses on configuring the Horizontal Pod Autoscaler (HPA) in an on-premises Kubernetes environment using Resource Metrics, specifically CPU and memory usage collected by the metrics-server
.
While Kubernetes also supports Custom Metrics (such as application-specific metrics exposed inside the cluster, often collected via Prometheus Adapter) and External Metrics (metrics from outside the cluster, like queue length, cloud services, or external APIs), this guide will walk you through the most common and foundational approach: autoscaling based on resource usage.
If you need to scale using custom or external metrics, additional setup is required. However, for most on-premises clusters, starting with resource metrics is the essential first step.
Types of Metrics for HPA Autoscaling
- Resource Metrics (CPU, memory): Focus of this article
Collected bymetrics-server
and used for most standard autoscaling needs. - Custom Metrics:
Application-specific metrics (e.g., requests per second), typically gathered via Prometheus Adapter. - External Metrics:
Metrics from outside the cluster (e.g., queue length from SQS, messages in Kafka, or other cloud services). This requires additional configuration beyond resource or in-cluster metrics.
Installing metrics-server
Before proceeding, this guide assumes you already have a working on-premises Kubernetes cluster and Prometheus properly installed and configured. Now, let’s deploy the metrics-server using the official manifest:
$ wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server.yaml
$ kubectl apply -f metrics-server.yaml
The Kubelet certificate must be signed by the cluster’s Certificate Authority. Otherwise, you need to disable certificate validation by passing the --kubelet-insecure-tls
flag to the metrics-server. In most on-premises scenarios, it’s common to add this flag to the metrics-server deployment:
$ kubectl -n kube-system edit deployment metrics-server
containers:
- name: metrics-server
args:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
Make sure the metrics-server
deployment is running correctly:
$ kubectl get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 18d
and
$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
After confirming that metrics-server
is working, you can check live resource usage across your cluster:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kub-controlplane.localdomain 156m 3% 1828Mi 23%
kub-node1.localdomain 753m 4% 3113Mi 9%
kub-node2.localdomain 825m 5% 2103Mi 6%
kub-node3.localdomain 444m 2% 3578Mi 11%
and
$ kubectl top pods -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
appX-qa redis-c8878bc85-bvxm6 9m 14Mi
argocd argocd-application-controller-0 7m 149Mi
argocd argocd-applicationset-controller-64f6bd6456-xfj45 4m 23Mi
[...]
Configuring HPA for Your Deployment
With metrics-server
installed and reporting usage correctly, you can now configure an HPA (Horizontal Pod Autoscaler) for your application.
Before configuring HPA, your Deployment
must define CPU and memory resource requests and limits. Without them, the HPA won’t be able to calculate resource utilization accurately.
Here’s an example of what to add or change in your deployment:
resources:
requests:
cpu: "1000m"
memory: "2048Mi"
limits:
cpu: "2000m"
memory: "2560Mi"
Below is an example that targets a Deployment
named image-processor-hpa
and scales it based on CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: image-processor-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: image-processor
minReplicas: 3
maxReplicas: 12
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Note the averageUtilization
target (50%) is calculated as a percentage of the CPU request (1000m) defined in your deployment.
⚠️ If your
Deployment
already specifies a fixed number ofreplicas
, make sure to remove that field. Otherwise, it will conflict with the Horizontal Pod Autoscaler, which takes control over the replica count.
You can verify if the HPA is working correctly by checking the current resource usage of your pods:
$ kubectl top pods -n image-processor
NAME CPU(cores) MEMORY(bytes)
redis-7cd85f6cdf-85kf8 4m 3Mi
image-processor-86dfdc5945-5czbf 2m 520Mi
image-processor-86dfdc5945-6s8kz 1m 498Mi
image-processor-86dfdc5945-d69js 1m 530Mi
image-processor-86dfdc5945-gcf8p 1m 505Mi
image-processor-86dfdc5945-q4fzs 2m 490Mi
image-processor-86dfdc5945-q54kx 1m 518Mi
and
$ kubectl describe hpa -n image-processor image-processor-hpa
Name: image-processor-hpa
Namespace: image-processor
Labels: app.kubernetes.io/instance=image-processor
Annotations: <none>
CreationTimestamp: Mon, 26 May 2025 10:24:41 -0300
Reference: Deployment/image-processor
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 0% (1m) / 50%
Min replicas: 3
Max replicas: 12
Deployment pods: 4 current / 4 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 18m (x490 over 17d) horizontal-pod-autoscaler New size: 3; reason: All metrics below target
Normal SuccessfulRescale 7m31s (x330 over 18d) horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Grafana
As an extra step, you can visualize your HPA behavior and scaling history using Grafana, assuming you already have it connected to your Kubernetes cluster through Prometheus.
Grafana provides a ready-to-use dashboard for Horizontal Pod Autoscaler (HPA) metrics:
To use it:
- Open your Grafana interface.
- Go to Dashboards → Import.
- Enter the dashboard ID:
22128
. - Select your Prometheus data source.
- Click Import.
This dashboard provides visibility into replica count over time, metric thresholds, and scaling events for each autoscaled deployment.
Here’s an example of the Grafana dashboard in action:
The names in the screenshot were redacted for consistency with the example.
Final Thoughts
Using metrics-server
, you can enable basic autoscaling in Kubernetes based on CPU and memory utilization. This setup is straightforward and works well for many general workloads.
However, metrics-server
only supports resource metrics (CPU and memory), as documented in the official Kubernetes documentation. If your use case requires scaling based on more advanced or custom metrics, there are alternative approaches.
These advanced options will be covered in a future article.
– dbaio