Kubernetes autoscaling is a dynamic mechanism that adjusts the number of running Pods based on workload demand. This feature helps achieve efficient resource utilization, cost-effectiveness, and resilient application performance. In this article, we will dive deep into the workings of Horizontal Pod Autoscaler (HPA), the most commonly used autoscaling mechanism in Kubernetes. We’ll break down its components, metrics collection, pod readiness handling, and how it ultimately makes scaling decisions.
1. Autoscaling Components Overview
Kubernetes supports three main types of autoscaling:
- Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas.
- Vertical Pod Autoscaler (VPA): Adjusts resource requests/limits for containers.
- Cluster Autoscaler (CA): Adds/removes nodes based on pending pods.
This article focuses on HPA.
Key Components:
- HorizontalPodAutoscaler (HPA) resource: Declares desired behavior for scaling.
- Metrics Server: Collects CPU and memory usage from kubelets and exposes
metrics.k8s.io
API. - Custom Metrics Adapter (optional): For custom or external metrics.
- Controller Manager: Houses the HPA controller.
- Kubelet: Reports node and pod metrics.
2. Setting up the HPA Resource
An HPA is defined using a Kubernetes manifest (YAML or JSON). For example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
This configuration aims to maintain 60% average CPU utilization across Pods.
3. Pod Initialization and Readiness Considerations
Before metrics are collected from Pods, Kubernetes needs to account for startup behavior:
- `horizontal-pod-autoscaler-initial-readiness-delay` : Default 30s. Time window after pod start during which rapid transitions between Ready/Unready are ignored.
- `horizontal-pod-autoscaler-cpu-initialization-period`: Default 5m. CPU metrics are not considered for autoscaling unless the pod is Ready and metrics are collected after this period.
- Readiness Probe: Used to determine if the pod is ready to receive traffic.
Best Practice: Delay readinessProbe
success until the startup CPU/memory burst has subsided.
4. Metrics Collection
Metrics are gathered every 15 seconds (default value of --horizontal-pod-autoscaler-sync-period
).
Metric Types Supported:
- Resource Metrics: CPU, memory.
- Custom Metrics: Provided by Prometheus adapter.
- External Metrics: Cloud APIs, business KPIs, etc.
Metrics Flow:
- Kubelet exposes metrics to the Metrics Server.
- Metrics Server serves those metrics via
metrics.k8s.io
API. - HPA controller fetches per-pod metrics from Metrics API.
- Average metrics (CPU/memory) are calculated.
5. Scaling Decision Logic
The HPA controller computes the desired number of replicas using this formula:

Example:
- Current CPU usage: 80%
- Target usage: 60%
- Current replicas: 5

Tolerance: No scaling occurs if the usage is within 10% of the target (default).
6. Handling Unready or Initializing Pods
Pods in the following states are excluded from metric calculations:
- Still initializing.
- Readiness probe failed.
- Missing metrics.
- Just restarted.
When metrics are missing, Kubernetes assumes:
- 0% usage for scale up.
- 100% usage for scale down.
This conservative approach avoids premature scaling decisions.
7. Stabilization and Scaling Limits
Stabilization Window:
- Defined by
--horizontal-pod-autoscaler-downscale-stabilization
(default 5m). - Prevents frequent downscaling.
Scaling Policies (Autoscaling/v2 only):
- Configure maximum pods added/removed per minute.
- Can be percentage or absolute number.
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 4
periodSeconds: 60
8. Triggering the Scaling Action
Once the controller:
- Fetches metrics.
- Computes average.
- Applies tolerance.
- Filters out unready pods.
- Applies stabilization policy.
Then it triggers a PATCH request to the target’s scale
subresource, updating .spec.replicas
.
Example:
PATCH /apis/apps/v1/namespaces/default/deployments/web-app/scale
{
"spec": {
"replicas": 7
}
}
9. Monitoring and Observability
Use the following tools:
kubectl get hpa
- Metrics dashboards (Prometheus + Grafana)
- Logs from
kube-controller-manager
- Events on the HPA object (
kubectl describe hpa
)
Conclusion
Kubernetes Horizontal Pod Autoscaler is a powerful mechanism that intelligently scales your applications. It integrates with metric systems, considers pod lifecycle state, uses dynamic algorithms, and is highly configurable. By understanding how each component works — from metric collection to replica adjustment — you can optimize your scaling policies for performance and cost.