How Kubernetes Autoscaling Works: Step-by-Step Guide

Kubernetes autoscaling is a dynamic mechanism that adjusts the number of running Pods based on workload demand. This feature helps achieve efficient resource utilization, cost-effectiveness, and resilient application performance. In this article, we will dive deep into the workings of Horizontal Pod Autoscaler (HPA), the most commonly used autoscaling mechanism in Kubernetes. We’ll break down its components, metrics collection, pod readiness handling, and how it ultimately makes scaling decisions.

1. Autoscaling Components Overview

Kubernetes supports three main types of autoscaling:

  • Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas.
  • Vertical Pod Autoscaler (VPA): Adjusts resource requests/limits for containers.
  • Cluster Autoscaler (CA): Adds/removes nodes based on pending pods.

This article focuses on HPA.

Key Components:

  • HorizontalPodAutoscaler (HPA) resource: Declares desired behavior for scaling.
  • Metrics Server: Collects CPU and memory usage from kubelets and exposes metrics.k8s.io API.
  • Custom Metrics Adapter (optional): For custom or external metrics.
  • Controller Manager: Houses the HPA controller.
  • Kubelet: Reports node and pod metrics.

2. Setting up the HPA Resource

An HPA is defined using a Kubernetes manifest (YAML or JSON). For example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60

This configuration aims to maintain 60% average CPU utilization across Pods.

3. Pod Initialization and Readiness Considerations

Before metrics are collected from Pods, Kubernetes needs to account for startup behavior:

  • `horizontal-pod-autoscaler-initial-readiness-delay` : Default 30s. Time window after pod start during which rapid transitions between Ready/Unready are ignored.
  • `horizontal-pod-autoscaler-cpu-initialization-period`: Default 5m. CPU metrics are not considered for autoscaling unless the pod is Ready and metrics are collected after this period.
  • Readiness Probe: Used to determine if the pod is ready to receive traffic.

Best Practice: Delay readinessProbe success until the startup CPU/memory burst has subsided.

4. Metrics Collection

Metrics are gathered every 15 seconds (default value of --horizontal-pod-autoscaler-sync-period).

Metric Types Supported:

  • Resource Metrics: CPU, memory.
  • Custom Metrics: Provided by Prometheus adapter.
  • External Metrics: Cloud APIs, business KPIs, etc.

Metrics Flow:

  1. Kubelet exposes metrics to the Metrics Server.
  2. Metrics Server serves those metrics via metrics.k8s.io API.
  3. HPA controller fetches per-pod metrics from Metrics API.
  4. Average metrics (CPU/memory) are calculated.

5. Scaling Decision Logic

The HPA controller computes the desired number of replicas using this formula:

Example:

  • Current CPU usage: 80%
  • Target usage: 60%
  • Current replicas: 5

Tolerance: No scaling occurs if the usage is within 10% of the target (default).

6. Handling Unready or Initializing Pods

Pods in the following states are excluded from metric calculations:

  • Still initializing.
  • Readiness probe failed.
  • Missing metrics.
  • Just restarted.

When metrics are missing, Kubernetes assumes:

  • 0% usage for scale up.
  • 100% usage for scale down.

This conservative approach avoids premature scaling decisions.

7. Stabilization and Scaling Limits

Stabilization Window:

  • Defined by --horizontal-pod-autoscaler-downscale-stabilization (default 5m).
  • Prevents frequent downscaling.

Scaling Policies (Autoscaling/v2 only):

  • Configure maximum pods added/removed per minute.
  • Can be percentage or absolute number.
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 4
periodSeconds: 60

8. Triggering the Scaling Action

Once the controller:

  1. Fetches metrics.
  2. Computes average.
  3. Applies tolerance.
  4. Filters out unready pods.
  5. Applies stabilization policy.

Then it triggers a PATCH request to the target’s scale subresource, updating .spec.replicas.

Example:

PATCH /apis/apps/v1/namespaces/default/deployments/web-app/scale
{
"spec": {
"replicas": 7
}
}

9. Monitoring and Observability

Use the following tools:

  • kubectl get hpa
  • Metrics dashboards (Prometheus + Grafana)
  • Logs from kube-controller-manager
  • Events on the HPA object (kubectl describe hpa)

Conclusion

Kubernetes Horizontal Pod Autoscaler is a powerful mechanism that intelligently scales your applications. It integrates with metric systems, considers pod lifecycle state, uses dynamic algorithms, and is highly configurable. By understanding how each component works — from metric collection to replica adjustment — you can optimize your scaling policies for performance and cost.

References:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics

Kubernetes: How traffic flows from internet to container via Istio

Kubernetes: How traffic flows from internet to container via Istio

Let’s walk through the traffic flow from the internet to your application containers in GKE using Istio. This will include how the traffic passes through various components such as NodePort, Istio Gateway, VirtualService, kube-proxy, Kubernetes services, sidecar (Envoy proxy), and ultimately reaches the application.

Given the configuration provided for the Istio Gateway and VirtualServices, this flow applies to both frontend.mydomain.com and backend.mydomain.com.

Traffic Flow

  1. Client Request: The client sends an HTTPS request to either frontend.mydomain.com or backend.mydomain.com.
  2. Passes through WAF → DDOS, SQL Injection, cross site scripting etc
  3. Cloud Load Balancer: Routes the traffic to the appropriate GKE node via a NodePort.
  4. Istio Ingress Gateway: Handles mutual TLS (mTLS) authentication and decrypts the traffic.
  5. VirtualService: Based on the host (frontend.mydomain.com or backend.mydomain.com), the VirtualService routes the traffic to the corresponding Kubernetes service.
  6. Kube-proxy and Kubernetes Service: The kube-proxy forwards the traffic from the ClusterIP service to the appropriate application pod.
  7. Envoy Sidecar: The Envoy proxy in the pod processes the request and forwards it to the application container.
  8. Application: The application processes the request and sends a response back, following the same path in reverse.

1. Traffic from Internet to Google Cloud Load Balancer

  1. A client (user or service) on the internet makes an HTTPS request to either frontend.mydomain.com or backend.mydomain.com.
  2. The request first reaches the Google Cloud Load Balancer (GCLB) associated with your GKE cluster. This load balancer is automatically provisioned by GKE when you define an Ingress or Gateway resource.
  3. The GCLB forwards the request to a NodePort on one of the GKE cluster nodes.

2. NodePort and Istio Ingress Gateway

  1. The NodePort on the GKE node receives the traffic and forwards it to the Istio Ingress Gateway pod, which is part of the istio-ingressgateway service running on the cluster nodes.

The Istio Ingress Gateway is defined in the Gateway resource:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: foocorp-gateway
namespace: default
spec:
selector:
istio: ingressgateway # Uses Istio's default ingress gateway
servers:
- port:
number: 443
name: https-frontend
protocol: HTTPS
tls:
mode: MUTUAL
credentialName: "backend-credential"
hosts:
- "backend.mydomain.com"
- port:
number: 443
name: https-backend
protocol: HTTPS
tls:
mode: MUTUAL
credentialName: "backend-credential"
hosts:
- "backend.mydomain.com"

The Istio Gateway handles mutual TLS (mTLS) based on the tls.mode: MUTUAL configuration. Both the client and the server authenticate each other using the “backend-credential” certificate stored in the cluster. This ensures secure communication between the client and the cluster.

3. VirtualService Routing

  1. Once the Istio Gateway accepts the connection and decrypts the traffic, it uses the VirtualService configuration to route the request. The traffic is matched based on the host and URI.

For requests to frontend.mydomain.com, the VirtualService for the frontend service is


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: frontend
spec:
hosts:
- "frontend.mydomain.com"
gateways:
- foocorp-gateway
http:
- match:
- uri:
exact: /
route:
- destination:
host: frontend.org-namespace.svc.cluster.local
port:
number: 80

For requests to backend.mydomain.com, the VirtualService for the backend service is :

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: backend
spec:
hosts:
- "backend.mydomain.com"
gateways:
- foocorp-gateway
http:
- match:
- uri:
exact: /
route:
- destination:
host: backend.org-namespace.svc.cluster.local
port:
number: 80
  1. The VirtualService directs the traffic to the corresponding Kubernetes service within the cluster (e.g., frontend.org-namespace.svc.cluster.local or backend.org-namespace.svc.cluster.local), forwarding the request to port 80.

4. Kubernetes Service (ClusterIP) and kube-proxy

  1. After the traffic is routed to the appropriate Kubernetes Service, the kube-proxy component comes into play.
  2. The ClusterIP service acts as an internal load balancer and directs traffic to the appropriate pods running your application (frontend or backend) by forwarding requests to the pod IPs.
  3. The kube-proxy manages the routing rules and forwards the traffic to one of the available pod instances.

5. Envoy Sidecar Proxy

  1. The request reaches the destination pod, but before entering the application container, it passes through the Envoy sidecar proxy. Istio injects this sidecar into every pod, and it is responsible for managing inbound and outbound traffic for the pod. The sidecar:
  2. Enforces security policies (like mTLS).
  3. Provides traffic observability.
  4. Routes traffic internally between services.
  5. Envoy then forwards the request to the actual application container running within the pod.

6. Application Container

  1. Finally, the request is processed by the application container (either frontend or backend, depending on the route). The application responds to the request and sends the response back to the client following the reverse path:
  2. Application → Sidecar (Envoy) → kube-proxy → ClusterIP service → Istio VirtualService → Istio Gateway → Google Cloud Load Balancer → Client.

Benefits of Using Istio in This Setup

  • mTLS: Secure communication between clients and services via mutual TLS.
  • Routing Control: Fine-grained routing rules managed by Istio’s Gateway and VirtualService resources.
  • Service Discovery: Kubernetes services (frontend.org-namespace.svc.cluster.local, backend.org-namespace.svc.cluster.local) allow automatic service discovery and load balancing.
  • Sidecar Proxy: The Envoy sidecar provides enhanced observability, security, and control over traffic at the pod level.
  • This configuration ensures that traffic is securely and efficiently routed to the appropriate backend services in your GKE cluster.

Reverse traffic flow from Container to other application within a cluster through gateway