π Vertical Pod Autoscaler (VPA) β Full Documentation¶
1. Introduction¶
A Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests (not limits) of your containers in a Deployment, StatefulSet, or DaemonSet.
-
Why? Many applications are over-provisioned (wasting resources) or under-provisioned (causing performance issues). VPA ensures pods get βjust enoughβ CPU and memory for optimal utilization.
-
Key Point: VPA evicts and recreates pods with new resource requests β unlike HPA, which changes replica count.
2. How VPA Works¶
- Recommender β Observes historical and real-time CPU/memory usage, calculates optimal requests.
- Updater β Decides whether to evict pods if current requests are far from recommendations.
- Admission Controller β Applies recommendations when new pods are created.
3. VPA Modes¶
VPA can operate in three modes via updatePolicy.updateMode:
Offβ Only generates recommendations (default). No updates applied.Initialβ Applies recommendations only at pod creation time, not later.Autoβ Actively evicts and recreates pods to apply updated requests (full automation).
4. VPA Components¶
- Recommender (required)
- Updater (optional but needed for
Automode) - Admission Controller (optional but recommended)
In clusters created with kubeadm, VPA is not installed by default. You usually deploy it from the official YAML manifests.
5. VPA vs HPA¶
| Feature | HPA (Horizontal) | VPA (Vertical) |
|---|---|---|
| Scales | Number of pod replicas | Pod resource requests (CPU, memory) |
| Resource Adjusted | CPU/Memory utilization triggers | Historical + observed CPU/memory usage |
| Eviction | No | Yes, pods may be restarted |
| Common Use Case | Web servers, stateless apps | Databases, batch jobs, memory-heavy apps |
π They can be combined, but avoid conflicts (HPA scales replicas, VPA scales resources).
6. Full YAML Manifest (With Detailed Comments)¶
# Vertical Pod Autoscaler definition
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
# Name of the VPA object (must be unique within namespace)
name: analytics-vpa
# Namespace where the target deployment resides
namespace: cka24456
spec:
# Target reference: this tells VPA which workload to monitor & adjust
targetRef:
apiVersion: "apps/v1" # API version of the workload
kind: "Deployment" # Supported kinds: Deployment, StatefulSet, DaemonSet
name: "analytics-deployment" # The exact name of the workload
# Policy controlling how VPA applies recommendations
updatePolicy:
updateMode: "Auto" # Options:
# - "Off" (default): only recommend, no updates
# - "Initial": apply only on pod creation
# - "Auto": actively evict pods & apply changes
# ResourcePolicy is OPTIONAL β only used if you want bounds or exclusions
resourcePolicy:
containerPolicies:
# Apply this policy to all containers in the workload
- containerName: "*"
# Set lower and upper bounds for CPU and memory requests
minAllowed:
cpu: "200m" # Minimum CPU request: 200 millicores (0.2 vCPU)
memory: "256Mi" # Minimum memory request: 256Mi
maxAllowed:
cpu: "2" # Maximum CPU request: 2 cores
memory: "4Gi" # Maximum memory request: 4 GiB
# Control what resources VPA is allowed to adjust
controlledResources: ["cpu", "memory"]
# By default, both are adjusted. You can restrict, e.g., ["cpu"] only.
# Mode can override the global updateMode for this specific container
mode: "Auto" # Options: "Off", "Initial", "Auto"
7. Useful Commands to Verify VPA¶
Check VPA object¶
kubectl get vpa -n cka24456
Describe VPA and see recommendations¶
kubectl describe vpa analytics-vpa -n cka24456
Look for:
Recommendation:
Container Recommendations:
Container Name: app
Target CPU: 500m
Target Memory: 1Gi
Watch pod restarts (because VPA evicts them)¶
kubectl get pods -n cka24456 -w
8. Best Practices & Gotchas¶
- VPA does not manage limits, only requests. Limits must be set manually if needed.
- Expect pod restarts when VPA evicts pods β not ideal for apps requiring 100% uptime.
- Use
minAllowedandmaxAllowedto prevent extreme recommendations. - Works well for stateful or resource-intensive apps (databases, ML workloads).
-
Combine HPA + VPA carefully:
-
HPA scales replicas.
- VPA scales per-pod resources.
- Avoid conflicts (e.g., if HPA scales on CPU% and VPA is adjusting CPU requests simultaneously).
- In production: start with
updateMode: Offto gather safe recommendations, then move toAuto.
Do you want me to format this in Markdown style (with headings, tables, code blocks) so you can directly drop it into your repoβs README.md?