AWS Auto Scaling

1. Why Auto Scaling Exists

Problem Without ASG With ASG
Traffic spike Servers overload → downtime New instances launched automatically
Low traffic Servers idle → wasted cost Instances terminated, cost drops
Instance failure Manual detection + replacement Auto-detected, auto-replaced
Deployment Manual rollout, risky Instance Refresh = controlled rolling update

ASG simultaneously provides availability, elasticity, and cost efficiency — three goals that are normally in tension.


2. Scaling Dimensions: Horizontal vs Vertical

Type What changes AWS approach Example
Horizontal (Scale Out/In) Number of instances ✅ ASG handles automatically 2 → 10 → 2 instances
Vertical (Scale Up/Down) Size of instance ⚠️ Manual — requires stop/resize/start t3.medium → t3.xlarge

Cloud architecture strongly prefers horizontal scaling — it's elastic, doesn't require downtime, and distributes failure risk. Vertical scaling has an upper ceiling (largest instance type) and requires downtime.


3. ASG vs Kubernetes Scaling — Layer Separation

User Traffic
Kubernetes HPA (Horizontal Pod Autoscaler)
  → Scales PODS when CPU/memory high
  → But: pods need NODES (EC2) to run on
EC2 Node capacity fills up
Kubernetes Cluster Autoscaler (CA)
  → Signals AWS ASG: "I need more nodes"
AWS Auto Scaling Group
  → Launches EC2 instances (infrastructure layer)
Karpenter (alternative to CA)
  → Direct AWS API calls, faster node provisioning
Tool Layer Scales
HPA (Kubernetes) Application Pods and containers
VPA (Kubernetes) Application Pod resource requests
Cluster Autoscaler Infrastructure EC2 nodes via ASG
Karpenter Infrastructure EC2 nodes directly (no ASG dependency)
AWS ASG Infrastructure EC2 instances

4. Capacity — Three Numbers ⭐

Minimum ≤ Desired ≤ Maximum

Minimum:  Floor — ASG never goes below this (availability guarantee)
Desired:  Current target — ASG actively maintains this count
Maximum:  Ceiling — ASG never exceeds this (cost protection)
Setting Recommended for Production
Minimum ≥ 1 (prevents full outage); ≥ 2 for true HA
Desired Start with baseline capacity based on normal load
Maximum Cap to control cost; set high enough for peak + margin

Setting Minimum = 0 means ASG can scale to zero — valid for dev/test to save cost, but dangerous for production (zero instances = zero availability).


5. ASG Instance Distribution — Multi-AZ ⭐

ASG distributes instances across the subnets you configure (one per AZ).

Mode Behavior Use When
Balanced (default) Tries equal distribution across AZs High availability (recommended)
Balanced Best Effort Prioritizes speed over balance Fast scale-out needed
Balanced Only Strict balance — delays if imbalance Compliance / strict HA
ASG with 3 AZs, desired = 6:
  AZ-1a: 2 instances
  AZ-1b: 2 instances
  AZ-1c: 2 instances

If AZ-1a instances terminate:
  ASG rebalances: launches in AZ-1a to restore balance

6. Scaling Policies — Complete Breakdown ⭐

1. Manual Scaling

Change desired capacity directly — ASG launches/terminates to reach it. Health replacement and AZ balancing still managed by ASG.

aws autoscaling set-desired-capacity \
  --auto-scaling-group-name my-asg \
  --desired-capacity 5

2. Scheduled Scaling

Scale at predictable times — runs before the load arrives.

Schedule: cron(0 8 * * 1-5)    → business hours start → desired=10
Schedule: cron(0 20 * * 1-5)   → business hours end   → desired=2
Schedule: cron(0 0 * * 0)      → Sunday midnight       → desired=20

Use case: Known load patterns — office hours, end-of-month reports, weekly jobs.


3. Dynamic Scaling

Scale in response to real-time CloudWatch metrics.

a) Simple Scaling (Legacy)

One threshold, one action, mandatory cooldown wait:

Alarm: CPU > 70% → Add 1 instance
       (then wait cooldown before re-evaluating)
Limitation Detail
Cooldown blocks all scaling Must wait even if load keeps climbing
Single threshold Cannot respond proportionally to severity
Deprecated in spirit Use Step Scaling instead

b) Step Scaling ⭐

Multiple thresholds → proportional response — no mandatory cooldown wait:

CPU 60–70%:  Add 1 instance (mild spike)
CPU 70–85%:  Add 2 instances (moderate spike)
CPU 85–100%: Add 4 instances (severe spike)

CPU 40–30%:  Remove 1 instance
CPU 30–0%:   Remove 2 instances

Step scaling does not wait for cooldown before triggering again. New instances have individual warmup periods instead.


You define the target — AWS controls the adjustment automatically:

Target: CPU = 50%
  Current CPU: 80% → ASG adds instances until CPU drops to ~50%
  Current CPU: 20% → ASG removes instances until CPU rises to ~50%
  AWS uses predictive math — adds more than "just enough" to avoid overshoot

Built-in metrics for Target Tracking:

Metric Target Example
ASGAverageCPUUtilization 50%
ASGAverageNetworkIn 1 GB/min
ASGAverageNetworkOut 1 GB/min
ALBRequestCountPerTarget 1000 req/target

Custom metric: Any CloudWatch metric (SQS queue depth, custom app metrics).

Target Tracking is the most widely used policy — it's self-tuning and handles both scale-out AND scale-in with one rule.


4. Predictive Scaling

Uses ML on historical data to pre-scale before load arrives:

Historical pattern: CPU spikes every Monday 9 AM
  → Sunday 11 PM: ASG pre-launches instances
  → Monday 9 AM: instances already running ← no launch latency ✅
Property Detail
Requires At least 24 hours of history (14 days for full accuracy)
Modes Forecast only (visibility) or Forecast + Scale (active)
Combined Use with Target Tracking for best results

Scaling Policy Comparison ⭐

Policy Cooldown Proportional Best For
Simple ✅ Required (blocks) ❌ One step Legacy only
Step Warmup per instance ✅ Yes (steps) Variable spikes
Target Tracking Warmup per instance ✅ Auto-calculated Production default
Scheduled N/A N/A Predictable patterns
Predictive N/A ML-based Regular recurring patterns

7. Cooldown vs Warmup — Critical Distinction ⭐

These two timers are completely different — confusing them is the #1 ASG mistake.

Cooldown Period (Simple Scaling only)

Purpose: Prevents launching/terminating another batch immediately after one action.

Simple scaling triggers → +2 instances launched
Cooldown starts: 300s
  (during this time: no new scaling actions, even if CPU stays high)
Cooldown ends → ASG re-evaluates

Only applies to Simple Scaling. Target Tracking and Step Scaling do NOT use cooldown — they use warmup periods instead.

Instance Warmup Period (Step + Target Tracking)

Purpose: Prevents newly launched instances from being counted in metrics until they're actually ready, avoiding premature scale-in.

New instance launched (CPU reads 0% — it's still booting)
Warmup: 300s
  (during warmup: instance NOT counted in aggregate ASG CPU average)
  (scale-in is blocked while any instance is warming up)
Warmup ends → instance contributes to metrics → ASG can scale in if needed
Without warmup:           With warmup:
Launch 3 instances        Launch 3 instances
They show 0% CPU          They're excluded from average
Aggregate drops to 10%    Aggregate stays at true value
ASG scales IN again! ❌   No false scale-in ✅
Timer Applies To Blocks Purpose
Cooldown Simple scaling All scaling actions Prevent rapid successive actions
Warmup Step + Target Tracking Scale-in only Prevent counting booting instances in metrics
Grace Period Health checks Health check failures Don't kill a booting instance

8. Health Checks ⭐

ASG checks health via two sources — both must pass:

Check Type What It Tests Who Configures
EC2 status checks Instance OS/hardware reachable Always active
ELB health check Application returns healthy HTTP response Enable in ASG settings
Custom health check Your own health signal via API Optional
# Enable ELB health checks for ASG (recommended for all LB-attached ASGs)
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-asg \
  --health-check-type ELB \
  --health-check-grace-period 300

Unhealthy behavior:

Instance fails health check
  → ASG marks as Unhealthy
  → ASG terminates it (deregisters from LB first — connection draining)
  → ASG launches replacement
  → New instance registered to LB
  → Grace period starts (health checks ignored for grace period duration)


9. Lifecycle Hooks ⭐ (Critical Advanced Concept)

Lifecycle hooks let you pause an instance during launch or termination to run custom logic before it proceeds.

Normal launch:       Pending → InService
With launch hook:    Pending → Pending:Wait (PAUSED) → custom action → InService

Normal termination:  InService → Terminating → Terminated
With terminate hook: InService → Terminating:Wait (PAUSED) → custom action → Terminated

Lifecycle Hook Use Cases

Hook Type Use Case
Launch hook Install software, load config, run tests, register with service discovery
Terminate hook Drain application state, backup data, delete secrets, close DB connections, deregister from service discovery
Example: Terminate hook for graceful shutdown
  Instance scaling in
    → Lifecycle hook fires
    → Sends SNS notification → Lambda runs
    → Lambda: drain DB connections, backup /var/app/cache to S3
    → Lambda calls complete-lifecycle-action → instance terminates

Hook timeout: Default 1 hour — instance stays paused up to 1 hour. You call complete-lifecycle-action to release it early.

January 2026 update: New instance lifecycle policy — if a termination hook times out or fails, you can now configure the instance to be retained for manual intervention instead of force-terminated.


10. Termination Policies ⭐

When ASG needs to scale in (remove instances), it picks which instance to terminate:

Policy Logic
Default Selects AZ with most instances → then: oldest LT version → closest to billing hour
OldestInstance Terminates the instance that has been running the longest
NewestInstance Terminates the most recently launched (useful for rollback)
OldestLaunchTemplate Terminates instances running old LT versions (force upgrade)
OldestLaunchConfiguration Same but for LC (legacy)
ClosestToNextInstanceHour Terminates instance closest to next billing hour (cost-efficient)
AllocationStrategy Maintains optimal On-Demand + Spot balance
Custom (Lambda) Fully custom logic via Lambda function

Best practice for rolling deployments: Use OldestLaunchTemplate so instances on old AMI/config are prioritized for termination as new ones launch.


11. Scale-In Protection ⭐

Protect specific instances from being terminated during scale-in events:

Use case: An instance is processing a critical long-running job.
          Scale-in should skip it.

aws autoscaling set-instance-protection \
  --instance-ids i-xxxxxxxx \
  --auto-scaling-group-name my-asg \
  --protected-from-scale-in

Instance protection does NOT protect from health-check-based termination. If the instance fails a health check, ASG terminates it regardless.


12. Warm Pools ⭐ (Reduce Scale-Out Latency)

If your instances take a long time to boot (JVM warm-up, large model loading, complex init), a Warm Pool pre-initializes instances so they're ready instantly when needed.

Without Warm Pool:
  Traffic spike → ASG launches → 5 min boot → instance ready
  → Users experience 5 min degraded service ❌

With Warm Pool:
  Traffic spike → ASG moves pre-initialized instance to InService
  → 0-10 sec transition time ✅

Warm pool continuously keeps N pre-initialized instances:
  Stopped (cheapest — no compute, still EBS cost)
  Running (full cost, fastest to serve)
  Hibernated (memory preserved, faster resume than cold start)
State Cost Resume Time
Stopped EBS only (~$0.10/GB/month) ~60-90 sec (cold start)
Running Full EC2 cost ~10 sec
Hibernated EBS + small overhead ~30-60 sec (RAM restore)

13. Complete Instance Lifecycle ⭐

LAUNCH FLOW:
  Pending
    → (lifecycle hook fires if configured)
    → Pending:Wait  [custom action, up to 1 hr]
    → Pending:Proceed
  InService  ← instance receives traffic
    → (scale-in or failure)
  Terminating
    → (lifecycle hook fires if configured)
    → Terminating:Wait  [custom action]
    → Terminating:Proceed
  Terminated

WARM POOL STATES (if configured):
  Warmed:Pending → Warmed:Running/Stopped/Hibernated
    → Traffic spike → moves to Pending → InService

14. Complete Scaling Flow — End to End

1. CloudWatch alarm: ASGAverageCPUUtilization > 70%
2. Target Tracking policy triggers
3. ASG calculates: need 3 more instances to reach 50% target
4. ASG launches 3 EC2s via Launch Template (LT $Default v3)
5. Launch lifecycle hook (if configured): wait for custom action
6. Instance state: Pending → InService
7. ASG registers instance with ALB Target Group
8. Target Group health check runs
9. Instance health: healthy → starts receiving traffic
10. Warmup period: instance excluded from aggregate metrics
11. Warmup expires → instance contributes to CPU average
12. CPU drops to ~50% → scaling stops

15. Common Mistakes

❌ Wrong ✅ Correct
Cooldown applies to all scaling policies Cooldown applies to Simple Scaling only
Warmup and cooldown are the same thing Cooldown blocks all scaling; warmup excludes metrics from booting instances
Grace period = warmup period Grace period = health check delay; warmup = metric exclusion for scaling math
Target Tracking waits for cooldown TT uses per-instance warmup, not a global cooldown
Simple scaling is fine for production Use Step Scaling or Target Tracking — Simple Scaling's cooldown causes dangerous delays
Scale-in protection prevents health termination Protection only blocks scale-in policy termination — health failures still terminate
Warm Pool instances are free Stopped warm pool instances still pay for EBS storage
Lifecycle hooks default to 30 minutes Default timeout is 1 hour
Termination = immediate ASG deregisters from LB first → connection draining → then terminates
ASG handles Kubernetes node scaling Kubernetes Cluster Autoscaler or Karpenter signals ASG; ASG doesn't watch Kubernetes

16. Interview Questions Checklist

  • What three problems does Auto Scaling solve?
  • Horizontal vs vertical scaling — which does AWS prefer and why?
  • What are the three capacity numbers? Relationship constraint?
  • Why is Minimum = 0 bad for production?
  • List all 5 scaling policy types with a use case for each
  • Simple vs Step vs Target Tracking — key differences?
  • What metric does Target Tracking use for ALB? (ALBRequestCountPerTarget)
  • Cooldown vs warmup period — what does each block and when does each apply?
  • What happens if a newly launched instance is counted in metrics during warmup?
  • What are lifecycle hooks? Give a real use case for terminate hook
  • What is the default lifecycle hook timeout? (1 hour)
  • What is the January 2026 lifecycle policy update? (retention on timeout)
  • What is a termination policy? Name 4 options
  • What is scale-in protection? What doesn't it protect against?
  • What is a Warm Pool? Three instance states and their costs?
  • When does ASG deregister from LB before termination? (always — connection draining)
  • ASG vs Kubernetes scaling — two separate layers, explain each
  • Walk through the complete lifecycle of an instance from launch to termination

Nectar