AWS Auto Scaling¶

1. Why Auto Scaling Exists¶

Problem	Without ASG	With ASG
Traffic spike	Servers overload → downtime	New instances launched automatically
Low traffic	Servers idle → wasted cost	Instances terminated, cost drops
Instance failure	Manual detection + replacement	Auto-detected, auto-replaced
Deployment	Manual rollout, risky	Instance Refresh = controlled rolling update

ASG simultaneously provides availability, elasticity, and cost efficiency — three goals that are normally in tension.

2. Scaling Dimensions: Horizontal vs Vertical¶

Type	What changes	AWS approach	Example
Horizontal (Scale Out/In)	Number of instances	✅ ASG handles automatically	2 → 10 → 2 instances
Vertical (Scale Up/Down)	Size of instance	⚠️ Manual — requires stop/resize/start	t3.medium → t3.xlarge

Cloud architecture strongly prefers horizontal scaling — it's elastic, doesn't require downtime, and distributes failure risk. Vertical scaling has an upper ceiling (largest instance type) and requires downtime.

3. ASG vs Kubernetes Scaling — Layer Separation¶

User Traffic
  ↓
Kubernetes HPA (Horizontal Pod Autoscaler)
  → Scales PODS when CPU/memory high
  → But: pods need NODES (EC2) to run on
  ↓
EC2 Node capacity fills up
  ↓
Kubernetes Cluster Autoscaler (CA)
  → Signals AWS ASG: "I need more nodes"
  ↓
AWS Auto Scaling Group
  → Launches EC2 instances (infrastructure layer)
  ↓
Karpenter (alternative to CA)
  → Direct AWS API calls, faster node provisioning

Tool	Layer	Scales
HPA (Kubernetes)	Application	Pods and containers
VPA (Kubernetes)	Application	Pod resource requests
Cluster Autoscaler	Infrastructure	EC2 nodes via ASG
Karpenter	Infrastructure	EC2 nodes directly (no ASG dependency)
AWS ASG	Infrastructure	EC2 instances

4. Capacity — Three Numbers ⭐¶

Minimum ≤ Desired ≤ Maximum

Minimum:  Floor — ASG never goes below this (availability guarantee)
Desired:  Current target — ASG actively maintains this count
Maximum:  Ceiling — ASG never exceeds this (cost protection)

Setting	Recommended for Production
Minimum	≥ 1 (prevents full outage); ≥ 2 for true HA
Desired	Start with baseline capacity based on normal load
Maximum	Cap to control cost; set high enough for peak + margin

Setting Minimum = 0 means ASG can scale to zero — valid for dev/test to save cost, but dangerous for production (zero instances = zero availability).

5. ASG Instance Distribution — Multi-AZ ⭐¶

ASG distributes instances across the subnets you configure (one per AZ).

Mode	Behavior	Use When
Balanced (default)	Tries equal distribution across AZs	High availability (recommended)
Balanced Best Effort	Prioritizes speed over balance	Fast scale-out needed
Balanced Only	Strict balance — delays if imbalance	Compliance / strict HA

ASG with 3 AZs, desired = 6:
  AZ-1a: 2 instances
  AZ-1b: 2 instances
  AZ-1c: 2 instances

If AZ-1a instances terminate:
  ASG rebalances: launches in AZ-1a to restore balance

6. Scaling Policies — Complete Breakdown ⭐¶

1. Manual Scaling¶

Change desired capacity directly — ASG launches/terminates to reach it. Health replacement and AZ balancing still managed by ASG.

aws autoscaling set-desired-capacity \
  --auto-scaling-group-name my-asg \
  --desired-capacity 5

2. Scheduled Scaling¶

Scale at predictable times — runs before the load arrives.

Schedule: cron(0 8 * * 1-5)    → business hours start → desired=10
Schedule: cron(0 20 * * 1-5)   → business hours end   → desired=2
Schedule: cron(0 0 * * 0)      → Sunday midnight       → desired=20

Use case: Known load patterns — office hours, end-of-month reports, weekly jobs.

3. Dynamic Scaling¶

Scale in response to real-time CloudWatch metrics.

a) Simple Scaling (Legacy)¶

One threshold, one action, mandatory cooldown wait:

Alarm: CPU > 70% → Add 1 instance
       (then wait cooldown before re-evaluating)

Limitation	Detail
Cooldown blocks all scaling	Must wait even if load keeps climbing
Single threshold	Cannot respond proportionally to severity
Deprecated in spirit	Use Step Scaling instead

b) Step Scaling ⭐¶

Multiple thresholds → proportional response — no mandatory cooldown wait:

CPU 60–70%:  Add 1 instance (mild spike)
CPU 70–85%:  Add 2 instances (moderate spike)
CPU 85–100%: Add 4 instances (severe spike)

CPU 40–30%:  Remove 1 instance
CPU 30–0%:   Remove 2 instances

Step scaling does not wait for cooldown before triggering again. New instances have individual warmup periods instead.

c) Target Tracking Scaling ⭐ (Recommended)¶

You define the target — AWS controls the adjustment automatically:

Target: CPU = 50%
  Current CPU: 80% → ASG adds instances until CPU drops to ~50%
  Current CPU: 20% → ASG removes instances until CPU rises to ~50%
  AWS uses predictive math — adds more than "just enough" to avoid overshoot

Built-in metrics for Target Tracking:

Metric	Target Example
`ASGAverageCPUUtilization`	50%
`ASGAverageNetworkIn`	1 GB/min
`ASGAverageNetworkOut`	1 GB/min
`ALBRequestCountPerTarget`	1000 req/target

Custom metric: Any CloudWatch metric (SQS queue depth, custom app metrics).

Target Tracking is the most widely used policy — it's self-tuning and handles both scale-out AND scale-in with one rule.

4. Predictive Scaling¶

Uses ML on historical data to pre-scale before load arrives:

Historical pattern: CPU spikes every Monday 9 AM
  → Sunday 11 PM: ASG pre-launches instances
  → Monday 9 AM: instances already running ← no launch latency ✅

Property	Detail
Requires	At least 24 hours of history (14 days for full accuracy)
Modes	Forecast only (visibility) or Forecast + Scale (active)
Combined	Use with Target Tracking for best results

Scaling Policy Comparison ⭐¶

Policy	Cooldown	Proportional	Best For
Simple	✅ Required (blocks)	❌ One step	Legacy only
Step	Warmup per instance	✅ Yes (steps)	Variable spikes
Target Tracking	Warmup per instance	✅ Auto-calculated	Production default
Scheduled	N/A	N/A	Predictable patterns
Predictive	N/A	ML-based	Regular recurring patterns

7. Cooldown vs Warmup — Critical Distinction ⭐¶

These two timers are completely different — confusing them is the #1 ASG mistake.

Cooldown Period (Simple Scaling only)¶

Purpose: Prevents launching/terminating another batch immediately after one action.

Simple scaling triggers → +2 instances launched
Cooldown starts: 300s
  (during this time: no new scaling actions, even if CPU stays high)
Cooldown ends → ASG re-evaluates

Only applies to Simple Scaling. Target Tracking and Step Scaling do NOT use cooldown — they use warmup periods instead.

Instance Warmup Period (Step + Target Tracking)¶

Purpose: Prevents newly launched instances from being counted in metrics until they're actually ready, avoiding premature scale-in.

New instance launched (CPU reads 0% — it's still booting)
Warmup: 300s
  (during warmup: instance NOT counted in aggregate ASG CPU average)
  (scale-in is blocked while any instance is warming up)
Warmup ends → instance contributes to metrics → ASG can scale in if needed

Without warmup:           With warmup:
Launch 3 instances        Launch 3 instances
They show 0% CPU          They're excluded from average
Aggregate drops to 10%    Aggregate stays at true value
ASG scales IN again! ❌   No false scale-in ✅

Timer	Applies To	Blocks	Purpose
Cooldown	Simple scaling	All scaling actions	Prevent rapid successive actions
Warmup	Step + Target Tracking	Scale-in only	Prevent counting booting instances in metrics
Grace Period	Health checks	Health check failures	Don't kill a booting instance

8. Health Checks ⭐¶

ASG checks health via two sources — both must pass:

Check Type	What It Tests	Who Configures
EC2 status checks	Instance OS/hardware reachable	Always active
ELB health check	Application returns healthy HTTP response	Enable in ASG settings
Custom health check	Your own health signal via API	Optional

# Enable ELB health checks for ASG (recommended for all LB-attached ASGs)
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-asg \
  --health-check-type ELB \
  --health-check-grace-period 300

Unhealthy behavior:

Instance fails health check
  → ASG marks as Unhealthy
  → ASG terminates it (deregisters from LB first — connection draining)
  → ASG launches replacement
  → New instance registered to LB
  → Grace period starts (health checks ignored for grace period duration)

9. Lifecycle Hooks ⭐ (Critical Advanced Concept)¶

Lifecycle hooks let you pause an instance during launch or termination to run custom logic before it proceeds.

Normal launch:       Pending → InService
With launch hook:    Pending → Pending:Wait (PAUSED) → custom action → InService

Normal termination:  InService → Terminating → Terminated
With terminate hook: InService → Terminating:Wait (PAUSED) → custom action → Terminated

Lifecycle Hook Use Cases¶

Hook Type	Use Case
Launch hook	Install software, load config, run tests, register with service discovery
Terminate hook	Drain application state, backup data, delete secrets, close DB connections, deregister from service discovery

Example: Terminate hook for graceful shutdown
  Instance scaling in
    → Lifecycle hook fires
    → Sends SNS notification → Lambda runs
    → Lambda: drain DB connections, backup /var/app/cache to S3
    → Lambda calls complete-lifecycle-action → instance terminates

Hook timeout: Default 1 hour — instance stays paused up to 1 hour. You call complete-lifecycle-action to release it early.

January 2026 update: New instance lifecycle policy — if a termination hook times out or fails, you can now configure the instance to be retained for manual intervention instead of force-terminated.

10. Termination Policies ⭐¶

When ASG needs to scale in (remove instances), it picks which instance to terminate:

Policy	Logic
Default	Selects AZ with most instances → then: oldest LT version → closest to billing hour
OldestInstance	Terminates the instance that has been running the longest
NewestInstance	Terminates the most recently launched (useful for rollback)
OldestLaunchTemplate	Terminates instances running old LT versions (force upgrade)
OldestLaunchConfiguration	Same but for LC (legacy)
ClosestToNextInstanceHour	Terminates instance closest to next billing hour (cost-efficient)
AllocationStrategy	Maintains optimal On-Demand + Spot balance
Custom (Lambda)	Fully custom logic via Lambda function

Best practice for rolling deployments: Use OldestLaunchTemplate so instances on old AMI/config are prioritized for termination as new ones launch.

11. Scale-In Protection ⭐¶

Protect specific instances from being terminated during scale-in events:

Use case: An instance is processing a critical long-running job.
          Scale-in should skip it.

aws autoscaling set-instance-protection \
  --instance-ids i-xxxxxxxx \
  --auto-scaling-group-name my-asg \
  --protected-from-scale-in

Instance protection does NOT protect from health-check-based termination. If the instance fails a health check, ASG terminates it regardless.

12. Warm Pools ⭐ (Reduce Scale-Out Latency)¶

If your instances take a long time to boot (JVM warm-up, large model loading, complex init), a Warm Pool pre-initializes instances so they're ready instantly when needed.

Without Warm Pool:
  Traffic spike → ASG launches → 5 min boot → instance ready
  → Users experience 5 min degraded service ❌

With Warm Pool:
  Traffic spike → ASG moves pre-initialized instance to InService
  → 0-10 sec transition time ✅

Warm pool continuously keeps N pre-initialized instances:
  Stopped (cheapest — no compute, still EBS cost)
  Running (full cost, fastest to serve)
  Hibernated (memory preserved, faster resume than cold start)

State	Cost	Resume Time
Stopped	EBS only (~$0.10/GB/month)	~60-90 sec (cold start)
Running	Full EC2 cost	~10 sec
Hibernated	EBS + small overhead	~30-60 sec (RAM restore)

13. Complete Instance Lifecycle ⭐¶

LAUNCH FLOW:
  Pending
    → (lifecycle hook fires if configured)
    → Pending:Wait  [custom action, up to 1 hr]
    → Pending:Proceed
  InService  ← instance receives traffic
    → (scale-in or failure)
  Terminating
    → (lifecycle hook fires if configured)
    → Terminating:Wait  [custom action]
    → Terminating:Proceed
  Terminated

WARM POOL STATES (if configured):
  Warmed:Pending → Warmed:Running/Stopped/Hibernated
    → Traffic spike → moves to Pending → InService

14. Complete Scaling Flow — End to End¶

1. CloudWatch alarm: ASGAverageCPUUtilization > 70%
   ↓
2. Target Tracking policy triggers
   ↓
3. ASG calculates: need 3 more instances to reach 50% target
   ↓
4. ASG launches 3 EC2s via Launch Template (LT $Default v3)
   ↓
5. Launch lifecycle hook (if configured): wait for custom action
   ↓
6. Instance state: Pending → InService
   ↓
7. ASG registers instance with ALB Target Group
   ↓
8. Target Group health check runs
   ↓
9. Instance health: healthy → starts receiving traffic
   ↓
10. Warmup period: instance excluded from aggregate metrics
    ↓
11. Warmup expires → instance contributes to CPU average
    ↓
12. CPU drops to ~50% → scaling stops

15. Common Mistakes¶

❌ Wrong	✅ Correct
Cooldown applies to all scaling policies	Cooldown applies to Simple Scaling only
Warmup and cooldown are the same thing	Cooldown blocks all scaling; warmup excludes metrics from booting instances
Grace period = warmup period	Grace period = health check delay; warmup = metric exclusion for scaling math
Target Tracking waits for cooldown	TT uses per-instance warmup, not a global cooldown
Simple scaling is fine for production	Use Step Scaling or Target Tracking — Simple Scaling's cooldown causes dangerous delays
Scale-in protection prevents health termination	Protection only blocks scale-in policy termination — health failures still terminate
Warm Pool instances are free	Stopped warm pool instances still pay for EBS storage
Lifecycle hooks default to 30 minutes	Default timeout is 1 hour
Termination = immediate	ASG deregisters from LB first → connection draining → then terminates
ASG handles Kubernetes node scaling	Kubernetes Cluster Autoscaler or Karpenter signals ASG; ASG doesn't watch Kubernetes