AWS Databases — RDS
1. Database Types in AWS
| Category | Type | AWS Service | Engine |
| Relational | SQL | RDS | MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, IBM Db2 |
| Relational | SQL (Cloud-native) | Aurora | MySQL-compatible, PostgreSQL-compatible |
| NoSQL | Key-Value | DynamoDB | Proprietary |
| NoSQL | Document | DocumentDB | MongoDB-compatible |
| NoSQL | Columnar | Keyspaces | Apache Cassandra-compatible |
| NoSQL | Graph | Neptune | Gremlin, SPARQL, openCypher |
| NoSQL | Time Series | Timestream | Proprietary |
| Cache | In-Memory | ElastiCache | Redis, Memcached |
| Search | Full-text | OpenSearch | Elasticsearch-compatible |
| Ledger | Immutable | QLDB | Proprietary |
2. Why RDS Exists — Three Options Compared
| Aspect | On-Premises | EC2 + DB Engine | RDS |
| Hardware | You manage | AWS manages | AWS manages |
| OS patching | You manage | You manage | AWS manages |
| DB engine install | You manage | You manage | AWS manages |
| Backups | You manage | You manage | AWS manages |
| Replication | You configure | You configure | AWS manages |
| Failover | You configure | You configure | AWS manages |
| Scaling | You procure hardware | You resize EC2 | You adjust settings |
| You manage | Everything | Schema, queries, indexes | Schema, queries, indexes, tuning |
The key trade-off of RDS: you lose OS-level access (no SSH to the DB instance) in exchange for AWS managing all infrastructure operations. If you need OS access (custom plugins, special OS config), use RDS Custom (available for Oracle and SQL Server only).
3. RDS Architecture — Mental Model ⭐
RDS DB Instance = EC2 instance (hidden) + EBS volume + RDS management layer
┌─────────────────────────────────────────────┐
│ RDS DB Instance │
│ ┌─────────────────────────────────────┐ │
│ │ DB Engine (MySQL / PostgreSQL / …) │ │
│ └────────────────┬────────────────────┘ │
│ │ reads/writes │
│ ┌────────────────▼────────────────────┐ │
│ │ EBS Storage (gp3/io2) │ │
│ └─────────────────────────────────────┘ │
│ │
│ Security Group → controls port access │
│ VPC Subnet → network placement │
│ IAM Role → access to AWS services │
│ Parameter Group→ DB engine config │
└─────────────────────────────────────────────┘
4. Storage Types (EBS-backed)
| Type | IOPS | Use Case | Notes |
| gp2 | 3 IOPS/GB (burst to 3,000) | General purpose | Legacy — use gp3 for new |
| gp3 | 3,000 base (up to 16,000) | General purpose | Cheaper than gp2, decouple IOPS from size |
| io1 | Up to 64,000 IOPS | High performance | Legacy provisioned IOPS |
| io2 / io2 Block Express | Up to 256,000 IOPS | Critical production | Highest durability + IOPS |
Storage Auto Scaling:
Enable max storage threshold (e.g., 1 TB max)
→ RDS automatically increases storage when:
- Free storage < 10% of total
- Low storage lasts > 5 minutes
- 6 hours since last auto-scale
→ No downtime for storage increase
5. Instance Classes
| Class | Type | Use Case | Example |
| T | Burstable | Dev/test, low traffic | db.t3.medium, db.t4g.large |
| M | General Purpose | Balanced workloads | db.m6i.xlarge |
| R | Memory Optimized | High-performance DBs, large datasets | db.r6g.2xlarge |
| X | Memory Intensive | In-memory analytics | db.x2idn |
| Optimized Reads | NVMe SSD cache | Read-heavy workloads | db.r6gd (local NVMe) |
6. Networking ⭐
DB Subnet Group
A logical group of subnets across multiple AZs — RDS deploys into these subnets.
Requirements:
- Must span at least 2 AZs
- Use private subnets (no public access to DB)
- Separate from application subnets (best practice)
Example:
subnet-db-1a (10.0.10.0/24) AZ-1a
subnet-db-1b (10.0.11.0/24) AZ-1b
subnet-db-1c (10.0.12.0/24) AZ-1c
Security Group for RDS
RDS SG inbound rules:
Allow TCP 3306 (MySQL) from App-SG ← reference SG ID, not IP
Allow TCP 5432 (PostgreSQL) from App-SG
Never:
Allow TCP 3306 from 0.0.0.0/0 ← never expose DB to internet
7. Read Replicas vs Multi-AZ ⭐ (Most Critical Distinction)
This is the #1 most tested RDS concept — they are completely different:
| Dimension | Read Replicas | Multi-AZ |
| Purpose | Scale reads | High availability |
| Replication | Asynchronous (slight lag) | Synchronous (zero data loss) |
| Use standby for traffic? | ✅ Yes — queries go to replicas | ❌ No — standby is passive |
| Failover | Manual — you promote it | Automatic — DNS flips |
| Count | Up to 15 | 1 standby (Multi-AZ instance) |
| Cross-region | ✅ Yes | ❌ No — same region only |
| Cost | You pay for replica instance | +~20% instance cost for standby |
| Promotes to primary? | Yes — manual promotion breaks replication | Yes — automatic via DNS CNAME flip |
Read Replica use case:
App → write → Primary DB
App → read → Read Replica (offload reads)
Reporting tools → Read Replica (heavy queries don't affect primary)
Multi-AZ use case:
Primary fails (AZ down, hardware failure)
→ RDS flips DNS CNAME to standby
→ Application reconnects to same endpoint — no code change
→ Failover: ~1–2 minutes (Multi-AZ Instance) [aws.amazon](https://aws.amazon.com/blogs/database/choose-the-right-amazon-rds-deployment-option-single-az-instance-multi-az-instance-or-multi-az-database-cluster/)
Read Replica is NOT a backup. It has async replication — if primary data is corrupted or deleted, the corruption replicates. Use snapshots for point-in-time backup.
8. Deployment Options ⭐
Single-AZ
[Primary Instance] ← all reads + writes
One AZ — instance failure = downtime until manual recovery
Use: Dev/test only
Multi-AZ Instance (Classic HA)
[Primary Instance] ──sync replication──→ [Standby Instance]
Same Region, different AZ ← passive, no traffic
Failover time: ~1–2 minutes [aws.amazon](https://aws.amazon.com/blogs/database/choose-the-right-amazon-rds-deployment-option-single-az-instance-multi-az-instance-or-multi-az-database-cluster/)
RPO (data loss): ~0 (synchronous)
RTO (downtime): ~1–2 minutes
Multi-AZ Cluster (New — 2022)
[Writer Instance]
├── sync replication ──→ [Reader Instance 1] AZ-2 ← serves reads
└── sync replication ──→ [Reader Instance 2] AZ-3 ← serves reads
Failover time: ~25–75 seconds [aws.amazon](https://aws.amazon.com/blogs/database/choose-the-right-amazon-rds-deployment-option-single-az-instance-multi-az-instance-or-multi-az-database-cluster/)
Benefit over Multi-AZ Instance: readers serve traffic (not passive)
9. Backups ⭐
Automated Backups
Retention: 1–35 days (default: 7 days; set to 0 to disable)
Backup window: daily snapshot during configured time window
Transaction logs: backed up every 5 minutes → enables PITR to any second
Point-in-Time Recovery (PITR):
Restore DB to any second within retention window
Creates NEW DB instance (does not overwrite existing)
Manual Snapshots
User-triggered
Retained indefinitely (not affected by retention period)
Can copy to another region → cross-region DR
Can share with another AWS account
Restore → creates new DB instance with restored data
Key Backup Behaviors
| Property | Automated | Manual Snapshot |
| Retention | 1–35 days | Forever (until you delete) |
| Trigger | Automatic | Manual |
| PITR | ✅ | ❌ (point-in-time to snapshot date only) |
| Survives DB deletion | ❌ (deleted with DB unless final snapshot taken) | ✅ |
10. Security
Encryption at Rest
Enabled at creation with KMS key (cannot encrypt existing unencrypted instance)
Encrypts: DB instance + automated backups + read replicas + snapshots
To encrypt existing unencrypted DB:
1. Create snapshot
2. Copy snapshot with encryption enabled
3. Restore from encrypted snapshot
4. Point application to new instance
Encryption in Transit
SSL/TLS enforced by parameter group setting:
MySQL: rds.force_ssl = 1
PostgreSQL: rds.force_ssl = 1
Authentication
| Method | How | Best For |
| Password | Username + password in connection string | Basic |
| IAM Auth | Generate IAM auth token (15 min TTL), no long-lived password | EC2/Lambda connecting to RDS |
| Kerberos | Microsoft AD integration | Enterprise, Windows environments |
IAM Authentication flow:
1. EC2 calls aws rds generate-db-auth-token (requires IAM permission)
2. Token valid for 15 minutes
3. Use token as password in DB connection
4. RDS validates token against IAM — no password stored anywhere ✅
11. RDS Proxy ⭐
Solves the connection explosion problem — especially Lambda → RDS:
Problem:
Lambda scales to 1,000 concurrent functions
Each opens a DB connection
MySQL max connections: ~100–200 for small instances
→ Connection errors, DB overload ❌
With RDS Proxy:
Lambda → RDS Proxy (connection pool: 100 connections to DB)
→ Proxy multiplexes 1,000 Lambda connections onto 100 DB connections ✅
→ DB sees manageable connection count
Also:
- IAM authentication enforced at proxy level
- Secrets Manager integration (credentials never in Lambda code)
- Failover faster: ~66% reduction in failover time (proxy handles reconnection)
12. Monitoring ⭐
| Tool | Level | Key Metrics |
| CloudWatch Metrics | Instance level | CPU, FreeableMemory, DatabaseConnections, ReadIOPS, WriteIOPS |
| Enhanced Monitoring | OS level (50+ metrics) | Per-process CPU, memory breakdown, file system |
| Performance Insights | SQL/query level | Top SQL by wait type, load, execution time |
Performance Insights:
Free: 7 days retention
Paid: up to 2 years
Shows: which SQL queries are causing load, what they're waiting for
Use: identify slow queries without needing slow query log
13. Aurora — AWS Cloud-Native Database ⭐
Aurora is NOT the same as RDS with MySQL/PostgreSQL. It's a reimagined architecture:
Architecture Difference
Standard RDS (MySQL): Aurora:
[Compute] → [EBS Volume] [Compute (Primary + 15 Replicas)]
One EBS per instance ↓
Replication copies data [Shared Cluster Volume]
Replica lag: seconds 6 copies of data across 3 AZs
Replicas share storage — no copy needed
Replica lag: milliseconds
Aurora Key Numbers
| Property | Value |
| Copies of data | 6 copies across 3 AZs (2 per AZ) |
| Survives losing | Up to 2 copies without write impact; 3 copies without read impact |
| Max read replicas | 15 (vs 5 for standard RDS) |
| Storage auto-scale | Grows in 10 GB increments up to 128 TB automatically |
| Failover time | ~25–75 seconds (faster than RDS Multi-AZ 1–2 min) |
| Backtrack | Rewind DB to any point in last 72 hours — no restore from snapshot needed |
| Endpoints | Writer endpoint, Reader endpoint (load-balanced across all replicas) |
Aurora Endpoints:
Writer endpoint: my-cluster.cluster-xyz.us-east-1.rds.amazonaws.com
→ Always points to primary (even after failover)
Reader endpoint: my-cluster.cluster-ro-xyz.us-east-1.rds.amazonaws.com
→ Load-balances reads across all Aurora Replicas
Aurora Serverless v2
Automatically scales compute in fine-grained increments (0.5 ACU units)
Pay per ACU-second used — no idle cost
Scales from minimum to maximum ACU in ~seconds
Use case: unpredictable workloads, dev/test, multitenant apps
Aurora Global Database
One primary Region + up to 5 secondary read-only Regions
Replication lag: < 1 second (storage-based replication, no DB impact)
Use case:
Global apps needing low latency reads in multiple continents
DR: promote secondary to primary in < 1 minute if primary region fails
Failover types:
Managed failover: RDS orchestrates promotion (~1 min)
Manual failover: you control the switch
14. Multi-AZ vs Read Replicas vs Aurora Replicas ⭐
| Feature | RDS Multi-AZ | RDS Read Replicas | Aurora Replicas |
| Purpose | HA failover | Read scaling | Read scaling + HA |
| Replication | Synchronous | Asynchronous | Synchronous (shared storage) |
| Serves reads? | ❌ Standby passive | ✅ Yes | ✅ Yes |
| Failover | Automatic | Manual promotion | Automatic (~25–75s) |
| Max count | 1 standby | 15 | 15 |
| Cross-region | ❌ | ✅ | ✅ (Global DB) |
| Lag | 0 | Seconds | Milliseconds |
15. Common Mistakes
| ❌ Wrong | ✅ Correct |
| Multi-AZ standby serves reads | Multi-AZ standby is passive — zero traffic; use Read Replicas for reads |
| Read Replica = automatic failover | Read Replicas require manual promotion — no automatic failover |
| Multi-AZ replication is async | Multi-AZ uses synchronous replication — zero data loss |
| Read Replicas prevent data corruption | Replication is async — deletions and corruptions replicate to replicas |
| Aurora is just faster RDS | Aurora uses shared cluster storage — architecturally different, not just faster |
| Aurora failover same as RDS Multi-AZ | Aurora: 25–75s; RDS Multi-AZ Instance: ~1–2 min |
| Cannot encrypt existing RDS DB | True — but workaround: snapshot → encrypt copy → restore new instance |
| RDS Proxy needed for EC2 apps | RDS Proxy primarily solves Lambda → RDS connection pooling; EC2 with small connection count doesn't need it |
| Automated backups persist after DB deletion | Automated backups deleted with DB unless you take a final manual snapshot |
| Backup retention default is 0 (off) | Default retention is 7 days (set to 0 explicitly to disable) |
16. Interview Questions Checklist
Nectar