Amazon S3

1. What is S3?

Amazon S3 (Simple Storage Service) is an infinitely scalable, highly durable, object-based storage service accessible over HTTP/HTTPS. Every object gets a URL — that's why it's classified as a web service.

https://bucket-name.s3.region.amazonaws.com/folder/object.png

Durability: 99.999999999% (11 nines) — S3 automatically replicates data across a minimum of 3 AZs within a Region (General Purpose buckets).


2. Storage Model — Object vs Block vs File

Dimension S3 (Object) EBS (Block) EFS (File)
Structure Objects + metadata Fixed-size blocks Hierarchical filesystem
Mount as drive ✅ (single EC2) ✅ (multi-EC2 via NFS)
OS install
Access method HTTP REST API OS filesystem NFS protocol
Max size 5 TB per object Volume size Unlimited
Use case Files, backups, data lakes Root volumes, DBs Shared app data
Object = Data (bytes) + Metadata (key-value) + Key (unique identifier path)

Example:
  Key:      images/profile/user-123.jpg
  Data:     (binary image bytes)
  Metadata: Content-Type: image/jpeg
            ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249"
            x-amz-meta-uploaded-by: user-456   ← custom metadata

Strong read-after-write consistency (since December 2020): After a PUT or DELETE, all subsequent GETs immediately see the new state. No eventual consistency lag — this was a major S3 upgrade.


3. Buckets ⭐

Core Properties

Property Detail
Name Globally unique across all AWS accounts + regions
Scope Regional — each bucket belongs to one Region
Default limit 100 per account (soft limit; can request up to 1,000)
Nesting ❌ No nested buckets — flat namespace only
Rename/move ❌ Immutable — cannot change name or Region after creation
Size Unlimited total; unlimited object count
Object size Min: 0 bytes; Max: 5 TB

Flat Namespace + Prefixes

S3 has no real folders — the / in a key is just a character delimiter:

Bucket: my-project
Keys:
  src/app.js                ← prefix "src/"
  src/utils/helper.js       ← prefix "src/utils/"
  README.md                 ← no prefix

"Create folder" in console = AWS creates a zero-byte object: src/
This is a visual convenience — not a real directory object.

Bucket Naming Rules

  • Lowercase letters, numbers, hyphens only; 3–63 characters
  • Cannot start or end with a hyphen
  • Cannot look like an IP address (192.168.5.4)
  • Static website: bucket name must exactly match domain name

4. Bucket Types

Type Storage AZ Count Speed Use Case
General Purpose (default) Multi-AZ ≥3 AZs Standard All workloads
Directory (S3 Express One Zone) Single AZ 1 AZ ~10× lower latency ML training, real-time analytics, high-freq reads
Directory bucket naming format:
  bucket-base-name--az-id--x-s3
  Example: my-data--use1-az4--x-s3

Directory Buckets trade availability for speed. If the AZ goes down, data is unavailable. Never use for data you cannot afford to lose without a backup elsewhere.


5. Uploading Objects — Multipart Upload ⭐

Object Size Method Reason
< 100 MB Single PUT Simple, fast
100 MB – 5 GB Recommended multipart Better performance, resume on failure
> 5 GB Required multipart Single PUT limited to 5 GB

How Multipart Works

1. Initiate upload → get Upload ID
2. Split file into parts (min 5 MB each, max 10,000 parts)
3. Upload each part independently (parallelizable) → get ETag per part
4. Complete upload → S3 assembles parts into final object
(If a part fails → retry just that part, not the whole file)

CLI shortcut (handles multipart automatically):
  aws s3 cp large-file.iso s3://my-bucket/  --storage-class STANDARD

Lifecycle policy tip: Create a rule to abort incomplete multipart uploads after N days — orphaned parts cost storage without forming a complete object.


6. S3 Performance Limits ⭐

S3 scales per prefix (not per bucket):

Operation Rate per Prefix
PUT / COPY / POST / DELETE 3,500 requests/sec
GET / HEAD 5,500 requests/sec
Prefix = everything in the key before the last "/"

Keys with different prefixes = independent performance limits:
  2026/jan/file.jpg  → prefix "2026/jan/" → 5,500 GET/s
  2026/feb/file.jpg  → prefix "2026/feb/" → 5,500 GET/s  (separate limit)
  2026/mar/file.jpg  → prefix "2026/mar/" → 5,500 GET/s  (separate limit)

Total: 3 prefixes × 5,500 = 16,500 GET/s for this bucket

Old advice (pre-2018): randomize key names to spread across partitions.
Current advice: just use logical prefixes — S3 auto-partitions them.

7. Access Control ⭐

S3 has three independent access control mechanisms — they stack:

Request allowed? = (IAM Policy allows) AND (Bucket Policy allows OR IAM allows) AND (Block Public Access doesn't block)

IAM Policy

Controls what an IAM identity (user/role) can do to S3 resources.

{
  "Effect": "Allow",
  "Action": ["s3:GetObject"],
  "Resource": "arn:aws:s3:::my-bucket/*"
}

Bucket Policy

Resource-based policy attached directly to a bucket — controls access from any principal (same account, cross-account, public):

{
  "Effect": "Allow",
  "Principal": "*",
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::my-website-bucket/*"
}
Scenario Use
EC2/Lambda reads bucket (same account) IAM role + IAM policy
Cross-account access Bucket policy (specify other account ARN)
Public static website Bucket policy with "Principal": "*"
Restrict to VPC Bucket policy with aws:SourceVpc condition

Block Public Access ⭐

Four independent settings — default: ALL ON:

Setting What it blocks
BlockPublicAcls New public ACLs from being set
IgnorePublicAcls Existing public ACLs (ignores them)
BlockPublicPolicy New bucket policies granting public access
RestrictPublicBuckets Existing policies that grant public access

Block Public Access can be set at the account level (applies to all buckets in the account) or bucket level (individual bucket). Account-level setting overrides bucket-level — you cannot make a bucket public if the account-level BPA is ON.

ACLs (Legacy — Avoid)

Per-object or per-bucket permissions. Disabled by default since 2023. AWS recommends keeping ACLs disabled — use bucket policies instead.


8. Encryption ⭐

Default encryption applies to ALL objects when no encryption is specified at upload. Since January 2023, SSE-S3 is applied by default to every bucket automatically.

SSE-S3 (Server-Side Encryption with S3 Managed Keys)

Upload → S3 encrypts with AES-256 → stores ciphertext + manages key
         You: do nothing — fully transparent
         Key rotation: automatic by AWS
         Cost: free

SSE-KMS (Server-Side Encryption with KMS Keys)

Upload → S3 calls KMS to generate Data Key → encrypts with Data Key
         KMS key (CMK) is audited in CloudTrail ← you can see who accessed what
         Cost: KMS API call charges apply
         Benefit: key rotation control, cross-account access, audit trail

Envelope encryption:

KMS CMK encrypts → Data Encryption Key (DEK)
DEK encrypts → actual S3 object data
(CMK never leaves KMS — only DEK is used by S3)

SSE-KMS has a KMS API call limit — for very high throughput (>1,000 req/s), use SSE-S3 or request a KMS quota increase.

SSE-C (Server-Side Encryption with Customer Provided Keys)

Upload: you provide the encryption key in the request header
S3: encrypts using your key → stores ciphertext → DISCARDS your key
Download: you MUST provide the same key again — AWS does not store it
Property SSE-S3 SSE-KMS SSE-C
Key managed by AWS AWS KMS (or you) You
Audit trail ✅ CloudTrail
Key rotation Automatic Configurable Manual
HTTPS required No No Yes (mandatory)
Cost Free KMS charges Free (you manage keys)

Client-Side Encryption

You encrypt before sending to S3 — S3 stores already-encrypted bytes. AWS never sees unencrypted data.


9. Versioning ⭐

States

Unversioned (default) → Versioning Enabled → Versioning Suspended
                                              (cannot go back to Unversioned)

Key Behaviors

Event Behavior
Upload same key New version created with new version ID; old version preserved
Upload before versioning Version ID = null
Delete (no version ID specified) Delete marker placed — object appears deleted but versions intact
Delete specific version ID Permanently removes that version
Delete a delete marker Restores the object (latest version becomes current)
Timeline:
  v1: report.pdf (version: ABCD, null if pre-versioning)
  v2: report.pdf (version: EFGH) ← latest
  DELETE (no version specified) → adds delete marker (version: IJKL)
  → GET report.pdf → 404 (but v1 and v2 still exist!)
  DELETE the delete marker → v2 becomes current again ✅

Versioning charges for all stored versions — use lifecycle policies to expire old versions or you accumulate large storage costs over time.

MFA Delete

Extra protection layer on top of versioning: - Requires MFA code to permanently delete a version or disable versioning - Must be enabled by the root account user - Prevents accidental or malicious permanent deletion


10. Object Lock (WORM) ⭐

Object Lock implements WORM (Write Once Read Many) — object cannot be modified or deleted during the retention period.

Requirement: versioning must be enabled
Enable: at bucket creation (cannot enable after)

Retention Modes

Mode Who Can Override Use Case
Governance Users with s3:BypassGovernanceRetention permission Internal compliance, testing
Compliance Nobody — not even root Regulatory mandates (SEC, FINRA, HIPAA)
Feature Set by Duration Override
Retain Until Date Policy/user Fixed date Governance: yes; Compliance: no
Legal Hold User Indefinite (until removed) User with s3:PutObjectLegalHold
Legal hold = on/off switch independent of retention period
  Use case: litigation hold — freeze specific objects regardless of retention

11. Pre-Signed URLs ⭐

Generate a time-limited URL that grants temporary access to a private object:

Normal: private object → 403 Forbidden (no public access)
Pre-signed URL: private object → accessible for X hours via special URL

Format:
https://bucket.s3.amazonaws.com/file.pdf
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=...
  &X-Amz-Date=...
  &X-Amz-Expires=3600          ← valid for 1 hour
  &X-Amz-Signature=...

Generate via CLI:
  aws s3 presign s3://my-bucket/report.pdf --expires-in 3600
Property Detail
Default expiry 1 hour (max 7 days for IAM roles; 7 days for IAM users)
Signed with Credentials of whoever generates it
Permissions Inherits the generator's permissions at time of use
Use case Share private files, temporary download links, video streaming

If the generating IAM role is revoked, the pre-signed URL immediately stops working.


12. ARN / URI / URL Reference

Format Example Used For
ARN arn:aws:s3:::my-bucket IAM policies, resource identification
ARN (object) arn:aws:s3:::my-bucket/key IAM policies scoped to object
S3 URI s3://my-bucket/key AWS CLI, SDK, internal AWS reference
HTTPS URL https://my-bucket.s3.amazonaws.com/key Browser access, public URLs
Path-style URL (legacy) https://s3.amazonaws.com/my-bucket/key Legacy — being deprecated
Transfer Acceleration URL https://my-bucket.s3-accelerate.amazonaws.com/key Global uploads via edge

13. Storage Classes — Complete Reference ⭐

Class Retrieval Min Duration Durability Use Case
Standard Instant None 11 nines (3+ AZs) Active data, frequent access
Intelligent-Tiering Instant None 11 nines Unknown/changing access patterns
Standard-IA Instant 30 days 11 nines Backups, monthly access
One Zone-IA Instant 30 days Single AZ only Re-creatable data, secondary backups
Glacier Instant Retrieval Milliseconds 90 days 11 nines Quarterly access, compliance archives
Glacier Flexible Retrieval 1 min–12 hrs 90 days 11 nines Backups, disaster recovery
Glacier Deep Archive 12–48 hrs 180 days 11 nines Legal holds, 7–10yr retention

Glacier Flexible Retrieval Options

Speed Time Cost
Expedited 1–5 minutes Highest
Standard 3–5 hours Medium
Bulk 5–12 hours Free

Bulk is free — for large planned restorations where timing doesn't matter, always use Bulk retrieval.

Intelligent-Tiering — How It Works

Automatically moves objects between tiers based on access:
  Frequent Access tier        ← default (Standard pricing)
  Infrequent Access tier      ← after 30 days of no access
  Archive Instant Access      ← after 90 days
  Archive Access              ← after 90–730 days (optional, activate it)
  Deep Archive Access         ← after 180–730 days (optional)

Monitoring fee: ~$0.0025 per 1,000 objects/month
  (Not worth it for objects < 128 KB — AWS does not charge monitoring for those)

Use Intelligent-Tiering for data whose access patterns are unknown or change over time — especially large datasets like data lakes.

Cost Dimension (us-east-1, approximate)

Class Storage $/GB/month Retrieval fee
Standard $0.023 None
Standard-IA $0.0125 Per-GB charge
Glacier Instant $0.004 Per-GB charge
Glacier Deep Archive $0.00099 Per-GB charge + time

14. Lifecycle Policies ⭐

Automate transitions and deletions — no manual intervention:

Rule example: "logs/" prefix
  Day 0:   Upload → Standard
  Day 30:  Transition → Standard-IA
  Day 90:  Transition → Glacier Flexible Retrieval
  Day 365: Permanently delete

Rule example: versioning cleanup
  Noncurrent versions older than 30 days → delete
  Incomplete multipart uploads older than 7 days → abort

Transition Constraints

✅ Allowed transitions (downward only):
  Standard → IA → Glacier Instant → Glacier Flexible → Deep Archive

❌ NOT allowed:
  Any Glacier → Standard/IA  (need to restore first, then re-upload)
  IA → Standard               (manual action required)

Minimum object size for IA transitions: 128 KB
(Smaller objects are not transitioned — they're cheaper in Standard)

15. Replication ⭐

Types

Type Scope Use Case
SRR (Same-Region Replication) Same Region, different bucket Log aggregation, prod→test copy
CRR (Cross-Region Replication) Different Region DR, latency, compliance

Requirements & Behavior

Requirements:
  - Versioning enabled on BOTH source and destination
  - IAM role with s3:ReplicateObject permission
  - Cross-account: destination bucket policy must allow source account

Behavior:
  - Only replicates NEW objects after replication is enabled
  - Existing objects: use S3 Batch Operations to replicate retroactively [docs.aws.amazon](https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops.html)
  - Deletes NOT replicated by default (can enable delete marker replication)
  - Encrypted objects (SSE-KMS): must configure KMS key in destination region

Replication Time Control (RTC)

Optional paid add-on — guarantees 99.99% of objects replicated within 15 minutes (useful for compliance that requires near-real-time replication).


16. Event Notifications ⭐

Trigger downstream processing when objects change:

Destination Use Case
Lambda Process file immediately (thumbnail, transcode, validate)
SNS Fan-out to multiple subscribers
SQS Queue for async/batch processing
EventBridge Advanced filtering, multiple targets, replay
Event types:
  s3:ObjectCreated:*     ← any upload method
  s3:ObjectRemoved:*     ← delete or delete marker
  s3:ObjectRestore:*     ← Glacier restore start/complete
  s3:Replication:*       ← replication events

EventBridge vs Native S3 notifications: Native (Lambda/SNS/SQS): simpler, lower latency EventBridge: richer filtering, multiple targets, event archival and replay For new architectures, prefer EventBridge.


17. S3 Select / Glacier Select

Query data inside an object without downloading the whole file:

File: sales.csv (10 GB CSV in S3)

Without S3 Select:
  Download all 10 GB → filter in your app

With S3 Select:
  SELECT * FROM S3Object WHERE year = '2026'
  → S3 filters server-side → returns ~10 MB
  → 99% less data transferred → faster + cheaper

Supported formats: CSV, JSON, Parquet (Parquet with columnar pushdown) Glacier Select: same concept for archived Glacier objects.


18. S3 Batch Operations

Run a single operation on billions of objects across buckets:

Operation Example
Copy objects Migrate between buckets or accounts
Replace ACLs Set uniform ACL across all objects
Add/replace tags Tag all objects in a bucket
Invoke Lambda Custom processing on each object
Restore from Glacier Bulk restore millions of archived objects
Encrypt objects Apply SSE-KMS to all objects retroactively
Workflow:
  1. Generate S3 Inventory report (list of target objects)
  2. Create Batch Operations job (operation + manifest)
  3. Review job summary (count, cost estimate)
  4. Run job → S3 processes each object
  5. Completion report saved to S3

19. Transfer Acceleration

Routes uploads/downloads through the nearest CloudFront Edge Location instead of going directly to the S3 Region:

User (London) → upload file → London Edge Location (fast, nearby)
                → AWS private backbone → S3 bucket (us-east-1)
                (faster than: User → public internet all the way to us-east-1)

Transfer Acceleration URL:
  https://my-bucket.s3-accelerate.amazonaws.com/key
When It Helps When It Doesn't
Users far from bucket Region Users in same Region as bucket
Large files over long distances Small files
Upload performance is bottleneck Download (CloudFront is better for reads)

20. Static Website Hosting

Requirements:
  1. Bucket name = domain name (mywebsite.com)
  2. Public access enabled (Block Public Access OFF)
  3. Bucket policy: allow s3:GetObject for Principal: "*"
  4. Index document: index.html
  5. Error document: error.html

Website endpoint format:
  http://my-bucket.s3-website-us-east-1.amazonaws.com

HTTPS: not supported natively on S3 website endpoint
  → Use CloudFront in front of S3 for HTTPS + custom domain

21. Requester Pays

Normally: bucket owner pays for storage AND data transfer. With Requester Pays enabled: the requester (downloader) pays for data transfer.

Use case: academic/research datasets, large public datasets
  → You make data available without paying for every download
  → Authenticated AWS users only (anonymous access not allowed)

22. VPC Endpoint for S3 (Gateway Endpoint)

Allows EC2/Lambda inside a VPC to access S3 without going through the internet:

Without VPC endpoint:
  EC2 (private subnet) → NAT Gateway → Internet → S3
  (NAT Gateway charges + internet traffic)

With VPC Gateway Endpoint:
  EC2 (private subnet) → VPC Endpoint → S3
  (free + stays on AWS backbone + no internet)

Configure via route table:
  Add route: pl-xxxxx (S3 prefix list) → vpce-xxxxx

Gateway Endpoints for S3 are free — no hourly charge, no data processing charge. Use them for any workload that reads/writes S3 from within a VPC.


23. Common Mistakes

❌ Wrong ✅ Correct
S3 has eventual consistency S3 has strong read-after-write consistency since Dec 2020
S3 "folders" are real directories Folders are just key prefixes — zero-byte visual placeholders
Versioning can be disabled once enabled Once enabled, can only be suspended — not disabled
Delete = permanent in versioned bucket Delete adds a delete marker — versions still exist
Block Public Access off = public bucket BPA off + no bucket policy = still private
SSE-C AWS stores your key SSE-C: AWS uses your key to encrypt then discards it immediately
Replication copies existing objects Replication only copies new objects after enabling
Glacier Bulk retrieval costs extra Bulk retrieval is free
Min duration for Standard-IA = 90 days Standard-IA minimum = 30 days; Glacier = 90 days; Deep Archive = 180 days
One performance limit per bucket Performance is per prefix — 5,500 GET/s per prefix
Multipart upload optional for all sizes Required for objects > 5 GB; recommended for > 100 MB
Object Lock can be enabled anytime Must enable at bucket creation — cannot enable on existing bucket

24. Interview Questions Checklist

  • S3 vs EBS vs EFS — when do you use each?
  • What is an object? What three components make it up?
  • What does "globally unique bucket name" mean technically?
  • Are S3 folders real? What is a prefix?
  • Strong consistency — what changed in Dec 2020?
  • What is multipart upload? When is it required vs recommended?
  • How does S3 performance scale — what is the unit? (per prefix)
  • Three access control layers in S3 — how do they stack?
  • Block Public Access — account-level vs bucket-level?
  • SSE-S3 vs SSE-KMS vs SSE-C — key difference?
  • What is envelope encryption in SSE-KMS?
  • Versioning states — three states, transitions between them?
  • Delete in versioned bucket — what actually happens?
  • MFA Delete — who can enable it? What does it protect?
  • Object Lock — Governance vs Compliance mode?
  • What is a legal hold? How is it different from retention period?
  • Pre-signed URL — what permissions does it inherit?
  • S3 Select — what problem does it solve?
  • Storage classes — retrieval time and minimum duration for each
  • Lifecycle transitions — which directions are allowed?
  • Intelligent-Tiering — how does it work and what's the monitoring fee?
  • Glacier Bulk retrieval — is it free? (Yes)
  • Replication — what does it NOT replicate by default? (deletes, existing objects)
  • S3 Batch Operations — what can it do?
  • VPC Gateway Endpoint for S3 — what does it cost? (free)
  • Transfer Acceleration — when does it help and when doesn't it?

Nectar