Amazon S3¶

1. What is S3?¶

Amazon S3 (Simple Storage Service) is an infinitely scalable, highly durable, object-based storage service accessible over HTTP/HTTPS. Every object gets a URL — that's why it's classified as a web service.

https://bucket-name.s3.region.amazonaws.com/folder/object.png

Durability: 99.999999999% (11 nines) — S3 automatically replicates data across a minimum of 3 AZs within a Region (General Purpose buckets).

2. Storage Model — Object vs Block vs File¶

Dimension	S3 (Object)	EBS (Block)	EFS (File)
Structure	Objects + metadata	Fixed-size blocks	Hierarchical filesystem
Mount as drive	❌	✅ (single EC2)	✅ (multi-EC2 via NFS)
OS install	❌	✅	❌
Access method	HTTP REST API	OS filesystem	NFS protocol
Max size	5 TB per object	Volume size	Unlimited
Use case	Files, backups, data lakes	Root volumes, DBs	Shared app data

Object = Data (bytes) + Metadata (key-value) + Key (unique identifier path)

Example:
  Key:      images/profile/user-123.jpg
  Data:     (binary image bytes)
  Metadata: Content-Type: image/jpeg
            ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249"
            x-amz-meta-uploaded-by: user-456   ← custom metadata

Strong read-after-write consistency (since December 2020): After a PUT or DELETE, all subsequent GETs immediately see the new state. No eventual consistency lag — this was a major S3 upgrade.

3. Buckets ⭐¶

Core Properties¶

Property	Detail
Name	Globally unique across all AWS accounts + regions
Scope	Regional — each bucket belongs to one Region
Default limit	100 per account (soft limit; can request up to 1,000)
Nesting	❌ No nested buckets — flat namespace only
Rename/move	❌ Immutable — cannot change name or Region after creation
Size	Unlimited total; unlimited object count
Object size	Min: 0 bytes; Max: 5 TB

Flat Namespace + Prefixes¶

S3 has no real folders — the / in a key is just a character delimiter:

Bucket: my-project
Keys:
  src/app.js                ← prefix "src/"
  src/utils/helper.js       ← prefix "src/utils/"
  README.md                 ← no prefix

"Create folder" in console = AWS creates a zero-byte object: src/
This is a visual convenience — not a real directory object.

Bucket Naming Rules¶

Lowercase letters, numbers, hyphens only; 3–63 characters
Cannot start or end with a hyphen
Cannot look like an IP address (192.168.5.4)
Static website: bucket name must exactly match domain name

4. Bucket Types¶

Type	Storage	AZ Count	Speed	Use Case
General Purpose (default)	Multi-AZ	≥3 AZs	Standard	All workloads
Directory (S3 Express One Zone)	Single AZ	1 AZ	~10× lower latency	ML training, real-time analytics, high-freq reads

Directory bucket naming format:
  bucket-base-name--az-id--x-s3
  Example: my-data--use1-az4--x-s3

Directory Buckets trade availability for speed. If the AZ goes down, data is unavailable. Never use for data you cannot afford to lose without a backup elsewhere.

5. Uploading Objects — Multipart Upload ⭐¶

Object Size	Method	Reason
< 100 MB	Single PUT	Simple, fast
100 MB – 5 GB	Recommended multipart	Better performance, resume on failure
> 5 GB	Required multipart	Single PUT limited to 5 GB

How Multipart Works¶

1. Initiate upload → get Upload ID
2. Split file into parts (min 5 MB each, max 10,000 parts)
3. Upload each part independently (parallelizable) → get ETag per part
4. Complete upload → S3 assembles parts into final object
(If a part fails → retry just that part, not the whole file)

CLI shortcut (handles multipart automatically):
  aws s3 cp large-file.iso s3://my-bucket/  --storage-class STANDARD

Lifecycle policy tip: Create a rule to abort incomplete multipart uploads after N days — orphaned parts cost storage without forming a complete object.

6. S3 Performance Limits ⭐¶

S3 scales per prefix (not per bucket):

Operation	Rate per Prefix
PUT / COPY / POST / DELETE	3,500 requests/sec
GET / HEAD	5,500 requests/sec

Prefix = everything in the key before the last "/"

Keys with different prefixes = independent performance limits:
  2026/jan/file.jpg  → prefix "2026/jan/" → 5,500 GET/s
  2026/feb/file.jpg  → prefix "2026/feb/" → 5,500 GET/s  (separate limit)
  2026/mar/file.jpg  → prefix "2026/mar/" → 5,500 GET/s  (separate limit)

Total: 3 prefixes × 5,500 = 16,500 GET/s for this bucket

Old advice (pre-2018): randomize key names to spread across partitions.
Current advice: just use logical prefixes — S3 auto-partitions them.

7. Access Control ⭐¶

S3 has three independent access control mechanisms — they stack:

Request allowed? = (IAM Policy allows) AND (Bucket Policy allows OR IAM allows) AND (Block Public Access doesn't block)

IAM Policy¶

Controls what an IAM identity (user/role) can do to S3 resources.

{
  "Effect": "Allow",
  "Action": ["s3:GetObject"],
  "Resource": "arn:aws:s3:::my-bucket/*"
}

Bucket Policy¶

Resource-based policy attached directly to a bucket — controls access from any principal (same account, cross-account, public):

{
  "Effect": "Allow",
  "Principal": "*",
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::my-website-bucket/*"
}

Scenario	Use
EC2/Lambda reads bucket (same account)	IAM role + IAM policy
Cross-account access	Bucket policy (specify other account ARN)
Public static website	Bucket policy with `"Principal": "*"`
Restrict to VPC	Bucket policy with `aws:SourceVpc` condition

Block Public Access ⭐¶

Four independent settings — default: ALL ON:

Setting	What it blocks
`BlockPublicAcls`	New public ACLs from being set
`IgnorePublicAcls`	Existing public ACLs (ignores them)
`BlockPublicPolicy`	New bucket policies granting public access
`RestrictPublicBuckets`	Existing policies that grant public access

Block Public Access can be set at the account level (applies to all buckets in the account) or bucket level (individual bucket). Account-level setting overrides bucket-level — you cannot make a bucket public if the account-level BPA is ON.

ACLs (Legacy — Avoid)¶

Per-object or per-bucket permissions. Disabled by default since 2023. AWS recommends keeping ACLs disabled — use bucket policies instead.

8. Encryption ⭐¶

Default encryption applies to ALL objects when no encryption is specified at upload. Since January 2023, SSE-S3 is applied by default to every bucket automatically.

SSE-S3 (Server-Side Encryption with S3 Managed Keys)¶

Upload → S3 encrypts with AES-256 → stores ciphertext + manages key
         You: do nothing — fully transparent
         Key rotation: automatic by AWS
         Cost: free

SSE-KMS (Server-Side Encryption with KMS Keys)¶

Upload → S3 calls KMS to generate Data Key → encrypts with Data Key
         KMS key (CMK) is audited in CloudTrail ← you can see who accessed what
         Cost: KMS API call charges apply
         Benefit: key rotation control, cross-account access, audit trail

Envelope encryption:

KMS CMK encrypts → Data Encryption Key (DEK)
DEK encrypts → actual S3 object data
(CMK never leaves KMS — only DEK is used by S3)

SSE-KMS has a KMS API call limit — for very high throughput (>1,000 req/s), use SSE-S3 or request a KMS quota increase.

SSE-C (Server-Side Encryption with Customer Provided Keys)¶

Upload: you provide the encryption key in the request header
S3: encrypts using your key → stores ciphertext → DISCARDS your key
Download: you MUST provide the same key again — AWS does not store it

Property	SSE-S3	SSE-KMS	SSE-C
Key managed by	AWS	AWS KMS (or you)	You
Audit trail	❌	✅ CloudTrail	❌
Key rotation	Automatic	Configurable	Manual
HTTPS required	No	No	Yes (mandatory)
Cost	Free	KMS charges	Free (you manage keys)

Client-Side Encryption¶

You encrypt before sending to S3 — S3 stores already-encrypted bytes. AWS never sees unencrypted data.

9. Versioning ⭐¶

States¶

Unversioned (default) → Versioning Enabled → Versioning Suspended
                                              (cannot go back to Unversioned)

Key Behaviors¶

Event	Behavior
Upload same key	New version created with new version ID; old version preserved
Upload before versioning	Version ID = `null`
Delete (no version ID specified)	Delete marker placed — object appears deleted but versions intact
Delete specific version ID	Permanently removes that version
Delete a delete marker	Restores the object (latest version becomes current)

Timeline:
  v1: report.pdf (version: ABCD, null if pre-versioning)
  v2: report.pdf (version: EFGH) ← latest
  DELETE (no version specified) → adds delete marker (version: IJKL)
  → GET report.pdf → 404 (but v1 and v2 still exist!)
  DELETE the delete marker → v2 becomes current again ✅

Versioning charges for all stored versions — use lifecycle policies to expire old versions or you accumulate large storage costs over time.

MFA Delete¶

Extra protection layer on top of versioning: - Requires MFA code to permanently delete a version or disable versioning - Must be enabled by the root account user - Prevents accidental or malicious permanent deletion

10. Object Lock (WORM) ⭐¶

Object Lock implements WORM (Write Once Read Many) — object cannot be modified or deleted during the retention period.

Requirement: versioning must be enabled
Enable: at bucket creation (cannot enable after)

Retention Modes¶

Mode	Who Can Override	Use Case
Governance	Users with `s3:BypassGovernanceRetention` permission	Internal compliance, testing
Compliance	Nobody — not even root	Regulatory mandates (SEC, FINRA, HIPAA)

Retention Period vs Legal Hold¶

Feature	Set by	Duration	Override
Retain Until Date	Policy/user	Fixed date	Governance: yes; Compliance: no
Legal Hold	User	Indefinite (until removed)	User with `s3:PutObjectLegalHold`

Legal hold = on/off switch independent of retention period
  Use case: litigation hold — freeze specific objects regardless of retention

11. Pre-Signed URLs ⭐¶

Generate a time-limited URL that grants temporary access to a private object:

Normal: private object → 403 Forbidden (no public access)
Pre-signed URL: private object → accessible for X hours via special URL

Format:
https://bucket.s3.amazonaws.com/file.pdf
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=...
  &X-Amz-Date=...
  &X-Amz-Expires=3600          ← valid for 1 hour
  &X-Amz-Signature=...

Generate via CLI:
  aws s3 presign s3://my-bucket/report.pdf --expires-in 3600

Property	Detail
Default expiry	1 hour (max 7 days for IAM roles; 7 days for IAM users)
Signed with	Credentials of whoever generates it
Permissions	Inherits the generator's permissions at time of use
Use case	Share private files, temporary download links, video streaming

If the generating IAM role is revoked, the pre-signed URL immediately stops working.

12. ARN / URI / URL Reference¶

Format	Example	Used For
ARN	`arn:aws:s3:::my-bucket`	IAM policies, resource identification
ARN (object)	`arn:aws:s3:::my-bucket/key`	IAM policies scoped to object
S3 URI	`s3://my-bucket/key`	AWS CLI, SDK, internal AWS reference
HTTPS URL	`https://my-bucket.s3.amazonaws.com/key`	Browser access, public URLs
Path-style URL (legacy)	`https://s3.amazonaws.com/my-bucket/key`	Legacy — being deprecated
Transfer Acceleration URL	`https://my-bucket.s3-accelerate.amazonaws.com/key`	Global uploads via edge

13. Storage Classes — Complete Reference ⭐¶

Class	Retrieval	Min Duration	Durability	Use Case
Standard	Instant	None	11 nines (3+ AZs)	Active data, frequent access
Intelligent-Tiering	Instant	None	11 nines	Unknown/changing access patterns
Standard-IA	Instant	30 days	11 nines	Backups, monthly access
One Zone-IA	Instant	30 days	Single AZ only	Re-creatable data, secondary backups
Glacier Instant Retrieval	Milliseconds	90 days	11 nines	Quarterly access, compliance archives
Glacier Flexible Retrieval	1 min–12 hrs	90 days	11 nines	Backups, disaster recovery
Glacier Deep Archive	12–48 hrs	180 days	11 nines	Legal holds, 7–10yr retention

Glacier Flexible Retrieval Options¶

Speed	Time	Cost
Expedited	1–5 minutes	Highest
Standard	3–5 hours	Medium
Bulk	5–12 hours	Free

Bulk is free — for large planned restorations where timing doesn't matter, always use Bulk retrieval.

Intelligent-Tiering — How It Works¶

Automatically moves objects between tiers based on access:
  Frequent Access tier        ← default (Standard pricing)
  Infrequent Access tier      ← after 30 days of no access
  Archive Instant Access      ← after 90 days
  Archive Access              ← after 90–730 days (optional, activate it)
  Deep Archive Access         ← after 180–730 days (optional)

Monitoring fee: ~$0.0025 per 1,000 objects/month
  (Not worth it for objects < 128 KB — AWS does not charge monitoring for those)

Use Intelligent-Tiering for data whose access patterns are unknown or change over time — especially large datasets like data lakes.

Cost Dimension (us-east-1, approximate)¶

Class	Storage $/GB/month	Retrieval fee
Standard	$0.023	None
Standard-IA	$0.0125	Per-GB charge
Glacier Instant	$0.004	Per-GB charge
Glacier Deep Archive	$0.00099	Per-GB charge + time

14. Lifecycle Policies ⭐¶

Automate transitions and deletions — no manual intervention:

Rule example: "logs/" prefix
  Day 0:   Upload → Standard
  Day 30:  Transition → Standard-IA
  Day 90:  Transition → Glacier Flexible Retrieval
  Day 365: Permanently delete

Rule example: versioning cleanup
  Noncurrent versions older than 30 days → delete
  Incomplete multipart uploads older than 7 days → abort

Transition Constraints¶

✅ Allowed transitions (downward only):
  Standard → IA → Glacier Instant → Glacier Flexible → Deep Archive

❌ NOT allowed:
  Any Glacier → Standard/IA  (need to restore first, then re-upload)
  IA → Standard               (manual action required)

Minimum object size for IA transitions: 128 KB
(Smaller objects are not transitioned — they're cheaper in Standard)

15. Replication ⭐¶

Types¶

Type	Scope	Use Case
SRR (Same-Region Replication)	Same Region, different bucket	Log aggregation, prod→test copy
CRR (Cross-Region Replication)	Different Region	DR, latency, compliance

Requirements & Behavior¶

Requirements:
  - Versioning enabled on BOTH source and destination
  - IAM role with s3:ReplicateObject permission
  - Cross-account: destination bucket policy must allow source account

Behavior:
  - Only replicates NEW objects after replication is enabled
  - Existing objects: use S3 Batch Operations to replicate retroactively [docs.aws.amazon](https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops.html)
  - Deletes NOT replicated by default (can enable delete marker replication)
  - Encrypted objects (SSE-KMS): must configure KMS key in destination region

Replication Time Control (RTC)¶

Optional paid add-on — guarantees 99.99% of objects replicated within 15 minutes (useful for compliance that requires near-real-time replication).

16. Event Notifications ⭐¶

Trigger downstream processing when objects change:

Destination	Use Case
Lambda	Process file immediately (thumbnail, transcode, validate)
SNS	Fan-out to multiple subscribers
SQS	Queue for async/batch processing
EventBridge	Advanced filtering, multiple targets, replay

Event types:
  s3:ObjectCreated:*     ← any upload method
  s3:ObjectRemoved:*     ← delete or delete marker
  s3:ObjectRestore:*     ← Glacier restore start/complete
  s3:Replication:*       ← replication events

EventBridge vs Native S3 notifications: Native (Lambda/SNS/SQS): simpler, lower latency EventBridge: richer filtering, multiple targets, event archival and replay For new architectures, prefer EventBridge.

17. S3 Select / Glacier Select¶

Query data inside an object without downloading the whole file:

File: sales.csv (10 GB CSV in S3)

Without S3 Select:
  Download all 10 GB → filter in your app

With S3 Select:
  SELECT * FROM S3Object WHERE year = '2026'
  → S3 filters server-side → returns ~10 MB
  → 99% less data transferred → faster + cheaper

Supported formats: CSV, JSON, Parquet (Parquet with columnar pushdown) Glacier Select: same concept for archived Glacier objects.

18. S3 Batch Operations¶

Run a single operation on billions of objects across buckets:

Operation	Example
Copy objects	Migrate between buckets or accounts
Replace ACLs	Set uniform ACL across all objects
Add/replace tags	Tag all objects in a bucket
Invoke Lambda	Custom processing on each object
Restore from Glacier	Bulk restore millions of archived objects
Encrypt objects	Apply SSE-KMS to all objects retroactively

Workflow:
  1. Generate S3 Inventory report (list of target objects)
  2. Create Batch Operations job (operation + manifest)
  3. Review job summary (count, cost estimate)
  4. Run job → S3 processes each object
  5. Completion report saved to S3

19. Transfer Acceleration¶

Routes uploads/downloads through the nearest CloudFront Edge Location instead of going directly to the S3 Region:

User (London) → upload file → London Edge Location (fast, nearby)
                → AWS private backbone → S3 bucket (us-east-1)
                (faster than: User → public internet all the way to us-east-1)

Transfer Acceleration URL:
  https://my-bucket.s3-accelerate.amazonaws.com/key

When It Helps	When It Doesn't
Users far from bucket Region	Users in same Region as bucket
Large files over long distances	Small files
Upload performance is bottleneck	Download (CloudFront is better for reads)

20. Static Website Hosting¶

Requirements:
  1. Bucket name = domain name (mywebsite.com)
  2. Public access enabled (Block Public Access OFF)
  3. Bucket policy: allow s3:GetObject for Principal: "*"
  4. Index document: index.html
  5. Error document: error.html

Website endpoint format:
  http://my-bucket.s3-website-us-east-1.amazonaws.com

HTTPS: not supported natively on S3 website endpoint
  → Use CloudFront in front of S3 for HTTPS + custom domain

21. Requester Pays¶

Normally: bucket owner pays for storage AND data transfer. With Requester Pays enabled: the requester (downloader) pays for data transfer.

Use case: academic/research datasets, large public datasets
  → You make data available without paying for every download
  → Authenticated AWS users only (anonymous access not allowed)

22. VPC Endpoint for S3 (Gateway Endpoint)¶

Allows EC2/Lambda inside a VPC to access S3 without going through the internet:

Without VPC endpoint:
  EC2 (private subnet) → NAT Gateway → Internet → S3
  (NAT Gateway charges + internet traffic)

With VPC Gateway Endpoint:
  EC2 (private subnet) → VPC Endpoint → S3
  (free + stays on AWS backbone + no internet)

Configure via route table:
  Add route: pl-xxxxx (S3 prefix list) → vpce-xxxxx

Gateway Endpoints for S3 are free — no hourly charge, no data processing charge. Use them for any workload that reads/writes S3 from within a VPC.

23. Common Mistakes¶

❌ Wrong	✅ Correct
S3 has eventual consistency	S3 has strong read-after-write consistency since Dec 2020
S3 "folders" are real directories	Folders are just key prefixes — zero-byte visual placeholders
Versioning can be disabled once enabled	Once enabled, can only be suspended — not disabled
Delete = permanent in versioned bucket	Delete adds a delete marker — versions still exist
Block Public Access off = public bucket	BPA off + no bucket policy = still private
SSE-C AWS stores your key	SSE-C: AWS uses your key to encrypt then discards it immediately
Replication copies existing objects	Replication only copies new objects after enabling
Glacier Bulk retrieval costs extra	Bulk retrieval is free
Min duration for Standard-IA = 90 days	Standard-IA minimum = 30 days; Glacier = 90 days; Deep Archive = 180 days
One performance limit per bucket	Performance is per prefix — 5,500 GET/s per prefix
Multipart upload optional for all sizes	Required for objects > 5 GB; recommended for > 100 MB
Object Lock can be enabled anytime	Must enable at bucket creation — cannot enable on existing bucket