AWS EFS & FSx¶

1. Where These Fit — Full AWS Storage Picture¶

EBS   → Block storage  → 1 EC2 only (mostly)     → hard disk attached to one server
S3    → Object storage → HTTP API, any client    → unlimited cloud bucket
EFS   → File storage   → many Linux EC2s at once → NFS shared network drive
FSx   → File storage   → specialist file systems → NFS/SMB/Lustre/ZFS for specific stacks

The single question that determines which to use:

Do multiple servers need to READ AND WRITE the same files simultaneously?
  NO  → EBS (block) or S3 (objects)
  YES → EFS or FSx (shared file system)

  YES + Linux workloads, simple NFS → EFS
  YES + Windows / HPC / enterprise NAS → FSx

Part 1 — Amazon EFS (Elastic File System)¶

---All exact performance specs pulled directly from official AWS documentation. Writing the complete file now.

````markdown

AWS EFS & FSx — Complete Reference¶

1. Where These Fit — Full AWS Storage Picture¶

EBS → Block storage → 1 EC2 only (mostly) → hard disk attached to one server S3 → Object storage → HTTP API, any client → unlimited cloud bucket EFS → File storage → many Linux EC2s at once → NFS shared network drive FSx → File storage → specialist file systems → NFS/SMB/Lustre/ZFS for specific stacks

The single question that determines which to use:

``` Do multiple servers need to READ AND WRITE the same files simultaneously? NO → EBS (block) or S3 (objects) YES → EFS or FSx (shared file system)

YES + Linux workloads, simple NFS → EFS YES + Windows / HPC / enterprise NAS → FSx ```

Part 1 — Amazon EFS (Elastic File System)¶

2. What is EFS?¶

Amazon EFS is a fully managed, serverless, elastic NFS (Network File System) for Linux. It scales capacity automatically — you never provision storage size. Multiple EC2 instances, containers (ECS/EKS), and Lambda functions across multiple AZs can mount and access the same file system simultaneously.

AZ-1 AZ-2 AZ-3 ┌──────────┐ ┌──────────┐ ┌──────────┐ │ EC2 #1 │ │ EC2 #2 │ │ EC2 #3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ┌────▼─────────────────▼─────────────────▼─────┐ │ EFS File System │ │ (Mount Targets in each AZ's subnet) │ └───────────────────────────────────────────────┘

Property	Value
Protocol	NFSv4.0 / NFSv4.1
OS support	Linux only (not Windows)
Capacity	Elastic — grows/shrinks automatically
Availability	Regional (3+ AZs) or One Zone (single AZ)
Durability	99.999999999% (11 nines) — Regional
Concurrent access	Thousands of instances simultaneously

3. EFS File System Types¶

Regional (Multi-AZ) — Default¶

Data stored redundantly across 3+ AZs Mount target created in each AZ's subnet If one AZ fails → instances in other AZs continue unaffected Use: production workloads requiring high availability

One Zone (Single-AZ)¶

Data stored in a single AZ Lower cost (~47% cheaper than Regional Standard) Mount target in one subnet only Use: dev/test, non-critical data, data you can recreate Risk: AZ failure = data unavailable (or lost if AZ is permanently destroyed)

4. EFS Storage Classes ⭐¶

EFS automatically moves files between storage classes based on access patterns:

Storage Class	Latency	Cost	For
EFS Standard	~1ms read / ~2.7ms write	Highest	Frequently accessed files
EFS Standard-IA (Infrequent Access)	Tens of ms	~92% lower storage cost	Rarely accessed files
EFS Archive	Tens of ms	Lowest	Files accessed a few times per year
One Zone	~1ms read / ~1.6ms write	~47% less than Regional	Single-AZ, frequently accessed
One Zone-IA	Tens of ms	Lowest overall	Single-AZ, infrequently accessed

Intelligent Tiering — Lifecycle Management¶

``` Enable lifecycle policy → EFS automatically transitions files:

After 30 days no access → Standard → Standard-IA After 90 days no access → Standard-IA → Archive

First access after transition → file moves back to Standard (configurable)

Similar to S3 Intelligent-Tiering but for file systems. ```

A retrieval fee applies when reading from IA/Archive classes. For files accessed frequently, keep them in Standard to avoid per-read charges.

5. Performance Modes¶

General Purpose (Default — Always Use This)¶

``` Lowest per-operation latency Supports all throughput modes One Zone file systems always use General Purpose

Recommended for: 99%+ of workloads ```

Max I/O (Legacy — Avoid)¶

``` Designed for highly parallelized workloads BUT: higher per-operation latency than General Purpose NOT supported for: One Zone file systems or Elastic throughput

AWS Recommendation: "Due to higher per-operation latencies with Max I/O, we recommend using General Purpose performance mode for all file systems."

Monitor PercentIOLimit CloudWatch metric — if consistently near 100%, switch to Elastic throughput instead of Max I/O mode. ```

6. Throughput Modes ⭐¶

Throughput mode controls how much throughput your file system can drive:

Elastic Throughput (Recommended — Default)¶

``` Automatically scales throughput up and down with your workload No capacity planning needed — you pay per GB read/written

Best for: Spiky or unpredictable workloads Average-to-peak ratio of 5% or less New file systems where patterns are unknown

Performance (Regional + Elastic + General Purpose): Read latency: ~1 ms Write latency: ~2.7 ms Max read IOPS: 900,000–2,500,000 Max write IOPS: 500,000 Max read throughput (per file system): 20–60 GiBps Max write throughput (per file system): 1–5 GiBps Max per-client: 1,500 MiBps (with amazon-efs-utils v2.0+) ```

Provisioned Throughput¶

``` You specify a fixed throughput level regardless of file system size You pay for provisioned amount above baseline

Best for: Known, steady workloads Average-to-peak ratio of 5% or more

Performance (Regional + Provisioned): Max read IOPS: 55,000 Max write IOPS: 25,000 Max read throughput: 3–10 GiBps Max write throughput: 1–3.33 GiBps Max per-client: 500 MiBps

Note: after switching to Provisioned or changing Provisioned amount, must wait 24 hours before switching back to Elastic/Bursting. ```

Bursting Throughput¶

``` Throughput scales proportionally to storage size in Standard class Accumulates burst credits when idle → spends credits when busy

Baseline: 50 KiBps per GiB of Standard storage Burst: 100 MiBps per TiB of Standard storage

Example (100 GiB Standard storage): Baseline: 5 MiBps continuous write Burst: 100 MiBps write for 72 minutes/day (on full credit balance)

Example (1 TiB Standard storage): Baseline: 50 MiBps write Burst: 100 MiBps write for 12 hours/day

Performance (Regional + Bursting): Max read IOPS: 35,000 Max write IOPS: 7,000 Max read throughput: 3–5 GiBps Max write throughput: 1–3 GiBps

Best for: workloads with long quiet periods followed by bursts Avoid: if throughput is consistently high (credits will be exhausted) ```

Throughput Mode Comparison¶

Mode	Scales With	Best For	Pricing
Elastic	Workload automatically	Spiky, unpredictable	Per GB read/written
Provisioned	Your specification	Steady, known patterns	Per MiBps provisioned
Bursting	Storage size + credits	Large files, infrequent bursts	Included in storage cost

7. Mounting EFS on Linux¶

```bash

Install EFS mount helper (amazon-efs-utils)¶

sudo yum install -y amazon-efs-utils # Amazon Linux / RHEL sudo apt-get install -y amazon-efs-utils # Ubuntu / Debian

Mount using EFS mount helper (recommended — handles TLS + retries)¶

sudo mount -t efs fs-12345678:/ /mnt/efs

Mount with TLS encryption in transit¶

sudo mount -t efs -o tls fs-12345678:/ /mnt/efs

Mount using NFS directly (alternative)¶

sudo mount -t nfs4 \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-12345678.efs.us-east-1.amazonaws.com:/ /mnt/efs

Auto-mount on boot — add to /etc/fstab:¶

fs-12345678:/ /mnt/efs efs defaults,_netdev 0 0 ```

For EKS (Kubernetes): Use the aws-efs-csi-driver — creates PersistentVolume backed by EFS; multiple pods read/write simultaneously using ReadWriteMany access mode (not possible with EBS).

8. EFS Access Points¶

Access points enforce a specific directory, POSIX user/group, and file permissions for application access:

``` EFS Root: / ├── /app1 ← Access Point A (uid:1001, gid:1001, root path /app1) ├── /app2 ← Access Point B (uid:1002, gid:1002, root path /app2) └── /logs ← Access Point C (uid:1000, gid:1000, root path /logs)

App1 mounts via Access Point A → sees only /app1, cannot access /app2 App2 mounts via Access Point B → sees only /app2

Benefit: multi-tenant isolation on one EFS file system Use case: Lambda functions, containerized apps needing scoped access ```

9. EFS Security¶

Layer	Mechanism
Network access	Mount targets in VPC subnets; Security Groups control port 2049 (NFS)
Identity	IAM policies + EFS resource policy
Encryption at rest	KMS-managed keys (enable at creation — cannot change later)
Encryption in transit	TLS 1.2+ via EFS mount helper (`-o tls`)
POSIX permissions	Standard Linux file/directory permissions (uid/gid)
Access Points	Application-level isolation

10. EFS Use Cases¶

Use Case	Why EFS
Kubernetes persistent storage	`ReadWriteMany` — multiple pods share same volume
WordPress / CMS media files	Multiple web servers need same uploaded images
CI/CD build artifacts	Multiple build agents share workspace
Machine learning training data	Multiple training instances read same dataset
Home directories	Each user gets their own directory on shared EFS
Container storage (ECS/EKS)	Tasks share a filesystem across AZs

Part 2 — Amazon FSx¶

11. What is FSx?¶

Amazon FSx provides fully managed, third-party file systems — you get the exact file system you're familiar with (Lustre, ONTAP, ZFS, Windows), managed by AWS. Choose FSx when your workload requires a specific file system that EFS (NFS-only, Linux-only) cannot serve.

Four FSx file systems: FSx for Windows File Server → Windows SMB workloads FSx for Lustre → HPC, ML, high-throughput Linux FSx for NetApp ONTAP → Enterprise multi-protocol NAS FSx for OpenZFS → ZFS Linux workloads, low latency

12. FSx for Windows File Server ⭐¶

What It Is¶

Fully managed Windows file system backed by Windows Server with full SMB (Server Message Block) protocol support and Active Directory integration.

Protocol: SMB 2.0, 2.1, 3.0, 3.1.1 Clients: Windows, Linux (via CIFS), macOS Auth: Microsoft Active Directory (AWS Managed AD or self-managed)

Key Features¶

Feature	Detail
Active Directory	Native integration — users log in with Windows credentials
NTFS permissions	Full Windows ACL support
DFS Namespaces	Distribute files across multiple FSx file systems
Shadow Copies	Previous versions — users self-restore files
SMB Multichannel	Multiple network connections for higher throughput
Deployment	Single-AZ or Multi-AZ (99.99% availability SLA)
Max throughput	12–20 GB/s per file system
Max file system	64 TiB
Latency	< 1 ms
Storage	SSD (low latency) or HDD (cost-optimized)

When to Use¶

Lift-and-shift Windows applications to AWS
.NET apps needing Windows file shares
SQL Server home directory, user profiles
Any workload requiring Windows ACLs or Active Directory

13. FSx for Lustre ⭐¶

What It Is¶

Fully managed Lustre — the world's most popular high-performance parallel file system, used in the largest supercomputers and ML clusters. Linux-only, extremely high throughput.

Protocol: Custom POSIX-compliant (Lustre protocol) Clients: Linux only Auth: POSIX permissions

Performance¶

Metric	Value
Max throughput per file system	1,000 GB/s
Max per-client throughput	150 GB/s
Max IOPS	Millions
Latency	< 1 ms

FSx for Lustre throughput (1,000 GB/s) is the highest of any FSx file system — 10–70× higher than the others. Built specifically for data-intensive workloads.

Deployment Types¶

``` Scratch (Temporary): No replication within AZ Data NOT preserved if file server fails Higher burst throughput Use: short-term processing, cost-sensitive HPC

Persistent (Long-term): Data replicated within single AZ File server failures are auto-recovered Use: long-running workloads, ML training runs ```

S3 Integration ⭐¶

``` FSx for Lustre can be linked to an S3 bucket: Import: data in S3 automatically imported to Lustre on first access (lazy loading) Export: results written back to S3 automatically

Pattern for ML training: Training data in S3 (cheap, durable) → Link to FSx for Lustre (high-speed scratch during training) → Model output exported back to S3 → Delete FSx after training (pay only during training job) ```

When to Use¶

Machine learning training on large datasets
High-performance computing (genomics, financial simulations, weather modeling)
Video rendering and transcoding
Seismic data processing

14. FSx for NetApp ONTAP ⭐¶

What It Is¶

Fully managed NetApp ONTAP — the most feature-rich FSx option, supporting multiple protocols simultaneously.

Protocols: NFS (3, 4.0, 4.1, 4.2) + SMB (2.0–3.1.1) + iSCSI (block storage) Clients: Windows, Linux, macOS — simultaneously

Performance¶

Metric	Value
Max throughput per file system	72–80 GB/s
Max per-client throughput	18 GB/s
Max IOPS	Millions
Max file system size	Virtually unlimited (10s of PBs)
Latency	< 1 ms

Unique Capabilities¶

Feature	Description
Multi-protocol	Same data accessed via NFS (Linux) AND SMB (Windows) simultaneously
FlexClone	Instant zero-copy clones of volumes (no data duplication)
SnapMirror	Cross-region replication to on-premises NetApp or another FSx
Auto-tiering	Hot data on SSD, cold data automatically moved to cheaper storage tier
Data deduplication	Removes duplicate blocks — reduces storage consumption
iSCSI	Block storage accessible as SAN (Storage Area Network)
Anti-virus integration	Native virus scanning support
Deployment	Single-AZ (99.9%) or Multi-AZ (99.99%)
On-prem caching	NetApp FlexCache — cache AWS data on-premises

When to Use¶

Lift-and-shift existing NetApp ONTAP NAS to AWS
Multi-protocol workloads (Windows + Linux accessing same files)
Enterprise NAS migration
Complex data management (cloning, replication, tiering)
Any workload where you're already using NetApp on-premises

15. FSx for OpenZFS ⭐¶

What It Is¶

Fully managed OpenZFS — a Linux-native file system known for data integrity, inline compression, and the lowest latency of any FSx option.

Protocol: NFS (3, 4.0, 4.1, 4.2) Clients: Windows, Linux, macOS

Performance¶

Metric	Value
Latency	< 0.5 ms (lowest of all FSx types)
Max throughput per file system	10–21 GB/s
Max per-client throughput	10 GB/s
Max IOPS	1–2 million
Max file system size	512 TiB

Key Features¶

Feature	Description
Instant snapshots	Point-in-time snapshots, space-efficient
FlexClone-equivalent	Instant zero-copy clones
Inline compression	Reduces storage cost automatically
Deployment	Single-AZ (99.5%) or Multi-AZ (99.99%)
Cross-region backups	✅ Supported
End-user restore	Users can restore previous file versions

When to Use¶

Lift-and-shift ZFS workloads to AWS
Linux-based file servers needing low latency
Development/test environments needing instant cloning
Any workload needing sub-millisecond NFS latency

16. FSx — Full Comparison Table¶

Feature	Windows FS	Lustre	NetApp ONTAP	OpenZFS
Protocol	SMB	Lustre (custom)	NFS + SMB + iSCSI	NFS
OS clients	Win, Linux, Mac	Linux only	Win, Linux, Mac	Win, Linux, Mac
Max throughput	12–20 GB/s	1,000 GB/s	72–80 GB/s	10–21 GB/s
Latency	< 1 ms	< 1 ms	< 1 ms	< 0.5 ms
Max IOPS	Hundreds of thousands	Millions	Millions	1–2 million
Max size	64 TiB	Multiple PBs	Virtually unlimited	512 TiB
Multi-AZ SLA	99.99%	❌ (Single-AZ)	99.99%	99.99%
Active Directory	✅	❌	✅	❌
S3 integration	❌	✅ (auto import/export)	❌	❌
Data deduplication	✅	❌	✅	❌
Instant snapshots	✅	❌	✅	✅
Cross-region replication	✅	✅ (via S3)	✅ (SnapMirror)	✅
Use case	Windows apps	ML/HPC	Enterprise NAS	ZFS/Linux

17. EFS vs FSx — When to Use Which ⭐¶

If you need...	Use
Shared Linux filesystem, elastic, simple	EFS
Multiple pods in Kubernetes sharing storage	EFS (ReadWriteMany)
Windows applications, SMB, Active Directory	FSx for Windows
ML training, HPC, highest possible throughput	FSx for Lustre
S3 as dataset, fast processing, export results	FSx for Lustre
Migrate existing NetApp ONTAP NAS to AWS	FSx for NetApp ONTAP
Windows AND Linux accessing same files	FSx for NetApp ONTAP
Migrate ZFS workloads, sub-ms latency NFS	FSx for OpenZFS
Dev/test cloning, snapshot-heavy workflows	FSx for NetApp ONTAP or OpenZFS

18. EFS vs EBS vs S3 — Complete Storage Comparison¶

Feature	EBS	EFS	S3
Type	Block	File (NFS)	Object
Access	1 EC2 (mostly)	Many EC2s simultaneously	HTTP API
OS support	Linux + Windows	Linux only	Any
Mount	✅ Block device	✅ NFS mount	❌ Not mountable
Elastic capacity	❌ (fixed size)	✅ Auto-grows/shrinks	✅ Unlimited
Multi-AZ	❌ (per AZ, unless io2 Multi-Attach)	✅ Regional	✅ (≥3 AZs)
Use case	Root volume, DB	Shared files, Kubernetes	Backups, web assets, data lake
Max size	64 TiB per volume	Unlimited	Unlimited
Cost model	GB provisioned	GB stored + throughput	GB stored + requests

19. Common Mistakes ✅¶

❌ Wrong	✅ Correct
EFS works on Windows	EFS is Linux-only (NFS) — use FSx for Windows for Windows clients
EFS needs pre-provisioned storage size	EFS is elastic — capacity grows/shrinks automatically
Max I/O mode is always better for heavy workloads	AWS recommends General Purpose for everything — Max I/O has higher latency
Bursting throughput is the best mode	Elastic is the recommended default — Bursting can exhaust credits
FSx for Lustre supports Windows clients	FSx for Lustre is Linux-only — for Windows use FSx for Windows or ONTAP
FSx for Lustre data is always durable	Scratch deployment has no replication — data can be lost on failure
EFS and EBS can

2. What is EFS?¶

Amazon EFS is a fully managed, serverless, elastic NFS (Network File System) for Linux. It scales capacity automatically — you never provision storage size. Multiple EC2 instances, containers (ECS/EKS), and Lambda functions across multiple AZs can mount and access the same file system simultaneously.

AZ-1 AZ-2 AZ-3 ┌──────────┐ ┌──────────┐ ┌──────────┐ │ EC2 #1 │ │ EC2 #2 │ │ EC2 #3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ┌────▼─────────────────▼─────────────────▼─────┐ │ EFS File System │ │ (Mount Targets in each AZ's subnet) │ └───────────────────────────────────────────────┘

Property	Value
Protocol	NFSv4.0 / NFSv4.1
OS support	Linux only (not Windows)
Capacity	Elastic — grows/shrinks automatically
Availability	Regional (3+ AZs) or One Zone (single AZ)
Durability	99.999999999% (11 nines) — Regional
Concurrent access	Thousands of instances simultaneously

3. EFS File System Types¶

Regional (Multi-AZ) — Default¶

Data stored redundantly across 3+ AZs Mount target created in each AZ's subnet If one AZ fails → instances in other AZs continue unaffected Use: production workloads requiring high availability

One Zone (Single-AZ)¶

Data stored in a single AZ Lower cost (~47% cheaper than Regional Standard) Mount target in one subnet only Use: dev/test, non-critical data, data you can recreate Risk: AZ failure = data unavailable (or lost if AZ is permanently destroyed)

4. EFS Storage Classes ⭐¶

EFS automatically moves files between storage classes based on access patterns:

Storage Class	Latency	Cost	For
EFS Standard	~1ms read / ~2.7ms write	Highest	Frequently accessed files
EFS Standard-IA (Infrequent Access)	Tens of ms	~92% lower storage cost	Rarely accessed files
EFS Archive	Tens of ms	Lowest	Files accessed a few times per year
One Zone	~1ms read / ~1.6ms write	~47% less than Regional	Single-AZ, frequently accessed
One Zone-IA	Tens of ms	Lowest overall	Single-AZ, infrequently accessed

Intelligent Tiering — Lifecycle Management¶

``` Enable lifecycle policy → EFS automatically transitions files:

After 30 days no access → Standard → Standard-IA After 90 days no access → Standard-IA → Archive

First access after transition → file moves back to Standard (configurable)

Similar to S3 Intelligent-Tiering but for file systems. ```

A retrieval fee applies when reading from IA/Archive classes. For files accessed frequently, keep them in Standard to avoid per-read charges.

5. Performance Modes¶

General Purpose (Default — Always Use This)¶

``` Lowest per-operation latency Supports all throughput modes One Zone file systems always use General Purpose

Recommended for: 99%+ of workloads ```

Max I/O (Legacy — Avoid)¶

``` Designed for highly parallelized workloads BUT: higher per-operation latency than General Purpose NOT supported for: One Zone file systems or Elastic throughput

AWS Recommendation: "Due to higher per-operation latencies with Max I/O, we recommend using General Purpose performance mode for all file systems."

Monitor PercentIOLimit CloudWatch metric — if consistently near 100%, switch to Elastic throughput instead of Max I/O mode. ```

6. Throughput Modes ⭐¶

Throughput mode controls how much throughput your file system can drive:

Elastic Throughput (Recommended — Default)¶

``` Automatically scales throughput up and down with your workload No capacity planning needed — you pay per GB read/written

Best for: Spiky or unpredictable workloads Average-to-peak ratio of 5% or less New file systems where patterns are unknown

Performance (Regional + Elastic + General Purpose): Read latency: ~1 ms Write latency: ~2.7 ms Max read IOPS: 900,000–2,500,000 Max write IOPS: 500,000 Max read throughput (per file system): 20–60 GiBps Max write throughput (per file system): 1–5 GiBps Max per-client: 1,500 MiBps (with amazon-efs-utils v2.0+) ```

Provisioned Throughput¶

``` You specify a fixed throughput level regardless of file system size You pay for provisioned amount above baseline

Best for: Known, steady workloads Average-to-peak ratio of 5% or more

Performance (Regional + Provisioned): Max read IOPS: 55,000 Max write IOPS: 25,000 Max read throughput: 3–10 GiBps Max write throughput: 1–3.33 GiBps Max per-client: 500 MiBps

Note: after switching to Provisioned or changing Provisioned amount, must wait 24 hours before switching back to Elastic/Bursting. ```

Bursting Throughput¶

``` Throughput scales proportionally to storage size in Standard class Accumulates burst credits when idle → spends credits when busy

Baseline: 50 KiBps per GiB of Standard storage Burst: 100 MiBps per TiB of Standard storage

Example (100 GiB Standard storage): Baseline: 5 MiBps continuous write Burst: 100 MiBps write for 72 minutes/day (on full credit balance)

Example (1 TiB Standard storage): Baseline: 50 MiBps write Burst: 100 MiBps write for 12 hours/day

Performance (Regional + Bursting): Max read IOPS: 35,000 Max write IOPS: 7,000 Max read throughput: 3–5 GiBps Max write throughput: 1–3 GiBps

Best for: workloads with long quiet periods followed by bursts Avoid: if throughput is consistently high (credits will be exhausted) ```

Throughput Mode Comparison¶

Mode	Scales With	Best For	Pricing
Elastic	Workload automatically	Spiky, unpredictable	Per GB read/written
Provisioned	Your specification	Steady, known patterns	Per MiBps provisioned
Bursting	Storage size + credits	Large files, infrequent bursts	Included in storage cost

7. Mounting EFS on Linux¶

```bash

Install EFS mount helper (amazon-efs-utils)¶

sudo yum install -y amazon-efs-utils # Amazon Linux / RHEL sudo apt-get install -y amazon-efs-utils # Ubuntu / Debian

Mount using EFS mount helper (recommended — handles TLS + retries)¶

sudo mount -t efs fs-12345678:/ /mnt/efs

Mount with TLS encryption in transit¶

sudo mount -t efs -o tls fs-12345678:/ /mnt/efs

Mount using NFS directly (alternative)¶

sudo mount -t nfs4 \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-12345678.efs.us-east-1.amazonaws.com:/ /mnt/efs

Auto-mount on boot — add to /etc/fstab:¶

fs-12345678:/ /mnt/efs efs defaults,_netdev 0 0 ```

For EKS (Kubernetes): Use the aws-efs-csi-driver — creates PersistentVolume backed by EFS; multiple pods read/write simultaneously using ReadWriteMany access mode (not possible with EBS).

8. EFS Access Points¶

Access points enforce a specific directory, POSIX user/group, and file permissions for application access:

``` EFS Root: / ├── /app1 ← Access Point A (uid:1001, gid:1001, root path /app1) ├── /app2 ← Access Point B (uid:1002, gid:1002, root path /app2) └── /logs ← Access Point C (uid:1000, gid:1000, root path /logs)

App1 mounts via Access Point A → sees only /app1, cannot access /app2 App2 mounts via Access Point B → sees only /app2

Benefit: multi-tenant isolation on one EFS file system Use case: Lambda functions, containerized apps needing scoped access ```

9. EFS Security¶

Layer	Mechanism
Network access	Mount targets in VPC subnets; Security Groups control port 2049 (NFS)
Identity	IAM policies + EFS resource policy
Encryption at rest	KMS-managed keys (enable at creation — cannot change later)
Encryption in transit	TLS 1.2+ via EFS mount helper (`-o tls`)
POSIX permissions	Standard Linux file/directory permissions (uid/gid)
Access Points	Application-level isolation

10. EFS Use Cases¶

Use Case	Why EFS
Kubernetes persistent storage	`ReadWriteMany` — multiple pods share same volume
WordPress / CMS media files	Multiple web servers need same uploaded images
CI/CD build artifacts	Multiple build agents share workspace
Machine learning training data	Multiple training instances read same dataset
Home directories	Each user gets their own directory on shared EFS
Container storage (ECS/EKS)	Tasks share a filesystem across AZs

Part 2 — Amazon FSx¶

11. What is FSx?¶

Amazon FSx provides fully managed, third-party file systems — you get the exact file system you're familiar with (Lustre, ONTAP, ZFS, Windows), managed by AWS. Choose FSx when your workload requires a specific file system that EFS (NFS-only, Linux-only) cannot serve.

Four FSx file systems: FSx for Windows File Server → Windows SMB workloads FSx for Lustre → HPC, ML, high-throughput Linux FSx for NetApp ONTAP → Enterprise multi-protocol NAS FSx for OpenZFS → ZFS Linux workloads, low latency

12. FSx for Windows File Server ⭐¶

What It Is¶

Fully managed Windows file system backed by Windows Server with full SMB (Server Message Block) protocol support and Active Directory integration.

Protocol: SMB 2.0, 2.1, 3.0, 3.1.1 Clients: Windows, Linux (via CIFS), macOS Auth: Microsoft Active Directory (AWS Managed AD or self-managed)

Key Features¶

Feature	Detail
Active Directory	Native integration — users log in with Windows credentials
NTFS permissions	Full Windows ACL support
DFS Namespaces	Distribute files across multiple FSx file systems
Shadow Copies	Previous versions — users self-restore files
SMB Multichannel	Multiple network connections for higher throughput
Deployment	Single-AZ or Multi-AZ (99.99% availability SLA)
Max throughput	12–20 GB/s per file system
Max file system	64 TiB
Latency	< 1 ms
Storage	SSD (low latency) or HDD (cost-optimized)

When to Use¶

Lift-and-shift Windows applications to AWS
.NET apps needing Windows file shares
SQL Server home directory, user profiles
Any workload requiring Windows ACLs or Active Directory

13. FSx for Lustre ⭐¶

What It Is¶

Fully managed Lustre — the world's most popular high-performance parallel file system, used in the largest supercomputers and ML clusters. Linux-only, extremely high throughput.

Protocol: Custom POSIX-compliant (Lustre protocol) Clients: Linux only Auth: POSIX permissions

Performance¶

Metric	Value
Max throughput per file system	1,000 GB/s
Max per-client throughput	150 GB/s
Max IOPS	Millions
Latency	< 1 ms

FSx for Lustre throughput (1,000 GB/s) is the highest of any FSx file system — 10–70× higher than the others. Built specifically for data-intensive workloads.

Deployment Types¶

``` Scratch (Temporary): No replication within AZ Data NOT preserved if file server fails Higher burst throughput Use: short-term processing, cost-sensitive HPC

Persistent (Long-term): Data replicated within single AZ File server failures are auto-recovered Use: long-running workloads, ML training runs ```

S3 Integration ⭐¶

``` FSx for Lustre can be linked to an S3 bucket: Import: data in S3 automatically imported to Lustre on first access (lazy loading) Export: results written back to S3 automatically

Pattern for ML training: Training data in S3 (cheap, durable) → Link to FSx for Lustre (high-speed scratch during training) → Model output exported back to S3 → Delete FSx after training (pay only during training job) ```

When to Use¶

Machine learning training on large datasets
High-performance computing (genomics, financial simulations, weather modeling)
Video rendering and transcoding
Seismic data processing

14. FSx for NetApp ONTAP ⭐¶

What It Is¶

Fully managed NetApp ONTAP — the most feature-rich FSx option, supporting multiple protocols simultaneously.

Protocols: NFS (3, 4.0, 4.1, 4.2) + SMB (2.0–3.1.1) + iSCSI (block storage) Clients: Windows, Linux, macOS — simultaneously

Performance¶

Metric	Value
Max throughput per file system	72–80 GB/s
Max per-client throughput	18 GB/s
Max IOPS	Millions
Max file system size	Virtually unlimited (10s of PBs)
Latency	< 1 ms

Unique Capabilities¶

Feature	Description
Multi-protocol	Same data accessed via NFS (Linux) AND SMB (Windows) simultaneously
FlexClone	Instant zero-copy clones of volumes (no data duplication)
SnapMirror	Cross-region replication to on-premises NetApp or another FSx
Auto-tiering	Hot data on SSD, cold data automatically moved to cheaper storage tier
Data deduplication	Removes duplicate blocks — reduces storage consumption
iSCSI	Block storage accessible as SAN (Storage Area Network)
Anti-virus integration	Native virus scanning support
Deployment	Single-AZ (99.9%) or Multi-AZ (99.99%)
On-prem caching	NetApp FlexCache — cache AWS data on-premises

When to Use¶

Lift-and-shift existing NetApp ONTAP NAS to AWS
Multi-protocol workloads (Windows + Linux accessing same files)
Enterprise NAS migration
Complex data management (cloning, replication, tiering)
Any workload where you're already using NetApp on-premises

15. FSx for OpenZFS ⭐¶

What It Is¶

Fully managed OpenZFS — a Linux-native file system known for data integrity, inline compression, and the lowest latency of any FSx option.

Protocol: NFS (3, 4.0, 4.1, 4.2) Clients: Windows, Linux, macOS

Performance¶

Metric	Value
Latency	< 0.5 ms (lowest of all FSx types)
Max throughput per file system	10–21 GB/s
Max per-client throughput	10 GB/s
Max IOPS	1–2 million
Max file system size	512 TiB

Key Features¶

Feature	Description
Instant snapshots	Point-in-time snapshots, space-efficient
FlexClone-equivalent	Instant zero-copy clones
Inline compression	Reduces storage cost automatically
Deployment	Single-AZ (99.5%) or Multi-AZ (99.99%)
Cross-region backups	✅ Supported
End-user restore	Users can restore previous file versions

When to Use¶

Lift-and-shift ZFS workloads to AWS
Linux-based file servers needing low latency
Development/test environments needing instant cloning
Any workload needing sub-millisecond NFS latency

16. FSx — Full Comparison Table¶

Feature	Windows FS	Lustre	NetApp ONTAP	OpenZFS
Protocol	SMB	Lustre (custom)	NFS + SMB + iSCSI	NFS
OS clients	Win, Linux, Mac	Linux only	Win, Linux, Mac	Win, Linux, Mac
Max throughput	12–20 GB/s	1,000 GB/s	72–80 GB/s	10–21 GB/s
Latency	< 1 ms	< 1 ms	< 1 ms	< 0.5 ms
Max IOPS	Hundreds of thousands	Millions	Millions	1–2 million
Max size	64 TiB	Multiple PBs	Virtually unlimited	512 TiB
Multi-AZ SLA	99.99%	❌ (Single-AZ)	99.99%	99.99%
Active Directory	✅	❌	✅	❌
S3 integration	❌	✅ (auto import/export)	❌	❌
Data deduplication	✅	❌	✅	❌
Instant snapshots	✅	❌	✅	✅
Cross-region replication	✅	✅ (via S3)	✅ (SnapMirror)	✅
Use case	Windows apps	ML/HPC	Enterprise NAS	ZFS/Linux

17. EFS vs FSx — When to Use Which ⭐¶

If you need...	Use
Shared Linux filesystem, elastic, simple	EFS
Multiple pods in Kubernetes sharing storage	EFS (ReadWriteMany)
Windows applications, SMB, Active Directory	FSx for Windows
ML training, HPC, highest possible throughput	FSx for Lustre
S3 as dataset, fast processing, export results	FSx for Lustre
Migrate existing NetApp ONTAP NAS to AWS	FSx for NetApp ONTAP
Windows AND Linux accessing same files	FSx for NetApp ONTAP
Migrate ZFS workloads, sub-ms latency NFS	FSx for OpenZFS
Dev/test cloning, snapshot-heavy workflows	FSx for NetApp ONTAP or OpenZFS

18. EFS vs EBS vs S3 — Complete Storage Comparison¶

Feature	EBS	EFS	S3
Type	Block	File (NFS)	Object
Access	1 EC2 (mostly)	Many EC2s simultaneously	HTTP API
OS support	Linux + Windows	Linux only	Any
Mount	✅ Block device	✅ NFS mount	❌ Not mountable
Elastic capacity	❌ (fixed size)	✅ Auto-grows/shrinks	✅ Unlimited
Multi-AZ	❌ (per AZ, unless io2 Multi-Attach)	✅ Regional	✅ (≥3 AZs)
Use case	Root volume, DB	Shared files, Kubernetes	Backups, web assets, data lake
Max size	64 TiB per volume	Unlimited	Unlimited
Cost model	GB provisioned	GB stored + throughput	GB stored + requests

19. Common Mistakes¶

❌ Wrong	✅ Correct
EFS works on Windows	EFS is Linux-only (NFS) — use FSx for Windows for Windows clients
EFS needs pre-provisioned storage size	EFS is elastic — capacity grows/shrinks automatically
Max I/O mode is always better for heavy workloads	AWS recommends General Purpose for everything — Max I/O has higher latency
Bursting throughput is the best mode	Elastic is the recommended default — Bursting can exhaust credits
FSx for Lustre supports Windows clients	FSx for Lustre is Linux-only — for Windows use FSx for Windows or ONTAP
FSx for Lustre data is always durable	Scratch deployment has no replication — data can be lost on failure
EFS and EBS can