AWS EFS & FSx

1. Where These Fit — Full AWS Storage Picture

EBS   → Block storage  → 1 EC2 only (mostly)     → hard disk attached to one server
S3    → Object storage → HTTP API, any client    → unlimited cloud bucket
EFS   → File storage   → many Linux EC2s at once → NFS shared network drive
FSx   → File storage   → specialist file systems → NFS/SMB/Lustre/ZFS for specific stacks

The single question that determines which to use:

Do multiple servers need to READ AND WRITE the same files simultaneously?
  NO  → EBS (block) or S3 (objects)
  YES → EFS or FSx (shared file system)

  YES + Linux workloads, simple NFS → EFS
  YES + Windows / HPC / enterprise NAS → FSx

Part 1 — Amazon EFS (Elastic File System)

---All exact performance specs pulled directly from official AWS documentation. Writing the complete file now.


````markdown

AWS EFS & FSx — Complete Reference


1. Where These Fit — Full AWS Storage Picture

EBS → Block storage → 1 EC2 only (mostly) → hard disk attached to one server S3 → Object storage → HTTP API, any client → unlimited cloud bucket EFS → File storage → many Linux EC2s at once → NFS shared network drive FSx → File storage → specialist file systems → NFS/SMB/Lustre/ZFS for specific stacks

The single question that determines which to use:

``` Do multiple servers need to READ AND WRITE the same files simultaneously? NO → EBS (block) or S3 (objects) YES → EFS or FSx (shared file system)

YES + Linux workloads, simple NFS → EFS YES + Windows / HPC / enterprise NAS → FSx ```


Part 1 — Amazon EFS (Elastic File System)


2. What is EFS?

Amazon EFS is a fully managed, serverless, elastic NFS (Network File System) for Linux. It scales capacity automatically — you never provision storage size. Multiple EC2 instances, containers (ECS/EKS), and Lambda functions across multiple AZs can mount and access the same file system simultaneously.

AZ-1 AZ-2 AZ-3 ┌──────────┐ ┌──────────┐ ┌──────────┐ │ EC2 #1 │ │ EC2 #2 │ │ EC2 #3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ┌────▼─────────────────▼─────────────────▼─────┐ │ EFS File System │ │ (Mount Targets in each AZ's subnet) │ └───────────────────────────────────────────────┘

Property Value
Protocol NFSv4.0 / NFSv4.1
OS support Linux only (not Windows)
Capacity Elastic — grows/shrinks automatically
Availability Regional (3+ AZs) or One Zone (single AZ)
Durability 99.999999999% (11 nines) — Regional
Concurrent access Thousands of instances simultaneously

3. EFS File System Types

Regional (Multi-AZ) — Default

Data stored redundantly across 3+ AZs Mount target created in each AZ's subnet If one AZ fails → instances in other AZs continue unaffected Use: production workloads requiring high availability

One Zone (Single-AZ)

Data stored in a single AZ Lower cost (~47% cheaper than Regional Standard) Mount target in one subnet only Use: dev/test, non-critical data, data you can recreate Risk: AZ failure = data unavailable (or lost if AZ is permanently destroyed)


4. EFS Storage Classes ⭐

EFS automatically moves files between storage classes based on access patterns:

Storage Class Latency Cost For
EFS Standard ~1ms read / ~2.7ms write Highest Frequently accessed files
EFS Standard-IA (Infrequent Access) Tens of ms ~92% lower storage cost Rarely accessed files
EFS Archive Tens of ms Lowest Files accessed a few times per year
One Zone ~1ms read / ~1.6ms write ~47% less than Regional Single-AZ, frequently accessed
One Zone-IA Tens of ms Lowest overall Single-AZ, infrequently accessed

Intelligent Tiering — Lifecycle Management

``` Enable lifecycle policy → EFS automatically transitions files:

After 30 days no access → Standard → Standard-IA After 90 days no access → Standard-IA → Archive

First access after transition → file moves back to Standard (configurable)

Similar to S3 Intelligent-Tiering but for file systems. ```

A retrieval fee applies when reading from IA/Archive classes. For files accessed frequently, keep them in Standard to avoid per-read charges.


5. Performance Modes

General Purpose (Default — Always Use This)

``` Lowest per-operation latency Supports all throughput modes One Zone file systems always use General Purpose

Recommended for: 99%+ of workloads ```

Max I/O (Legacy — Avoid)

``` Designed for highly parallelized workloads BUT: higher per-operation latency than General Purpose NOT supported for: One Zone file systems or Elastic throughput

AWS Recommendation: "Due to higher per-operation latencies with Max I/O, we recommend using General Purpose performance mode for all file systems."

Monitor PercentIOLimit CloudWatch metric — if consistently near 100%, switch to Elastic throughput instead of Max I/O mode. ```


6. Throughput Modes ⭐

Throughput mode controls how much throughput your file system can drive:

``` Automatically scales throughput up and down with your workload No capacity planning needed — you pay per GB read/written

Best for: Spiky or unpredictable workloads Average-to-peak ratio of 5% or less New file systems where patterns are unknown

Performance (Regional + Elastic + General Purpose): Read latency: ~1 ms Write latency: ~2.7 ms Max read IOPS: 900,000–2,500,000 Max write IOPS: 500,000 Max read throughput (per file system): 20–60 GiBps Max write throughput (per file system): 1–5 GiBps Max per-client: 1,500 MiBps (with amazon-efs-utils v2.0+) ```

Provisioned Throughput

``` You specify a fixed throughput level regardless of file system size You pay for provisioned amount above baseline

Best for: Known, steady workloads Average-to-peak ratio of 5% or more

Performance (Regional + Provisioned): Max read IOPS: 55,000 Max write IOPS: 25,000 Max read throughput: 3–10 GiBps Max write throughput: 1–3.33 GiBps Max per-client: 500 MiBps

Note: after switching to Provisioned or changing Provisioned amount, must wait 24 hours before switching back to Elastic/Bursting. ```

Bursting Throughput

``` Throughput scales proportionally to storage size in Standard class Accumulates burst credits when idle → spends credits when busy

Baseline: 50 KiBps per GiB of Standard storage Burst: 100 MiBps per TiB of Standard storage

Example (100 GiB Standard storage): Baseline: 5 MiBps continuous write Burst: 100 MiBps write for 72 minutes/day (on full credit balance)

Example (1 TiB Standard storage): Baseline: 50 MiBps write Burst: 100 MiBps write for 12 hours/day

Performance (Regional + Bursting): Max read IOPS: 35,000 Max write IOPS: 7,000 Max read throughput: 3–5 GiBps Max write throughput: 1–3 GiBps

Best for: workloads with long quiet periods followed by bursts Avoid: if throughput is consistently high (credits will be exhausted) ```

Throughput Mode Comparison

Mode Scales With Best For Pricing
Elastic Workload automatically Spiky, unpredictable Per GB read/written
Provisioned Your specification Steady, known patterns Per MiBps provisioned
Bursting Storage size + credits Large files, infrequent bursts Included in storage cost

7. Mounting EFS on Linux

```bash

Install EFS mount helper (amazon-efs-utils)

sudo yum install -y amazon-efs-utils # Amazon Linux / RHEL sudo apt-get install -y amazon-efs-utils # Ubuntu / Debian

Mount using EFS mount helper (recommended — handles TLS + retries)

sudo mount -t efs fs-12345678:/ /mnt/efs

Mount with TLS encryption in transit

sudo mount -t efs -o tls fs-12345678:/ /mnt/efs

Mount using NFS directly (alternative)

sudo mount -t nfs4 \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-12345678.efs.us-east-1.amazonaws.com:/ /mnt/efs

Auto-mount on boot — add to /etc/fstab:

fs-12345678:/ /mnt/efs efs defaults,_netdev 0 0 ```

For EKS (Kubernetes): Use the aws-efs-csi-driver — creates PersistentVolume backed by EFS; multiple pods read/write simultaneously using ReadWriteMany access mode (not possible with EBS).


8. EFS Access Points

Access points enforce a specific directory, POSIX user/group, and file permissions for application access:

``` EFS Root: / ├── /app1 ← Access Point A (uid:1001, gid:1001, root path /app1) ├── /app2 ← Access Point B (uid:1002, gid:1002, root path /app2) └── /logs ← Access Point C (uid:1000, gid:1000, root path /logs)

App1 mounts via Access Point A → sees only /app1, cannot access /app2 App2 mounts via Access Point B → sees only /app2

Benefit: multi-tenant isolation on one EFS file system Use case: Lambda functions, containerized apps needing scoped access ```


9. EFS Security

Layer Mechanism
Network access Mount targets in VPC subnets; Security Groups control port 2049 (NFS)
Identity IAM policies + EFS resource policy
Encryption at rest KMS-managed keys (enable at creation — cannot change later)
Encryption in transit TLS 1.2+ via EFS mount helper (-o tls)
POSIX permissions Standard Linux file/directory permissions (uid/gid)
Access Points Application-level isolation

10. EFS Use Cases

Use Case Why EFS
Kubernetes persistent storage ReadWriteMany — multiple pods share same volume
WordPress / CMS media files Multiple web servers need same uploaded images
CI/CD build artifacts Multiple build agents share workspace
Machine learning training data Multiple training instances read same dataset
Home directories Each user gets their own directory on shared EFS
Container storage (ECS/EKS) Tasks share a filesystem across AZs

Part 2 — Amazon FSx


11. What is FSx?

Amazon FSx provides fully managed, third-party file systems — you get the exact file system you're familiar with (Lustre, ONTAP, ZFS, Windows), managed by AWS. Choose FSx when your workload requires a specific file system that EFS (NFS-only, Linux-only) cannot serve.

Four FSx file systems: FSx for Windows File Server → Windows SMB workloads FSx for Lustre → HPC, ML, high-throughput Linux FSx for NetApp ONTAP → Enterprise multi-protocol NAS FSx for OpenZFS → ZFS Linux workloads, low latency


12. FSx for Windows File Server ⭐

What It Is

Fully managed Windows file system backed by Windows Server with full SMB (Server Message Block) protocol support and Active Directory integration.

Protocol: SMB 2.0, 2.1, 3.0, 3.1.1 Clients: Windows, Linux (via CIFS), macOS Auth: Microsoft Active Directory (AWS Managed AD or self-managed)

Key Features

Feature Detail
Active Directory Native integration — users log in with Windows credentials
NTFS permissions Full Windows ACL support
DFS Namespaces Distribute files across multiple FSx file systems
Shadow Copies Previous versions — users self-restore files
SMB Multichannel Multiple network connections for higher throughput
Deployment Single-AZ or Multi-AZ (99.99% availability SLA)
Max throughput 12–20 GB/s per file system
Max file system 64 TiB
Latency < 1 ms
Storage SSD (low latency) or HDD (cost-optimized)

When to Use

  • Lift-and-shift Windows applications to AWS
  • .NET apps needing Windows file shares
  • SQL Server home directory, user profiles
  • Any workload requiring Windows ACLs or Active Directory

13. FSx for Lustre ⭐

What It Is

Fully managed Lustre — the world's most popular high-performance parallel file system, used in the largest supercomputers and ML clusters. Linux-only, extremely high throughput.

Protocol: Custom POSIX-compliant (Lustre protocol) Clients: Linux only Auth: POSIX permissions

Performance

Metric Value
Max throughput per file system 1,000 GB/s
Max per-client throughput 150 GB/s
Max IOPS Millions
Latency < 1 ms

FSx for Lustre throughput (1,000 GB/s) is the highest of any FSx file system — 10–70× higher than the others. Built specifically for data-intensive workloads.

Deployment Types

``` Scratch (Temporary): No replication within AZ Data NOT preserved if file server fails Higher burst throughput Use: short-term processing, cost-sensitive HPC

Persistent (Long-term): Data replicated within single AZ File server failures are auto-recovered Use: long-running workloads, ML training runs ```

S3 Integration ⭐

``` FSx for Lustre can be linked to an S3 bucket: Import: data in S3 automatically imported to Lustre on first access (lazy loading) Export: results written back to S3 automatically

Pattern for ML training: Training data in S3 (cheap, durable) → Link to FSx for Lustre (high-speed scratch during training) → Model output exported back to S3 → Delete FSx after training (pay only during training job) ```

When to Use

  • Machine learning training on large datasets
  • High-performance computing (genomics, financial simulations, weather modeling)
  • Video rendering and transcoding
  • Seismic data processing

14. FSx for NetApp ONTAP ⭐

What It Is

Fully managed NetApp ONTAP — the most feature-rich FSx option, supporting multiple protocols simultaneously.

Protocols: NFS (3, 4.0, 4.1, 4.2) + SMB (2.0–3.1.1) + iSCSI (block storage) Clients: Windows, Linux, macOS — simultaneously

Performance

Metric Value
Max throughput per file system 72–80 GB/s
Max per-client throughput 18 GB/s
Max IOPS Millions
Max file system size Virtually unlimited (10s of PBs)
Latency < 1 ms

Unique Capabilities

Feature Description
Multi-protocol Same data accessed via NFS (Linux) AND SMB (Windows) simultaneously
FlexClone Instant zero-copy clones of volumes (no data duplication)
SnapMirror Cross-region replication to on-premises NetApp or another FSx
Auto-tiering Hot data on SSD, cold data automatically moved to cheaper storage tier
Data deduplication Removes duplicate blocks — reduces storage consumption
iSCSI Block storage accessible as SAN (Storage Area Network)
Anti-virus integration Native virus scanning support
Deployment Single-AZ (99.9%) or Multi-AZ (99.99%)
On-prem caching NetApp FlexCache — cache AWS data on-premises

When to Use

  • Lift-and-shift existing NetApp ONTAP NAS to AWS
  • Multi-protocol workloads (Windows + Linux accessing same files)
  • Enterprise NAS migration
  • Complex data management (cloning, replication, tiering)
  • Any workload where you're already using NetApp on-premises

15. FSx for OpenZFS ⭐

What It Is

Fully managed OpenZFS — a Linux-native file system known for data integrity, inline compression, and the lowest latency of any FSx option.

Protocol: NFS (3, 4.0, 4.1, 4.2) Clients: Windows, Linux, macOS

Performance

Metric Value
Latency < 0.5 ms (lowest of all FSx types)
Max throughput per file system 10–21 GB/s
Max per-client throughput 10 GB/s
Max IOPS 1–2 million
Max file system size 512 TiB

Key Features

Feature Description
Instant snapshots Point-in-time snapshots, space-efficient
FlexClone-equivalent Instant zero-copy clones
Inline compression Reduces storage cost automatically
Deployment Single-AZ (99.5%) or Multi-AZ (99.99%)
Cross-region backups ✅ Supported
End-user restore Users can restore previous file versions

When to Use

  • Lift-and-shift ZFS workloads to AWS
  • Linux-based file servers needing low latency
  • Development/test environments needing instant cloning
  • Any workload needing sub-millisecond NFS latency

16. FSx — Full Comparison Table

Feature Windows FS Lustre NetApp ONTAP OpenZFS
Protocol SMB Lustre (custom) NFS + SMB + iSCSI NFS
OS clients Win, Linux, Mac Linux only Win, Linux, Mac Win, Linux, Mac
Max throughput 12–20 GB/s 1,000 GB/s 72–80 GB/s 10–21 GB/s
Latency < 1 ms < 1 ms < 1 ms < 0.5 ms
Max IOPS Hundreds of thousands Millions Millions 1–2 million
Max size 64 TiB Multiple PBs Virtually unlimited 512 TiB
Multi-AZ SLA 99.99% ❌ (Single-AZ) 99.99% 99.99%
Active Directory
S3 integration ✅ (auto import/export)
Data deduplication
Instant snapshots
Cross-region replication ✅ (via S3) ✅ (SnapMirror)
Use case Windows apps ML/HPC Enterprise NAS ZFS/Linux

17. EFS vs FSx — When to Use Which ⭐

If you need... Use
Shared Linux filesystem, elastic, simple EFS
Multiple pods in Kubernetes sharing storage EFS (ReadWriteMany)
Windows applications, SMB, Active Directory FSx for Windows
ML training, HPC, highest possible throughput FSx for Lustre
S3 as dataset, fast processing, export results FSx for Lustre
Migrate existing NetApp ONTAP NAS to AWS FSx for NetApp ONTAP
Windows AND Linux accessing same files FSx for NetApp ONTAP
Migrate ZFS workloads, sub-ms latency NFS FSx for OpenZFS
Dev/test cloning, snapshot-heavy workflows FSx for NetApp ONTAP or OpenZFS

18. EFS vs EBS vs S3 — Complete Storage Comparison

Feature EBS EFS S3
Type Block File (NFS) Object
Access 1 EC2 (mostly) Many EC2s simultaneously HTTP API
OS support Linux + Windows Linux only Any
Mount ✅ Block device ✅ NFS mount ❌ Not mountable
Elastic capacity ❌ (fixed size) ✅ Auto-grows/shrinks ✅ Unlimited
Multi-AZ ❌ (per AZ, unless io2 Multi-Attach) ✅ Regional ✅ (≥3 AZs)
Use case Root volume, DB Shared files, Kubernetes Backups, web assets, data lake
Max size 64 TiB per volume Unlimited Unlimited
Cost model GB provisioned GB stored + throughput GB stored + requests

19. Common Mistakes ✅

❌ Wrong ✅ Correct
EFS works on Windows EFS is Linux-only (NFS) — use FSx for Windows for Windows clients
EFS needs pre-provisioned storage size EFS is elastic — capacity grows/shrinks automatically
Max I/O mode is always better for heavy workloads AWS recommends General Purpose for everything — Max I/O has higher latency
Bursting throughput is the best mode Elastic is the recommended default — Bursting can exhaust credits
FSx for Lustre supports Windows clients FSx for Lustre is Linux-only — for Windows use FSx for Windows or ONTAP
FSx for Lustre data is always durable Scratch deployment has no replication — data can be lost on failure
EFS and EBS can

2. What is EFS?

Amazon EFS is a fully managed, serverless, elastic NFS (Network File System) for Linux. It scales capacity automatically — you never provision storage size. Multiple EC2 instances, containers (ECS/EKS), and Lambda functions across multiple AZs can mount and access the same file system simultaneously.

AZ-1 AZ-2 AZ-3 ┌──────────┐ ┌──────────┐ ┌──────────┐ │ EC2 #1 │ │ EC2 #2 │ │ EC2 #3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ┌────▼─────────────────▼─────────────────▼─────┐ │ EFS File System │ │ (Mount Targets in each AZ's subnet) │ └───────────────────────────────────────────────┘

Property Value
Protocol NFSv4.0 / NFSv4.1
OS support Linux only (not Windows)
Capacity Elastic — grows/shrinks automatically
Availability Regional (3+ AZs) or One Zone (single AZ)
Durability 99.999999999% (11 nines) — Regional
Concurrent access Thousands of instances simultaneously

3. EFS File System Types

Regional (Multi-AZ) — Default

Data stored redundantly across 3+ AZs Mount target created in each AZ's subnet If one AZ fails → instances in other AZs continue unaffected Use: production workloads requiring high availability

One Zone (Single-AZ)

Data stored in a single AZ Lower cost (~47% cheaper than Regional Standard) Mount target in one subnet only Use: dev/test, non-critical data, data you can recreate Risk: AZ failure = data unavailable (or lost if AZ is permanently destroyed)


4. EFS Storage Classes ⭐

EFS automatically moves files between storage classes based on access patterns:

Storage Class Latency Cost For
EFS Standard ~1ms read / ~2.7ms write Highest Frequently accessed files
EFS Standard-IA (Infrequent Access) Tens of ms ~92% lower storage cost Rarely accessed files
EFS Archive Tens of ms Lowest Files accessed a few times per year
One Zone ~1ms read / ~1.6ms write ~47% less than Regional Single-AZ, frequently accessed
One Zone-IA Tens of ms Lowest overall Single-AZ, infrequently accessed

Intelligent Tiering — Lifecycle Management

``` Enable lifecycle policy → EFS automatically transitions files:

After 30 days no access → Standard → Standard-IA After 90 days no access → Standard-IA → Archive

First access after transition → file moves back to Standard (configurable)

Similar to S3 Intelligent-Tiering but for file systems. ```

A retrieval fee applies when reading from IA/Archive classes. For files accessed frequently, keep them in Standard to avoid per-read charges.


5. Performance Modes

General Purpose (Default — Always Use This)

``` Lowest per-operation latency Supports all throughput modes One Zone file systems always use General Purpose

Recommended for: 99%+ of workloads ```

Max I/O (Legacy — Avoid)

``` Designed for highly parallelized workloads BUT: higher per-operation latency than General Purpose NOT supported for: One Zone file systems or Elastic throughput

AWS Recommendation: "Due to higher per-operation latencies with Max I/O, we recommend using General Purpose performance mode for all file systems."

Monitor PercentIOLimit CloudWatch metric — if consistently near 100%, switch to Elastic throughput instead of Max I/O mode. ```


6. Throughput Modes ⭐

Throughput mode controls how much throughput your file system can drive:

``` Automatically scales throughput up and down with your workload No capacity planning needed — you pay per GB read/written

Best for: Spiky or unpredictable workloads Average-to-peak ratio of 5% or less New file systems where patterns are unknown

Performance (Regional + Elastic + General Purpose): Read latency: ~1 ms Write latency: ~2.7 ms Max read IOPS: 900,000–2,500,000 Max write IOPS: 500,000 Max read throughput (per file system): 20–60 GiBps Max write throughput (per file system): 1–5 GiBps Max per-client: 1,500 MiBps (with amazon-efs-utils v2.0+) ```

Provisioned Throughput

``` You specify a fixed throughput level regardless of file system size You pay for provisioned amount above baseline

Best for: Known, steady workloads Average-to-peak ratio of 5% or more

Performance (Regional + Provisioned): Max read IOPS: 55,000 Max write IOPS: 25,000 Max read throughput: 3–10 GiBps Max write throughput: 1–3.33 GiBps Max per-client: 500 MiBps

Note: after switching to Provisioned or changing Provisioned amount, must wait 24 hours before switching back to Elastic/Bursting. ```

Bursting Throughput

``` Throughput scales proportionally to storage size in Standard class Accumulates burst credits when idle → spends credits when busy

Baseline: 50 KiBps per GiB of Standard storage Burst: 100 MiBps per TiB of Standard storage

Example (100 GiB Standard storage): Baseline: 5 MiBps continuous write Burst: 100 MiBps write for 72 minutes/day (on full credit balance)

Example (1 TiB Standard storage): Baseline: 50 MiBps write Burst: 100 MiBps write for 12 hours/day

Performance (Regional + Bursting): Max read IOPS: 35,000 Max write IOPS: 7,000 Max read throughput: 3–5 GiBps Max write throughput: 1–3 GiBps

Best for: workloads with long quiet periods followed by bursts Avoid: if throughput is consistently high (credits will be exhausted) ```

Throughput Mode Comparison

Mode Scales With Best For Pricing
Elastic Workload automatically Spiky, unpredictable Per GB read/written
Provisioned Your specification Steady, known patterns Per MiBps provisioned
Bursting Storage size + credits Large files, infrequent bursts Included in storage cost

7. Mounting EFS on Linux

```bash

Install EFS mount helper (amazon-efs-utils)

sudo yum install -y amazon-efs-utils # Amazon Linux / RHEL sudo apt-get install -y amazon-efs-utils # Ubuntu / Debian

Mount using EFS mount helper (recommended — handles TLS + retries)

sudo mount -t efs fs-12345678:/ /mnt/efs

Mount with TLS encryption in transit

sudo mount -t efs -o tls fs-12345678:/ /mnt/efs

Mount using NFS directly (alternative)

sudo mount -t nfs4 \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-12345678.efs.us-east-1.amazonaws.com:/ /mnt/efs

Auto-mount on boot — add to /etc/fstab:

fs-12345678:/ /mnt/efs efs defaults,_netdev 0 0 ```

For EKS (Kubernetes): Use the aws-efs-csi-driver — creates PersistentVolume backed by EFS; multiple pods read/write simultaneously using ReadWriteMany access mode (not possible with EBS).


8. EFS Access Points

Access points enforce a specific directory, POSIX user/group, and file permissions for application access:

``` EFS Root: / ├── /app1 ← Access Point A (uid:1001, gid:1001, root path /app1) ├── /app2 ← Access Point B (uid:1002, gid:1002, root path /app2) └── /logs ← Access Point C (uid:1000, gid:1000, root path /logs)

App1 mounts via Access Point A → sees only /app1, cannot access /app2 App2 mounts via Access Point B → sees only /app2

Benefit: multi-tenant isolation on one EFS file system Use case: Lambda functions, containerized apps needing scoped access ```


9. EFS Security

Layer Mechanism
Network access Mount targets in VPC subnets; Security Groups control port 2049 (NFS)
Identity IAM policies + EFS resource policy
Encryption at rest KMS-managed keys (enable at creation — cannot change later)
Encryption in transit TLS 1.2+ via EFS mount helper (-o tls)
POSIX permissions Standard Linux file/directory permissions (uid/gid)
Access Points Application-level isolation

10. EFS Use Cases

Use Case Why EFS
Kubernetes persistent storage ReadWriteMany — multiple pods share same volume
WordPress / CMS media files Multiple web servers need same uploaded images
CI/CD build artifacts Multiple build agents share workspace
Machine learning training data Multiple training instances read same dataset
Home directories Each user gets their own directory on shared EFS
Container storage (ECS/EKS) Tasks share a filesystem across AZs

Part 2 — Amazon FSx


11. What is FSx?

Amazon FSx provides fully managed, third-party file systems — you get the exact file system you're familiar with (Lustre, ONTAP, ZFS, Windows), managed by AWS. Choose FSx when your workload requires a specific file system that EFS (NFS-only, Linux-only) cannot serve.

Four FSx file systems: FSx for Windows File Server → Windows SMB workloads FSx for Lustre → HPC, ML, high-throughput Linux FSx for NetApp ONTAP → Enterprise multi-protocol NAS FSx for OpenZFS → ZFS Linux workloads, low latency


12. FSx for Windows File Server ⭐

What It Is

Fully managed Windows file system backed by Windows Server with full SMB (Server Message Block) protocol support and Active Directory integration.

Protocol: SMB 2.0, 2.1, 3.0, 3.1.1 Clients: Windows, Linux (via CIFS), macOS Auth: Microsoft Active Directory (AWS Managed AD or self-managed)

Key Features

Feature Detail
Active Directory Native integration — users log in with Windows credentials
NTFS permissions Full Windows ACL support
DFS Namespaces Distribute files across multiple FSx file systems
Shadow Copies Previous versions — users self-restore files
SMB Multichannel Multiple network connections for higher throughput
Deployment Single-AZ or Multi-AZ (99.99% availability SLA)
Max throughput 12–20 GB/s per file system
Max file system 64 TiB
Latency < 1 ms
Storage SSD (low latency) or HDD (cost-optimized)

When to Use

  • Lift-and-shift Windows applications to AWS
  • .NET apps needing Windows file shares
  • SQL Server home directory, user profiles
  • Any workload requiring Windows ACLs or Active Directory

13. FSx for Lustre ⭐

What It Is

Fully managed Lustre — the world's most popular high-performance parallel file system, used in the largest supercomputers and ML clusters. Linux-only, extremely high throughput.

Protocol: Custom POSIX-compliant (Lustre protocol) Clients: Linux only Auth: POSIX permissions

Performance

Metric Value
Max throughput per file system 1,000 GB/s
Max per-client throughput 150 GB/s
Max IOPS Millions
Latency < 1 ms

FSx for Lustre throughput (1,000 GB/s) is the highest of any FSx file system — 10–70× higher than the others. Built specifically for data-intensive workloads.

Deployment Types

``` Scratch (Temporary): No replication within AZ Data NOT preserved if file server fails Higher burst throughput Use: short-term processing, cost-sensitive HPC

Persistent (Long-term): Data replicated within single AZ File server failures are auto-recovered Use: long-running workloads, ML training runs ```

S3 Integration ⭐

``` FSx for Lustre can be linked to an S3 bucket: Import: data in S3 automatically imported to Lustre on first access (lazy loading) Export: results written back to S3 automatically

Pattern for ML training: Training data in S3 (cheap, durable) → Link to FSx for Lustre (high-speed scratch during training) → Model output exported back to S3 → Delete FSx after training (pay only during training job) ```

When to Use

  • Machine learning training on large datasets
  • High-performance computing (genomics, financial simulations, weather modeling)
  • Video rendering and transcoding
  • Seismic data processing

14. FSx for NetApp ONTAP ⭐

What It Is

Fully managed NetApp ONTAP — the most feature-rich FSx option, supporting multiple protocols simultaneously.

Protocols: NFS (3, 4.0, 4.1, 4.2) + SMB (2.0–3.1.1) + iSCSI (block storage) Clients: Windows, Linux, macOS — simultaneously

Performance

Metric Value
Max throughput per file system 72–80 GB/s
Max per-client throughput 18 GB/s
Max IOPS Millions
Max file system size Virtually unlimited (10s of PBs)
Latency < 1 ms

Unique Capabilities

Feature Description
Multi-protocol Same data accessed via NFS (Linux) AND SMB (Windows) simultaneously
FlexClone Instant zero-copy clones of volumes (no data duplication)
SnapMirror Cross-region replication to on-premises NetApp or another FSx
Auto-tiering Hot data on SSD, cold data automatically moved to cheaper storage tier
Data deduplication Removes duplicate blocks — reduces storage consumption
iSCSI Block storage accessible as SAN (Storage Area Network)
Anti-virus integration Native virus scanning support
Deployment Single-AZ (99.9%) or Multi-AZ (99.99%)
On-prem caching NetApp FlexCache — cache AWS data on-premises

When to Use

  • Lift-and-shift existing NetApp ONTAP NAS to AWS
  • Multi-protocol workloads (Windows + Linux accessing same files)
  • Enterprise NAS migration
  • Complex data management (cloning, replication, tiering)
  • Any workload where you're already using NetApp on-premises

15. FSx for OpenZFS ⭐

What It Is

Fully managed OpenZFS — a Linux-native file system known for data integrity, inline compression, and the lowest latency of any FSx option.

Protocol: NFS (3, 4.0, 4.1, 4.2) Clients: Windows, Linux, macOS

Performance

Metric Value
Latency < 0.5 ms (lowest of all FSx types)
Max throughput per file system 10–21 GB/s
Max per-client throughput 10 GB/s
Max IOPS 1–2 million
Max file system size 512 TiB

Key Features

Feature Description
Instant snapshots Point-in-time snapshots, space-efficient
FlexClone-equivalent Instant zero-copy clones
Inline compression Reduces storage cost automatically
Deployment Single-AZ (99.5%) or Multi-AZ (99.99%)
Cross-region backups ✅ Supported
End-user restore Users can restore previous file versions

When to Use

  • Lift-and-shift ZFS workloads to AWS
  • Linux-based file servers needing low latency
  • Development/test environments needing instant cloning
  • Any workload needing sub-millisecond NFS latency

16. FSx — Full Comparison Table

Feature Windows FS Lustre NetApp ONTAP OpenZFS
Protocol SMB Lustre (custom) NFS + SMB + iSCSI NFS
OS clients Win, Linux, Mac Linux only Win, Linux, Mac Win, Linux, Mac
Max throughput 12–20 GB/s 1,000 GB/s 72–80 GB/s 10–21 GB/s
Latency < 1 ms < 1 ms < 1 ms < 0.5 ms
Max IOPS Hundreds of thousands Millions Millions 1–2 million
Max size 64 TiB Multiple PBs Virtually unlimited 512 TiB
Multi-AZ SLA 99.99% ❌ (Single-AZ) 99.99% 99.99%
Active Directory
S3 integration ✅ (auto import/export)
Data deduplication
Instant snapshots
Cross-region replication ✅ (via S3) ✅ (SnapMirror)
Use case Windows apps ML/HPC Enterprise NAS ZFS/Linux

17. EFS vs FSx — When to Use Which ⭐

If you need... Use
Shared Linux filesystem, elastic, simple EFS
Multiple pods in Kubernetes sharing storage EFS (ReadWriteMany)
Windows applications, SMB, Active Directory FSx for Windows
ML training, HPC, highest possible throughput FSx for Lustre
S3 as dataset, fast processing, export results FSx for Lustre
Migrate existing NetApp ONTAP NAS to AWS FSx for NetApp ONTAP
Windows AND Linux accessing same files FSx for NetApp ONTAP
Migrate ZFS workloads, sub-ms latency NFS FSx for OpenZFS
Dev/test cloning, snapshot-heavy workflows FSx for NetApp ONTAP or OpenZFS

18. EFS vs EBS vs S3 — Complete Storage Comparison

Feature EBS EFS S3
Type Block File (NFS) Object
Access 1 EC2 (mostly) Many EC2s simultaneously HTTP API
OS support Linux + Windows Linux only Any
Mount ✅ Block device ✅ NFS mount ❌ Not mountable
Elastic capacity ❌ (fixed size) ✅ Auto-grows/shrinks ✅ Unlimited
Multi-AZ ❌ (per AZ, unless io2 Multi-Attach) ✅ Regional ✅ (≥3 AZs)
Use case Root volume, DB Shared files, Kubernetes Backups, web assets, data lake
Max size 64 TiB per volume Unlimited Unlimited
Cost model GB provisioned GB stored + throughput GB stored + requests

19. Common Mistakes

❌ Wrong ✅ Correct
EFS works on Windows EFS is Linux-only (NFS) — use FSx for Windows for Windows clients
EFS needs pre-provisioned storage size EFS is elastic — capacity grows/shrinks automatically
Max I/O mode is always better for heavy workloads AWS recommends General Purpose for everything — Max I/O has higher latency
Bursting throughput is the best mode Elastic is the recommended default — Bursting can exhaust credits
FSx for Lustre supports Windows clients FSx for Lustre is Linux-only — for Windows use FSx for Windows or ONTAP
FSx for Lustre data is always durable Scratch deployment has no replication — data can be lost on failure
EFS and EBS can