Kubernetes Persistent Volumes and Claims: A Comprehensive Guide¶
Background: Understanding Storage in Kubernetes¶
Before diving into Persistent Volumes (PVs) and Persistent Volume Claims (PVCs), let’s establish the foundational concepts that underpin storage management in Kubernetes. These are critical for understanding how PVs and PVCs function and why they are necessary.
What is Storage in Kubernetes?¶
Kubernetes is a container orchestration platform that manages compute resources (e.g., CPU, memory) and storage resources separately. While containers are ephemeral by design—meaning their local files are lost upon restart or crash—many applications, such as databases or file servers, require persistent storage that survives container lifecycles. Kubernetes addresses this through its storage subsystem, which provides abstractions to manage storage independently of compute resources.
Key Concepts to Understand¶
- Pods: Pods are the smallest deployable units in Kubernetes, running one or more containers. Containers within a Pod share resources, including storage, which is often provided via volumes.
- Volumes: As covered previously, Kubernetes volumes are directories accessible to containers in a Pod, enabling data sharing and persistence. Volumes can be ephemeral (e.g.,
emptyDir, deleted with the Pod) or persistent (e.g., backed by network storage, surviving Pod deletion). PVs and PVCs build on the volume concept to provide durable, cluster-wide storage. - StorageClasses: A StorageClass defines a storage profile (e.g., performance, provisioning method) that administrators configure. It allows dynamic provisioning of PVs, abstracting the underlying storage details from users.
- VolumeAttributesClasses (emerging concept): These extend StorageClasses to allow modification of volume attributes (e.g., performance) after creation, though they are less critical for this discussion.
Why Persistent Storage Matters¶
Unlike ephemeral volumes, which are tied to a Pod’s lifecycle, persistent storage ensures data durability across Pod restarts, deletions, or node failures. This is essential for stateful applications like databases (e.g., MySQL, PostgreSQL) or file storage systems. PVs and PVCs provide a robust framework to manage this persistent storage, decoupling storage provisioning from consumption.
With this background, students should understand that PVs and PVCs are Kubernetes resources designed to handle persistent storage needs, offering a user-friendly abstraction over complex storage backends. Now, let’s explore these resources in detail.
Introduction to Persistent Volumes and Claims¶
Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) form the core of Kubernetes’ persistent storage subsystem. They abstract the details of storage provisioning (how storage is provided) from storage consumption (how it is used), enabling users to request storage without needing to understand the underlying infrastructure.
- PersistentVolume (PV): A cluster-wide resource representing a piece of storage, provisioned either manually by an administrator (static provisioning) or automatically via a StorageClass (dynamic provisioning). PVs are independent of Pods, with lifecycles that persist beyond any individual Pod.
- PersistentVolumeClaim (PVC): A user’s request for storage, specifying requirements like size and access modes. PVCs act as a “claim” on PV resources, similar to how Pods consume node resources (e.g., CPU, memory).
Analogy: Think of PVs as available storage units in a warehouse (e.g., shelves with specific capacities and features). PVCs are like purchase orders from users, requesting a storage unit that meets their needs. Kubernetes matches the order (PVC) to an available unit (PV) or creates a new one if dynamic provisioning is enabled.
Lifecycle of Persistent Volumes and Claims¶
The interaction between PVs and PVCs follows a well-defined lifecycle, which is crucial for understanding their behavior:
- Provisioning:
- Static Provisioning: An administrator manually creates PVs with specific storage details (e.g., NFS share, iSCSI volume). These PVs are available in the cluster for users to claim.
-
Dynamic Provisioning: When a PVC requests a StorageClass and no matching PV exists, Kubernetes dynamically provisions a PV using the StorageClass’s provisioner. This requires the
DefaultStorageClassadmission controller to be enabled on the API server (configured via--enable-admission-plugins=DefaultStorageClass).- Explanation: Dynamic provisioning automates storage creation, reducing administrative overhead. For example, a cloud provider’s provisioner might create a new disk in AWS EBS or Google Cloud Persistent Disk.
- Note: PVCs requesting the empty string (
"") as their StorageClass disable dynamic provisioning, relying on static PVs.
-
Binding:
- A control loop in the Kubernetes control plane monitors new PVCs and attempts to bind them to a matching PV based on size, access modes, and StorageClass.
- If a PV is dynamically provisioned, it is automatically bound to the PVC.
- For static PVs, the PVC binds to a PV that meets or exceeds the request. If no matching PV exists, the PVC remains unbound until a suitable PV is available.
- Explanation: Binding is a one-to-one, exclusive mapping. The
ClaimReffield in the PV links it to the PVC, ensuring other PVCs cannot claim it. -
Example: A PVC requesting 10Gi with
ReadWriteOncewill not bind to a 5Gi PV but will bind to a 15Gi PV if available. -
Using:
- Once bound, a PVC can be used as a volume in a Pod. The Pod’s volume specification references the PVC, and Kubernetes mounts the underlying PV into the Pod’s containers.
- Users specify the desired access mode (e.g.,
ReadWriteOnce) in the Pod’s volume configuration. - Example:
apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - name: myfrontend image: nginx volumeMounts: - mountPath: "/var/www/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: myclaim -
Note: PVCs must be in the same namespace as the Pod using them.
-
Storage Object in Use Protection:
- Kubernetes prevents deletion of PVCs actively used by Pods and PVs bound to PVCs to avoid data loss.
- When a PVC or PV is marked for deletion (status:
Terminating), finalizers (e.g.,kubernetes.io/pvc-protection,kubernetes.io/pv-protection) delay removal until the resource is no longer in use. - Example:
kubectl describe pvc hostpath Name: hostpath Status: Terminating Finalizers: [kubernetes.io/pvc-protection] -
Explanation: This feature ensures data integrity by preventing accidental deletion during active use.
-
Reclaiming:
- When a PVC is deleted, the PV’s reclaim policy determines what happens to the storage:
- Retain: The PV persists, marked as “released,” with data intact. Administrators must manually delete the PV, clean the storage, and optionally reuse it.
- Steps:
- Delete the PV object.
- Clean the storage backend (e.g., remove files on an NFS share).
- Delete the storage asset or create a new PV to reuse it.
- Delete: Both the PV object and the underlying storage asset are deleted. This is the default for dynamically provisioned PVs unless the StorageClass specifies otherwise.
- Recycle (Deprecated): The PV is scrubbed (e.g.,
rm -rf /thevolume/*) and made available for a new claim. Not recommended; use dynamic provisioning instead.
- Example (Custom Recycler Pod for Recycle, deprecated):
apiVersion: v1 kind: Pod metadata: name: pv-recycler spec: restartPolicy: Never volumes: - name: vol hostPath: path: /any/path containers: - name: pv-recycler image: registry.k8s.io/busybox command: ["/bin/sh", "-c", "rm -rf /scrub/* && test -z \"$(ls -A /scrub)\" || exit 1"] volumeMounts: - name: vol mountPath: /scrub -
Note: The
Recyclepolicy requires a custom recycler Pod, but dynamic provisioning is preferred for modern clusters. -
PersistentVolume Deletion Protection (Stable, v1.33):
- For PVs with a
Deletereclaim policy, finalizers ensure the underlying storage is deleted before the PV object is removed:external-provisioner.volume.kubernetes.io/finalizer: Added to CSI volumes (dynamic and static) and dynamically provisioned in-tree volumes since v1.31.kubernetes.io/pv-controller: Added to dynamically provisioned in-tree volumes since v1.31.
- Example (CSI Volume):
kubectl describe pv pvc-2f0bab97 Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d Finalizers: [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer] Reclaim Policy: Delete - Explanation: This ensures storage cleanup, preventing orphaned resources, especially in cloud environments.
PersistentVolume (PV) Specification¶
A PV is defined by its spec and status, capturing the storage’s configuration and current state.
Key Fields¶
- capacity: Specifies the storage size (e.g.,
storage: 5Gi). Currently, only storage size is supported, but future attributes may include IOPS or throughput. - volumeMode (Stable, v1.18):
Filesystem(default): Mounts the volume as a directory with a filesystem (e.g., ext4, XFS).Block: Presents the volume as a raw block device without a filesystem, ideal for applications that manage their own storage (e.g., databases).- accessModes: Defines how the volume can be mounted:
ReadWriteOnce(RWO): Read-write by a single node; multiple Pods on the same node can access.ReadOnlyMany(ROX): Read-only by multiple nodes.ReadWriteMany(RWX): Read-write by multiple nodes.ReadWriteOncePod(RWOP, Stable, v1.29): Read-write by a single Pod, ensuring exclusive access cluster-wide. Requires CSI volumes and specific CSI sidecar versions (e.g.,csi-provisioner:v3.0.0+).- Note: Access modes describe capabilities, not enforced restrictions. For example, a
ReadOnlyManyPV is not guaranteed to be read-only unless configured as such by the storage backend. - storageClassName: Links the PV to a StorageClass. An empty or absent
storageClassNameindicates no class, limiting binding to PVCs without a class. - persistentVolumeReclaimPolicy: Defines reclamation behavior (
Retain,Delete,Recycle). - mountOptions: Specifies mount options (e.g.,
nfsvers=4.1) for supported volume types (e.g.,nfs,iscsi,csi). - nodeAffinity: Restricts the PV to specific nodes, mandatory for
localvolumes. - Source: Specifies the storage backend (e.g.,
nfs,csi,hostPath).
Example PV¶
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: slow
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /tmp
server: 172.17.0.2
Explanation: This PV provides 5Gi of NFS storage, mountable as read-write by one node, with a Delete reclaim policy. It requires the /sbin/mount.nfs helper program on nodes to mount the NFS share.
Phase¶
A PV’s status reflects its lifecycle phase: - Available: Unbound and ready for a PVC. - Bound: Bound to a PVC, with the PVC name visible via kubectl describe pv <name>. - Released: PVC deleted, but the PV is not yet reclaimed. - Failed: Reclamation failed. - Phase Transition Timestamp (Stable, v1.31): The status.lastPhaseTransitionTime field records when the PV last changed phases, aiding in debugging.
PersistentVolumeClaim (PVC) Specification¶
A PVC defines a user’s storage request, with spec and status fields.
Key Fields¶
- accessModes: Matches the PV’s access modes (e.g.,
ReadWriteOnce). - volumeMode:
FilesystemorBlock, matching the PV. - resources.requests.storage: Specifies the desired storage size (e.g.,
8Gi). - Note: For
Filesystemvolumes, this is the allocated size, which may be slightly reduced due to filesystem overhead (e.g., XFS metadata). - storageClassName: Requests a specific StorageClass or
""for no class. If unset, behavior depends on theDefaultStorageClassadmission plugin. - selector: Filters PVs by labels (e.g.,
matchLabels,matchExpressions). Cannot be used with dynamic provisioning. - volumeName: Explicitly binds to a named PV, bypassing matching criteria except for validation.
Example PVC¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: slow
selector:
matchLabels:
release: "stable"
Explanation: This PVC requests 8Gi of storage with ReadWriteOnce access, preferring PVs from the slow StorageClass with a release: stable label.
StorageClass and Default Behavior¶
- With
DefaultStorageClassAdmission Plugin: - A default StorageClass (marked with
storageclass.kubernetes.io/is-default-class: true) is applied to PVCs withoutstorageClassName. - PVCs with
storageClassName: ""only bind to PVs with no class. - Without
DefaultStorageClass: - PVCs without
storageClassNameor withstorageClassName: ""bind to PVs with no class. - Retroactive assignment (Stable, v1.28) updates PVCs without
storageClassNameto the default StorageClass when one becomes available, unlessstorageClassName: "". - Note: Avoid using the deprecated
volume.beta.kubernetes.io/storage-classannotation.
Reserving a PersistentVolume¶
To ensure a PVC binds to a specific PV, you can pre-bind them:
- PVC Specifies PV:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: foo-pvc namespace: foo spec: storageClassName: "" volumeName: foo-pv -
Binds to
foo-pvif it exists and is unbound, ignoring some matching criteria (e.g., node affinity). -
PV Reserves PVC:
apiVersion: v1 kind: PersistentVolume metadata: name: foo-pv spec: storageClassName: "" claimRef: name: foo-pvc namespace: foo - Reserves
foo-pvforfoo-pvc, preventing other PVCs from binding. - Useful for
Retainpolicy PVs or reusing existing storage.
Explanation: Pre-binding ensures predictable storage allocation, especially in environments with limited PVs.
Expanding PersistentVolumeClaims (Stable, v1.24)¶
PVCs can be expanded to request more storage, provided the StorageClass’s allowVolumeExpansion is true. Supported volume types include csi, flexVolume (deprecated), and portworxVolume (deprecated).
Process¶
- Edit the PVC’s
spec.resources.requests.storageto a larger size. - The underlying PV is resized, without creating a new PV.
- For
Filesystemvolumes (e.g., XFS, ext4), expansion occurs when a Pod mounts the PVC inReadWritemode, either at Pod startup or if the filesystem supports online expansion. - For
Blockvolumes, resizing is immediate if supported by the driver.
Example StorageClass¶
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example-vol-default
provisioner: vendor-name.example/magicstorage
allowVolumeExpansion: true
Example PVC Expansion¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
resources:
requests:
storage: 10Gi # Increased from 8Gi
Warnings: - Directly editing a PV’s capacity can prevent automatic resizing, as Kubernetes assumes the size is manually set. - Expansion requires CSI driver support for csi volumes.
Resizing In-Use PVCs¶
- In-use PVCs expand automatically when the filesystem is resized, without needing to recreate the Pod.
- Note: Unbound PVCs require a Pod to be created to trigger expansion.
Recovering from Expansion Failures¶
If expansion fails (e.g., requesting too much storage), the controller retries indefinitely. To recover: 1. Mark the PV as Retain. 2. Delete and recreate the PVC with a smaller size, setting volumeName to the PV’s name. 3. Restore the PV’s original reclaim policy.
Explanation: This manual intervention prevents infinite retries, allowing administrators to adjust storage requests.
Types of Persistent Volumes¶
PVs are implemented as plugins, with support for various storage backends. As of Kubernetes v1.33, supported types include:
- csi: Container Storage Interface, the standard for external storage integration.
- fc: Fibre Channel block storage.
- hostPath: Node-local storage (for testing; not suitable for multi-node clusters).
- iscsi: iSCSI storage.
- local: Local storage devices, requiring
nodeAffinity. - nfs: Network File System, supporting multiple writers.
Deprecated Types¶
These types are redirected to CSI drivers or unsupported: - awsElasticBlockStore: Use ebs.csi.aws.com (migration default since v1.23). - azureDisk: Use disk.csi.azure.com (migration default since v1.23). - azureFile: Use file.csi.azure.com (migration default since v1.24). - cinder: Use cinder.csi.openstack.org (migration default since v1.21). - flexVolume: Deprecated since v1.23; migrate to CSI. - gcePersistentDisk: Use pd.csi.storage.gke.io (migration default since v1.23). - portworxVolume: Use pxd.portworx.com (migration default since v1.31). - vsphereVolume: Use csi.vsphere.vmware.com (migration default since v1.25). - cephfs, rbd: Removed in v1.31. - glusterfs, flocker, quobyte, storageos: Removed in v1.25 or earlier.
Recommendation: Use CSI drivers for new deployments to ensure future compatibility.
Access Mode Support¶
| Volume Plugin | ReadWriteOnce | ReadOnlyMany | ReadWriteMany | ReadWriteOncePod |
|---|---|---|---|---|
| csi | Driver-dependent | Driver-dependent | Driver-dependent | Driver-dependent |
| fc | ✓ | ✓ | - | - |
| hostPath | ✓ | - | - | - |
| iscsi | ✓ | ✓ | - | - |
| local | ✓ | - | - | - |
| nfs | ✓ | ✓ | ✓ | - |
Note: ReadWriteOncePod requires CSI volumes and Kubernetes v1.22+ with updated CSI sidecars.
Advanced Features¶
Raw Block Volume Support (Stable, v1.18)¶
- Purpose: Provides raw block devices without a filesystem, ideal for applications like databases.
- Supported Plugins:
csi,fc,iscsi,local. - Example:
apiVersion: v1 kind: PersistentVolume metadata: name: block-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce volumeMode: Block persistentVolumeReclaimPolicy: Retain fc: targetWWNs: ["50060e801049cfd1"] lun: 0 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 10Gi --- apiVersion: v1 kind: Pod metadata: name: pod-with-block-volume spec: containers: - name: fc-container image: fedora:26 volumeDevices: - name: data devicePath: /dev/xvda volumes: - name: data persistentVolumeClaim: claimName: block-pvc - Note: Use
volumeDevicesinstead ofvolumeMountsfor block volumes, specifying a device path.
Volume Snapshot and Restore (Stable, v1.20)¶
- Purpose: Creates snapshots of CSI volumes for backup or restoration.
- Example:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc spec: storageClassName: csi-hostpath-sc dataSource: name: new-snapshot-test kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi - Note: Requires CSI drivers with snapshot support.
Volume Cloning (CSI Only)¶
- Purpose: Creates a new PVC from an existing PVC’s data.
- Example:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: my-csi-plugin dataSource: name: existing-src-pvc-name kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Volume Populators and Data Sources (Beta, v1.24)¶
- Purpose: Allows custom controllers to populate PVCs with data from arbitrary sources using the
dataSourceReffield. - Example:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: populated-pvc spec: dataSourceRef: name: example-name kind: ExampleDataSource apiGroup: example.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi - Note: Requires the
AnyVolumeDataSourcefeature gate and external populator controllers.
Cross-Namespace Data Sources (Alpha, v1.26)¶
- Purpose: Allows referencing data sources (e.g., VolumeSnapshots) in another namespace.
- Requirements:
- Enable
AnyVolumeDataSourceandCrossNamespaceVolumeDataSourcefeature gates. - Create a
ReferenceGrantin the source namespace. - Example:
apiVersion: gateway.networking.k8s.io/v1beta1 kind: ReferenceGrant metadata: name: allow-ns1-pvc namespace: default spec: from: - group: "" kind: PersistentVolumeClaim namespace: ns1 to: - group: snapshot.storage.k8s.io kind: VolumeSnapshot name: new-snapshot-demo --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: foo-pvc namespace: ns1 spec: dataSourceRef: apiGroup: snapshot.storage.k8s.io kind: VolumeSnapshot name: new-snapshot-demo namespace: default accessModes: - ReadWriteOnce resources: requests: storage: 1Gi - Note: Requires the Gateway API’s
ReferenceGrantresource.
Writing Portable Configuration¶
To create portable storage configurations for diverse Kubernetes clusters: 1. Include PVCs in your configuration (e.g., with Deployments, ConfigMaps). 2. Avoid including PVs, as users may lack permission to create them. 3. Allow users to specify a storageClassName for the PVC: - If provided, set persistentVolumeClaim.storageClassName to match. - If not provided, leave storageClassName unset to use the cluster’s default StorageClass. 4. Monitor PVCs for binding delays, indicating missing dynamic provisioning or storage systems, and alert users.
Example:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: "" # Set by user or left unset
Best Practices¶
- Use Dynamic Provisioning: Prefer StorageClasses for automated PV creation, reducing manual overhead.
- Set Appropriate Reclaim Policies: Use
Deletefor cloud-backed storage andRetainfor manual management or critical data. - Leverage CSI Drivers: Transition from deprecated in-tree plugins to CSI for better support.
- Monitor Binding: Ensure PVCs bind promptly; unbound PVCs may indicate missing PVs or misconfigured StorageClasses.
- Secure Storage: Use
ReadWriteOncePodfor sensitive data requiring exclusive access. - Test Expansion: Validate volume expansion in a staging environment, especially for in-use PVCs.
What’s Next¶
- Explore StorageClasses for dynamic provisioning and advanced storage configuration.
- Experiment with PVs and PVCs in a lab environment, such as deploying a stateful application like WordPress with MySQL.
- Review CSI driver documentation for specific storage backend integrations.