Kubernetes Persistent Volumes and Claims: A Comprehensive Guide

Background: Understanding Storage in Kubernetes

Before diving into Persistent Volumes (PVs) and Persistent Volume Claims (PVCs), let’s establish the foundational concepts that underpin storage management in Kubernetes. These are critical for understanding how PVs and PVCs function and why they are necessary.

What is Storage in Kubernetes?

Kubernetes is a container orchestration platform that manages compute resources (e.g., CPU, memory) and storage resources separately. While containers are ephemeral by design—meaning their local files are lost upon restart or crash—many applications, such as databases or file servers, require persistent storage that survives container lifecycles. Kubernetes addresses this through its storage subsystem, which provides abstractions to manage storage independently of compute resources.

Key Concepts to Understand

  1. Pods: Pods are the smallest deployable units in Kubernetes, running one or more containers. Containers within a Pod share resources, including storage, which is often provided via volumes.
  2. Volumes: As covered previously, Kubernetes volumes are directories accessible to containers in a Pod, enabling data sharing and persistence. Volumes can be ephemeral (e.g., emptyDir, deleted with the Pod) or persistent (e.g., backed by network storage, surviving Pod deletion). PVs and PVCs build on the volume concept to provide durable, cluster-wide storage.
  3. StorageClasses: A StorageClass defines a storage profile (e.g., performance, provisioning method) that administrators configure. It allows dynamic provisioning of PVs, abstracting the underlying storage details from users.
  4. VolumeAttributesClasses (emerging concept): These extend StorageClasses to allow modification of volume attributes (e.g., performance) after creation, though they are less critical for this discussion.

Why Persistent Storage Matters

Unlike ephemeral volumes, which are tied to a Pod’s lifecycle, persistent storage ensures data durability across Pod restarts, deletions, or node failures. This is essential for stateful applications like databases (e.g., MySQL, PostgreSQL) or file storage systems. PVs and PVCs provide a robust framework to manage this persistent storage, decoupling storage provisioning from consumption.

With this background, students should understand that PVs and PVCs are Kubernetes resources designed to handle persistent storage needs, offering a user-friendly abstraction over complex storage backends. Now, let’s explore these resources in detail.

Introduction to Persistent Volumes and Claims

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) form the core of Kubernetes’ persistent storage subsystem. They abstract the details of storage provisioning (how storage is provided) from storage consumption (how it is used), enabling users to request storage without needing to understand the underlying infrastructure.

  • PersistentVolume (PV): A cluster-wide resource representing a piece of storage, provisioned either manually by an administrator (static provisioning) or automatically via a StorageClass (dynamic provisioning). PVs are independent of Pods, with lifecycles that persist beyond any individual Pod.
  • PersistentVolumeClaim (PVC): A user’s request for storage, specifying requirements like size and access modes. PVCs act as a “claim” on PV resources, similar to how Pods consume node resources (e.g., CPU, memory).

Analogy: Think of PVs as available storage units in a warehouse (e.g., shelves with specific capacities and features). PVCs are like purchase orders from users, requesting a storage unit that meets their needs. Kubernetes matches the order (PVC) to an available unit (PV) or creates a new one if dynamic provisioning is enabled.

Lifecycle of Persistent Volumes and Claims

The interaction between PVs and PVCs follows a well-defined lifecycle, which is crucial for understanding their behavior:

  1. Provisioning:
  2. Static Provisioning: An administrator manually creates PVs with specific storage details (e.g., NFS share, iSCSI volume). These PVs are available in the cluster for users to claim.
  3. Dynamic Provisioning: When a PVC requests a StorageClass and no matching PV exists, Kubernetes dynamically provisions a PV using the StorageClass’s provisioner. This requires the DefaultStorageClass admission controller to be enabled on the API server (configured via --enable-admission-plugins=DefaultStorageClass).

    • Explanation: Dynamic provisioning automates storage creation, reducing administrative overhead. For example, a cloud provider’s provisioner might create a new disk in AWS EBS or Google Cloud Persistent Disk.
    • Note: PVCs requesting the empty string ("") as their StorageClass disable dynamic provisioning, relying on static PVs.
  4. Binding:

  5. A control loop in the Kubernetes control plane monitors new PVCs and attempts to bind them to a matching PV based on size, access modes, and StorageClass.
  6. If a PV is dynamically provisioned, it is automatically bound to the PVC.
  7. For static PVs, the PVC binds to a PV that meets or exceeds the request. If no matching PV exists, the PVC remains unbound until a suitable PV is available.
  8. Explanation: Binding is a one-to-one, exclusive mapping. The ClaimRef field in the PV links it to the PVC, ensuring other PVCs cannot claim it.
  9. Example: A PVC requesting 10Gi with ReadWriteOnce will not bind to a 5Gi PV but will bind to a 15Gi PV if available.

  10. Using:

  11. Once bound, a PVC can be used as a volume in a Pod. The Pod’s volume specification references the PVC, and Kubernetes mounts the underlying PV into the Pod’s containers.
  12. Users specify the desired access mode (e.g., ReadWriteOnce) in the Pod’s volume configuration.
  13. Example:
    apiVersion: v1
    kind: Pod
    metadata:
      name: mypod
    spec:
      containers:
      - name: myfrontend
        image: nginx
        volumeMounts:
        - mountPath: "/var/www/html"
          name: mypd
      volumes:
      - name: mypd
        persistentVolumeClaim:
          claimName: myclaim
    
  14. Note: PVCs must be in the same namespace as the Pod using them.

  15. Storage Object in Use Protection:

  16. Kubernetes prevents deletion of PVCs actively used by Pods and PVs bound to PVCs to avoid data loss.
  17. When a PVC or PV is marked for deletion (status: Terminating), finalizers (e.g., kubernetes.io/pvc-protection, kubernetes.io/pv-protection) delay removal until the resource is no longer in use.
  18. Example:
    kubectl describe pvc hostpath
    Name:          hostpath
    Status:        Terminating
    Finalizers:    [kubernetes.io/pvc-protection]
    
  19. Explanation: This feature ensures data integrity by preventing accidental deletion during active use.

  20. Reclaiming:

  21. When a PVC is deleted, the PV’s reclaim policy determines what happens to the storage:
    • Retain: The PV persists, marked as “released,” with data intact. Administrators must manually delete the PV, clean the storage, and optionally reuse it.
    • Steps:
      1. Delete the PV object.
      2. Clean the storage backend (e.g., remove files on an NFS share).
      3. Delete the storage asset or create a new PV to reuse it.
    • Delete: Both the PV object and the underlying storage asset are deleted. This is the default for dynamically provisioned PVs unless the StorageClass specifies otherwise.
    • Recycle (Deprecated): The PV is scrubbed (e.g., rm -rf /thevolume/*) and made available for a new claim. Not recommended; use dynamic provisioning instead.
  22. Example (Custom Recycler Pod for Recycle, deprecated):
    apiVersion: v1
    kind: Pod
    metadata:
      name: pv-recycler
    spec:
      restartPolicy: Never
      volumes:
      - name: vol
        hostPath:
          path: /any/path
      containers:
      - name: pv-recycler
        image: registry.k8s.io/busybox
        command: ["/bin/sh", "-c", "rm -rf /scrub/* && test -z \"$(ls -A /scrub)\" || exit 1"]
        volumeMounts:
        - name: vol
          mountPath: /scrub
    
  23. Note: The Recycle policy requires a custom recycler Pod, but dynamic provisioning is preferred for modern clusters.

  24. PersistentVolume Deletion Protection (Stable, v1.33):

  25. For PVs with a Delete reclaim policy, finalizers ensure the underlying storage is deleted before the PV object is removed:
    • external-provisioner.volume.kubernetes.io/finalizer: Added to CSI volumes (dynamic and static) and dynamically provisioned in-tree volumes since v1.31.
    • kubernetes.io/pv-controller: Added to dynamically provisioned in-tree volumes since v1.31.
  26. Example (CSI Volume):
    kubectl describe pv pvc-2f0bab97
    Name:            pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
    Finalizers:      [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer]
    Reclaim Policy:  Delete
    
  27. Explanation: This ensures storage cleanup, preventing orphaned resources, especially in cloud environments.

PersistentVolume (PV) Specification

A PV is defined by its spec and status, capturing the storage’s configuration and current state.

Key Fields

  • capacity: Specifies the storage size (e.g., storage: 5Gi). Currently, only storage size is supported, but future attributes may include IOPS or throughput.
  • volumeMode (Stable, v1.18):
  • Filesystem (default): Mounts the volume as a directory with a filesystem (e.g., ext4, XFS).
  • Block: Presents the volume as a raw block device without a filesystem, ideal for applications that manage their own storage (e.g., databases).
  • accessModes: Defines how the volume can be mounted:
  • ReadWriteOnce (RWO): Read-write by a single node; multiple Pods on the same node can access.
  • ReadOnlyMany (ROX): Read-only by multiple nodes.
  • ReadWriteMany (RWX): Read-write by multiple nodes.
  • ReadWriteOncePod (RWOP, Stable, v1.29): Read-write by a single Pod, ensuring exclusive access cluster-wide. Requires CSI volumes and specific CSI sidecar versions (e.g., csi-provisioner:v3.0.0+).
  • Note: Access modes describe capabilities, not enforced restrictions. For example, a ReadOnlyMany PV is not guaranteed to be read-only unless configured as such by the storage backend.
  • storageClassName: Links the PV to a StorageClass. An empty or absent storageClassName indicates no class, limiting binding to PVCs without a class.
  • persistentVolumeReclaimPolicy: Defines reclamation behavior (Retain, Delete, Recycle).
  • mountOptions: Specifies mount options (e.g., nfsvers=4.1) for supported volume types (e.g., nfs, iscsi, csi).
  • nodeAffinity: Restricts the PV to specific nodes, mandatory for local volumes.
  • Source: Specifies the storage backend (e.g., nfs, csi, hostPath).

Example PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0003
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: slow
  mountOptions:
  - hard
  - nfsvers=4.1
  nfs:
    path: /tmp
    server: 172.17.0.2

Explanation: This PV provides 5Gi of NFS storage, mountable as read-write by one node, with a Delete reclaim policy. It requires the /sbin/mount.nfs helper program on nodes to mount the NFS share.

Phase

A PV’s status reflects its lifecycle phase: - Available: Unbound and ready for a PVC. - Bound: Bound to a PVC, with the PVC name visible via kubectl describe pv <name>. - Released: PVC deleted, but the PV is not yet reclaimed. - Failed: Reclamation failed. - Phase Transition Timestamp (Stable, v1.31): The status.lastPhaseTransitionTime field records when the PV last changed phases, aiding in debugging.

PersistentVolumeClaim (PVC) Specification

A PVC defines a user’s storage request, with spec and status fields.

Key Fields

  • accessModes: Matches the PV’s access modes (e.g., ReadWriteOnce).
  • volumeMode: Filesystem or Block, matching the PV.
  • resources.requests.storage: Specifies the desired storage size (e.g., 8Gi).
  • Note: For Filesystem volumes, this is the allocated size, which may be slightly reduced due to filesystem overhead (e.g., XFS metadata).
  • storageClassName: Requests a specific StorageClass or "" for no class. If unset, behavior depends on the DefaultStorageClass admission plugin.
  • selector: Filters PVs by labels (e.g., matchLabels, matchExpressions). Cannot be used with dynamic provisioning.
  • volumeName: Explicitly binds to a named PV, bypassing matching criteria except for validation.

Example PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  storageClassName: slow
  selector:
    matchLabels:
      release: "stable"

Explanation: This PVC requests 8Gi of storage with ReadWriteOnce access, preferring PVs from the slow StorageClass with a release: stable label.

StorageClass and Default Behavior

  • With DefaultStorageClass Admission Plugin:
  • A default StorageClass (marked with storageclass.kubernetes.io/is-default-class: true) is applied to PVCs without storageClassName.
  • PVCs with storageClassName: "" only bind to PVs with no class.
  • Without DefaultStorageClass:
  • PVCs without storageClassName or with storageClassName: "" bind to PVs with no class.
  • Retroactive assignment (Stable, v1.28) updates PVCs without storageClassName to the default StorageClass when one becomes available, unless storageClassName: "".
  • Note: Avoid using the deprecated volume.beta.kubernetes.io/storage-class annotation.

Reserving a PersistentVolume

To ensure a PVC binds to a specific PV, you can pre-bind them:

  1. PVC Specifies PV:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: foo-pvc
      namespace: foo
    spec:
      storageClassName: ""
      volumeName: foo-pv
    
  2. Binds to foo-pv if it exists and is unbound, ignoring some matching criteria (e.g., node affinity).

  3. PV Reserves PVC:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: foo-pv
    spec:
      storageClassName: ""
      claimRef:
        name: foo-pvc
        namespace: foo
    

  4. Reserves foo-pv for foo-pvc, preventing other PVCs from binding.
  5. Useful for Retain policy PVs or reusing existing storage.

Explanation: Pre-binding ensures predictable storage allocation, especially in environments with limited PVs.

Expanding PersistentVolumeClaims (Stable, v1.24)

PVCs can be expanded to request more storage, provided the StorageClass’s allowVolumeExpansion is true. Supported volume types include csi, flexVolume (deprecated), and portworxVolume (deprecated).

Process

  1. Edit the PVC’s spec.resources.requests.storage to a larger size.
  2. The underlying PV is resized, without creating a new PV.
  3. For Filesystem volumes (e.g., XFS, ext4), expansion occurs when a Pod mounts the PVC in ReadWrite mode, either at Pod startup or if the filesystem supports online expansion.
  4. For Block volumes, resizing is immediate if supported by the driver.

Example StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: example-vol-default
provisioner: vendor-name.example/magicstorage
allowVolumeExpansion: true

Example PVC Expansion

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  resources:
    requests:
      storage: 10Gi  # Increased from 8Gi

Warnings: - Directly editing a PV’s capacity can prevent automatic resizing, as Kubernetes assumes the size is manually set. - Expansion requires CSI driver support for csi volumes.

Resizing In-Use PVCs

  • In-use PVCs expand automatically when the filesystem is resized, without needing to recreate the Pod.
  • Note: Unbound PVCs require a Pod to be created to trigger expansion.

Recovering from Expansion Failures

If expansion fails (e.g., requesting too much storage), the controller retries indefinitely. To recover: 1. Mark the PV as Retain. 2. Delete and recreate the PVC with a smaller size, setting volumeName to the PV’s name. 3. Restore the PV’s original reclaim policy.

Explanation: This manual intervention prevents infinite retries, allowing administrators to adjust storage requests.

Types of Persistent Volumes

PVs are implemented as plugins, with support for various storage backends. As of Kubernetes v1.33, supported types include:

  • csi: Container Storage Interface, the standard for external storage integration.
  • fc: Fibre Channel block storage.
  • hostPath: Node-local storage (for testing; not suitable for multi-node clusters).
  • iscsi: iSCSI storage.
  • local: Local storage devices, requiring nodeAffinity.
  • nfs: Network File System, supporting multiple writers.

Deprecated Types

These types are redirected to CSI drivers or unsupported: - awsElasticBlockStore: Use ebs.csi.aws.com (migration default since v1.23). - azureDisk: Use disk.csi.azure.com (migration default since v1.23). - azureFile: Use file.csi.azure.com (migration default since v1.24). - cinder: Use cinder.csi.openstack.org (migration default since v1.21). - flexVolume: Deprecated since v1.23; migrate to CSI. - gcePersistentDisk: Use pd.csi.storage.gke.io (migration default since v1.23). - portworxVolume: Use pxd.portworx.com (migration default since v1.31). - vsphereVolume: Use csi.vsphere.vmware.com (migration default since v1.25). - cephfs, rbd: Removed in v1.31. - glusterfs, flocker, quobyte, storageos: Removed in v1.25 or earlier.

Recommendation: Use CSI drivers for new deployments to ensure future compatibility.

Access Mode Support

Volume Plugin ReadWriteOnce ReadOnlyMany ReadWriteMany ReadWriteOncePod
csi Driver-dependent Driver-dependent Driver-dependent Driver-dependent
fc - -
hostPath - - -
iscsi - -
local - - -
nfs -

Note: ReadWriteOncePod requires CSI volumes and Kubernetes v1.22+ with updated CSI sidecars.

Advanced Features

Raw Block Volume Support (Stable, v1.18)

  • Purpose: Provides raw block devices without a filesystem, ideal for applications like databases.
  • Supported Plugins: csi, fc, iscsi, local.
  • Example:
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: block-pv
    spec:
      capacity:
        storage: 10Gi
      accessModes:
      - ReadWriteOnce
      volumeMode: Block
      persistentVolumeReclaimPolicy: Retain
      fc:
        targetWWNs: ["50060e801049cfd1"]
        lun: 0
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: block-pvc
    spec:
      accessModes:
      - ReadWriteOnce
      volumeMode: Block
      resources:
        requests:
          storage: 10Gi
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-with-block-volume
    spec:
      containers:
      - name: fc-container
        image: fedora:26
        volumeDevices:
        - name: data
          devicePath: /dev/xvda
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: block-pvc
    
  • Note: Use volumeDevices instead of volumeMounts for block volumes, specifying a device path.

Volume Snapshot and Restore (Stable, v1.20)

  • Purpose: Creates snapshots of CSI volumes for backup or restoration.
  • Example:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: restore-pvc
    spec:
      storageClassName: csi-hostpath-sc
      dataSource:
        name: new-snapshot-test
        kind: VolumeSnapshot
        apiGroup: snapshot.storage.k8s.io
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
    
  • Note: Requires CSI drivers with snapshot support.

Volume Cloning (CSI Only)

  • Purpose: Creates a new PVC from an existing PVC’s data.
  • Example:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: cloned-pvc
    spec:
      storageClassName: my-csi-plugin
      dataSource:
        name: existing-src-pvc-name
        kind: PersistentVolumeClaim
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
    

Volume Populators and Data Sources (Beta, v1.24)

  • Purpose: Allows custom controllers to populate PVCs with data from arbitrary sources using the dataSourceRef field.
  • Example:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: populated-pvc
    spec:
      dataSourceRef:
        name: example-name
        kind: ExampleDataSource
        apiGroup: example.storage.k8s.io
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
    
  • Note: Requires the AnyVolumeDataSource feature gate and external populator controllers.

Cross-Namespace Data Sources (Alpha, v1.26)

  • Purpose: Allows referencing data sources (e.g., VolumeSnapshots) in another namespace.
  • Requirements:
  • Enable AnyVolumeDataSource and CrossNamespaceVolumeDataSource feature gates.
  • Create a ReferenceGrant in the source namespace.
  • Example:
    apiVersion: gateway.networking.k8s.io/v1beta1
    kind: ReferenceGrant
    metadata:
      name: allow-ns1-pvc
      namespace: default
    spec:
      from:
      - group: ""
        kind: PersistentVolumeClaim
        namespace: ns1
      to:
      - group: snapshot.storage.k8s.io
        kind: VolumeSnapshot
        name: new-snapshot-demo
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: foo-pvc
      namespace: ns1
    spec:
      dataSourceRef:
        apiGroup: snapshot.storage.k8s.io
        kind: VolumeSnapshot
        name: new-snapshot-demo
        namespace: default
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
    
  • Note: Requires the Gateway API’s ReferenceGrant resource.

Writing Portable Configuration

To create portable storage configurations for diverse Kubernetes clusters: 1. Include PVCs in your configuration (e.g., with Deployments, ConfigMaps). 2. Avoid including PVs, as users may lack permission to create them. 3. Allow users to specify a storageClassName for the PVC: - If provided, set persistentVolumeClaim.storageClassName to match. - If not provided, leave storageClassName unset to use the cluster’s default StorageClass. 4. Monitor PVCs for binding delays, indicating missing dynamic provisioning or storage systems, and alert users.

Example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ""  # Set by user or left unset

Best Practices

  1. Use Dynamic Provisioning: Prefer StorageClasses for automated PV creation, reducing manual overhead.
  2. Set Appropriate Reclaim Policies: Use Delete for cloud-backed storage and Retain for manual management or critical data.
  3. Leverage CSI Drivers: Transition from deprecated in-tree plugins to CSI for better support.
  4. Monitor Binding: Ensure PVCs bind promptly; unbound PVCs may indicate missing PVs or misconfigured StorageClasses.
  5. Secure Storage: Use ReadWriteOncePod for sensitive data requiring exclusive access.
  6. Test Expansion: Validate volume expansion in a staging environment, especially for in-use PVCs.

What’s Next

  • Explore StorageClasses for dynamic provisioning and advanced storage configuration.
  • Experiment with PVs and PVCs in a lab environment, such as deploying a stateful application like WordPress with MySQL.
  • Review CSI driver documentation for specific storage backend integrations.