UPDATE: A reader notified my about the fact, that in ReadWriteOnce mode it is still possible for multiple Pods to mount the same volume as long as they are running on the same node. This was updated in the article accordingly. Thanks Dean!
Recently I was challenged with providing a PersistentVolume which supports ReadWriteMany as access mode to an application I wanted to install in my local Kubernetes cluster. While this sounds like just setting a simple property in the depth of a YAML file, the ReadWriteMany mode is fundamentally different from “normal” storage. If you are new to Linux and storage technologies in general it can be confusing trying to navigate possible solutions to this problem. If you just embarked this journey, I hope to give some helpful insights with this post. Follow me along deploying Rook NFS in a Kubernetes cluster and using it to dynamically provision a ReadWriteMany volume.
Introduction
What is ReadWriteMany Used For?
In the Kubernetes world there are two main types of volumes you will meet most of the time: ReadWriteOnce (RWO) and ReadWriteMany (RWX) volumes.
RWO volumes are the most common type of PersistentVolumes, which I would consider the default, if nothing else is mentioned. A RWO volume can only be mounted to a single Pod, unless the Pods are running on the same node. Like a hard disk that is directly connected to a notebook, server or Virtual Machine, the processes in this Pod are exclusively accessing this volume. For example on AWS, you’ll get an Elastic Block Storage when a RWO volume is provisioned. As mentioned, you could use the same RWO volume for multiple Pods, if they are scheduled to the same node, but for many use cases this will not work, because you want to schedule Pods deliberately to different nodes to make your workload highly available.
RWX volumes on the other hand are intended to be used by multiple Pods simultaneously. A typical use case would be a set of Pods all running an instance of the same application (let’s say a gallery app to share photos), whereat all these app instances of course want to be able to deliver the same set of images to the end user. Also, all of these Pods may want to write newly uploaded images to the same storage. To make this work, in contrast to a RWO volume, we need an additional control layer that manages concurrent read and write requests to the shared volume. This is commonly referred to as a Network File System (NFS). On AWS, you would need Elastic File Storage to accomplish this.
Besides RWO and RWX there is also a ReadOnlyMany type, which essentially has the same requirements to the storage service as a ReadWriteMany volume has.
The Solution
It’s easy if you run on a managed Kubernetes cluster on the cloud. The cloud vendors already solved this problem for you and offer compatible storage services. On your local box, it’s not that easy. This is where a Network Filesystem comes into play:
NFS allows remote hosts to mount file systems over a network and interact with those file systems as though they are mounted locally. This enables system administrators to consolidate resources onto centralized servers on the network.
Rook NFS Guide
A possible solution could be running a NFS Server in parallel to your Kubernetes cluster and integrate it using the nfs-client-provisioner. I don’t like this solution, because:
- … the NFS volumes need to be prepared manually by yourself before you can use them in your cluster.
- … it requires interaction with an external component (the NFS Server) you have to handle yourself.
- … using an external component outside the cluster makes this solution not very portable.
This is where Rook NFS come into play. It consists the following pieces:
- It comes with a NFS Server which takes care of creating, deleting and modifying the volumes.
- There are capabilities for Dynamic Volume Provisioning, which I don’t want to miss.
- The NFS Server part is managed by an Operator.
- Apart from a few requirements to the nodes, all related parts are running as Kubernetes clusters workloads.
The Rook project provides many solutions to work with containerized storage. For our purpose we only need Rook NFS.
Getting Rook NFS to Work
There is detailed guide in the official docs. However, I found the guide a little confusing, which is why I took some shortcuts setting up Rook NFS. Find the latest and greatest official guide here.
Install NFS Utilities on the Actual Node(s)
On the operating system level you need to install the NFS client packages. This is easy for the major Linux distributions:
# Ubuntu, Debian sudo apt update sudo apt install -y nfs-common # RedHat, CentOS sudo yum check-update sudo yum -y install nfs-utils
The NFS client packages are not a NFS Server, but just a set of tools required by the cluster nodes to communicate with remote NFS storage.
Clone Sample Manifests from GitHub
The Rook project offers sample YAML manifests for the several flavors of Rook deployments. By the I’m writing this post, 1.5.4 was the latest version of Rook NFS available. Thus we grab the corresponding sample manifests in the Rook NFS repository on GitHub:
git clone --branch v1.5.4 https://github.com/rook/rook
From now on all further steps will happen in the cluster.
Install a ReadWriteOnce Provisioner
You’ve read correctly. Rook NFS will take care of providing ReadWriteMany volumes, but eventually Rook NFS itself also needs some storage where it can put all the data, that it manages as ReadWriteMany volumes. For this purpose you could of course manually create a hostPath
volume for this once, but it is way easier to use a provisioner. Here I want to recommend the Local Path Provisioner built by the Rancher team. Install its latest release:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
Wait for the
Pod to become ready:local-path-provisioner
kubectl get pod -A -l app=local-path-provisioner
Shows:
NAMESPACE NAME READY kube-system local-path-provisioner-6d59f47c7-wmmss 1/1
Now you can get hostPath
volumes automatically provisioned, including the volumes we now need for Rook NFS.
Install the NFS Operator
The main component of Rook NFS is the NFS Operator. This operator will take care of creating an and managing NFS volumes, which can be used in ReadWriteMany mode.
Following the steps in the official guidelines, deploy the Operator by cloning and applying the YAML manifests:
kubectl apply -f rook/cluster/examples/kubernetes/nfs/common.yaml kubectl apply -f rook/cluster/examples/kubernetes/nfs/operator.yaml
Let’s check if the Rook NFS Operator has been installed:
kubectl get pods -n rook-nfs-system
This will show us a single Pod:
NAME READY STATUS RESTARTS AGE rook-nfs-operator-f45cfcd7d-cgx5l 1/1 Running 0 4m34s
Create Pod Security Policies
The official docs recommend configuring a set of Pod Security Policies (PSPs). If you are using a shared Kubernetes cluster, this would makes sure that Rook NFS runs with a sufficient set of permissions in the cluster. In example my local k3s cluster does not have any default PSPs in place, which is why I don’t have to add the recommended PSPs recommended. If the command kubectl get podsecuritypolicies.policy -A
returns results, you should definitely think about adding the policies like this:
kubectl apply -f rook/cluster/examples/kubernetes/nfs/psp.yaml
Create RBAC Resources for the NFS Server
The Rook NFS Server is not only a simple NFS Server, but includes a Kubernetes Volume Provisioner. This enables the NFS Server to provision NFS volumes dynamically if requested by a workload. For Dynamic Volume Provisioning a set of specific permissions are required, which have to be created before we can install the NFS Server:
- A separate Namespace (in addition to the
rook-nfs-system
used by the operator) - A dedicated ServiceAccount including the required ClusterRole and ClusterRoleBinding
For this purpose there is another manifest we can use:
kubectl apply -f rook/cluster/examples/kubernetes/nfs/rbac.yaml
At this point there are some optional steps you could prepare, but they are not required for our simple setup.
Install the NFS Server
Finally it’s time for launching the actual NFS Server. Rook NFS offers different types of NFS Server depending on the backend storage to be used for the NFS volumes (Ceph, XFS). For a simple way to get our hands on ReadWriteMany volumes, we can ignore them. To install the NFS Server, we will create two resources:
- a PersistentVolumeClaim as backend for the NFS Server
- the NFS Server itself
The PersistentVolumeClaim will be used by the NFS Server to store all the ReadWriteMany volumes. Therefore, it needs to be big enough in size to be able to persist all the volumes you plan to provision with Rook NFS. For this purpose we can use a hostPath
volume in ReadWriteOnce mode provisioned by the Local Path Provisioner we installed earlier.
At first, set the total size we want to allocate for the NFS Server’s backend storage:
NFS_SIZE=10Gi
Also, we need to define that we want to use the local-path
StorageClass for the backend storage:
RWO_STORAGE_CLASS=local-path
With the next command, we will use the sample manifest and use the previously set variables to modify it before it gets applied to the cluster:
cat rook/cluster/examples/kubernetes/nfs/nfs.yaml \ | sed "s/ReadWriteMany/ReadWriteOnce/g" \ | sed "s/storage: 1Gi/storage: $NFS_SIZE/g" \ | sed "/accessModes/i \ \ storageClassName: $RWO_STORAGE_CLASS" \ | kubectl apply -f -
Let’s see how the NFS Server is doing:
kubectl get nfsservers.nfs.rook.io -n rook-nfs
This will list a single CustomResource of kind nfsservers.nfs.rook.io
:
NAME AGE STATE rook-nfs 18s Running
To display the related Pods run:
kubectl get pod -n rook-nfs
The output shows one NFS Server Pod named rook-nfs-0 running two containers:
NAME READY STATUS RESTARTS AGE rook-nfs-0 2/2 Running 0 112s
The NFS Server comes along with the NFS Volume Provisioner, which allows to request workloads NFS volumes via Dynamic Volume Provisioning:
kubectl get pod -n rook-nfs \ -o=custom-columns="POD:.metadata.name,CONTAINER:.status.containerStatuses[*].name,READY:.status.containerStatuses[*].ready"
Shows:
POD CONTAINER READY rook-nfs-0 nfs-provisioner,nfs-server true,true
Create a StorageClass
Now that Rook NFS is running, as a last step we need to define a StorageClass to which we can refer to when we define PersistentVolumes for workloads:
kubectl apply -f rook/cluster/examples/kubernetes/nfs/sc.yaml
When we describe the created StorageClass, we can see that it is tied to the NFS Server we just deployed and will export all volumes to a share called share1
:
kubectl describe sc rook-nfs-share1
Shows:
Name: rook-nfs-share1 [...] Provisioner: nfs.rook.io/rook-nfs-provisioner Parameters: exportName=share1,nfsServerName=rook-nfs,nfsServerNamespace=rook-nfs AllowVolumeExpansion: <unset> MountOptions: <none> ReclaimPolicy: Delete VolumeBindingMode: Immediate Events: <none>
Notice: On some Kubernetes platforms I had to force usage of NFS v4 to make it possible for Pods to mount a NFS volume. This is possible by adding a mount option to the StorageClass:
apiVersion: storage.k8s.io/v1 kind: StorageClass [...] mountOptions: ["vers=4"] [...]
Find more details in on GitHub in two specific issues.
Test the Installation With an Example Use Case
With Rook NFS ready to go, we can run small demo scenario. The scenario will consist of two pieces
- a PersistentVolumeClaim (PVC) using Rook NFS supporting ReadWriteMany access
- a BusyBox Deployment of five replicas, all writing data to the PVC
This means that we will have five Pods in total, whereat all of them will be able to read from and write to the same volume.
Create a New Namespace
Deploying our test workloads into a separate namespace makes it easier to filter for Pods with kubectl
later on:
kubectl create namespace test
Create a ReadWriteMany Volume
We will create a very small NFS volume (1 MB) for testing purposes in the new namespace:
kubectl apply -n test -f rook/cluster/examples/kubernetes/nfs/pvc.yaml
Command
shows the successful provisioning of the volume:kubectl describe pvc -n test rook-nfs-pv-claim
[...] Successfully provisioned volume pvc-a19b975c-0f9b-4b10-afba-147bcbd58ab3
And – what we’ve been waiting for all the time: kubectl get pvc -n test rook-nfs-pv-claim
shows the volume with access mode RWX
!
NAME [...] VOLUME [...] ACCESS MODES rook-nfs-pv-claim [...] pvc-a19b975c-0f9b-4b10-afba-147bcbd58ab3 [...] RWX
Consume the Volume With a Workload
Now that we have an empty volume ready to be mounted to some Pods, we will create our workload. The workload consists of the below Deployment resource. The entire snippet can be executed as one single command:
cat <<EOF | kubectl apply -n test -f - apiVersion: apps/v1 kind: Deployment metadata: labels: app: nfs-demo name: nfs-busybox spec: replicas: 5 selector: matchLabels: app: nfs-demo template: metadata: labels: app: nfs-demo spec: volumes: - name: rook-nfs-vol persistentVolumeClaim: claimName: rook-nfs-pv-claim terminationGracePeriodSeconds: 5 containers: - image: busybox imagePullPolicy: IfNotPresent name: busybox volumeMounts: - name: rook-nfs-vol mountPath: "/mnt" command: - sh args: - -c - | while true; do echo "\$(date) \$(hostname)" > /mnt/shared.log sleep $(($RANDOM % 5 + 5)) done EOF
Listing all Pods in our test
namespace with kubectl get pods -n test
, we see five Pods running as expected:
NAME READY STATUS RESTARTS AGE nfs-busybox-6b9fd44b95-qk28n 1/1 Running 0 103s nfs-busybox-6b9fd44b95-j7h27 1/1 Running 0 103s nfs-busybox-6b9fd44b95-hds8m 1/1 Running 0 103s nfs-busybox-6b9fd44b95-qx7xj 1/1 Running 0 100s nfs-busybox-6b9fd44b95-wmhss 1/1 Running 0 99s
All of the nfs-busybox
Pods frequently write their own name into the same text file on the shared RWX
volume. To verify this behavior, let’s have alook into the file using kubectl exec. To look at the file we will use one of the nfs-busybox
Pods.
kubectl exec \ -n test $(kubectl get pods -n test -o=jsonpath='{.items[0].metadata.name}') \ -- cat /mnt/shared.log
The first run will yield:
Sun Dec 27 21:06:55 UTC 2020 nfs-busybox-6b9fd44b95-j7h27
If we run this command once again a little later (approximately 10 seconds), another Pod will have written its name into the file:
Sun Dec 27 21:08:08 UTC 2020 nfs-busybox-6b9fd44b95-hds8m
Conclusion
For Production usage, there are many more steps to be taken. In this post we haven’t considered:
- a backup strategy.
- high-availability for Rook NFS components.
- scalability for use cases with a high number of Pods accessing the same volume.
- performance tuning for workloads with high I/O.
- using re-sizable volumes for the NFS Server’s backend storage.
We have to keep in mind that most of the cloud providers already provide RWX
StorageClasses “straight from the factory” for their managed Kubernetes clusters. For working locally (k3s, minikube, microk8s, …) on Linux machines the Rook NFS Provisioner is a good choice. I found that other storage solutions for Kubernetes (such as in example Longhorn or OpenEBS) also provide RWX
storage classes, but are most likely more resource intensive.
I think it’s worth to automate the steps described in this blog post for your local Kubernetes development box. It comes in very handy for scenarios in which an application demands RWX volumes.
This is it for today. Now go enjoy your ReadWriteMany volumes! 🙂
2 Comments
hi,
have you tried if the PVC is provisioned by local-path provisioner, without via Rook NFS?
To my understanding, this is not supported by the local-path-provisioner (see this issue in the repo: https://github.com/rancher/local-path-provisioner/issues/70), which si why I had to resort to Rook NFS.