Your Very Own Kubernetes ReadWriteMany Storage

UPDATE: A reader notified my about the fact, that in ReadWriteOnce mode it is still possible for multiple Pods to mount the same volume as long as they are running on the same node. This was updated in the article accordingly. Thanks Dean!

Recently I was challenged with providing a PersistentVolume which supports ReadWriteMany as access mode to an application I wanted to install in my local Kubernetes cluster. While this sounds like just setting a simple property in the depth of a YAML file, the ReadWriteMany mode is fundamentally different from “normal” storage. If you are new to Linux and storage technologies in general it can be confusing trying to navigate possible solutions to this problem. If you just embarked this journey, I hope to give some helpful insights with this post. Follow me along deploying Rook NFS in a Kubernetes cluster and using it to dynamically provision a ReadWriteMany volume.

Introduction

What is ReadWriteMany Used For?

In the Kubernetes world there are two main types of volumes you will meet most of the time: ReadWriteOnce (RWO) and ReadWriteMany (RWX) volumes.

RWO volumes are the most common type of PersistentVolumes, which I would consider the default, if nothing else is mentioned. A RWO volume can only be mounted to a single Pod, unless the Pods are running on the same node. Like a hard disk that is directly connected to a notebook, server or Virtual Machine, the processes in this Pod are exclusively accessing this volume. For example on AWS, you’ll get an Elastic Block Storage when a RWO volume is provisioned. As mentioned, you could use the same RWO volume for multiple Pods, if they are scheduled to the same node, but for many use cases this will not work, because you want to schedule Pods deliberately to different nodes to make your workload highly available.

RWX volumes on the other hand are intended to be used by multiple Pods simultaneously. A typical use case would be a set of Pods all running an instance of the same application (let’s say a gallery app to share photos), whereat all these app instances of course want to be able to deliver the same set of images to the end user. Also, all of these Pods may want to write newly uploaded images to the same storage. To make this work, in contrast to a RWO volume, we need an additional control layer that manages concurrent read and write requests to the shared volume. This is commonly referred to as a Network File System (NFS). On AWS, you would need Elastic File Storage to accomplish this.

Besides RWO and RWX there is also a ReadOnlyMany type, which essentially has the same requirements to the storage service as a ReadWriteMany volume has.

ReadWriteOnce vs. ReadWriteMany Volumes
A Rough Comparison of ReadWriteOnce & ReadWriteMany Volumes

The Solution

It’s easy if you run on a managed Kubernetes cluster on the cloud. The cloud vendors already solved this problem for you and offer compatible storage services. On your local box, it’s not that easy. This is where a Network Filesystem comes into play:

NFS allows remote hosts to mount file systems over a network and interact with those file systems as though they are mounted locally. This enables system administrators to consolidate resources onto centralized servers on the network.

Rook NFS Guide

A possible solution could be running a NFS Server in parallel to your Kubernetes cluster and integrate it using the nfs-client-provisioner. I don’t like this solution, because:

  • … the NFS volumes need to be prepared manually by yourself before you can use them in your cluster.
  • … it requires interaction with an external component (the NFS Server) you have to handle yourself.
  • … using an external component outside the cluster makes this solution not very portable.

This is where Rook NFS come into play. It consists the following pieces:

  • It comes with a NFS Server which takes care of creating, deleting and modifying the volumes.
  • There are capabilities for Dynamic Volume Provisioning, which I don’t want to miss.
  • The NFS Server part is managed by an Operator.
  • Apart from a few requirements to the nodes, all related parts are running as Kubernetes clusters workloads.

The Rook project provides many solutions to work with containerized storage. For our purpose we only need Rook NFS.

Getting Rook NFS to Work

There is detailed guide in the official docs. However, I found the guide a little confusing, which is why I took some shortcuts setting up Rook NFS. Find the latest and greatest official guide here.

Install NFS Utilities on the Actual Node(s)

On the operating system level you need to install the NFS client packages. This is easy for the major Linux distributions:

# Ubuntu, Debian
sudo apt update
sudo apt install -y nfs-common

# RedHat, CentOS
sudo yum check-update
sudo yum -y install nfs-utils

The NFS client packages are not a NFS Server, but just a set of tools required by the cluster nodes to communicate with remote NFS storage.

Clone Sample Manifests from GitHub

The Rook project offers sample YAML manifests for the several flavors of Rook deployments. By the I’m writing this post, 1.5.4 was the latest version of Rook NFS available. Thus we grab the corresponding sample manifests in the Rook NFS repository on GitHub:

git clone --branch v1.5.4 https://github.com/rook/rook

From now on all further steps will happen in the cluster.

Install a ReadWriteOnce Provisioner

You’ve read correctly. Rook NFS will take care of providing ReadWriteMany volumes, but eventually Rook NFS itself also needs some storage where it can put all the data, that it manages as ReadWriteMany volumes. For this purpose you could of course manually create a hostPath volume for this once, but it is way easier to use a provisioner. Here I want to recommend the Local Path Provisioner built by the Rancher team. Install its latest release:

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

Wait for the local-path-provisioner Pod to become ready:

kubectl get pod -A -l app=local-path-provisioner

Shows:

NAMESPACE     NAME                                     READY
kube-system   local-path-provisioner-6d59f47c7-wmmss   1/1

Now you can get hostPath volumes automatically provisioned, including the volumes we now need for Rook NFS.

Install the NFS Operator

The main component of Rook NFS is the NFS Operator. This operator will take care of creating an and managing NFS volumes, which can be used in ReadWriteMany mode.

Following the steps in the official guidelines, deploy the Operator by cloning and applying the YAML manifests:

kubectl apply -f rook/cluster/examples/kubernetes/nfs/common.yaml
kubectl apply -f rook/cluster/examples/kubernetes/nfs/operator.yaml

Let’s check if the Rook NFS Operator has been installed:

kubectl get pods -n rook-nfs-system 

This will show us a single Pod:

NAME                                READY   STATUS    RESTARTS   AGE
rook-nfs-operator-f45cfcd7d-cgx5l   1/1     Running   0          4m34s

Create Pod Security Policies

The official docs recommend configuring a set of Pod Security Policies (PSPs). If you are using a shared Kubernetes cluster, this would makes sure that Rook NFS runs with a sufficient set of permissions in the cluster. In example my local k3s cluster does not have any default PSPs in place, which is why I don’t have to add the recommended PSPs recommended. If the command kubectl get podsecuritypolicies.policy -A returns results, you should definitely think about adding the policies like this:

kubectl apply -f rook/cluster/examples/kubernetes/nfs/psp.yaml

Create RBAC Resources for the NFS Server

The Rook NFS Server is not only a simple NFS Server, but includes a Kubernetes Volume Provisioner. This enables the NFS Server to provision NFS volumes dynamically if requested by a workload. For Dynamic Volume Provisioning a set of specific permissions are required, which have to be created before we can install the NFS Server:

  • A separate Namespace (in addition to the rook-nfs-system used by the operator)
  • A dedicated ServiceAccount including the required ClusterRole and ClusterRoleBinding

For this purpose there is another manifest we can use:

kubectl apply -f rook/cluster/examples/kubernetes/nfs/rbac.yaml

At this point there are some optional steps you could prepare, but they are not required for our simple setup.

Install the NFS Server

Finally it’s time for launching the actual NFS Server. Rook NFS offers different types of NFS Server depending on the backend storage to be used for the NFS volumes (Ceph, XFS). For a simple way to get our hands on ReadWriteMany volumes, we can ignore them. To install the NFS Server, we will create two resources:

  • a PersistentVolumeClaim as backend for the NFS Server
  • the NFS Server itself

The PersistentVolumeClaim will be used by the NFS Server to store all the ReadWriteMany volumes. Therefore, it needs to be big enough in size to be able to persist all the volumes you plan to provision with Rook NFS. For this purpose we can use a hostPath volume in ReadWriteOnce mode provisioned by the Local Path Provisioner we installed earlier.

At first, set the total size we want to allocate for the NFS Server’s backend storage:

NFS_SIZE=10Gi

Also, we need to define that we want to use the local-path StorageClass for the backend storage:

RWO_STORAGE_CLASS=local-path

With the next command, we will use the sample manifest and use the previously set variables to modify it before it gets applied to the cluster:

cat rook/cluster/examples/kubernetes/nfs/nfs.yaml \
    | sed "s/ReadWriteMany/ReadWriteOnce/g" \
    | sed "s/storage: 1Gi/storage: $NFS_SIZE/g" \
    | sed "/accessModes/i \ \ storageClassName: $RWO_STORAGE_CLASS" \
    | kubectl apply -f -

Let’s see how the NFS Server is doing:

kubectl get nfsservers.nfs.rook.io -n rook-nfs

This will list a single CustomResource of kind nfsservers.nfs.rook.io:

NAME       AGE   STATE
rook-nfs   18s   Running

To display the related Pods run:

kubectl get pod -n rook-nfs

The output shows one NFS Server Pod named rook-nfs-0 running two containers:

NAME         READY   STATUS    RESTARTS   AGE
rook-nfs-0   2/2     Running   0          112s

The NFS Server comes along with the NFS Volume Provisioner, which allows to request workloads NFS volumes via Dynamic Volume Provisioning:

kubectl get pod -n rook-nfs \
    -o=custom-columns="POD:.metadata.name,CONTAINER:.status.containerStatuses[*].name,READY:.status.containerStatuses[*].ready"

Shows:

POD          CONTAINER                   READY
rook-nfs-0   nfs-provisioner,nfs-server  true,true

Create a StorageClass

Now that Rook NFS is running, as a last step we need to define a StorageClass to which we can refer to when we define PersistentVolumes for workloads:

kubectl apply -f rook/cluster/examples/kubernetes/nfs/sc.yaml

When we describe the created StorageClass, we can see that it is tied to the NFS Server we just deployed and will export all volumes to a share called share1:

kubectl describe sc rook-nfs-share1

Shows:

Name:            rook-nfs-share1
[...]
Provisioner:           nfs.rook.io/rook-nfs-provisioner
Parameters:            exportName=share1,nfsServerName=rook-nfs,nfsServerNamespace=rook-nfs
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

Notice: On some Kubernetes platforms I had to force usage of NFS v4 to make it possible for Pods to mount a NFS volume. This is possible by adding a mount option to the StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
[...]
mountOptions: ["vers=4"]
[...]

Find more details in on GitHub in two specific issues.

Test the Installation With an Example Use Case

With Rook NFS ready to go, we can run small demo scenario. The scenario will consist of two pieces

  • a PersistentVolumeClaim (PVC) using Rook NFS supporting ReadWriteMany access
  • a BusyBox Deployment of five replicas, all writing data to the PVC

This means that we will have five Pods in total, whereat all of them will be able to read from and write to the same volume.

Create a New Namespace

Deploying our test workloads into a separate namespace makes it easier to filter for Pods with kubectl later on:

kubectl create namespace test

Create a ReadWriteMany Volume

We will create a very small NFS volume (1 MB) for testing purposes in the new namespace:

kubectl apply -n test -f rook/cluster/examples/kubernetes/nfs/pvc.yaml

Command kubectl describe pvc -n test rook-nfs-pv-claim shows the successful provisioning of the volume:

[...]
Successfully provisioned volume pvc-a19b975c-0f9b-4b10-afba-147bcbd58ab3

And – what we’ve been waiting for all the time: kubectl get pvc -n test rook-nfs-pv-claim shows the volume with access mode RWX!

NAME                [...]   VOLUME                                    [...]   ACCESS MODES
rook-nfs-pv-claim   [...]   pvc-a19b975c-0f9b-4b10-afba-147bcbd58ab3  [...]   RWX         

Consume the Volume With a Workload

Now that we have an empty volume ready to be mounted to some Pods, we will create our workload. The workload consists of the below Deployment resource. The entire snippet can be executed as one single command:

cat <<EOF | kubectl apply -n test -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nfs-demo
  name: nfs-busybox
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nfs-demo
  template:
    metadata:
      labels:
        app: nfs-demo
    spec:
      volumes:
      - name: rook-nfs-vol
        persistentVolumeClaim:
          claimName: rook-nfs-pv-claim
      terminationGracePeriodSeconds: 5
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
        - name: rook-nfs-vol
          mountPath: "/mnt"
        command:
          - sh 
        args:
          - -c
          - |
            while true; do
              echo "\$(date) \$(hostname)" > /mnt/shared.log
              sleep $(($RANDOM % 5 + 5))
            done
EOF

Listing all Pods in our test namespace with kubectl get pods -n test, we see five Pods running as expected:

NAME                           READY   STATUS    RESTARTS   AGE
nfs-busybox-6b9fd44b95-qk28n   1/1     Running   0          103s
nfs-busybox-6b9fd44b95-j7h27   1/1     Running   0          103s
nfs-busybox-6b9fd44b95-hds8m   1/1     Running   0          103s
nfs-busybox-6b9fd44b95-qx7xj   1/1     Running   0          100s
nfs-busybox-6b9fd44b95-wmhss   1/1     Running   0          99s

All of the nfs-busybox Pods frequently write their own name into the same text file on the shared RWX volume. To verify this behavior, let’s have alook into the file using kubectl exec. To look at the file we will use one of the nfs-busybox Pods.

kubectl exec \
    -n test $(kubectl get pods -n test -o=jsonpath='{.items[0].metadata.name}') \
    -- cat /mnt/shared.log

The first run will yield:

Sun Dec 27 21:06:55 UTC 2020 nfs-busybox-6b9fd44b95-j7h27

If we run this command once again a little later (approximately 10 seconds), another Pod will have written its name into the file:

Sun Dec 27 21:08:08 UTC 2020 nfs-busybox-6b9fd44b95-hds8m

Conclusion

For Production usage, there are many more steps to be taken. In this post we haven’t considered:

  • a backup strategy.
  • high-availability for Rook NFS components.
  • scalability for use cases with a high number of Pods accessing the same volume.
  • performance tuning for workloads with high I/O.
  • using re-sizable volumes for the NFS Server’s backend storage.

We have to keep in mind that most of the cloud providers already provide RWX StorageClasses “straight from the factory” for their managed Kubernetes clusters. For working locally (k3s, minikube, microk8s, …) on Linux machines the Rook NFS Provisioner is a good choice. I found that other storage solutions for Kubernetes (such as in example Longhorn or OpenEBS) also provide RWX storage classes, but are most likely more resource intensive.

I think it’s worth to automate the steps described in this blog post for your local Kubernetes development box. It comes in very handy for scenarios in which an application demands RWX volumes.


This is it for today. Now go enjoy your ReadWriteMany volumes! 🙂

Leave a Reply

CAPTCHA


The following GDPR rules must be read and accepted:
This form collects your name, email and content so that I can keep track of the comments placed on the website. Your current IP address will also be collected in order to prevent spam comments from automated bots. For more info check the privacy policy where you can educate yourself on where, how and why your data is stored.