Preparing a K8s Persistent Volume Before the Pod Starts

Wade Rossmann
3 min readMar 21, 2023
Photo by Tim Gouw on Unsplash

By the time you’ve landed at this article you’ve probably seen a dozen variations on “just kubectl cp ..." which all seem to assume that the pod is already up and running. But what if you want to pre-load some data or otherwise prep the volume? What if the pod won’t start without this prep? What if the data you want to modify is locked by a process in the running pod? This article seeks to address those cases.

There are 3 cases that I suppose your needs will fall into:

1. Prepping a volume with basic permissions

If the container is starting up with a non-root user, then you’re going to want to set the securityContext in the Pod spec, which defines the UID and GID under which the Pod processes will be executing. Eg:

spec.securityContext:
runAsUser: 1234
runAsGroup: 1234
fsGroup: 1234

As I understand it, you do not necessarily need to specify these values unless you want to override what is specified in the image/Dockerfile, but k8s does not necessarily have knowledge about these values at the time it is mounting storage to the Pod, so we must now be explicit.

Aka: “This worked just fine until I attached the PVC!”

In order to determine what these values should be you will need to consult the particular application’s docs, or inspect the image and look for USER and/or GROUP directives.

2. Prepping a volume with programmatic data

You might want to create a certain folder structure, or populate some basic data. The best option in this case would be to have the main container take care of this at startup, but I also realize that we cannot always be in control of the image we are using.

The second-best option is to use an initContainer, but you don’t necessarily need to build a whole prep image if your tasks are simple/succint enough to fit in a command script. Eg:

spec.template.spec:
volumes:
- name: my-pvc
persistentVolumeClaim:
claimName: my-pvc
initContainers:
- image: alpine:3
env:
- name: VOLUME_MOUNT
value: /mnt/my_volume
- name: TARGET_DIR
value: path/to/somewhere
- name: TARGET_FILE
value: https://somesite.com/somefile.ext
volumeMounts:
- name: my-pvc
mountpath: /mnt/my_volume
command:
- /bin/sh
- -c
- |
if [ -f ${VOLUME_MOUNT}/.init_complete ]; then
exit
fi

apk update && apk add curl && \
curl -O ${VOLUME_MOUNT}/${TARGET_DIR}/$(basename ${TARGET_FILE}) ${TARGET_FILE} && \
touch $VOLUME_MOUNT/.init_complete

Note: If you wind up with permissions issues, you may want to use the securityContext settings from #1 above.

3. Pre-loading a volume with arbitrary data, such as a backup restore

This is a bit stickier of a use case, as it’s likely a one-time task for which you may not necessarily want to introduce an initContainer into the mix. In this case we can create the PVC first in isolation, then run a temporary, “no-op” Pod on top of it so we can leverage kubectl cp. Eg:

metadata:
name: myapp-temp
namespace: mynamespace
spec.template.spec:
volumes:
- name: app-data
persistentVolumeClaim:
claimName: app-data
containers:
- name: noop
image: alpine:3
command:
- /bin/sh
- -c
- |
while [ true ]; do
sleep 5
done
volumeMounts:
- mountPath: /mnt/myapp
name: app-data

Once this Deployment is running you can copy the data to the PV with a command like:

kubectl cp ./somefile.ext mynamespace/myapp-temp:/path/to/somewhere/

Once you’ve finished copying the data you can remove the temporary Deployment and stand up the regular one.

Caveat: When the PVC is defined k8s does not necessarily actually allocate the Volume at that time. You may want to copy over any pod scheduling directives, Tolerations, etc from the “regular” Deployment’s configuration. This will ensure that the temporary Pod will be scheduled somewhere that the regular pod can also run, avoiding something like accidentally allocating a Volume in an availability zone in which the regular deployment can’t or won’t run.

Note on Config in this Article

All of the above config manifests are Deployments stripped down to the absolute bare bones of likely changes that will need to be made to your existing workload, or otherwise illustrative example values.

Personally, I am not a fan of trying to mentally parse huge example configs, mentally diff them with my own, and trying to figure out what the relevant changes should be. So I’ve boiled things down to bare essentials.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Wade Rossmann
Wade Rossmann

No responses yet

Write a response