Preparing a K8s Persistent Volume Before the Pod Starts

3 min readMar 21, 2023

By the time you’ve landed at this article you’ve probably seen a dozen variations on “just kubectl cp ..." which all seem to assume that the pod is already up and running. But what if you want to pre-load some data or otherwise prep the volume? What if the pod won’t start without this prep? What if the data you want to modify is locked by a process in the running pod? This article seeks to address those cases.

There are 3 cases that I suppose your needs will fall into:

1. Prepping a volume with basic permissions

If the container is starting up with a non-root user, then you’re going to want to set the securityContext in the Pod spec, which defines the UID and GID under which the Pod processes will be executing. Eg:

spec.securityContext:
  runAsUser: 1234
  runAsGroup: 1234
  fsGroup: 1234

As I understand it, you do not necessarily need to specify these values unless you want to override what is specified in the image/Dockerfile, but k8s does not necessarily have knowledge about these values at the time it is mounting storage to the Pod, so we must now be explicit.

Aka: “This worked just fine until I attached the PVC!”

In order to determine what these values should be you will need to consult the particular application’s docs, or inspect the image and look for USER and/or GROUP directives.

2. Prepping a volume with programmatic data

You might want to create a certain folder structure, or populate some basic data. The best option in this case would be to have the main container take care of this at startup, but I also realize that we cannot always be in control of the image we are using.

The second-best option is to use an initContainer, but you don’t necessarily need to build a whole prep image if your tasks are simple/succint enough to fit in a command script. Eg:

spec.template.spec:
  volumes:
    - name: my-pvc
      persistentVolumeClaim:
        claimName: my-pvc
  initContainers:
    - image: alpine:3
      env:
        - name: VOLUME_MOUNT
          value: /mnt/my_volume
        - name: TARGET_DIR
          value: path/to/somewhere
        - name: TARGET_FILE
          value: https://somesite.com/somefile.ext
      volumeMounts:
        - name: my-pvc
          mountpath: /mnt/my_volume
      command: 
        - /bin/sh
        - -c
        - |
          if [ -f ${VOLUME_MOUNT}/.init_complete ]; then
            exit
          fi
          
          apk update && apk add curl && \
          curl -O ${VOLUME_MOUNT}/${TARGET_DIR}/$(basename ${TARGET_FILE}) ${TARGET_FILE} && \
          touch $VOLUME_MOUNT/.init_complete

Note: If you wind up with permissions issues, you may want to use the securityContext settings from #1 above.

3. Pre-loading a volume with arbitrary data, such as a backup restore

This is a bit stickier of a use case, as it’s likely a one-time task for which you may not necessarily want to introduce an initContainer into the mix. In this case we can create the PVC first in isolation, then run a temporary, “no-op” Pod on top of it so we can leverage kubectl cp. Eg:

metadata:
  name: myapp-temp
  namespace: mynamespace
spec.template.spec:
  volumes:
    - name: app-data
      persistentVolumeClaim:
        claimName: app-data
  containers:
    - name: noop
      image: alpine:3
      command:
        - /bin/sh
        - -c
        - |
          while [ true ]; do
            sleep 5
          done
      volumeMounts:
        - mountPath: /mnt/myapp
          name: app-data

Once this Deployment is running you can copy the data to the PV with a command like:

kubectl cp ./somefile.ext mynamespace/myapp-temp:/path/to/somewhere/

Once you’ve finished copying the data you can remove the temporary Deployment and stand up the regular one.

Caveat: When the PVC is defined k8s does not necessarily actually allocate the Volume at that time. You may want to copy over any pod scheduling directives, Tolerations, etc from the “regular” Deployment’s configuration. This will ensure that the temporary Pod will be scheduled somewhere that the regular pod can also run, avoiding something like accidentally allocating a Volume in an availability zone in which the regular deployment can’t or won’t run.

Note on Config in this Article

All of the above config manifests are Deployments stripped down to the absolute bare bones of likely changes that will need to be made to your existing workload, or otherwise illustrative example values.

Personally, I am not a fan of trying to mentally parse huge example configs, mentally diff them with my own, and trying to figure out what the relevant changes should be. So I’ve boiled things down to bare essentials.