Transporting Container Images With Skopeo

TLDR; Skopeo is great tool to not only gather information from remote registries, but also to sync images inbetween registries. In this post we will demonstrate how to copy individual tags from repositories and also entire repositories inbetween registries, whereat we will run the whole Job on a Kubernetes cluster to make it possible to execute the synchronizations anywhere and unattended.

Container registries are an integral component in the container world. For modern applications there is no imaginable deployment workflow which does not involve a container registry at some point. A regular task when working with Kubernetes can be moving container images between different private registries, especially if you are working in private cloud environments or in on-pemises data centers. In today’s blog post I want to show how one can easily automate this process.

Why would you transfer image between registries?

  • Your Kubernetes clusters run on different cloud platforms, whereat each of the platforms provides its own container registry.
  • Fresh releases of your app are pushed into a private registry running on a build server.
  • Instead of pulling publically available container images from DockerHub, quay.io or somewhere else, you want to make your deployment workflows more resilient and faster by providing these images in your own, private registry in near proximity to the actual clusters.
  • Your Kubernetes cluster is air-gapped and thus no internet connections are possible.

What’s the typical way of doing this?

The intuitive way of moving an image from one remote registry to another remote registry would be:

  • Pull the image from registry A.
  • Tag the now locally available image from registry A with registry B.
  • Push the image to registry B.

What are the drawbacks?

  • A Docker daemon needs to be running on the executing system. This will be a problem, if you want to perform this task securely inside a container.
  • The images at hand can be large. This can be a problem, if you execute the sync on your local machine: You need downstream bandwidth to download the image layers as well as upstream bandwidth to push the to the other registry. Most connections do not provide symmetric bandwiths which leaves you with long waiting times.

What I also haven’t figured out yet for registry services on public cloud platforms: Most of them do not offer any methods to put an additional tags on existing images in a repository. Your only option in this case is pulling, tagging and pushing the image back again. Not very efficient.

Solution Approach

Whereat the described approaches are fine for occasionally moving images around, there is room for improvement if you need to do this more regularly and with many and large images.

At this point Skopeo comes into play. In a recent blog post, I already covered some use cases of this neat tool. To expand on its capabilities, we want to achieve the following in today’s post:

  • Deploy to image registries in a Kubernetes cluster. They are mimicking two remote registries.
  • We create a Skopeo container which will run in a Pod on the same Kubernetes cluster.
  • Skopeo will be used to transport images inbetween those two registries.

Of course Skopeo’s usage is not limited to Kubernetes clusters. You can also use it on any Linux system from command line instead using it in a container. But: If we can use Skopeo within a container, we can easily outsource the data transfer from our local machine into a Kubernetes cluster, which is maybe already there to host other workloads in a DevOps pipeline.

Deploy Container Registries

In order to provide a proper environment for demonstration purposes, we will install two separate private container registries in the cluster and look at them as remote registries. We can use one of the many Helm charts available to deploy two separate Docker Registries. First we will add the required repository:

helm repo add phntom https://phntom.kix.co.il/charts/
helm repo update

We want to set up the two registries with some basic security. The Helm chart at hand offers us an installation parameter secret.htpasswd to our convenience. We only need to pass a proper htpasswd string into this configuration variable:

# Install the first registry
helm install registry-a phntom/docker-registry \
    --set secrets.htpasswd=$(htpasswd -Bbn user-a mypass123) \
    --set configData.compatibility.schema1.enabled=true

# Install the second registry
helm install registry-b phntom/docker-registry \
    --set secrets.htpasswd=$(htpasswd -Bbn user-b mypass456) \
    --set configData.compatibility.schema1.enabled=true

A short notice on configData.compatibility.schema1.enabled=true: In the latest Docker Registry versions, v1 schemas are disabled by default. Enabling v1 schema support mitigates problems pushing older images following the v1 schema to our two private registries.

Test Skopeo in a Container

Next we will demonstrate how Skopeo can be used within a Kubernetes cluster. One of they key ideas of this blog post is to show how to transport images from and to registries without taking a detour over your local system and without a running Docker daemon.

Let’s start using Skopeo in order to check the available tags for the busybox image on DockerHub. Following command will run a temporary Skopeo Pod. Notice: Have some patience when running this for the first time as the Skopeo image has to be pulled from quay.io. At the time of writing this post, the pulled container image had 140 MB.

kubectl run skopeo --image quay.io/skopeo/stable --command -- sleep inf

Once the container is running we can enter it using:

kubectl exec -it skopeo -- bash

Now working on the container’s shell ([root@skopeo /]#) we will run a command that remotely inspects the busybox image on Dockerhub:

skopeo list-tags docker://docker.io/library/busybox

Listing tags may take a few seconds. The command will result in an output similar to this one (below output is shortened):

{
    "Repository": "docker.io/library/busybox",
    "Tags": [
[...]
        "1.33-glibc",
        "1.33-musl",
        "1.33-uclibc",
        "1.33.0-glibc",
        "1.33.0-musl",
        "1.33.0-uclibc",
        "1.33.0"
        "1.33",
        "1",
        "buildroot-2013.08.1",
        "buildroot-2014.02",
        "glibc",
        "latest",
        "musl",
        "stable-glibc",
        "stable-musl",
        "stable-uclibc",
        "stable",
[...]
}

Looks good. We can use Skopeo to communicate with a registry from within a Kubernetes cluster without having to use any of the cluster’s underlying container runtime.

But … wait! This isn’t working for me!

At this point I want to mention a piece of information relevant for peope sitting in corporate networks: If you try to access the internet from within a container in a Kubernetes cluster, the used internet domain name (in our case docker.io) will be resolved using the Kubernetes-internal DNS, which is most likely a CoreDNS pod. CoreDNS by default uses Google’s public DNS servers for name resolution (i.e. 8.8.8.8). A typical security measure on corporate networks is to block usage of this DNS server. So in case you get the following error message …

FATA[0008] Error listing repository tags: error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution

… it is very possible that you ran into this problem. Checking the CoreDNS logs with kubectl -n kube-system logs -l k8s-app=kube-dns in my case showed:

[ERROR] plugin/errors: 2 docker.io. AAAA: read udp 10.42.0.2:57985->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 docker.io. A: read udp 10.42.0.2:40477->8.8.8.8:53: i/o timeout

Describing solutions to this issue exceeds the scope of this post. Have a look at the K8s docs for more information.

But now – let’s proceed with our example and make sure, that you are on the skopeo Pods’s shell (kubectl exec -it skopeo -- bash).

Pull an Image Into Registry A

The registries we previously installed in the cluster using Helm are still empty. Let’s pull an image from DockerHub and push it into registry A in one go:

skopeo copy \
    --dest-tls-verify=false \
    --dest-creds=user-a:mypass123 \
    docker://docker.io/library/busybox:1.33-glibc \
    docker://registry-a-docker-registry:5000/mybusybox:1.33-glibc

We need --dest-tls-verify=false because our test registries A and B are not equipped with a signed TLS certificate. Argument --dest-creds user-a:mypass123 is used to pass the required login credentials to our self-installed target registry. Above command should yield:

Getting image source signatures
Copying blob 7933913056cb done
Copying config e7aa429352 done
Writing manifest to image destination
Storing signatures

According to the shell output, Skopeo has successfully copied the image into our private registry A. We want to verify this:

skopeo list-tags \
    --tls-verify=false \
    --creds=user-a:mypass123 \
    docker://registry-a-docker-registry:5000/mybusybox

If all went well, you will see details on the image we just stored successfully in our private registry:

{
    "Repository": "registry-a-docker-registry:5000/mybusybox",
    "Tags": [
        "1.33-glibc"
    ]
}

Copy an Image From Registry A to Registry B

As we are still on the shell of the Skopeo container, we will now try to copy our image directly from registry A to registry B:

skopeo copy \
    --src-tls-verify=false \
    --src-creds=user-a:mypass123 \
    --dest-tls-verify=false \
    --dest-creds=user-b:mypass456 \
    docker://registry-a-docker-registry:5000/mybusybox:1.33-glibc \
    docker://registry-b-docker-registry:5000/mybusybox:1.33-glibc

Checking registry B using:

skopeo list-tags \
    --tls-verify=false \
    --creds=user-b:mypass456 \
    docker://registry-b-docker-registry:5000/mybusybox

… will show us the image was successfully copied to registry B:

{
    "Repository": "registry-b-docker-registry:5000/mybusybox",
    "Tags": [
        "1.33-glibc"
    ]
}

Schedule a Job to Sync a Repository Between Registries

Daily operations may require a regular synchronization of repositories between registries. Using skopeo sync this can be achieved in a declarative manner. To execute the sync operation, we will create a CronJob in our Kubernetes cluster.

Create a Pod to Test the Sync Command

Before we build the CronJob, we will assemble the actual sync command. We could do this by putting all the configuration into a single command line call, but Skopeo also offers an option to define all the parameters in YAML files. This also makes it easier to define more complex sync configurations. In our case, we want to test syncing:

  • The linux tag of repository docker.io/hello-world
  • All tags starting with “1.3” and ending with “glibc” from repository docker.io/busybox
  • The latest tag from repository quay.io/coreos/etcd

Skopeo needs these declarative instructions in a YAML file. To make the YAML available in the Pod, we will provide it as a ConfigMap.

Create a file called configmap-skopeo-sync-source.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: skopeo-sync-source
data:
  source.yaml: |
    docker.io:
      images:
        hello-world:
        - linux
      images-by-tag-regex:
        busybox: ^1.3.*glibc$
    quay.io:
      images:
        coreos/etcd:
        - latest

Both registries we want to pull from are public. Therefore we don’t need any further configuration related to credentials or certificates. Create the ConfigMap using:

kubectl apply -f configmap-skopeo-sync-source.yaml

To be able to use this YAML configuration file in the container, we will have to provide it on our Kubernetes cluster as a ConfigMap. Create file pod-skopeo-sync-test.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: skopeo-sync-test
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: skopeo-sync-test
    command:
    - sleep
    - inf
    image: quay.io/skopeo/stable
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: source-yaml
      mountPath: /etc/skopeo/source.yaml
      subPath: source.yaml
  volumes:
  - name: source-yaml
    configMap:
      name: skopeo-sync-source

Apply the YAML to create the new Pod with the ConfigMap mounted and enter the Pod’s shell:

# Create the Pod
kubectl apply -f pod-skopeo-sync-test.yaml

# Enter the Pod
kubectl exec -it skopeo-sync-test -- bash

This is how the corresponding Skopeo command should look like, if we want to sync the defined images to our private registry A:

skopeo sync \
    --src yaml \
    --dest docker \
    --dest-tls-verify=false \
    --dest-creds=user-a:mypass123 \
    /etc/skopeo/source.yaml \
    registry-a-docker-registry:5000/my-synced-repos

The output will look similar to this:

INFO[0000] Processing repo                               registry=quay.io repo=coreos/etcd
INFO[0000] Processing repo                               registry=docker.io repo=hello-world
INFO[0000] Processing repo                               registry=docker.io repo=busybox
INFO[0000] Querying registry for image tags              registry=docker.io repo=busybox
INFO[0000] Getting tags                                  image=docker.io/library/busybox
INFO[0002] Start filtering using the regular expression: ^1.3.*glibc$  registry=docker.io repo=busybox
INFO[0002] Copying image ref 1/1                         from="docker://quay.io/coreos/etcd:latest" to="docker://registry-a-docker-registry:5000/my-synced-repos/etcd:latest"
Getting image source signatures
Copying blob d1eca4d01894 done
Copying blob 8bc526247b5c done
[...]
Writing manifest to image destination
Storing signatures
INFO[0007] Copying image ref 1/1                         from="docker://hello-world:linux" to="docker://registry-a-docker-registry:5000/my-synced-repos/hello-world:linux"
Getting image source signatures
Copying blob b8dfde127a29 [--------------------------------------] 0.0b / 0.0b
Copying config d1165f2212 [======================================] 1.4KiB / 1.4KiB
Writing manifest to image destination
Storing signatures
INFO[0010] Copying image ref 1/11                        from="docker://busybox:1.30-glibc" to="docker://registry-a-docker-registry:5000/my-synced-repos/busybox:1.30.glibc[...]
[...]
INFO[0057] Copying image ref 11/11                       from="docker://busybox:1.33.0-glibc" to="docker://registry-a-docker-registry:5000/my-synced-repos/busybox:1.33.0-glibc"
[...]
INFO[0061] Synced 13 images from 3 sources

Now, let’s see if the images were properly synced to registry A:

skopeo list-tags \
    --tls-verify=false \
    --creds=user-a:mypass123 \
    docker://registry-a-docker-registry:5000/my-synced-repos/busybox

Gives us:

{
    "Repository": "registry-a-docker-registry:5000/my-synced-repos/busybox",
    "Tags": [
        "1.32.1-glibc",
        "1-glibc",
        "1.30-glibc",
        "1.33-glibc",
        "1.30.1-glibc",
        "1.24.2-glibc",
        "1.31-glibc",
        "1.24-glibc",
        "1.30.0-glibc",
        "1.24.1-glibc",
        "1.32.0-glibc",
        "1.32-glibc",
        "1.33.0-glibc",
        "1.31.0-glibc",
        "1.31.1-glibc"
    ]
}

Schedule a Job for the Sync Command

Syncing old images on a regular basis doesn’t make much sense, but you could modify your source.yaml in a way, that it would always sync the latest images available once per day. For testing purposes, we will create a Job running our exisiting sync definition once every 5 minutes.

In order to follow good practices, we will not hardcode the registry credentials in the container command, but use environment variables sourced from a Kubernetes Secret. Create the credentials of registry A as a Secret:

kubectl create secret generic registry-a \
  --from-literal=username=user-a \
  --from-literal=password=mypass123

Now we will assemble a YAML file called cronjob-skopeo-5m-sync.yaml containing the definition of the Kubernetes Job, including the source.yaml mounted from the ConfigMap as well as registry A’s credentials used as environment variables:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: skopeo-5m-sync
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      backoffLimit: 3
      template:
        metadata:
          labels:
            job: skopeo-5m-sync
        spec:
          terminationGracePeriodSeconds: 0
          restartPolicy: OnFailure
          volumes:
          - name: source-yaml
            configMap:
              name: skopeo-sync-source
          containers:
          - name: skopeo-sync-test
            image: quay.io/skopeo/stable
            volumeMounts:
            - name: source-yaml
              mountPath: /etc/skopeo/source.yaml
              subPath: source.yaml
            env:
            - name: REG_A_USERNAME
              valueFrom:
                secretKeyRef:
                  name: registry-a
                  key: username
            - name: REG_A_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: registry-a
                  key: password
            command:
            - bash
            - -c
            - |
              skopeo sync \
              --src yaml \
              --dest docker \
              --dest-tls-verify=false \
              --dest-creds=$REG_A_USERNAME:$REG_A_PASSWORD \
              /etc/skopeo/source.yaml \
              registry-a-docker-registry:5000/my-synced-repo

Then schedule the Job by applying the manifest:

kubectl apply -f cronjob-skopeo-daily-sync.yaml

When the time hits a fifth minute on the timeline (:05, :10, :15 …) the job will be triggered. Monitor it’s logs by referring to a label we pinned on the Pod’s template:

kubectl logs -f -l job=skopeo-5m-sync

This should now present to you the output of skopeo sync as we have seen it before. To get rid of the CronJob, delete it with:

kubectl delete cronjob skopeo-5m-sync

Please note: Some public registries like DockerHub enforce an IP-based pull quota. If you leave the CronJob running for too long, this quota will be exhausted quickly.

Conclusion

I found Skopeo to be an extremely useful tool in the container space. Whereat podman or docker are sufficient for typical tasks, skopeo can be a big help automating DevOps pipelines or migrating images between registries.

That’s it for today!

Leave a Reply

CAPTCHA


The following GDPR rules must be read and accepted:
This form collects your name, email and content so that I can keep track of the comments placed on the website. Your current IP address will also be collected in order to prevent spam comments from automated bots. For more info check the privacy policy where you can educate yourself on where, how and why your data is stored.