Debian Mirror in Kubernetes

Now that I put debmirror in a container, it was time to run the entire thing in Kubernetes. This includes running the site to serve the mirror, and a cron job that will run the container to sync the repositories.

Now that I put debmirror in a container, it was time to run the entire thing in Kubernetes. This includes running the site to serve the mirror, and a cron job that will run the container to sync the repositories.

To see how the container is built, check out my last blog post, https://www.frakkingsweet.com/debmirror-docker-container/. It goes over everything needed to build and run it. This post is about tying it all together.

Throughout this guide, I used debmirror.example.com as the hostname for the site. You will want to change that to match the hostname you want to use for your mirror. Since names do not allow periods I replaced the periods with a dash - so you'll need to update those as well.

Mirror Site

First step is building a simple set of configs that will run an nginx container with a shareable persistent volume. The PV will be shared between the sync containers and the site.

The 4 pieces to the site is the persistent volume claim, the deployment, the service and the ingress. I named the files persiststentvolumeclaim-data.yml, deployment.yml, service.yml and ingress.yml respectively. Each one is relatively simple, the PVC is set to share the claim with read/write on multiple nodes.

Kubernetes YAML files

The persistentvolumeclaim-data.yml is as follows. Be sure to change the storageClassName to the correct storage class for your environment.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: debmirror-example-com-data
  labels:
    app.kubernetes.io/name: nginx
    app.kubernetes.io/instance: nginx-debmirror.example.com
    app.kubernetes.io/component: site
    app.kubernetes.io/part-of: debmirror.example.com
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 200Gi
  storageClassName: nfs-kube

The contents of deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: debmirror-example-com
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: site
      app.kubernetes.io/part-of: debmirror.example.com
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
        app.kubernetes.io/instance: nginx-debmirror.example.com
        app.kubernetes.io/component: site
        app.kubernetes.io/part-of: debmirror.example.com
    spec:
      containers:
      - name: debmirror-example-com
        image: nginx:1.19.1-alpine
        livenessProbe:
          httpGet:
            path: /?liveness
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /?readiness
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
          failureThreshold: 20
        resources:
          limits:
            memory: 200Mi
            cpu: 500m
          requests:
            memory: 10Mi
            cpu: 50m
        volumeMounts:
          - mountPath: "/usr/share/nginx/html"
            name: debmirror-example-com-data
      volumes:
        - name: debmirror-example-com-data
          persistentVolumeClaim:
            claimName: debmirror-example-com-data

My service.yml contains this

apiVersion: v1
kind: Service
metadata:
  name: debmirror-example-com
  labels:
    app.kubernetes.io/name: nginx
    app.kubernetes.io/instance: nginx-debmirror.example.com
    app.kubernetes.io/component: site
    app.kubernetes.io/part-of: debmirror.example.com
spec:
  selector:
    app.kubernetes.io/component: site
    app.kubernetes.io/part-of: debmirror.example.com
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP

And finally, my ingress.yml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: debmirror-example-com
  labels:
    app.kubernetes.io/name: nginx
    app.kubernetes.io/instance: nginx-debmirror.example.com
    app.kubernetes.io/component: site
    app.kubernetes.io/part-of: debmirror.example.com
spec:
  rules:
  - host: debmirror.example.com
    http:
      paths:
      - backend:
          serviceName: debmirror-example-com
          servicePort: 80
        path: /
        pathType: ImplementationSpecific

Apply YAML files

With those files created, lets create the namespace we'll use for our mirror, in this case it's named debmirror.

kubectl create namespace debmirror
kubectl apply -n debmirror -f .

Health Checks

The health checks will initially fail on the nginx container, so it won't actually take any load at this point. To fix this, create an empty index.html file in the root of the persistent volume.

One way of creating the root file, you'll need to do this all relatively quickly, is by getting the name of the pod using kubectl get pods -n debmirror. It should output something like this:

NAME                                     READY   STATUS      RESTARTS   AGE
debmirror-example-com-6fbbc88489-z6429   1/1     Running     6          16h

Now that we have the name, we will execute a touch command to create that file.

kubectl exec debmirror-example-com-6fbbc88489-z6429 -- touch /usr/share/nginx/html/index.html

Now the health checks should succeed, and the container should enter a ready state. It may take a few seconds for it to register that it is good, and it is possible it takes it until the next time the container restarts.

Test the site

Once the container is up and running you should be able to browse to the mirror in your browser by going to http://debmirror.example.com and you should get the contents of that index.html file.

Debmirror CronJob

With the site created and working, we now need to create the jobs to populate it. I am going to be describing the Debian mirror however, the process is the same for any other apt repository. You just need to update the appropriate environment variables.

The file name I'm using is cronjob-debmirror-debian.yml.

In the file below you'll need to update the image to point to the Debmirror sync image you created. It also runs the job at midnight and noon UTC time.

If you want to use a source mirror other than Oregon State you'll need to update the HOST environment variable. To sync some other repositories, like Ubuntu or Docker update the env section accordingly. I have example values in the post I linked to above.

The file contents are:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: debmirror-debian
spec:
  schedule: "0 0,12 * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app.kubernetes.io/name: debmirror
            app.kubernetes.io/instance: debmirror-debmirror.example.com
            app.kubernetes.io/component: sync-debian
            app.kubernetes.io/part-of: debmirror.example.com
        spec:
          restartPolicy: Never
          imagePullSecrets:
          - name: exampleregistry
          volumes:
            - name: debmirror-example-com-data
              persistentVolumeClaim:
                claimName: debmirror-example-com-data
          containers:
          - name: debmirror-debian
            image: harbor.example.com/images/debmirror-sync:latest
            imagePullPolicy: Always
            resources:
              limits:
                memory: 1000Mi
                cpu: 1000m
              requests:
                memory: 1000Mi
                cpu: 100m
            volumeMounts:
              - mountPath: /mnt/debmirror
                name: debmirror-example-com-data
            env:
              - name: DEST
                value: /mnt/debmirror/debian
              - name: HOST
                value: debian.oregonstate.edu
              - name: ROOT
                value: debian/
              - name: DIST
                value: buster,buster-updates
              - name: SECTION
                value: main,contrib,non-free,main/debian-installer
              - name: ARCH
                value: amd64,arm64,armhf
              - name: METHOD
                value: http

Once the CronJob is configured the way you want it, apply it using kubectl -n debmirror -f .

The first run will take a long time, for mine (with the 3 architectures) it downloaded 105 gigs of data.

Once the job finishes check the container result using kubectl get pods -n debmirror. The result should be like

NAME                                           READY   STATUS      RESTARTS   AGE
debmirror-debian-1601640000-42njw              0/1     Completed   0          4h21m

To check it, browse to http://debmirror.example.com/debian/dists/buster/Release.gpg. You should get a file.

Configure Debian

With the mirror created, we need to configure Debian to use the new mirror. This is relatively easy to do. Edit /etc/apt/sources.list

For Debian 10 (Buster) my file started out like this:

#

# deb cdrom:[Debian GNU/Linux 10.0.0 _Buster_ - Official amd64 NETINST 20190706-10:23]/ buster main

#deb cdrom:[Debian GNU/Linux 10.0.0 _Buster_ - Official amd64 NETINST 20190706-10:23]/ buster main

deb http://deb.debian.org/debian/ buster main
deb-src http://deb.debian.org/debian/ buster main

deb http://security.debian.org/debian-security buster/updates main
deb-src http://security.debian.org/debian-security buster/updates main

# buster-updates, previously known as 'volatile'
deb http://deb.debian.org/debian/ buster-updates main
deb-src http://deb.debian.org/debian/ buster-updates main

# This system was installed using small removable media
# (e.g. netinst, live or single CD). The matching "deb cdrom"
# entries were disabled at the end of the installation process.
# For information about how to configure apt package sources,
# see the sources.list(5) manual.

We need to change the host names on all but the debian-security to point to my new mirror and in my case, comment out the -src entries. I don't need source code for the packages, so those repositories were not needed. We're only updating the hosts since our sync job placed the repository in the same web path. After modification, my file looked like this:

#

# deb cdrom:[Debian GNU/Linux 10.0.0 _Buster_ - Official amd64 NETINST 20190706-10:23]/ buster main

#deb cdrom:[Debian GNU/Linux 10.0.0 _Buster_ - Official amd64 NETINST 20190706-10:23]/ buster main

deb http://debmirror.example.com/debian/ buster main
#deb-src http://debmirror.example.com/debian/ buster main

deb http://security.debian.org/debian-security buster/updates main
deb-src http://security.debian.org/debian-security buster/updates main

# buster-updates, previously known as 'volatile'
deb http://debmirror.example.com/debian/ buster-updates main
#deb-src http://debmirror.example.com/debian/ buster-updates main

# This system was installed using small removable media
# (e.g. netinst, live or single CD). The matching "deb cdrom"
# entries were disabled at the end of the installation process.
# For information about how to configure apt package sources,
# see the sources.list(5) manual.

Once the file is updated, run sudo apt update to refresh using your mirror. If all goes well, you should get something similar to this:

Get:1 http://debmirror.example.com/debian buster InRelease [121 kB]
Get:2 http://debmirror.example.com/debian buster-updates InRelease [51.9 kB]
Hit:3 http://security.debian.org/debian-security buster/updates InRelease
Get:4 http://debmirror.example.com/debian buster/main amd64 Packages [7,906 kB]
Get:5 http://debmirror.example.com/debian buster/main Translation-en [5,968 kB]
Get:6 http://debmirror.example.com/debian buster-updates/main amd64 Packages [7,868 B]
Get:7 http://debmirror.example.com/debian buster-updates/main Translation-en [5,672 B]
Fetched 173 kB in 3s (64.9 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done

Conclusion

When doing the Ubuntu repository, the job failed a few times but eventually succeeded.  It took nearly 4 and half hours for it to successfully complete the download of about 90 gigs of data.

The part where the cronjob ran at UTC time was unexpected. I will be looking in to how to get that to run at local time and not UTC. Probably something dumb.

So far, this has been my most complicated application in Kubernetes, primarily because of all the moving parts and shared persistent volume. I am glad I did it and am happy with it so far.

I am excited to move more things from Swarm and other dedicated systems into Kubernetes.