COVID, my year in review

I wanted to go over some of the technological changes I went through and a very high level overview of how I accomplished them.

I went into a store without a mask for the first time yesterday. And a lot has changed with my home network.

With the COVID pandemic finally getting under control where I live, all of the mask mandates stating we have to wear masks in public have been lifted. I am fully vaccinated so this is good news for me.

I wanted to go over some of the technological changes I went through and a very high level overview of how I accomplished them.

Home use and personal network

First, I will cover my home network which is primarily used for learning. It consisted of a number Windows server VM's, running everything in high availability. My job requires as close to 100% uptime, so I learn how to do it at home. I also had a Docker Swarm cluster that I was in the process of converting to Kubernetes and a few other Linux VM's running Debian. These were all running on a single server using Hyper-V for my hypervisor, it is a home network and didn't need to create high availability in that respect.

My first change was completing my migration from Docker Swarm to a cluster based on Rancher. I was creating and applying my manifests manually. No Helm, no CI/CD, everything was manual and by hand. I learned a lot with this and am happy that I did.

My second change was migrating my Kubernetes from a Rancher based cluster to a native Kubernetes cluster. Rancher was taking a lot of resources that I did not have to spare. After all, it is a single server with only 128 gigs of ram. At the same time, I did not want to manually apply and build my manifests. I chose to use a mix of Kustomize, Helm and ArgoCD. I stored my cluster configuration in a Git repository in Azure DevOps. I chose ArgoCD over Flux because, at the time, Flux did not support Azure DevOps. This turned out to be a fantastic solution. The UI in ArgoCD was easy to use, the configuration was simple, it was automated, and fun to use.

My third change, and probably one of the biggest, was migrating my network to entirely be Linux based with Debian as my distribution of choice. I started out by first setting up a Bind DNS server and forwarding my old network name requests to the domain controllers. I then set up a new domain name that allowed dynamic updates from my new DHCP server. Next up was migrating DHCP. I chose to use the isc-dhcp-server. This allows easy dynamic DNS updates with Bind. After disabling the DHCP on my old server everything picked up a new address from the new DHCP quickly. I did have to reboot a few things, but whatever.

My fourth big change was kind of fun. We are starting to use Azure at work, so I wanted to get better at it. What better place to learn it than doing it at home where it does not impact anybody other than myself? I migrated my site to Azure. I used Terraform for everything related to it. I also had to turn my site in to a static site. This was interesting. I ended up using a mix of a Logic App, Azure Function, Storage Queue and pipeline in Azure DevOps to keep the costs to a minimum. It is about 2 cents a month to run this configuration. Running my site is about 20 cents, woopty do since running my server is 60-70 dollars a month. I have a couple of other sites that I migrated as well.

I also started backing up my fileserver to an Azure Blob Storage. I use azcopy to sync the data and it runs nightly. Costs me about 10 dollars to do it. Cheap, secure, easy.

My next one was complex and interesting. I converted everything from running on a single server to 14 Raspberry Pi's. I could not just clone the VM's onto them, so it was all completely fresh. Previously, when I originally built my network, I had no real documentation on what I did. I changed that this time around. Every install or configuration change was documented. I used a wiki in Azure DevOps for that.

My firewall was already on a Pi so that was simple to document. It was a single purpose host. I just added the necessary services. Like the VPN and NGINX. After I migrated my public services on to that Pi, I proceeded to migrate my DNS and DHCP server. That was relatively easy as well.  I was able to do most of the configuring by memory since I had just recently set it up. I used some of my other blog posts to help as well.

Finally, I started my Kubernetes migration to the PI's. I tried to do a simple install on the PI's. This proved problematic. I used Canal as my CNI, that was a mistake. Canal, since it is really old, does not work on an ARM architecture, which is what Raspberry PI's are. I decided to switch to Flannel. I thought this was going to be easy. I installed Flannel then uninstalled Canal. That was my mistake. After uninstalling Canal, nothing could talk to each other and my cluster was dead and unrecoverable. In hindsight I should have reboot all my nodes in the cluster after installing Flannel and before removing Canal. Lesson learned. Since I now needed to rebuild my cluster, I re-designed it a bit. I decided to setup an HAProxy cluster in front of it. I used to use KeepaliveD to handle the HA with a floating IP on the control planes and another IP on the working nodes. This worked OK until I needed to add a new service. NGinx would eventually stop on the host with the IP and cause about 7 seconds of downtime. I also found the control plane was a little unreliable where things would randomly not talk to one another. I need as close to 100% so that was not acceptable to me. I set up HAProxy in a load balanced cluster for my new cluster. It is in front of both the control plane nodes and the worker nodes. I still have a floating IP on the worker nodes, more on that in a bit. HAProxy needed to do a TCP passthrough to the control planes. This configuration worked fantastically, and I still use it today. For the new cluster I chose to use Calico. It was relatively easy to implement. There was one configuration change I had to do to make it work.

Now that I had my Kubernetes cluster running it was time to setup my shared storage. I was running on 8x10 terabyte drives in a RAID10 array. Effectively about 40TB. This was cool, but I was only using about 3 and a good chunk of that was OS disks and empty used space in the VMDK's. Actual data was only about 1TB. I setup a Pi to be an NFS server with 2x4TB USB drives in a mirror. This is my shared storage for my Kubernetes cluster. I then tried to rebuild my cluster with Argo and some updated configs for the NFS provisioner. Turns out Argo does not work on ARM processors either. They also have no plans of making it work. I then hunted for a new solution. Flux had a new version and it now supported Azure DevOps. I am not the biggest fan of Flux, but it works. After getting Flux setup and getting it to manage itself, which is part of the setup, I started migrating my configuration from the old cluster to the new one.

During the migration, I started with Harbor, which is what I used to house all my containers. Another pitfall. Harbor also does not work on ARM. I decided to use a cloud provider for my registry. This way I did not have a chicken and the egg problem. How can you start a container when you do not have your registry that contains the container? I spent about a week hunting for a cloud hosted registry. I decided on Azure Container Registry. It was cheap at about 5 dollars a month and since I already used Azure for my sites it made sense. I also did a POC of GitHub and GitHub actions. Ugh, that was a pain-in-the-ass. Not a fan at all. But I am glad that I did.

Since I did not have my old Harbor and containers, I had to start rebuilding them. During the GitHub POC that I did, I learned about docker buildx and QEMU to build multi-architecture images. This was important since I needed images that worked on ARM processors. I set out to build out my images and house them in my new Azure Container Registry. As I built each image I put them in my new Pi cluster and migrated the storage.

I set up a new Postfix pod and ingress so I could send email from other sources on my network. This taught me how to setup TCP port forwarding in the NGINX ingress which I later used for my SAMBA server. Since this port is TCP, it goes through my HAProxy. Remember, I have to have my 100% uptime. :)

Once I had everything up and running in the new cluster, I had one thing remaining on the server. My SAMBA file server. Since I had a Kubernetes cluster, and at the time no additional network ports, I decided to move my fileserver in to Kubernetes. HAProxy does not support forwarding UDP packets, sad, so I set up the TCP and UDP ingress for NGINX to forward the necessary ports to the SAMBA service and pod. Building the image for SAMBA was a little complex. The users that SAMBA knows about must exist in the container. The way I did that was use a script to create the users based on environment variables. It was a little messy, but it works. I then copied all the files from the old SAMBA server into the directory on the NFS shared storage, updated the DNS entry for my fileserver and it all worked marvelously. Next, I turned off the server.

After turning off the server I set out to get the backups to Azure working. This proved problematic. There is no azcopy for ARM. I set up a pipeline and built it from source in Docker container using buildx. The application started. I was excited. Then very quickly disappointed when the azcopy sync would not run. Turns out it really does not work on ARM and there is an open issue in GitHub about it. It has something to do with Go and the ARM CPU, some pretty low-level stuff. Sad. This caused me to create my own sync application. I did this in .Net Core. I am a huge C# guy and love that .Net Core is cross platform, and it works on ARM! I set this application to run as a cron job on my shared storage to sync some other NFS shared files and as a Kubernetes cron job to sync the fileserver blob store. Exciting.

This is the first time in 17 years I have not had an actual x86 server to play on. Literally, 17 years. That is a long time. The server was taking about 220 watts (equivalent to about 4 60w light bulbs) turned on 24 hours a day, 7 days a week. It cost about 60-70 dollars a month. It is all now taking a whopping 30-40 watts, about 8 dollars. That is a huge savings for a home network. While doing all of this, with the move to cloud-based hosting of everything public, I can now get rid of my business class Comcast connection. That will be another 200 dollars a month savings. Moving to the cloud is costing a total of about 3 dollars a month. All-in-all, everything I've done over the past year on my home network has been an adventure. I have learned a ton that I still use professionally and personally. I am excited to see what the next year will bring.

In conclusion for my home network and outside-of-work learning it has been exciting, difficult, fun, and entertaining. Tons of learning, tons of struggles, but I now know way more than I did.

Professional/work changes

I have done a lot with work as well.

The first thing, and during the whole past year, we were implementing a Kubernetes cluster using Rancher. We were struggling with some of the pieces at first since none of us had any real experience. Which is why I started doing it at home. I took my knowledge that I learned personally and applied to a larger scale, this allowed us to move faster.

At about September I did my first real Azure thing. I set up an Azure Function and Service Bus using PowerShell Core. That was my very first real attempt at using Azure. Not the most complicated and a little late to the game, but it was exciting. I was under a tight deadline and did not get a chance to really research the best way of doing things, like ARM templates or Terraform (that came later).

I was also working on migrating from our on-premise version of Azure DevOps Server to the cloud hosted Azure DevOps. This was problematic, at least one of our upgrades of our on-prem TFS upgrades through one of our Azure DevOps Server upgrades ran into some unknown, hidden, issues. These issues were found during the imports. What was expected to take a month ended up taking almost 7 months. It was a lot of back and forth with Microsoft support engineers. I even got to talk to a few developers on the Azure DevOps team.

Another thing I am working on is revamping our network infrastructure and making things amazing. I cannot go in to details, but it is going to be absolutely fantastic when it is done.

That is it for 2020, in the first part of 2021 things got really interesting. I got to hire a couple of DevOps engineers. My first engineer was a Kubernetes guru. I was able to pass off most of the Kubernetes project to him. Which was good because I was now neck deep in learning Azure and API Management. What a great product for API management. We like it so far. That was a steep learning curve. Both in terms of learning the APIM, deployments and Azure as a whole. There was a lot of backend work that went into it since we were connecting it a couple of our on-prem systems, and of course, it had to be as secure as we could possibly make it. We had a little more time and was able to use Terraform for this project. We now use it for all things Azure, and I use it at home as well. Right as I finished the initial parts of the APIM project I was able to hand most of Azure to the newest addition to the team.

Now we are working through a lot of other housekeeping projects and finally moving forward and getting to some fun projects.

Conclusion

The professional/work section is kind of short compared to my personal. That is primarily due to bringing my personal learning and experiences from my own time into the workplace. It has been a hectic, fast paced and exciting year.

For most COVID was not a good time and it is hard to see a positive in such a hard time. I forced myself to look at the positives that it brought, and there was actually a large number of them. I am not thankful that COVID happened, but I am thankful that it forced some changes on the world that would not have otherwise changed.

Here are some of me relevant posts

Running pi-gen on WSL 2
This post is all about building and using a custom kernel for WSL2 and getting pi-gen to work.
Build multi-architecture images in Azure DevOps
I am migrating everything to a new Raspberry Pi cluster and need to build multi-architecture images through an Azure DevOps pipeline. Here is how I did it.
Postfix in a container
I want to run postfix in a container as a mail relay for my network. Here are the problems I ran in to and how I solved them.
Installing NGINX through Argo CD
This guide can be used for deploying any helm chart for a 3rd party repository through Argo. I am basing this around NGINX though.
Letting Argo CD manage itself
One of the things I’m implementing in my new Kubernetes cluster is Argo. First step is getting it to manage itself.
NFSv4, Kubernetes, nfs-client-provisioner
Getting a Storage Class in Kubernetes with NFSv4 turned out to be relatively simple. I’m using Rancher, Kubernetes and my shared storage is using NFS.