Trading the Monolith for the Swarm: A Migration Story

For years, my little corner of the internet—a handful of personal and professional sites—lived a quiet, predictable life. First, they were hosted in my basement on an ancient Linux box. Bizarre VHOST rules, weird apache2 proxies, all sorts of networking shenanigans to even get DynamicDNS to resolve. It was often down for weeks at a time; God forbid the power went out!

So then I migrated to a VM. A single Google Cloud VM. It was a well-tended garden, a digital pet. An Apache instance for a static site, a suite of Ghost blogs each paired with its own MySQL container, all neatly managed by Docker Compose and fronted by Traefik. It worked perfectly.

But then I retired from the Military and went to work with many of the world's best kubernetes engineers.

So I had to modernize.

Sure, there's a risk with the VM architecture; a crashed Docker daemon, a botched OS update, a single point of failure that could take everything offline until I manually intervened. The infrastructure was functional, but it was brittle. Across the valley lay the promised land: Kubernetes. The allure of GKE Autopilot, with its promise of resilience, self-healing.

More importantly, I'd only be paying for what I use, and the need to expand my skill-set while increasing both reliability and security was too strong to ignore.

My own software development skills are admittedly antique—a relic of a 1990s era of Perl scripts, GNU C and monolithic applications. But with access to some world-class Kubernetes engineers for advice, I figured the mindset was more important than the syntax. It was time to trade my digital pet for a herd of cattle.

It was time for the great migration.

Phase 1: The Illusion of Simplicity

The initial plan seemed straightforward, almost trivial. First, migrate the simplest piece of the puzzle: a static Apache site. The playbook looked clean:

Spin up a GKE Autopilot cluster.
Bake the website's flat files into a new Docker image.
Write a few simple YAML files for a Deployment, a Service, and an Ingress.
Apply, update DNS, and declare victory.

The universe, however, has a wonderful sense of humor. The first few steps were a death by a thousand paper cuts, a perfect introduction to the difference between a tutorial and a real-world project. My GCP project’s network, it turned out, was in "manual subnet mode," requiring a level of specificity the tutorials never mention. My first attempt to create the cluster failed. A quick diagnostic, a corrected gcloud command, and I was back on track. Then came the container push, a battle against IAM roles and the subtle but critical difference between a VM's "access scopes" and a service account's "permissions." It was a frustrating lesson in GCP security, like having the keys to the car but not the key to the building the car is parked in.

Yet, after a few hours of troubleshooting, the site was live. The foundation was laid. The load balancer was provisioned, the automated SSL certificate was active, and a single, stateless site was humming along on a resilient, multi-node backend. The hardest part was over.

Or so I thought.

Phase 2: The Stateful Beast and the Hungry Void

Next up were the Ghost blogs, starting with one that currently has no traffic. This was a different beast entirely. It had two stateful components: the MySQL database and the persistent content directory with themes and images.

The database migration was surprisingly clean. The plan to offload the databases from individual containers into a single, managed Cloud SQL instance was the first major strategic victory. It simplified the entire architecture, trading a bit of extra cost for a massive reduction in operational burden. A few mysqldump commands, a gsutil cp to a staging bucket, and a final gcloud SQL import —the data was safe in its new, professionally managed home.

I knew I needed a Persistent Volume Claim (PVC), a request for a network-attached hard drive. The process seemed simple enough: create the PVC, launch a temporary "helper pod" to attach the empty disk, use gsutil to pull down a tar.gz of my content, and unpack it. But in the chaos of a "controlled demolition" to reset a broken Ingress, I learned a brutal lesson about the default Reclaim Policy. When I deleted the PVC to get a clean slate, I wasn't just handing back my coat check ticket; I was handing back a ticket that had "Burn After Reading" stamped on the back. The underlying disk, and all the data I had just migrated, was instantly and irrevocably destroyed.

It was a gut punch, but a critical lesson in the literal and unforgiving nature of declarative infrastructure. The blueprints don't care about your intent; they execute exactly what you write. I swallowed my pride, re-ran the data transfer, and this time, the data stuck.

Phase 3: Wrestling the Control Plane Hydra

With the data migrated, I deployed the pod. And then, the real boss battle began. The pod was healthy… “2/2 Running.” The logs were clean. And yet, the website returned a 502 or an ERR_CONNECTION_CLOSED.

This was my introduction to the concept of the "Two Inspectors." My internal inspector—the Kubernetes readinessProbe—saw the pod running and marked it healthy. But a second, stricter inspector—the Google Cloud Load Balancer's own health check—was getting a 301 from Ghost and marking the backend as UNHEALTHY. Because the backend was unhealthy, the certificate validation couldn't complete, leaving it stuck in a FailedNotVisible state. Because the certificate was invalid, the HTTPS listener wouldn't work, causing the connection errors. It was a perfect, maddening deadlock.

The solution was a series of discoveries that felt like peeling an onion. First, building a health-check "sidecar," a tiny, separate container in the pod whose only job is to answer the phone and say, "Yes, we're open." Then, creating a BackendConfig to tell the external inspector to call that specific phone number.

Even then, the system was stuck. Stale errors in the Ingress and Certificate controllers refused to clear. The solution was the "Big Hammer"—a series of kubectl patch commands to forcibly remove the finalizers from the deadlocked objects, followed by a full demolition and rebuild of the networking layer. And in the final hours, the definitive smoking gun for the last stubborn domain - this one, z3r0.us - a rogue AAAA (IPv6) record in its DNS, a ghost from an old configuration that was poisoning the entire certificate validation process.

The View from the Other Side

Today, all the sites are operational on GKE Autopilot. The VM has been decommissioned and my Linux box in my basement is just home to a dev environment (and my son's Minecraft server). The final architecture is clean, resilient, and far more powerful than the old monolith. Was it worth the pain? Absolutely.

The real lesson wasn't learning YAML syntax or gcloud commands. The real lesson was in learning how to debug a distributed system. It was about forming a hypothesis, running a diagnostic to test it, analyzing the output, and methodically eliminating variables until only the truth remains. It’s a process of peeling back layers, from the browser, to DNS, to the load balancer, to the Ingress, to the service, to the pod, and finally, to the application log itself.

The journey was a brutal but invaluable reminder that in the world of modern infrastructure, the map is not the territory. The blueprint is not the building. The real skill is navigating the messy, unpredictable space between the two. And for an old dog from the ‘90s, that’s a new trick worth learning.