m@ksim.pro
Back to all posts
IT 4 min read

Kubernetes: what the hype omits about the operational side

Kubernetes solves real problems at scale. It also introduces a new operational surface that most teams are not ready for. A realistic look before you commit.

Kubernetes has become the default answer to "how do we run containers" so quickly that the question rarely gets examined. Every conference talk shows the benefits. The operational challenges show up later, quietly, as incidents and as engineers leaving because the on-call rotation became unsustainable.

I am not arguing against Kubernetes. I use it. I recommend it in the right contexts. But "the right contexts" is a phrase worth taking seriously.

What Kubernetes actually solves well

The core value proposition is genuine: declarative configuration of how your workloads should run, automated scheduling across a cluster, self-healing when containers crash, and a standard model for networking and storage that does not depend on any particular cloud provider.

At a certain scale - many services, many teams, significant deployment frequency - this pays off substantially. The alternative, managing a fleet of servers with bespoke scripts, does not scale and does not compose.

The second benefit is portability. A cluster running on your own hardware and a cluster running on a managed cloud service share the same API and the same mental model. That reduces lock-in, at least at the workload layer.

What the talks leave out

The complexity of Kubernetes is not just in the initial setup. It lives in the ongoing operations.

Networking in Kubernetes is a layer of abstraction on top of the host networking, and when something breaks - and something will break - debugging requires understanding both layers simultaneously. CNI plugins, service mesh if you have one, network policies, ingress controllers: each adds configuration surface and failure modes.

Storage is harder than compute. Stateful workloads in Kubernetes require persistent volumes, storage classes, and an understanding of how the storage backend behaves when a node is evicted or fails. Teams that migrate databases into Kubernetes without experience in this area tend to discover the edge cases during incidents.

Certificate management, RBAC, secrets handling, version upgrades - each is a discipline in itself. A Kubernetes upgrade is not a one-command operation; it requires careful sequencing of control plane and node updates with a tested rollback plan.

The staffing question

The honest question before adopting Kubernetes is: who on the team has operated it in production before?

"Operated" means: responded to an incident at 2 AM, diagnosed a node that stopped scheduling pods without losing running workloads, upgraded a cluster without downtime, and tuned resource requests and limits across dozens of services so that autoscaling actually works.

If the answer is "no one, but we will learn", that is a legitimate choice - but it should be a deliberate one, with a realistic timeline and the acknowledgment that the first year will have painful incidents that a more experienced team would have avoided.

When a simpler model is the right model

A single well-configured virtual machine with a process manager handles most workloads up to several hundred requests per second. A managed container service - something that abstracts the cluster entirely - handles medium-complexity deployments without the full Kubernetes operational surface.

Kubernetes becomes the right choice when you have multiple services with different scaling profiles, when you need fine-grained deployment control, when your team has or is ready to build the operational expertise, and when the cost of the learning curve is justified by the scale you are operating at.

A practical starting point

If you are evaluating Kubernetes for the first time, use a managed service rather than self-hosted. Let someone else operate the control plane. Focus your team's learning on workload configuration and observability. Avoid putting stateful workloads in the cluster until you have experience with stateless ones.

The goal is to match the operational complexity you take on to the problem you are actually solving. Kubernetes is a powerful tool. It is also one of the most complex pieces of infrastructure many teams will ever run.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp