← Blog

Why We Killed Our Kubernetes Cluster
(and What We Replaced It With)

600 MB of control plane to run three microservices. etcd consensus on a five-node cluster. A YAML file longer than the service it deployed. We tore it out and replaced it with 5 MB of C23. Here is exactly what we did and what we measured.

SB
Scott Baker
Systems engineer — C23, Rust, NixOS, post-quantum security

I want to be precise about something before we start. This post is not a Kubernetes hate piece. Kubernetes is an impressive piece of engineering built by serious people solving a real problem at Google scale in 2014. The problem is that most teams are not Google in 2014, and they are paying the full complexity tax anyway.

We were running three services: a Node.js API, a React frontend served as static files, and a Postgres-backed worker process. Five nodes in a bare-metal cluster at a datacenter. Here is what Kubernetes cost us to run those three services.

The Inventory

Before we touched anything, we ran ps aux on the control plane node and listed everything that existed purely to serve k8s's needs, not our application's:

PROCESS RESIDENT MEMORY PURPOSE
etcd~180 MBDistributed consensus store. Stores pod specs, ConfigMaps, Secrets.
kube-apiserver~200 MBREST frontend to etcd. Every cluster operation goes through this.
kube-scheduler~50 MBWatches apiserver for unscheduled pods. Assigns them to nodes.
kube-controller-manager~60 MBRuns reconciliation loops for deployments, replica sets, endpoints.
kubelet (×5 nodes)~40 MB eachNode agent. Talks to apiserver, manages container runtime.
kube-proxy (×5 nodes)~20 MB eachiptables rules for service routing. Reprograms netfilter on every change.
containerd (×5 nodes)~30 MB eachContainer runtime daemon. Pulls images, manages overlay filesystems.
CoreDNS~30 MBIn-cluster DNS. Required for service name resolution.
nginx-ingress-controller~90 MBRoutes external HTTP to services. Watches apiserver for Ingress objects.
Total overhead~870 MBNone of this runs our application.

870 megabytes of resident memory to run a 12 MB Node.js API, a 3 MB static site, and an 8 MB worker. The control plane outweighs the application by 37:1.

This is not a memory argument. Memory is cheap. This is a complexity argument. Every one of those processes is a failure domain, a configuration surface, a security attack surface, and an upgrade risk. We were spending more time managing the orchestrator than managing our application.

The Breaking Points

1. The YAML Surface Area

To deploy that Node.js API with three replicas, health checks, and an HTTP route, we needed: a Deployment, a Service, an Ingress, a HorizontalPodAutoscaler, a PodDisruptionBudget, and a ConfigMap for the nginx-ingress annotations. Six resource types. 214 lines of YAML. Here is a sample of what the ingress alone looked like:

api-ingress.yaml — 47 lines to route HTTP
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: api-ingress annotations: kubernetes.io/ingress.class: "nginx" nginx.ingress.kubernetes.io/proxy-body-size: "10m" nginx.ingress.kubernetes.io/proxy-read-timeout: "30" nginx.ingress.kubernetes.io/proxy-send-timeout: "30" nginx.ingress.kubernetes.io/use-regex: "true" nginx.ingress.kubernetes.io/rewrite-target: /$1 spec: rules: - host: api.example.com http: paths: - path: /api(/|$)(.*) pathType: Prefix backend: service: name: api-service port: number: 8080

The equivalent in Skr8tr — start the ingress binary with a flag:

one flag, same result
skr8tr_ingress --listen 80 --tower 127.0.0.1 \ --route /api:api-service \ --route /:frontend

2. The Auth Model

Kubernetes authentication is credential-file based. Your kubeconfig contains a token field that is base64-encoded. Base64 is not encryption. It is not hashing. It is a reversible encoding scheme that anyone who has the file can trivially decode with base64 -d. The token is effectively a plaintext password stored in a YAML file that gets copied to every developer's laptop.

There is more. ServiceAccount tokens for in-cluster workloads expire by default but are mounted as files into every pod. Anyone who can exec into a pod in a default RBAC config can read those tokens. This is not hypothetical — it is a documented attack surface with CVE records.

Skr8tr's auth model is different in kind, not degree. Every mutating command is signed with an ML-DSA-65 key (CRYSTALS-Dilithium Level 3, NIST post-quantum standard). The signing key is a 4032-byte file that lives on the operator's machine with chmod 600. It never goes to the server. The server only sees the public key (1952 bytes). The signature on the wire is a 3309-byte binary blob, hex-encoded, appended to the command.

what the conductor receives for a signed SUBMIT
# wire payload (truncated for readability) SUBMIT|/opt/apps/api.skr8tr|1743890400|3a4f8b2c...6618 hex chars...e9d1a0 # ^cmd ^manifest path ^unix_ts ^ML-DSA-65 signature # The conductor verifies: OQS_SIG_verify(payload, sig, pubkey) # If the timestamp is outside ±30s: replay attack rejected # If the signature is invalid: ERR|UNAUTHORIZED # The signing key never left the operator's laptop

3. The Rollout Ceremony

A zero-downtime rolling update in Kubernetes requires you to understand and configure at minimum: strategy.rollingUpdate.maxSurge, strategy.rollingUpdate.maxUnavailable, readinessProbe (correctly — a wrong probe causes the rollout to stall forever), and PodDisruptionBudget (if you want to survive a node drain during rollout). Get any of these wrong and you get either downtime or a stuck rollout that requires manual intervention.

In Skr8tr:

rolling update
skr8tr --key ~/.skr8tr/signing.sec rollout api-v2.skr8tr # rolling out /opt/apps/api-v2.skr8tr... ok # app api-server # status new replicas launching, old replicas draining (8s settle)

The rollout thread in the Conductor launches a new-generation replica, waits 8 seconds for it to settle, then sends SIGTERM to the old-generation replica followed by SIGKILL after a 2-second grace window. One at a time. No probe YAML. No PodDisruptionBudget. At any point during the rollout, N−1 replicas are live.

What We Built Instead

Skr8tr is three C23 daemons and a Rust CLI. Here is the full component inventory:

BINARY SIZE PURPOSE
skr8tr_reg~40 KBService registry. UDP. Register, lookup, round-robin across replicas.
skr8tr_sched~80 KBConductor. Schedules workloads, tracks placements, handles auth, rolling updates.
skr8tr_node~60 KBFleet node. Runs workloads via fork+exec. Health checks. Log ring buffer.
skr8tr_ingress~45 KBHTTP reverse proxy. Longest-prefix routing. Dynamic backend via Tower.
skr8tr (CLI)~3 MBOperator interface. Rust. PQC signing built in.
Total~3.3 MBEverything. Including auth. Including ingress.

The Manifest Format

We did not want YAML. YAML is a data serialization format that was pressed into service as a configuration language. It has significant whitespace, implicit type coercion (no parses as false, Norway parses as NO in some parsers), and no native schema. We built our own format.

api-server.skr8tr
app api-server exec /usr/local/bin/myapi args --port 8080 --db postgres.internal:5432 port 8080 replicas 3 health { check GET /healthz 200 interval 10s retries 3 } scale { min 1 max 8 cpu-above 80 cpu-below 20 }

That is the complete deployment manifest for our API server with health checks and auto-scaling. 18 lines. No anchors. No indentation ambiguity. No implicit type coercion. The parser is 200 lines of C23.

The Numbers

We ran both stacks side by side on identical hardware for two weeks. Here is what we measured:

METRIC KUBERNETES SKRTR
Control plane resident memory ~870 MB ~12 MB
Time from git push to new replica serving traffic ~45s (image pull + pod scheduling + readiness) ~1.2s (fork + exec, no image)
Rolling update: 3 replicas ~90s ~26s (3 × 8s settle)
New node joins cluster ~3 min (kubelet registration, cert approval) <6s (first heartbeat)
Config lines to deploy one service with ingress 214 lines (6 resource types) 18 lines (1 manifest)
Auth model base64 token (plaintext equivalent) ML-DSA-65 post-quantum signature
Binary size of control plane ~620 MB (all binaries) ~3.3 MB
The 1.2-second deploy time is not a trick. Skr8tr does not pull a container image. It does not set up an overlay filesystem. It does not configure network namespaces. It calls fork() and execve() with the binary path from the manifest. The binary was already on disk. That is the entire deployment step.

What Skr8tr Does Not Do (Yet)

Honest accounting. These are genuine gaps relative to a mature k8s installation:

If you need multi-tenant container isolation, network policies, or a distributed block storage system, Kubernetes is a reasonable answer. If you are running your own services on nodes you control, it is likely overkill.

The Source

Skr8tr is Apache 2.0. The full source is on GitHub. The control plane is ~2000 lines of C23 across four files. The CLI is ~500 lines of Rust. The parser for .skr8tr manifests is 200 lines. It is small enough to read in an afternoon.

If you are running a Kubernetes cluster for three services, I would invite you to spend that afternoon reading Skr8tr's source and considering whether the complexity you are carrying is load-bearing.


Questions, corrections, or war stories from your own k8s migration: open an issue or email directly.

← All posts Next: No Passwords. No TLS. No Bearer Tokens. →