Kubernetes In Anger
In a detailed blog post on Lobsters, a developer shares their experience migrating a production application to Kubernetes, which turned into a three-week ordeal. The issue stemmed from a misconfigured NGINX Ingress controller that lacked the `nginx.ingress.kubernetes.io/service-upstream` annotation. This caused the controller to use pod IPs directly, and during rolling updates, traffic was routed to terminating pods, resulting in intermittent 502 errors. The team initially suspected application bugs, but after extensive logging and tracing, they identified the missing annotation. The fix was a single line of YAML. The author emphasizes that such subtle misconfigurations are common and recommends thorough testing of Ingress behavior with canary deployments. The post serves as a cautionary tale about the complexity of Kubernetes networking and the importance of understanding underlying components.
A single missing annotation caused weeks of downtime, highlighting Kubernetes' hidden complexity.