Posts

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Image
Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach Robust cluster bootstrap separates infrastructure provisioning from continuous reconciliation. This guide details a production-grade Terraform plus Argo CD model with explicit governance. TL;DR A production-ready Kubernetes bootstrap is more reliable when Terraform and Argo CD have explicit responsibilities. Terraform should provision and manage infrastructure primitives, cluster lifecycle resources, and state safety controls. Argo CD should continuously reconcile platform and workload resources from Git using declarative application definitions. This model reduces drift and clarifies incident ownership. Teams should harden Terraform workflows with plan review and state management controls, and treat Argo CD app-of-apps repositories as privileged automation surfaces with strict access and project boundaries. App-of-apps accelerates bootstrap, but should be managed as privileged automation. Bo...

Progressive Delivery on Kubernetes with Argo CD and Argo Rollouts

Image
Progressive Delivery on Kubernetes with Argo CD and Argo Rollouts Argo CD and Argo Rollouts solve different problems in the release path. This guide shows how to use them together for safer canary and blue-green delivery on Kubernetes. TL;DR Progressive delivery on Kubernetes is not just a nicer rolling update. Argo CD reconciles Git against the cluster and keeps the desired state honest, while Argo Rollouts adds first-class release strategies such as canary and blue-green, with analysis gates and traffic-aware promotion. When you combine them, you get a clear control boundary: Git defines intent, Argo CD applies it, and Argo Rollouts manages staged exposure and rollback decisions. That split makes release behavior more predictable, especially when you need metric-based promotion instead of blind full-cluster cutovers. Argo Rollouts is the control plane that adds staged promotion and analysis on top of GitOps-driven delivery. Rolling Updates Are Not Progressive Delivery Kubernetes...

Autoscaling EKS Clusters with Karpenter: A Policy-First Model That Holds in Production

Image
Autoscaling EKS Clusters with Karpenter: A Policy-First Model That Holds in Production Karpenter can improve EKS scaling speed and flexibility, but reliable outcomes depend on NodePool policy, EC2NodeClass boundaries, and disruption controls. TL;DR Karpenter works best in production when autoscaling is treated as policy, not only capacity automation. Modern Karpenter workflows are built around NodePool, EC2NodeClass, and NodeClaim resources. Teams should enforce explicit requirements, limits, and disruption budgets, and run the Karpenter controller outside Karpenter-managed capacity. Cost and reliability improvements come from combining scaling policy with workload resource discipline and clear observability through NodeClaim lifecycle and metrics. Production autoscaling starts with explicit NodePool and EC2NodeClass policy. Karpenter Succeeds in Production Only When Scaling Policy Is Explicit Karpenter can scale EKS clusters faster and with wider instance selection than static-no...

Kubernetes Gateway API vs Ingress: A Practical Production Model for Platform Teams

Image
Kubernetes Gateway API vs Ingress: A Practical Production Model for Platform Teams Ingress still works, but new routing requirements in shared clusters are better served by Gateway API. This guide explains what changes operationally, what to migrate first, and how to validate support safely. TL;DR Ingress is not removed from Kubernetes, but its API is frozen and Kubernetes recommends Gateway API for future evolution. The practical win for platform teams is governance: GatewayClass and Gateway can be owned by infrastructure teams, while HTTPRoute and related route objects can be owned by application teams. Migration should be incremental, with Ingress and Gateway resources coexisting while behavior is validated against your specific implementation. Production success depends on conformance checks, supported feature verification, and explicit cross-namespace attachment policy rather than direct YAML translation. Gateway API maps platform ownership and application ownership more cleanl...

Improving Kubernetes Cost Visibility with OpenCost

Improving Kubernetes Cost Visibility with OpenCost OpenCost gives Kubernetes teams a practical way to see allocation, idle cost, and cloud billing in one place. This guide shows how to install it and read the numbers correctly. TL;DR OpenCost is useful when you need more than a cloud bill and less than a full financial model. It turns Kubernetes telemetry, Prometheus data, and cloud pricing inputs into allocation views that help teams understand who is using what and how much of the cluster is idle or shared. The important caveat is that the numbers are only as good as the telemetry and pricing data behind them, so the right goal is trustworthy cost visibility, not magical accounting precision. Cost Visibility Is Not Cost Guessing Most Kubernetes cost discussions start with the cloud bill and end with spreadsheet politics. That works until you need to answer a more useful question: which namespace, workload, team, or service is actually consuming the cluster, and how much of the pl...

Building an Internal Developer Platform on EKS

Image
Building an Internal Developer Platform on EKS An internal developer platform is not just a cluster plus CI/CD. This guide shows how Backstage, GitOps, and EKS fit together as a product layer for self-service delivery. TL;DR An internal developer platform on EKS works best when you treat it as a product, not a cluster project. EKS provides the runtime substrate, but the platform is the contract layer that turns infrastructure into self-service capabilities: catalog, templates, deployment paths, health visibility, and guardrails. Backstage is useful because its catalog and software templates expose those capabilities in a developer-facing interface, while GitOps keeps the actual platform state declarative and auditable. If you want adoption, focus on what developers can request and understand, not only on what the cluster can run. An IDP Is A Product Layer, Not A Cluster Project The easiest way to build the wrong internal developer platform is to treat it like an infrastructure chec...