Posts

Showing posts from May, 2026

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Image
Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach Robust cluster bootstrap separates infrastructure provisioning from continuous reconciliation. This guide details a production-grade Terraform plus Argo CD model with explicit governance. TL;DR A production-ready Kubernetes bootstrap is more reliable when Terraform and Argo CD have explicit responsibilities. Terraform should provision and manage infrastructure primitives, cluster lifecycle resources, and state safety controls. Argo CD should continuously reconcile platform and workload resources from Git using declarative application definitions. This model reduces drift and clarifies incident ownership. Teams should harden Terraform workflows with plan review and state management controls, and treat Argo CD app-of-apps repositories as privileged automation surfaces with strict access and project boundaries. App-of-apps accelerates bootstrap, but should be managed as privileged automation. Bo...

Progressive Delivery on Kubernetes with Argo CD and Argo Rollouts

Image
Progressive Delivery on Kubernetes with Argo CD and Argo Rollouts Argo CD and Argo Rollouts solve different problems in the release path. This guide shows how to use them together for safer canary and blue-green delivery on Kubernetes. TL;DR Progressive delivery on Kubernetes is not just a nicer rolling update. Argo CD reconciles Git against the cluster and keeps the desired state honest, while Argo Rollouts adds first-class release strategies such as canary and blue-green, with analysis gates and traffic-aware promotion. When you combine them, you get a clear control boundary: Git defines intent, Argo CD applies it, and Argo Rollouts manages staged exposure and rollback decisions. That split makes release behavior more predictable, especially when you need metric-based promotion instead of blind full-cluster cutovers. Argo Rollouts is the control plane that adds staged promotion and analysis on top of GitOps-driven delivery. Rolling Updates Are Not Progressive Delivery Kubernetes...

Autoscaling EKS Clusters with Karpenter: A Policy-First Model That Holds in Production

Image
Autoscaling EKS Clusters with Karpenter: A Policy-First Model That Holds in Production Karpenter can improve EKS scaling speed and flexibility, but reliable outcomes depend on NodePool policy, EC2NodeClass boundaries, and disruption controls. TL;DR Karpenter works best in production when autoscaling is treated as policy, not only capacity automation. Modern Karpenter workflows are built around NodePool, EC2NodeClass, and NodeClaim resources. Teams should enforce explicit requirements, limits, and disruption budgets, and run the Karpenter controller outside Karpenter-managed capacity. Cost and reliability improvements come from combining scaling policy with workload resource discipline and clear observability through NodeClaim lifecycle and metrics. Production autoscaling starts with explicit NodePool and EC2NodeClass policy. Karpenter Succeeds in Production Only When Scaling Policy Is Explicit Karpenter can scale EKS clusters faster and with wider instance selection than static-no...