Posts

Showing posts with the label eks

Environment Promotion Strategies for GitOps Pipelines: Branches, Paths, Tags, and Digests

Image
Environment Promotion Strategies for GitOps Pipelines: Branches, Paths, Tags, and Digests GitOps promotion is a data-model problem before it is a tooling problem. This guide compares branches, directories, tags, image digests, Flux automation, and Argo CD Image Updater trade-offs. TL;DR A reliable GitOps promotion strategy makes the promoted artifact, environment-specific configuration, approval record, and rollback target explicit. Directory-per-environment models are simple and auditable, branch-per-environment models isolate change history but create merge drift, tag or SHA promotion improves reproducibility, and image-digest promotion closes supply-chain gaps. Flux Image Automation and Argo CD Image Updater can reduce toil, but production promotion still needs protected branches, signed commits or tags, policy gates, drift detection, and a clear handoff to progressive delivery across clusters safely. Promotion is the movement of a reviewed artifact through explicit environment s...

Platform Engineering on AWS with EKS Blueprints and GitOps

Image
Platform Engineering on AWS with EKS Blueprints and GitOps Platform engineering on AWS gets much clearer when Terraform owns day-0 infrastructure and Argo CD owns day-2 reconciliation. This guide shows how EKS Blueprints and the GitOps Bridge pattern create that boundary. TL;DR Platform engineering on AWS is easier to reason about when you separate responsibilities: Terraform provisions the EKS cluster, networking, IAM, and add-on metadata, while Argo CD continuously reconciles in-cluster applications and platform add-ons from Git. EKS Blueprints and the GitOps Bridge pattern make that handoff explicit by passing cluster context into Argo CD instead of letting Terraform and GitOps compete for the same resources. The result is a cleaner bootstrap flow, fewer ownership collisions, and a platform model that scales better across teams and environments. Platform Engineering Starts With Ownership, Not Tools The most common mistake in platform engineering is treating Terraform, EKS Bluepr...

A Modern Terraform Reference Architecture for Amazon EKS

Image
A Modern Terraform Reference Architecture for Amazon EKS Most EKS failures are not Kubernetes failures. They are boundary failures between Terraform state , VPC capacity, node provisioning, and workload identity. This guide lays out a production-ready reference architecture that keeps those seams explicit. TL;DR A modern Terraform reference architecture for Amazon EKS should separate network, cluster, and add-on state; reserve private subnet capacity for control-plane ENIs and pods; keep a small stable baseline of managed nodes; use Karpenter for bursty or heterogeneous workloads; and choose workload identity deliberately instead of treating IRSA and EKS Pod Identity as interchangeable. The goal is not just to create a cluster, but to make upgrades, add-on lifecycle, IAM boundaries, and node replacement predictable. If you design those boundaries early, EKS gets much easier to operate. A production EKS architecture works better when Terraform state, networking, compute, identity...

Autoscaling Amazon EKS with Karpenter: NodePools, EC2NodeClasses, and Practical Guardrails

Image
Autoscaling Amazon EKS with Karpenter: NodePools, EC2NodeClasses, and Practical Guardrails Karpenter changes EKS autoscaling from static node-group math to Pod-driven provisioning. This guide shows how NodePools, EC2NodeClasses, and disruption controls fit together so you can scale faster without creating a cost or reliability mess. TL;DR Karpenter reacts to unschedulable Pods, not just node-group size. NodePool resources define scheduling intent; EC2NodeClass resources define AWS launch settings. Keep instance-family choices broad enough for Karpenter to find real capacity. Keep a small baseline of stable capacity for system workloads while Karpenter handles bursty or specialized demand. Validate rollout by forcing real pending Pods and watching controller decisions, not just by checking whether the controller Pod is running. Use disruption and consolidation conservatively until you understand the effect on workload churn and startup latency. Karpenter Helps...