Platform Engineering on AWS with EKS Blueprints and GitOps
Platform Engineering on AWS with EKS Blueprints and GitOps
Platform engineering on AWS gets much clearer when Terraform owns day-0 infrastructure and Argo CD owns day-2 reconciliation. This guide shows how EKS Blueprints and the GitOps Bridge pattern create that boundary.
TL;DR
Platform engineering on AWS is easier to reason about when you separate responsibilities: Terraform provisions the EKS cluster, networking, IAM, and add-on metadata, while Argo CD continuously reconciles in-cluster applications and platform add-ons from Git. EKS Blueprints and the GitOps Bridge pattern make that handoff explicit by passing cluster context into Argo CD instead of letting Terraform and GitOps compete for the same resources. The result is a cleaner bootstrap flow, fewer ownership collisions, and a platform model that scales better across teams and environments.
Platform Engineering Starts With Ownership, Not Tools
The most common mistake in platform engineering is treating Terraform, EKS Blueprints, and Argo CD as if they are interchangeable layers in one giant deployment system. They are not.
Terraform is excellent at creating the cloud foundation: the VPC, the EKS cluster, IAM roles, and the external services your platform needs before Kubernetes can even start doing useful work. Argo CD is excellent at continuously reconciling Kubernetes resources from Git once that platform exists. The GitOps Bridge pattern exists because those two responsibilities should meet at a contract, not blur into each other.
That contract matters more than the brand names. If you do not define day-0 and day-2 ownership clearly, you get a platform that is hard to explain, hard to audit, and hard to upgrade. The better model is simple:
- Terraform owns external infrastructure and bootstrap metadata.
- Argo CD owns in-cluster desired state and reconciliation.
- Platform teams own the boundary between the two.
Why This Is a Platform Problem
Platform engineering is not just about standardizing a cluster build. It is about making the path from “new environment” to “safe production workload” repeatable enough that application teams do not need to rediscover it every quarter.
The OpenGitOps principles describe Git as the source of truth and automated reconciliation as the operational model. That sounds simple until you need cloud-side metadata that lives outside the cluster. Add-ons like ingress controllers, external DNS, secret integrations, and workload bootstrap flows often need values that only Terraform knows: VPC IDs, IAM role ARNs, account IDs, cluster names, or repository locations.
That is where EKS Blueprints and GitOps Bridge fit together. AWS’s GitOps Bridge guide describes the pattern directly: IaC creates the external resources and stores the metadata GitOps needs; GitOps then reads that metadata and passes it into the Helm chart or application install.
The Contract Between IaC and GitOps
The old debate is usually framed as “Terraform vs GitOps.” That framing is too weak.
The real question is: which system owns which layer of intent?
Terraform owns:
- VPC, subnets, routing, and EKS cluster creation
- IAM roles and trust relationships
- cluster bootstrap metadata
- prerequisite AWS services for add-ons and controllers
Argo CD owns:
- Applications, ApplicationSets, and environment overlays
- Helm chart installation and reconciliation
- sync policy, health reporting, and rollback behavior
- promotion through Git commits instead of manual API calls
This matches the Argo CD model described in the official docs: applications are defined in Git and Argo CD automatically syncs them to Kubernetes clusters, with health surfaced separately from sync state.
The key design rule is simple: do not let both tools think they are the source of truth for the same object.
A Practical Bootstrap Flow on EKS
The AWS IA EKS Blueprints GitOps getting-started guide shows a clean bootstrap sequence:
- Terraform creates the cluster and foundational AWS resources.
- Terraform writes GitOps Bridge metadata into the Argo CD cluster secret.
- Argo CD reads that metadata to install platform add-ons and workloads.
- Subsequent changes flow through Git, not through ad hoc cluster access.
That is the platform-engineering version of day-0 and day-2:
- Day 0: provision the substrate.
- Day 1: register the cluster and seed the GitOps control plane.
- Day 2: let Git-driven reconciliation handle ongoing change.
Here is a minimal Terraform-shaped example of how teams usually think about that boundary:
terraform {
required_version = ">= 1.6.0"
}
locals {
gitops_bridge_metadata = {
addons_repo_url = "https://github.com/gitops-bridge-dev/gitops-bridge"
addons_repo_basepath = "bootstrap/control-plane/addons"
addons_repo_path = "bootstrap/control-plane/addons"
addons_repo_revision = "main"
aws_region = var.aws_region
aws_cluster_name = module.eks.cluster_name
aws_vpc_id = module.vpc.vpc_id
}
}
output "gitops_bridge_metadata" {
value = local.gitops_bridge_metadata
}
The point is not the exact syntax. The point is to make cluster context explicit and machine-readable so Argo CD can use it without hard-coded values scattered across manifests.
What Add-Ons Need From Cloud Metadata
Add-ons are where platform teams feel the pain first.
Controllers like AWS Load Balancer Controller, ExternalDNS, External Secrets, or other cluster-integrated add-ons often need cloud identifiers and permissions that do not belong in application manifests. A Helm chart may need:
- the AWS region
- the cluster name
- IAM role references
- VPC or subnet context
- repository paths for bootstrap apps
If you put those values directly into every workload repo, you create copy-paste drift. If you try to make Terraform own every in-cluster controller forever, you lose GitOps reconciliation.
The GitOps Bridge pattern solves this by carrying environment-specific values across the boundary once, then letting Argo CD use them repeatedly during install and reconciliation.
A Bridge-Oriented Repo Shape
One workable structure looks like this:
platform/
bootstrap/
argocd/
addons/
clusters/
dev/
stage/
prod/
apps/
shared/
team-a/
team-b/
That layout is not mandatory, but it makes ownership visible. The platform team owns bootstrap/ and clusters/, while product teams can own selected paths under apps/ without getting access to the whole substrate.
How Argo CD Fits Into the Model
Argo CD’s Application spec is the contract for reconciliation. A minimal application declares a source, a destination, and a sync policy.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: platform-addons
namespace: argocd
spec:
project: platform
source:
repoURL: https://github.com/acme/platform-gitops.git
targetRevision: main
path: clusters/prod/platform-addons
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
Two details matter here.
First, automated sync means Git changes are applied without a human clicking the sync button. The Argo CD docs note that this also removes the need for CI/CD pipelines to talk directly to the Argo CD API server.
Second, prune and selfHeal are not decoration. prune removes resources that disappeared from Git. selfHeal tells Argo CD to correct drift when live state diverges from desired state.
If your platform team only remembers one thing, it should be this: sync policy defines how reconciliation happens, while health defines how well the live resources are doing.
Sync Policy, Health, and Ordering Are Different Problems
People often collapse these into one concept. Argo CD does not.
Sync policy controls behavior
Auto-sync decides when and how Argo CD applies changes from Git. The automated sync docs also cover retry refresh, self-heal, and reconciliation timing.
Health controls status
Argo CD provides built-in health assessment for standard Kubernetes resources and allows custom Lua health checks when you need deeper logic for custom resources or known controller edge cases.
Sync waves control ordering
When one resource must settle before another starts, sync waves give you ordering control. The sync-waves docs explain that Argo CD runs sync in phases and waves, and that a later wave waits for earlier resources to become healthy.
Here is a compact example that combines those ideas:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: bootstrap-platform
namespace: argocd
spec:
project: platform
source:
repoURL: https://github.com/acme/platform-gitops.git
targetRevision: main
path: bootstrap/platform
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
---
apiVersion: argoproj.io/v1alpha1
kind: Namespace
metadata:
name: platform-system
annotations:
argocd.argoproj.io/sync-wave: "-1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: platform-controller
namespace: platform-system
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
replicas: 2
selector:
matchLabels:
app: platform-controller
template:
metadata:
labels:
app: platform-controller
spec:
containers:
- name: controller
image: public.ecr.aws/docker/library/nginx:1.27
In practice, you would use waves for bootstrap sequencing, not as a replacement for good dependency design. Keep the chain short and obvious.
What Managed Argo CD on EKS Changes
AWS now documents a managed Argo CD capability in EKS. That changes the operational discussion, but not the platform boundary.
The benefit is clear: AWS handles the install, maintenance, and scaling burden for the Argo CD control plane itself. You no longer need to treat Argo CD as another self-hosted workload that your platform team must patch and keep alive.
What does not change is the GitOps contract:
- You still define applications in Git.
- You still need repository access and permissions configured.
- You still need a clean split between infrastructure provisioning and in-cluster reconciliation.
- You still need a repository structure that maps to teams and environments.
So the managed service reduces toil, but it does not remove architectural responsibility. It just lets the platform team spend less time on Argo CD plumbing and more time on ownership boundaries and developer experience.
Why This Pattern Is Better Than “Terraform All The Things”
Terraform can install Helm charts, and many teams start there because it feels convenient. That convenience decays fast.
The problem is not that Terraform cannot create Kubernetes resources. The problem is that if Terraform owns every add-on, every retry, and every runtime drift correction, then a deployment becomes a Terraform operation instead of a GitOps operation. That pushes you toward larger state files, more imperative workflows, and more operator-only change windows.
The GitOps Bridge pattern gives you a cleaner failure domain:
- Terraform handles cloud prerequisites once.
- Argo CD handles repeated reconciliation many times.
- Git records the desired state and the history of every platform change.
That is a more durable operating model for platform engineering because the control boundary is visible.
When Monorepo Wins, and When It Doesn’t
The monorepo vs multirepo question is often asked as if it were a universal rule. It is not.
Use a monorepo when:
- one platform team owns most of the GitOps surface
- environment promotion is standardized
- cluster bootstrap and shared add-ons live beside app manifests
- you want shared policy, shared review, and simpler discovery
Use separate repos or repo-per-team when:
- teams have clearly different release cadences
- access control needs stronger boundaries
- product teams should not see platform internals
- you need independent ownership for clusters or workloads
The more important modern distinction is not monorepo versus multirepo by itself. It is whether your reconciliation boundaries match ownership boundaries.
Argo CD Applications, ApplicationSets, and path-scoped source layouts make it possible to decompose responsibility even when some repositories stay centralized. That is why the old debate has softened: repository structure matters, but it is no longer the only lever.
Why We Built It This Way
This pattern is intentionally conservative.
It keeps Terraform focused on cloud and bootstrap intent, and it keeps Argo CD focused on continuous delivery and runtime convergence. That separation helps with audits, change reviews, and on-call debugging because you can answer three questions quickly:
- Who created the environment?
- Who owns the live workload state?
- Where is the contract between them?
If you cannot answer those questions, your platform is already too coupled.
Frequently Asked Questions
Q: What is the GitOps Bridge pattern in one sentence? A: It is the handoff mechanism between IaC and GitOps. Terraform creates the cluster and external dependencies, then passes the metadata GitOps needs so Argo CD can install and reconcile add-ons and workloads.
Q: Should I bootstrap Argo CD with Terraform or Git? A: Use Terraform for the initial cloud-side bootstrap and registration, then let Git and Argo CD own ongoing reconciliation. That keeps day-0 and day-2 responsibilities separate.
Q: How do health checks differ from auto-sync? A: Auto-sync decides when Argo CD applies desired state. Health checks decide whether resources are healthy after they are applied. You need both, but they solve different problems.
Q: Does managed EKS Argo CD eliminate the need for GitOps Bridge? A: No. It removes operational overhead for the Argo CD control plane, but you still need a clean contract between Terraform, cloud metadata, and in-cluster reconciliation.
Q: What is the safest way to model platform add-ons? A: Keep their cloud prerequisites in Terraform, keep their install and lifecycle in Argo CD, and pass only the metadata needed for those add-ons to work in the target cluster.
Resources
- ArgoCD on Amazon EKS Blueprints for Terraform
- Working with Argo CD in Amazon EKS
- Continuous deployment with Argo CD in Amazon EKS
- Configure repository access for Argo CD on Amazon EKS
- GitOps Bridge project
- GitOps Bridge Argo CD bootstrap module
- AWS IA EKS Blueprints add-ons module
- Argo CD Application specification
- Argo CD automated sync policy
- Argo CD resource health
- Argo CD sync waves
- OpenGitOps principles
Comments
Post a Comment