Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach
Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach
Robust cluster bootstrap separates infrastructure provisioning from continuous reconciliation. This guide details a production-grade Terraform plus Argo CD model with explicit governance.
TL;DR
A production-ready Kubernetes bootstrap is more reliable when Terraform and Argo CD have explicit responsibilities. Terraform should provision and manage infrastructure primitives, cluster lifecycle resources, and state safety controls. Argo CD should continuously reconcile platform and workload resources from Git using declarative application definitions. This model reduces drift and clarifies incident ownership. Teams should harden Terraform workflows with plan review and state management controls, and treat Argo CD app-of-apps repositories as privileged automation surfaces with strict access and project boundaries.

Bootstrapping Fails Most Often at Ownership Boundaries, Not Tooling
Teams rarely fail because Terraform or Argo CD is incapable. They fail because lifecycle ownership is mixed: infrastructure and runtime reconciliation are managed by overlapping processes with unclear authority.
A reliable model is to enforce two layers:
- Terraform manages infrastructure lifecycle and state-critical primitives.
- Argo CD manages continuous reconciliation of Kubernetes resources from Git.
This aligns with Argo CD’s cluster bootstrapping guidance and reduces operational ambiguity during upgrades and incidents.
1. Use Terraform for Infrastructure Lifecycle and State Discipline
Terraform should own cluster-adjacent infrastructure and provider-backed lifecycle resources.
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "platform-terraform-state-prod"
key = "eks/bootstrap/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_eks_cluster" "main" {
name = "platform-prod"
role_arn = aws_iam_role.eks_cluster.arn
version = "1.31"
vpc_config {
subnet_ids = var.private_subnet_ids
}
}
resource "aws_eks_node_group" "baseline" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "baseline"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = var.private_subnet_ids
}
Use plan review before apply and maintain dependency lock artifacts for reproducibility.
2. Use Argo CD for Continuous Platform and Workload Reconciliation
After control plane and baseline nodes are ready, Argo CD should take over in-cluster reconciliation from Git.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cluster-bootstrap
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: platform-admin
source:
repoURL: https://github.com/example/platform-config.git
targetRevision: main
path: clusters/prod
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
This keeps application drift control where it belongs and avoids re-running infrastructure provisioning for routine app-level changes.
3. Treat App-of-Apps as Privileged Infrastructure
Argo CD documentation frames app-of-apps as admin-level capability. That is a governance requirement, not a suggestion.
Practical controls:
- restrict write access to parent app repositories
- enforce review gates on project field and destination scope
- define clear project boundaries for child applications
- use sync waves where ordering dependencies exist
Without this, app-of-apps can become an uncontrolled cross-namespace mutation channel.
4. Keep Rollout and Recovery Paths Explicit
This model improves recoverability only when rollback and ownership boundaries are practiced:
- Terraform rollback path for infrastructure-level failures
- Argo CD sync and app-level rollback path for runtime config failures
If rollback paths are mixed, incident response slows and attribution becomes unclear.
5. Why This Model Holds Up Better Than Mixed Bootstrap Flows
| Dimension | Mixed Ownership Bootstrap | Two-Layer Terraform + Argo CD | Operational Benefit |
|---|---|---|---|
| Source of truth | Ambiguous across tools and scripts | Clear split: infra in Terraform, runtime in Argo CD | Faster root-cause isolation |
| Drift control | Partial and inconsistent | Layered and scoped | Better change confidence |
| Access governance | Broad write permissions accumulate | Privileged app-of-apps repos controlled explicitly | Lower privilege-escalation risk |
| Rollback design | Ad hoc and incident-specific | Defined per layer | Shorter recovery cycles |
| Team collaboration | Ownership conflicts during change windows | Stable boundaries for platform and app teams | Reduced coordination overhead |
What To Do Next
- Document ownership boundaries before changing bootstrap code.
- Enforce Terraform plan-review-apply workflow with remote state controls.
- Restrict app-of-apps parent repository writes and validate project scope in PRs.
- Apply sync-wave ordering for foundational components and dependent apps.
Frequently Asked Questions
Q: Can Argo CD replace Terraform for full bootstrap? Argo CD is excellent for Kubernetes resource reconciliation, but provider-backed infrastructure lifecycle and state concerns are better handled by Terraform in most environments.
Q: Is app-of-apps required for Argo CD bootstrap? No, but it is common. If adopted, it should be treated as privileged automation.
Q: Should auto-sync be enabled on everything immediately? Not always. Use automated sync intentionally with clear prune behavior, ordering strategy, and rollback expectations.
Resources
- Argo CD Cluster Bootstrapping
- Argo CD Declarative Setup
- Argo CD Getting Started
- Argo CD Application Specification
- Argo CD Automated Sync
- Argo CD Sync Waves
- Argo CD App Create Command
- Terraform Apply
- Terraform S3 Backend
- Terraform Provider Requirements
- Terraform Dependency Lock
- Terraform AWS EKS Cluster
- Terraform AWS EKS Node Group
Comments
Post a Comment