Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Robust cluster bootstrap separates infrastructure provisioning from continuous reconciliation. This guide details a production-grade Terraform plus Argo CD model with explicit governance.

TL;DR

A production-ready Kubernetes bootstrap is more reliable when Terraform and Argo CD have explicit responsibilities. Terraform should provision and manage infrastructure primitives, cluster lifecycle resources, and state safety controls. Argo CD should continuously reconcile platform and workload resources from Git using declarative application definitions. This model reduces drift and clarifies incident ownership. Teams should harden Terraform workflows with plan review and state management controls, and treat Argo CD app-of-apps repositories as privileged automation surfaces with strict access and project boundaries.

Argo CD app-of-apps architecture diagram. — App-of-apps accelerates bootstrap, but should be managed as privileged automation.

Bootstrapping Fails Most Often at Ownership Boundaries, Not Tooling

Teams rarely fail because Terraform or Argo CD is incapable. They fail because lifecycle ownership is mixed: infrastructure and runtime reconciliation are managed by overlapping processes with unclear authority.

A reliable model is to enforce two layers:

Terraform manages infrastructure lifecycle and state-critical primitives.
Argo CD manages continuous reconciliation of Kubernetes resources from Git.

This aligns with Argo CD’s cluster bootstrapping guidance and reduces operational ambiguity during upgrades and incidents.

1. Use Terraform for Infrastructure Lifecycle and State Discipline

Terraform should own cluster-adjacent infrastructure and provider-backed lifecycle resources.

terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  backend "s3" {
    bucket = "platform-terraform-state-prod"
    key    = "eks/bootstrap/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_eks_cluster" "main" {
  name     = "platform-prod"
  role_arn = aws_iam_role.eks_cluster.arn
  version  = "1.31"

  vpc_config {
    subnet_ids = var.private_subnet_ids
  }
}

resource "aws_eks_node_group" "baseline" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "baseline"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = var.private_subnet_ids
}

Use plan review before apply and maintain dependency lock artifacts for reproducibility.

2. Use Argo CD for Continuous Platform and Workload Reconciliation

After control plane and baseline nodes are ready, Argo CD should take over in-cluster reconciliation from Git.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cluster-bootstrap
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: platform-admin
  source:
    repoURL: https://github.com/example/platform-config.git
    targetRevision: main
    path: clusters/prod
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

This keeps application drift control where it belongs and avoids re-running infrastructure provisioning for routine app-level changes.

3. Treat App-of-Apps as Privileged Infrastructure

Argo CD documentation frames app-of-apps as admin-level capability. That is a governance requirement, not a suggestion.

Practical controls:

restrict write access to parent app repositories
enforce review gates on project field and destination scope
define clear project boundaries for child applications
use sync waves where ordering dependencies exist

Without this, app-of-apps can become an uncontrolled cross-namespace mutation channel.

4. Keep Rollout and Recovery Paths Explicit

This model improves recoverability only when rollback and ownership boundaries are practiced:

Terraform rollback path for infrastructure-level failures
Argo CD sync and app-level rollback path for runtime config failures

If rollback paths are mixed, incident response slows and attribution becomes unclear.

5. Why This Model Holds Up Better Than Mixed Bootstrap Flows

Dimension	Mixed Ownership Bootstrap	Two-Layer Terraform + Argo CD	Operational Benefit
Source of truth	Ambiguous across tools and scripts	Clear split: infra in Terraform, runtime in Argo CD	Faster root-cause isolation
Drift control	Partial and inconsistent	Layered and scoped	Better change confidence
Access governance	Broad write permissions accumulate	Privileged app-of-apps repos controlled explicitly	Lower privilege-escalation risk
Rollback design	Ad hoc and incident-specific	Defined per layer	Shorter recovery cycles
Team collaboration	Ownership conflicts during change windows	Stable boundaries for platform and app teams	Reduced coordination overhead

What To Do Next

Document ownership boundaries before changing bootstrap code.
Enforce Terraform plan-review-apply workflow with remote state controls.
Restrict app-of-apps parent repository writes and validate project scope in PRs.
Apply sync-wave ordering for foundational components and dependent apps.

Frequently Asked Questions

Q: Can Argo CD replace Terraform for full bootstrap? Argo CD is excellent for Kubernetes resource reconciliation, but provider-backed infrastructure lifecycle and state concerns are better handled by Terraform in most environments.

Q: Is app-of-apps required for Argo CD bootstrap? No, but it is common. If adopted, it should be treated as privileged automation.

Q: Should auto-sync be enabled on everything immediately? Not always. Use automated sync intentionally with clear prune behavior, ordering strategy, and rollback expectations.

Search This Blog

DevOpsDreams