Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Robust cluster bootstrap separates infrastructure provisioning from continuous reconciliation. This guide details a production-grade Terraform plus Argo CD model with explicit governance.

TL;DR

A production-ready Kubernetes bootstrap is more reliable when Terraform and Argo CD have explicit responsibilities. Terraform should provision and manage infrastructure primitives, cluster lifecycle resources, and state safety controls. Argo CD should continuously reconcile platform and workload resources from Git using declarative application definitions. This model reduces drift and clarifies incident ownership. Teams should harden Terraform workflows with plan review and state management controls, and treat Argo CD app-of-apps repositories as privileged automation surfaces with strict access and project boundaries.

Argo CD app-of-apps architecture diagram.
App-of-apps accelerates bootstrap, but should be managed as privileged automation.

Bootstrapping Fails Most Often at Ownership Boundaries, Not Tooling

Teams rarely fail because Terraform or Argo CD is incapable. They fail because lifecycle ownership is mixed: infrastructure and runtime reconciliation are managed by overlapping processes with unclear authority.

A reliable model is to enforce two layers:

  • Terraform manages infrastructure lifecycle and state-critical primitives.
  • Argo CD manages continuous reconciliation of Kubernetes resources from Git.

This aligns with Argo CD’s cluster bootstrapping guidance and reduces operational ambiguity during upgrades and incidents.

1. Use Terraform for Infrastructure Lifecycle and State Discipline

Terraform should own cluster-adjacent infrastructure and provider-backed lifecycle resources.

terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  backend "s3" {
    bucket = "platform-terraform-state-prod"
    key    = "eks/bootstrap/terraform.tfstate"
    region = "us-east-1"
  }
}
resource "aws_eks_cluster" "main" {
  name     = "platform-prod"
  role_arn = aws_iam_role.eks_cluster.arn
  version  = "1.31"

  vpc_config {
    subnet_ids = var.private_subnet_ids
  }
}

resource "aws_eks_node_group" "baseline" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "baseline"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = var.private_subnet_ids
}

Use plan review before apply and maintain dependency lock artifacts for reproducibility.

2. Use Argo CD for Continuous Platform and Workload Reconciliation

After control plane and baseline nodes are ready, Argo CD should take over in-cluster reconciliation from Git.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cluster-bootstrap
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: platform-admin
  source:
    repoURL: https://github.com/example/platform-config.git
    targetRevision: main
    path: clusters/prod
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

This keeps application drift control where it belongs and avoids re-running infrastructure provisioning for routine app-level changes.

3. Treat App-of-Apps as Privileged Infrastructure

Argo CD documentation frames app-of-apps as admin-level capability. That is a governance requirement, not a suggestion.

Practical controls:

  • restrict write access to parent app repositories
  • enforce review gates on project field and destination scope
  • define clear project boundaries for child applications
  • use sync waves where ordering dependencies exist

Without this, app-of-apps can become an uncontrolled cross-namespace mutation channel.

4. Keep Rollout and Recovery Paths Explicit

This model improves recoverability only when rollback and ownership boundaries are practiced:

  • Terraform rollback path for infrastructure-level failures
  • Argo CD sync and app-level rollback path for runtime config failures

If rollback paths are mixed, incident response slows and attribution becomes unclear.

5. Why This Model Holds Up Better Than Mixed Bootstrap Flows

DimensionMixed Ownership BootstrapTwo-Layer Terraform + Argo CDOperational Benefit
Source of truthAmbiguous across tools and scriptsClear split: infra in Terraform, runtime in Argo CDFaster root-cause isolation
Drift controlPartial and inconsistentLayered and scopedBetter change confidence
Access governanceBroad write permissions accumulatePrivileged app-of-apps repos controlled explicitlyLower privilege-escalation risk
Rollback designAd hoc and incident-specificDefined per layerShorter recovery cycles
Team collaborationOwnership conflicts during change windowsStable boundaries for platform and app teamsReduced coordination overhead

What To Do Next

  1. Document ownership boundaries before changing bootstrap code.
  2. Enforce Terraform plan-review-apply workflow with remote state controls.
  3. Restrict app-of-apps parent repository writes and validate project scope in PRs.
  4. Apply sync-wave ordering for foundational components and dependent apps.

Frequently Asked Questions

Q: Can Argo CD replace Terraform for full bootstrap? Argo CD is excellent for Kubernetes resource reconciliation, but provider-backed infrastructure lifecycle and state concerns are better handled by Terraform in most environments.

Q: Is app-of-apps required for Argo CD bootstrap? No, but it is common. If adopted, it should be treated as privileged automation.

Q: Should auto-sync be enabled on everything immediately? Not always. Use automated sync intentionally with clear prune behavior, ordering strategy, and rollback expectations.

Resources

Comments

Popular posts from this blog

Argo CD Auto-Sync and Health Checks: An Operator's Guide to Safe GitOps Reconciliation

Kubernetes Multi-Tenancy with Namespaces and Network Policies: A Practical Guide for GitOps Teams