A Modern Terraform Reference Architecture for Amazon EKS

A Modern Terraform Reference Architecture for Amazon EKS

Most EKS failures are not Kubernetes failures. They are boundary failures between Terraform state, VPC capacity, node provisioning, and workload identity. This guide lays out a production-ready reference architecture that keeps those seams explicit.

TL;DR

A modern Terraform reference architecture for Amazon EKS should separate network, cluster, and add-on state; reserve private subnet capacity for control-plane ENIs and pods; keep a small stable baseline of managed nodes; use Karpenter for bursty or heterogeneous workloads; and choose workload identity deliberately instead of treating IRSA and EKS Pod Identity as interchangeable. The goal is not just to create a cluster, but to make upgrades, add-on lifecycle, IAM boundaries, and node replacement predictable. If you design those boundaries early, EKS gets much easier to operate.

Generated diagram showing network, cluster, add-on, and compute boundaries in a modern Terraform reference architecture for Amazon EKS.
A production EKS architecture works better when Terraform state, networking, compute, identity, and add-on ownership are treated as explicit seams.

The Real Design Problem Is Not Creating EKS. It Is Owning the Seams.

You can stand up Amazon EKS with Terraform in a weekend. The harder part is keeping the platform understandable six months later when you are upgrading Kubernetes, rotating AMIs, adding a storage driver, or trying to explain why a workload can reach S3 from one namespace but not another.

A modern reference architecture has to make those boundaries explicit:

  • Terraform state boundaries
  • VPC and subnet capacity boundaries
  • compute ownership boundaries between managed node groups and Karpenter
  • identity boundaries between node roles, IRSA, and EKS Pod Identity
  • add-on ownership boundaries between Amazon EKS and your own day-2 tooling

If those seams are vague, EKS feels random. If those seams are designed up front, the platform behaves like an engineered system.

Three AWS details should shape the design immediately:

  • An EKS cluster must be created with at least two subnets in different Availability Zones, and EKS creates 2-4 control-plane ENIs in the subnets you provide. Those subnets need spare IP capacity, not just enough room for nodes. AWS docs
  • Cluster subnets need at least six available IPs each, and AWS recommends at least sixteen. That recommendation matters during upgrades, when control-plane ENIs are replaced. AWS docs
  • The S3 backend supports native lockfiles with use_lockfile = true, while DynamoDB locking is now documented as deprecated for that backend. Bucket versioning is strongly recommended for state recovery. HashiCorp docs

That is the baseline for the architecture below.

1. Start With State Boundaries, Not Cluster Resources

Terraform state is part of the architecture. If you collapse VPCs, clusters, add-ons, and application extras into one state file, every routine change gets a wider blast radius than it needs.

The cleaner split for EKS platforms is usually:

StateOwnsWhy it should stay separate
networkVPC, subnets, route tables, NAT, endpointsLong-lived and shared by more than one cluster or stack
clusterEKS control plane, access config, encryption, baseline node groupsCluster lifecycle changes are high-impact and deserve narrow plans
addonsEKS managed add-ons, workload IAM roles, Pod Identity associationsChanges often, but should not re-plan the VPC or cluster role chain
platform-servicesingress, external-dns, cert-manager, GitOps bootstrapOften shifts toward Helm or GitOps ownership later

Use S3 remote state with versioning and lockfiles. If you operate in multiple AWS accounts, use an administrative execution role pattern and keep state keys environment-specific rather than stuffing everything into workspaces and hoping naming discipline will save you.

terraform {
  required_version = ">= 1.9.0"

  backend "s3" {
    bucket       = "company-terraform-state"
    key          = "prod/eu-west-1/eks-platform/cluster.tfstate"
    region       = "eu-west-1"
    use_lockfile = true
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.36"
    }
  }
}

provider "aws" {
  region = var.region

  assume_role {
    role_arn = var.terraform_execution_role_arn
  }

  default_tags {
    tags = {
      Environment = var.environment
      Platform    = "eks"
      ManagedBy   = "terraform"
    }
  }
}

Why this holds up better:

  • network changes stay rare and deliberate.
  • cluster plans stay readable during Kubernetes upgrades.
  • addons can move faster without re-planning the whole platform.
  • platform-services can later hand off to GitOps without a state migration crisis.

This is one of the biggest differences between a demo and a reference architecture.

2. Design the VPC for Control-Plane ENIs and Pod Density Before You Design Nodes

Most EKS networking mistakes come from thinking only about worker nodes. AWS is explicit that the cluster subnets are also where the control plane creates ENIs, and those ENIs are replaced during cluster version updates. That is why tiny, overused subnets create upgrade pain before you ever hit an application outage. AWS docs

The baseline I would use for a production EKS VPC is:

  • private subnets for nodes and control-plane ENIs
  • tightly scoped public endpoint CIDRs, or private-only API access if your operator path supports it
  • dedicated endpoint strategy for private clusters
  • conscious pod-density tuning on the VPC CNI instead of accepting default IPv4 exhaustion behavior

For EC2-backed clusters, Amazon EKS add-ons are the clean default for networking components. AWS now recommends EKS add-ons over self-managed versions because updates and centralized configuration are simpler through AWS APIs. AWS docs

resource "aws_eks_cluster" "this" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.kubernetes_version

  access_config {
    authentication_mode = "API_AND_CONFIG_MAP"
  }

  enabled_cluster_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler",
  ]

  encryption_config {
    resources = ["secrets"]

    provider {
      key_arn = aws_kms_key.eks_secrets.arn
    }
  }

  kubernetes_network_config {
    service_ipv4_cidr = "172.20.0.0/16"
  }

  vpc_config {
    subnet_ids              = var.cluster_subnet_ids
    endpoint_private_access = true
    endpoint_public_access  = true
    public_access_cidrs     = var.admin_cidrs
  }
}

A few practical rules matter more than they look:

  • If you put managed node groups in public subnets, MapPublicIpOnLaunch must be enabled or nodes will fail to join. AWS docs
  • If you keep nodes in private subnets, they still need image and API reachability. That usually means NAT plus selected endpoints, or a more complete PrivateLink design. AWS docs
  • If you are tight on IPv4, enable prefix delegation on the VPC CNI and design for contiguous /28 availability in the subnet. AWS notes that prefix attachment is typically faster than attaching a new ENI and is the main lever for higher pod density on EC2 nodes. AWS docs

The trap is easy to spot in hindsight: teams scale nodes to fix pending Pods when the real bottleneck is subnet address economics.

3. Keep a Stable Baseline With Managed Node Groups, Then Add Karpenter Where It Actually Helps

The cleanest modern EKS compute pattern is not "managed node groups or Karpenter." It is "managed node groups first, Karpenter where the workload shape justifies it."

Managed node groups are still the right default for:

  • system-critical baseline capacity
  • ingress and DNS paths you do not want tied to burst autoscaling
  • clusters that need predictable rolling update semantics
  • launch-template-heavy customization

AWS-managed node groups automatically drain nodes during updates and terminations, support node auto repair, and let you keep the EC2 mechanics inside your AWS account without manually wiring ASGs. AWS docs

resource "aws_eks_node_group" "system" {
  cluster_name    = aws_eks_cluster.this.name
  node_group_name = "${var.cluster_name}-system"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_node_subnet_ids
  capacity_type   = "ON_DEMAND"
  instance_types  = ["m7i.large"]
  ami_type        = "AL2023_x86_64_STANDARD"

  scaling_config {
    min_size     = 2
    desired_size = 3
    max_size     = 6
  }

  update_config {
    max_unavailable_percentage = 25
  }

  node_repair_config {
    enabled = true
  }

  labels = {
    "node-role" = "system"
  }

  taint {
    key    = "CriticalAddonsOnly"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Karpenter is strongest when capacity is variable, instance diversity matters, or Pod scheduling constraints are too awkward to model with a pile of node groups. AWS says it best: Karpenter fits workloads with changing capacity needs, while managed node groups and ASGs remain a good fit for more static, consistent workloads. AWS docs

That architecture choice has two important implications:

  • keep at least one small non-Karpenter baseline pool so Karpenter itself is not responsible for hosting its own controller
  • let Karpenter handle bursty, specialized, or Spot-heavy workloads rather than replacing every node group on day one

Karpenter’s modern APIs also change how you think about compute. NodePool defines the scheduling envelope and disruption policy; EC2NodeClass defines AWS launch details such as subnet selectors, security-group selectors, role, and AMI selection. Karpenter nodeclass Karpenter docs

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: apps
spec:
  role: eks-node-role-prod
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: prod-eks
        tier: private
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: prod-eks
  amiSelectorTerms:
    - alias: al2023@v20240807
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: apps
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: apps
      expireAfter: 336h
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
    budgets:
      - nodes: "10%"

Two trade-offs here are easy to miss:

  • Pin the AMI in production. Karpenter explicitly warns against @latest in production because a new AMI can drift nodes into an untested image. Karpenter docs
  • Treat expireAfter as an upper bound, not a promise. Karpenter can still replace nodes earlier due to consolidation or drift if budgets allow it. Karpenter docs Karpenter docs

That is why managed baseline plus Karpenter burst capacity is such a durable design: you get fast Pod-driven scaling without making every system component depend on Karpenter’s disruption logic.

4. EKS Pod Identity and IRSA Solve Similar Problems, but They Change Different Parts of the System

Teams often talk about IRSA and EKS Pod Identity as if one is just a newer spelling of the other. They are not the same operational model.

IRSA

  • requires a cluster-specific OIDC provider
  • uses AssumeRoleWithWebIdentity
  • stays useful where OIDC-based service account federation is already part of the pattern
  • creates extra setup friction in private environments because the cluster OIDC issuer is outside the VPC endpoint path

EKS Pod Identity

  • does not require an IAM OIDC provider
  • uses a simpler trust principal, pods.eks.amazonaws.com
  • relies on the EKS Pod Identity Agent running on nodes
  • is easier to reuse across clusters because the role trust is not tied to a cluster-specific OIDC issuer

AWS explicitly recommends Pod Identity as the preferred way to grant IAM permissions to EKS add-ons and positions it as the simpler method for workload credentials on supported EC2-backed EKS clusters. AWS add-ons-iam AWS docs

The catch is that support boundaries still matter:

  • Pod Identity requires the Pod Identity Agent. AWS docs
  • The agent needs node-role permission for eks-auth:AssumeRoleForPodIdentity. AWS docs
  • In private clusters, nodes need the eks-auth PrivateLink endpoint for Pod Identity to work. AWS docs
  • Pod Identity is not available for all compute combinations, including Fargate and Windows pods. AWS docs

That means the reference architecture decision is usually:

  • default to Pod Identity for new EC2-backed workloads and supported add-ons
  • keep IRSA for controllers or environments where Pod Identity is not the right fit yet
data "aws_iam_policy_document" "pod_identity_trust" {
  statement {
    effect = "Allow"

    principals {
      type        = "Service"
      identifiers = ["pods.eks.amazonaws.com"]
    }

    actions = [
      "sts:AssumeRole",
      "sts:TagSession",
    ]
  }
}

resource "aws_iam_role" "external_dns" {
  name               = "${var.cluster_name}-external-dns"
  assume_role_policy = data.aws_iam_policy_document.pod_identity_trust.json
}

resource "aws_eks_pod_identity_association" "external_dns" {
  cluster_name    = aws_eks_cluster.this.name
  namespace       = "dns"
  service_account = "external-dns"
  role_arn        = aws_iam_role.external_dns.arn
}

The most important architectural payoff is not fewer lines of IAM. It is cleaner separation of responsibilities between the platform team managing EKS associations and the IAM team managing reusable role trust and policies.

5. Prefer Managed Add-ons for the Baseline, and Make IAM Explicit for the Ones That Need It

If your cluster baseline still installs core components as random Helm releases because "that is how we always did it," you are leaving operational clarity on the table.

AWS recommends EKS add-ons over self-managed networking add-ons, and the same general logic applies to baseline platform components that AWS actively manages through the EKS add-on API. AWS docs

For a conventional EC2-backed baseline, I would usually manage these in Terraform:

  • vpc-cni
  • coredns
  • kube-proxy
  • eks-pod-identity-agent
  • aws-ebs-csi-driver

Whether every one of those belongs in your exact baseline depends on your storage and compute model, but the general principle is stable: if the add-on is foundational, versioned by AWS, and tied to cluster compatibility, treat it as infrastructure.

AWS also notes that some add-ons need IAM permissions, and add-ons can now manage Pod Identity associations through the add-on APIs themselves. If both IRSA and Pod Identity settings are supplied and the Pod Identity Agent is installed, EKS prefers Pod Identity and ignores the IRSA role for that add-on. AWS docs

resource "aws_eks_addon" "pod_identity_agent" {
  cluster_name = aws_eks_cluster.this.name
  addon_name   = "eks-pod-identity-agent"
}

resource "aws_eks_addon" "vpc_cni" {
  cluster_name                = aws_eks_cluster.this.name
  addon_name                  = "vpc-cni"
  resolve_conflicts_on_create = "OVERWRITE"
  resolve_conflicts_on_update = "PRESERVE"

  pod_identity_association {
    service_account = "aws-node"
    role_arn        = aws_iam_role.vpc_cni.arn
  }

  depends_on = [aws_eks_addon.pod_identity_agent]
}

resource "aws_eks_addon" "coredns" {
  cluster_name                = aws_eks_cluster.this.name
  addon_name                  = "coredns"
  resolve_conflicts_on_create = "OVERWRITE"
  resolve_conflicts_on_update = "PRESERVE"
}

This is the operational trade-off:

  • managed add-ons reduce upgrade ambiguity and centralize compatibility
  • self-managed installs can still be justified when you need a configuration surface or release cadence that AWS does not expose yet

That is a real trade-off, not a purity test. But for the reference architecture, managed first is the cleaner default.

Why This Architecture Holds Up Better in Production

The point of a reference architecture is not that it is the only correct design. The point is that it remains legible under stress.

This one does because each major concern has a clear owner:

ConcernDefault ownerWhy
VPC shape, subnet math, endpointsTerraform network stateSlow-changing and shared
Cluster version, access mode, encryption, loggingTerraform cluster stateNeeds explicit plans and review
Stable system capacitymanaged node groupsPredictable lifecycle and repair
Bursty or mixed instance capacityKarpenterPod-driven placement and diversity
Workload AWS credentialsPod Identity first, IRSA where neededLeast privilege without overloading node roles
Baseline compatibility add-onsTerraform addons stateTied to cluster lifecycle, not app deploys

That separation is what keeps common EKS incidents boring:

  • cluster upgrades do not become subnet surprises
  • add-on updates do not become IAM archaeology
  • AMI rotation does not become hidden drift
  • scaling does not collapse into "just add another node group"

The architecture is modern because it respects what the platform has become: EKS is no longer only a Kubernetes API endpoint. It is a set of interacting control planes for networking, identity, storage, and compute, and Terraform is useful only when it makes those interactions easier to reason about.

Frequently Asked Questions

Q: What is the smallest production-safe compute pattern for EKS? A: A small On-Demand managed node group for system-critical components plus Karpenter for bursty application capacity is a strong default. It gives you a stable place for baseline add-ons and a safe rollback path if Karpenter policy needs adjustment.

Q: Should I use only private endpoints for the Kubernetes API? A: Use private-only access if your operator path, CI runners, and emergency access model are already inside the VPC or connected network. If not, keep private access enabled and restrict public access to known admin CIDRs until you can remove the public endpoint safely.

Q: Is EKS Pod Identity a full replacement for IRSA? A: Not universally. Pod Identity is simpler for many EC2-backed workloads and is the AWS-recommended path for add-on IAM, but IRSA still matters for support gaps and existing OIDC-based patterns. The right choice depends on workload support and network constraints, not on fashion.

Q: When is Karpenter the wrong first move? A: Karpenter is the wrong first move when your real problem is weak subnet design, poor resource requests, or add-on sprawl. It improves node provisioning, but it does not rescue a cluster with bad IP economics, unclear ownership, or missing workload constraints.

Q: What should I review before every EKS minor upgrade? A: Review subnet headroom, add-on compatibility, node AMI strategy, Pod Identity or IRSA dependencies, and disruption policies for both managed node groups and Karpenter. Most upgrade pain comes from those dependencies, not from the version bump itself.

Resources

Comments

Popular posts from this blog

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Argo CD Auto-Sync and Health Checks: An Operator's Guide to Safe GitOps Reconciliation

Kubernetes Multi-Tenancy with Namespaces and Network Policies: A Practical Guide for GitOps Teams