A Modern Terraform Reference Architecture for Amazon EKS
A Modern Terraform Reference Architecture for Amazon EKS
Most EKS failures are not Kubernetes failures. They are boundary failures between Terraform state, VPC capacity, node provisioning, and workload identity. This guide lays out a production-ready reference architecture that keeps those seams explicit.
TL;DR
A modern Terraform reference architecture for Amazon EKS should separate network, cluster, and add-on state; reserve private subnet capacity for control-plane ENIs and pods; keep a small stable baseline of managed nodes; use Karpenter for bursty or heterogeneous workloads; and choose workload identity deliberately instead of treating IRSA and EKS Pod Identity as interchangeable. The goal is not just to create a cluster, but to make upgrades, add-on lifecycle, IAM boundaries, and node replacement predictable. If you design those boundaries early, EKS gets much easier to operate.
The Real Design Problem Is Not Creating EKS. It Is Owning the Seams.
You can stand up Amazon EKS with Terraform in a weekend. The harder part is keeping the platform understandable six months later when you are upgrading Kubernetes, rotating AMIs, adding a storage driver, or trying to explain why a workload can reach S3 from one namespace but not another.
A modern reference architecture has to make those boundaries explicit:
- Terraform state boundaries
- VPC and subnet capacity boundaries
- compute ownership boundaries between managed node groups and Karpenter
- identity boundaries between node roles, IRSA, and EKS Pod Identity
- add-on ownership boundaries between Amazon EKS and your own day-2 tooling
If those seams are vague, EKS feels random. If those seams are designed up front, the platform behaves like an engineered system.
Three AWS details should shape the design immediately:
- An EKS cluster must be created with at least two subnets in different Availability Zones, and EKS creates 2-4 control-plane ENIs in the subnets you provide. Those subnets need spare IP capacity, not just enough room for nodes. AWS docs
- Cluster subnets need at least six available IPs each, and AWS recommends at least sixteen. That recommendation matters during upgrades, when control-plane ENIs are replaced. AWS docs
- The S3 backend supports native lockfiles with
use_lockfile = true, while DynamoDB locking is now documented as deprecated for that backend. Bucket versioning is strongly recommended for state recovery. HashiCorp docs
That is the baseline for the architecture below.
1. Start With State Boundaries, Not Cluster Resources
Terraform state is part of the architecture. If you collapse VPCs, clusters, add-ons, and application extras into one state file, every routine change gets a wider blast radius than it needs.
The cleaner split for EKS platforms is usually:
| State | Owns | Why it should stay separate |
|---|---|---|
network | VPC, subnets, route tables, NAT, endpoints | Long-lived and shared by more than one cluster or stack |
cluster | EKS control plane, access config, encryption, baseline node groups | Cluster lifecycle changes are high-impact and deserve narrow plans |
addons | EKS managed add-ons, workload IAM roles, Pod Identity associations | Changes often, but should not re-plan the VPC or cluster role chain |
platform-services | ingress, external-dns, cert-manager, GitOps bootstrap | Often shifts toward Helm or GitOps ownership later |
Use S3 remote state with versioning and lockfiles. If you operate in multiple AWS accounts, use an administrative execution role pattern and keep state keys environment-specific rather than stuffing everything into workspaces and hoping naming discipline will save you.
terraform {
required_version = ">= 1.9.0"
backend "s3" {
bucket = "company-terraform-state"
key = "prod/eu-west-1/eks-platform/cluster.tfstate"
region = "eu-west-1"
use_lockfile = true
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 6.36"
}
}
}
provider "aws" {
region = var.region
assume_role {
role_arn = var.terraform_execution_role_arn
}
default_tags {
tags = {
Environment = var.environment
Platform = "eks"
ManagedBy = "terraform"
}
}
}
Why this holds up better:
networkchanges stay rare and deliberate.clusterplans stay readable during Kubernetes upgrades.addonscan move faster without re-planning the whole platform.platform-servicescan later hand off to GitOps without a state migration crisis.
This is one of the biggest differences between a demo and a reference architecture.
2. Design the VPC for Control-Plane ENIs and Pod Density Before You Design Nodes
Most EKS networking mistakes come from thinking only about worker nodes. AWS is explicit that the cluster subnets are also where the control plane creates ENIs, and those ENIs are replaced during cluster version updates. That is why tiny, overused subnets create upgrade pain before you ever hit an application outage. AWS docs
The baseline I would use for a production EKS VPC is:
- private subnets for nodes and control-plane ENIs
- tightly scoped public endpoint CIDRs, or private-only API access if your operator path supports it
- dedicated endpoint strategy for private clusters
- conscious pod-density tuning on the VPC CNI instead of accepting default IPv4 exhaustion behavior
For EC2-backed clusters, Amazon EKS add-ons are the clean default for networking components. AWS now recommends EKS add-ons over self-managed versions because updates and centralized configuration are simpler through AWS APIs. AWS docs
resource "aws_eks_cluster" "this" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
access_config {
authentication_mode = "API_AND_CONFIG_MAP"
}
enabled_cluster_log_types = [
"api",
"audit",
"authenticator",
"controllerManager",
"scheduler",
]
encryption_config {
resources = ["secrets"]
provider {
key_arn = aws_kms_key.eks_secrets.arn
}
}
kubernetes_network_config {
service_ipv4_cidr = "172.20.0.0/16"
}
vpc_config {
subnet_ids = var.cluster_subnet_ids
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = var.admin_cidrs
}
}
A few practical rules matter more than they look:
- If you put managed node groups in public subnets,
MapPublicIpOnLaunchmust be enabled or nodes will fail to join. AWS docs - If you keep nodes in private subnets, they still need image and API reachability. That usually means NAT plus selected endpoints, or a more complete PrivateLink design. AWS docs
- If you are tight on IPv4, enable prefix delegation on the VPC CNI and design for contiguous
/28availability in the subnet. AWS notes that prefix attachment is typically faster than attaching a new ENI and is the main lever for higher pod density on EC2 nodes. AWS docs
The trap is easy to spot in hindsight: teams scale nodes to fix pending Pods when the real bottleneck is subnet address economics.
3. Keep a Stable Baseline With Managed Node Groups, Then Add Karpenter Where It Actually Helps
The cleanest modern EKS compute pattern is not "managed node groups or Karpenter." It is "managed node groups first, Karpenter where the workload shape justifies it."
Managed node groups are still the right default for:
- system-critical baseline capacity
- ingress and DNS paths you do not want tied to burst autoscaling
- clusters that need predictable rolling update semantics
- launch-template-heavy customization
AWS-managed node groups automatically drain nodes during updates and terminations, support node auto repair, and let you keep the EC2 mechanics inside your AWS account without manually wiring ASGs. AWS docs
resource "aws_eks_node_group" "system" {
cluster_name = aws_eks_cluster.this.name
node_group_name = "${var.cluster_name}-system"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_node_subnet_ids
capacity_type = "ON_DEMAND"
instance_types = ["m7i.large"]
ami_type = "AL2023_x86_64_STANDARD"
scaling_config {
min_size = 2
desired_size = 3
max_size = 6
}
update_config {
max_unavailable_percentage = 25
}
node_repair_config {
enabled = true
}
labels = {
"node-role" = "system"
}
taint {
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_SCHEDULE"
}
}
Karpenter is strongest when capacity is variable, instance diversity matters, or Pod scheduling constraints are too awkward to model with a pile of node groups. AWS says it best: Karpenter fits workloads with changing capacity needs, while managed node groups and ASGs remain a good fit for more static, consistent workloads. AWS docs
That architecture choice has two important implications:
- keep at least one small non-Karpenter baseline pool so Karpenter itself is not responsible for hosting its own controller
- let Karpenter handle bursty, specialized, or Spot-heavy workloads rather than replacing every node group on day one
Karpenter’s modern APIs also change how you think about compute. NodePool defines the scheduling envelope and disruption policy; EC2NodeClass defines AWS launch details such as subnet selectors, security-group selectors, role, and AMI selection. Karpenter nodeclass Karpenter docs
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: apps
spec:
role: eks-node-role-prod
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: prod-eks
tier: private
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: prod-eks
amiSelectorTerms:
- alias: al2023@v20240807
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: apps
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: apps
expireAfter: 336h
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
budgets:
- nodes: "10%"
Two trade-offs here are easy to miss:
- Pin the AMI in production. Karpenter explicitly warns against
@latestin production because a new AMI can drift nodes into an untested image. Karpenter docs - Treat
expireAfteras an upper bound, not a promise. Karpenter can still replace nodes earlier due to consolidation or drift if budgets allow it. Karpenter docs Karpenter docs
That is why managed baseline plus Karpenter burst capacity is such a durable design: you get fast Pod-driven scaling without making every system component depend on Karpenter’s disruption logic.
4. EKS Pod Identity and IRSA Solve Similar Problems, but They Change Different Parts of the System
Teams often talk about IRSA and EKS Pod Identity as if one is just a newer spelling of the other. They are not the same operational model.
IRSA
- requires a cluster-specific OIDC provider
- uses
AssumeRoleWithWebIdentity - stays useful where OIDC-based service account federation is already part of the pattern
- creates extra setup friction in private environments because the cluster OIDC issuer is outside the VPC endpoint path
EKS Pod Identity
- does not require an IAM OIDC provider
- uses a simpler trust principal,
pods.eks.amazonaws.com - relies on the EKS Pod Identity Agent running on nodes
- is easier to reuse across clusters because the role trust is not tied to a cluster-specific OIDC issuer
AWS explicitly recommends Pod Identity as the preferred way to grant IAM permissions to EKS add-ons and positions it as the simpler method for workload credentials on supported EC2-backed EKS clusters. AWS add-ons-iam AWS docs
The catch is that support boundaries still matter:
- Pod Identity requires the Pod Identity Agent. AWS docs
- The agent needs node-role permission for
eks-auth:AssumeRoleForPodIdentity. AWS docs - In private clusters, nodes need the
eks-authPrivateLink endpoint for Pod Identity to work. AWS docs - Pod Identity is not available for all compute combinations, including Fargate and Windows pods. AWS docs
That means the reference architecture decision is usually:
- default to Pod Identity for new EC2-backed workloads and supported add-ons
- keep IRSA for controllers or environments where Pod Identity is not the right fit yet
data "aws_iam_policy_document" "pod_identity_trust" {
statement {
effect = "Allow"
principals {
type = "Service"
identifiers = ["pods.eks.amazonaws.com"]
}
actions = [
"sts:AssumeRole",
"sts:TagSession",
]
}
}
resource "aws_iam_role" "external_dns" {
name = "${var.cluster_name}-external-dns"
assume_role_policy = data.aws_iam_policy_document.pod_identity_trust.json
}
resource "aws_eks_pod_identity_association" "external_dns" {
cluster_name = aws_eks_cluster.this.name
namespace = "dns"
service_account = "external-dns"
role_arn = aws_iam_role.external_dns.arn
}
The most important architectural payoff is not fewer lines of IAM. It is cleaner separation of responsibilities between the platform team managing EKS associations and the IAM team managing reusable role trust and policies.
5. Prefer Managed Add-ons for the Baseline, and Make IAM Explicit for the Ones That Need It
If your cluster baseline still installs core components as random Helm releases because "that is how we always did it," you are leaving operational clarity on the table.
AWS recommends EKS add-ons over self-managed networking add-ons, and the same general logic applies to baseline platform components that AWS actively manages through the EKS add-on API. AWS docs
For a conventional EC2-backed baseline, I would usually manage these in Terraform:
vpc-cnicorednskube-proxyeks-pod-identity-agentaws-ebs-csi-driver
Whether every one of those belongs in your exact baseline depends on your storage and compute model, but the general principle is stable: if the add-on is foundational, versioned by AWS, and tied to cluster compatibility, treat it as infrastructure.
AWS also notes that some add-ons need IAM permissions, and add-ons can now manage Pod Identity associations through the add-on APIs themselves. If both IRSA and Pod Identity settings are supplied and the Pod Identity Agent is installed, EKS prefers Pod Identity and ignores the IRSA role for that add-on. AWS docs
resource "aws_eks_addon" "pod_identity_agent" {
cluster_name = aws_eks_cluster.this.name
addon_name = "eks-pod-identity-agent"
}
resource "aws_eks_addon" "vpc_cni" {
cluster_name = aws_eks_cluster.this.name
addon_name = "vpc-cni"
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
pod_identity_association {
service_account = "aws-node"
role_arn = aws_iam_role.vpc_cni.arn
}
depends_on = [aws_eks_addon.pod_identity_agent]
}
resource "aws_eks_addon" "coredns" {
cluster_name = aws_eks_cluster.this.name
addon_name = "coredns"
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
}
This is the operational trade-off:
- managed add-ons reduce upgrade ambiguity and centralize compatibility
- self-managed installs can still be justified when you need a configuration surface or release cadence that AWS does not expose yet
That is a real trade-off, not a purity test. But for the reference architecture, managed first is the cleaner default.
Why This Architecture Holds Up Better in Production
The point of a reference architecture is not that it is the only correct design. The point is that it remains legible under stress.
This one does because each major concern has a clear owner:
| Concern | Default owner | Why |
|---|---|---|
| VPC shape, subnet math, endpoints | Terraform network state | Slow-changing and shared |
| Cluster version, access mode, encryption, logging | Terraform cluster state | Needs explicit plans and review |
| Stable system capacity | managed node groups | Predictable lifecycle and repair |
| Bursty or mixed instance capacity | Karpenter | Pod-driven placement and diversity |
| Workload AWS credentials | Pod Identity first, IRSA where needed | Least privilege without overloading node roles |
| Baseline compatibility add-ons | Terraform addons state | Tied to cluster lifecycle, not app deploys |
That separation is what keeps common EKS incidents boring:
- cluster upgrades do not become subnet surprises
- add-on updates do not become IAM archaeology
- AMI rotation does not become hidden drift
- scaling does not collapse into "just add another node group"
The architecture is modern because it respects what the platform has become: EKS is no longer only a Kubernetes API endpoint. It is a set of interacting control planes for networking, identity, storage, and compute, and Terraform is useful only when it makes those interactions easier to reason about.
Frequently Asked Questions
Q: What is the smallest production-safe compute pattern for EKS? A: A small On-Demand managed node group for system-critical components plus Karpenter for bursty application capacity is a strong default. It gives you a stable place for baseline add-ons and a safe rollback path if Karpenter policy needs adjustment.
Q: Should I use only private endpoints for the Kubernetes API? A: Use private-only access if your operator path, CI runners, and emergency access model are already inside the VPC or connected network. If not, keep private access enabled and restrict public access to known admin CIDRs until you can remove the public endpoint safely.
Q: Is EKS Pod Identity a full replacement for IRSA? A: Not universally. Pod Identity is simpler for many EC2-backed workloads and is the AWS-recommended path for add-on IAM, but IRSA still matters for support gaps and existing OIDC-based patterns. The right choice depends on workload support and network constraints, not on fashion.
Q: When is Karpenter the wrong first move? A: Karpenter is the wrong first move when your real problem is weak subnet design, poor resource requests, or add-on sprawl. It improves node provisioning, but it does not rescue a cluster with bad IP economics, unclear ownership, or missing workload constraints.
Q: What should I review before every EKS minor upgrade? A: Review subnet headroom, add-on compatibility, node AMI strategy, Pod Identity or IRSA dependencies, and disruption policies for both managed node groups and Karpenter. Most upgrade pain comes from those dependencies, not from the version bump itself.
Resources
- Terraform S3 backend
- Terraform remote state backends
- Amazon EKS networking requirements
- Amazon EKS managed node groups
- Amazon EKS add-ons
- IAM roles for Amazon EKS add-ons
- EKS Pod Identity
- IAM roles for service accounts
- Amazon EKS Pod Identity Agent
- Amazon EKS prefix mode for Linux
- Amazon EKS best practices for Karpenter
- Karpenter NodePools
- Karpenter EC2NodeClasses
- Karpenter Managing AMIs
Comments
Post a Comment