Terraform Security Best Practices for AWS: IAM Is Only One Layer

Terraform security on AWS is not just a least-privilege IAM exercise. This guide shows how to harden runner identities, trust policies, state backends, validation gates, and provider dependencies so infrastructure changes fail safely instead of failing live.

TL;DR

Terraform security on AWS breaks in four places: the identity that runs Terraform, the policies and trust relationships it can assume, the state and plan artifacts it writes, and the provider or module dependencies it downloads. AWS and HashiCorp documentation point to a safer pattern: use temporary credentials and scoped roles, cap delegated access with permissions boundaries and organization guardrails, store state remotely with locking and recovery controls, keep secrets out of plans when possible, and pin plus lock providers and modules so reviewable code is the thing that changes. IAM matters, but it is not the whole attack surface.

Generated diagram showing Terraform runner identity, IAM boundaries, remote state protection, and provider lock controls for AWS Terraform security. — Terraform security is layered: execution identity, trust and policy boundaries, protected state, and reproducible provider dependencies.

Terraform Security Usually Breaks Long Before an IAM Policy Looks Obviously Wrong

The easy version of Terraform security advice is "use least privilege." The harder and more useful version is this: a Terraform run is a supply chain with credentials. If the runner can assume too much, if the trust policy is loose, if the state bucket is readable, or if a provider upgrade lands without review, you can get a clean terraform apply and still have a poor security posture.

AWS and HashiCorp documentation point to a layered model that is more reliable in production. The effective permissions of a Terraform run can be shaped by identity-based policies, permissions boundaries, service control policies, session policies, and resource-based policies. Terraform then adds its own surfaces: provider authentication, state, plan files, dependency selection, and module sourcing.

Key control surfaces:

4 places Terraform security usually fails: runner identity, AWS trust and policy evaluation, state and plan artifacts, and provider or module dependency selection.
3 AWS permission layers commonly involved in execution: identity policies, permissions boundaries, and SCPs, with session and resource policies changing the final result.

The practical question is not "did we write an IAM policy?" It is "what is the smallest blast radius for the thing that runs Terraform, and how do we prove it before apply?"

1. Secure the Execution Identity First

If the Terraform runner starts with the wrong kind of credentials, every later control is already weaker than it should be.

AWS recommends temporary credentials over long-term access keys, and HashiCorp's AWS provider guidance says provider credentials in configuration are not recommended because configuration is routinely shared through version control. In practice, that leads to a better default:

human operators authenticate through federation and receive temporary credentials
CI or automation assumes a purpose-built role instead of carrying static keys
cross-account runners use explicit trust policies instead of ambient admin credentials

For multi-account AWS, HashiCorp's AssumeRole tutorial is still the cleanest pattern because it keeps the execution identity separate from the target account. If Terraform is run by a third party or a shared SaaS runner, AWS also recommends requiring an ExternalId in the trust policy so the role cannot be assumed accidentally or on behalf of the wrong tenant.

terraform {
  required_version = "~> 1.13.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region

  assume_role {
    role_arn     = var.terraform_execution_role_arn
    session_name = "terraform-ci"
  }
}

data "aws_iam_policy_document" "terraform_trust" {
  statement {
    sid     = "AllowTerraformRunner"
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "AWS"
      identifiers = [var.ci_principal_arn]
    }

    condition {
      test     = "StringEquals"
      variable = "sts:ExternalId"
      values   = [var.external_id]
    }
  }
}

resource "aws_iam_role" "terraform_execution" {
  name                 = "terraform-execution"
  assume_role_policy   = data.aws_iam_policy_document.terraform_trust.json
  permissions_boundary = aws_iam_policy.terraform_boundary.arn
}

The permissions boundary is the part many teams skip. AWS is explicit that a permissions boundary sets the maximum permissions identity-based policies can grant; it does not grant permissions on its own. That makes it valuable when you want developers or platform automation to create roles without letting them silently mint administrator-level access inside an account.

Trade-off: permissions boundaries improve delegation safety, but they also make troubleshooting more complex because the effective result is an intersection of multiple policy types. That is the right trade for production accounts. The wrong trade is letting a Terraform runner operate with broad credentials because debugging boundary math feels inconvenient.

One more nuance from the AWS docs matters here: a permissions boundary is not the entire access model. You still need to review resource-based policies, trust policies, session policies, and organization guardrails. A role that looks constrained from the identity side can still be part of a larger access path if the surrounding policies are sloppy.

2. Treat Terraform State and Plan Artifacts as Secret Material

Terraform state is not a harmless implementation detail. HashiCorp documents that marking a variable as sensitive redacts it from normal CLI output, but Terraform still records sensitive values in state. HashiCorp also documents that terraform output -json, terraform show -json, and local state files can reveal sensitive values in plain text.

That means state deserves the same handling you would give a production credential store:

keep state remote instead of leaving it on developer laptops
encrypt it at rest
lock it for writes
restrict who can read it
plan for recovery when someone deletes or corrupts it

The S3 backend docs are unusually direct on this point. HashiCorp recommends enabling bucket versioning for recovery, documents use_lockfile = true for S3 state locking, and warns against hardcoding sensitive backend credentials or passing them casually through backend config because those values can end up in the .terraform directory and plan files.

terraform {
  backend "s3" {
    bucket       = "org-terraform-state-prod"
    key          = "network/eu-west-1/terraform.tfstate"
    region       = "eu-west-1"
    encrypt      = true
    use_lockfile = true
  }
}

variable "db_password" {
  type        = string
  description = "Injected at runtime; omitted from plan and state."
  sensitive   = true
  ephemeral   = true
}

That ephemeral = true line is the more technical part of the story. HashiCorp documents that ephemeral values are omitted from state and plan files entirely, while sensitive = true only redacts display. If you are on Terraform 1.10 or later, ephemeral values are a better fit for runtime-only data that should never persist after the run.

Trade-off: remote state is safer than local state, but the backend becomes a privileged system in its own right. Anyone who can read the backend often has the ability to recover secrets, inspect topology, and understand how your environment is wired. Treat access to the state bucket, lock files, audit logs, and backup copies as production access, not as harmless tooling access.

3. Validate Policies and Trust Documents Before `apply`

Most teams validate Terraform syntax. Fewer validate the IAM documents Terraform is about to deploy.

AWS gives you a better path with IAM Access Analyzer policy validation. The IAM docs and AWS CLI reference show that you can validate identity policies, resource policies, and even role trust policies before attachment. That matters because some failures are not syntax errors. They are overly permissive patterns, malformed ARNs, incorrect condition keys, or the wrong STS action in a trust policy.

A minimal CI gate can validate both the execution policy and the trust policy generated by your Terraform code:

terraform fmt -check
terraform validate
terraform plan -out=tfplan

aws accessanalyzer validate-policy \
  --policy-type IDENTITY_POLICY \
  --policy-document file://iam/terraform-execution-policy.json

aws accessanalyzer validate-policy \
  --policy-type RESOURCE_POLICY \
  --validate-policy-resource-type AWS::IAM::AssumeRolePolicyDocument \
  --policy-document file://iam/terraform-trust-policy.json

This is also where organizations should connect AWS-native guardrails to Terraform review. If the account sits under AWS Organizations, remember that SCPs define maximum permissions at the account or OU level. A Terraform plan can look correct in isolation and still fail at runtime because an SCP blocks the action. That is not a reason to weaken the SCP. It is a sign that your pipeline needs better pre-apply validation and a clearer mental model of which layer owns what.

Trade-off: validating policies adds friction to the pipeline and forces you to version generated JSON or surface it explicitly in CI. The payoff is high. You catch bad trust relationships and over-broad permissions before they become active identities in AWS, which is much cheaper than discovering them from CloudTrail after a broad role has already been used.

If you use an external OIDC identity provider for CI, AWS also recommends OIDC federation over storing long-term credentials in the application. The trust policy then becomes the real control surface. Restrict the issuer, audience, and subject claims tightly enough that only the intended workload can assume the role.

4. Reduce Supply-Chain Surprise: Pin Providers, Lock Checksums, Pin Modules

Terraform security is not only about what your code says. It is also about which binary and provider package interprets that code.

HashiCorp's provider requirements and dependency lock file docs make the model explicit:

required_providers defines the acceptable provider source and version range
.terraform.lock.hcl records the exact selected provider version and checksums
terraform providers lock can pre-populate checksums across platforms
registry modules should use the version argument if you want reviewable, intentional upgrades

That is not bureaucracy. It is your change-control boundary.

terraform {
  required_version = "~> 1.13.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.51"
    }
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.8.1"

  name = "prod-core"
  cidr = "10.40.0.0/16"
}

terraform init
terraform providers lock \
  -platform=linux_amd64 \
  -platform=windows_amd64

HashiCorp recommends version constraints for providers, and the lock file gives you checksum-backed reproducibility. If you skip the lock file, terraform init can select a newer acceptable provider than the one your colleague or CI used previously. If you skip module version pinning, your next init can pull a newer module without a code review that clearly shows the change.

Trade-off: strict pinning slows upgrades and forces deliberate maintenance. That is a feature, not a bug, in production estates. Fast-moving provider upgrades are appropriate only when you already have a review path for schema changes, deprecations, and behavioral drift.

Why We Built It This Way

The stronger operating model is to treat Terraform as a chain of trust rather than as a single binary that happens to create infrastructure.

That leads to a design with explicit boundaries:

Layer	Security objective	Concrete control
Runner identity	No standing admin keys	Federation, OIDC, or `AssumeRole` with temporary credentials
AWS authorization	Smallest possible blast radius	Least privilege, permissions boundaries, SCPs, narrow trust policies
State and plans	No casual secret exposure	Remote backend, encryption, locking, versioning, ephemeral values
Dependencies	No surprise interpreter changes	Provider constraints, committed lock file, pinned registry modules
CI review	Catch bad auth before it is live	Access Analyzer validation, `terraform validate`, reviewed plans

The reason this holds up better is simple: every layer can fail in a different way. IAM alone cannot save a leaked state file. State encryption alone cannot save an over-broad trust policy. Provider pinning alone cannot save a runner that still has long-term access keys in a repo secret. You need all of them because the attack surface is distributed.

What To Do Next

If you want the shortest path from "we have Terraform" to "we have a defensible Terraform security model," do these in order:

Remove long-term AWS keys from human and CI Terraform workflows where federation, OIDC, or AssumeRole is possible.
Create purpose-built execution roles and cap delegated role creation with permissions boundaries.
Move state to a remote backend you can encrypt, lock, audit, and recover with versioning.
Stop assuming sensitive = true is enough for secrets; use ephemeral values where the workflow supports them.
Add IAM Access Analyzer policy validation to CI for execution and trust policies.
Commit .terraform.lock.hcl, constrain providers, and pin registry modules so upgrades are intentional.

That sequence gives you meaningful risk reduction quickly without pretending a single IAM policy solved the problem.

Frequently Asked Questions

Q: Does sensitive = true keep secrets out of Terraform state? A: No. HashiCorp documents that sensitive redacts values in normal CLI output, but Terraform still stores those values in state. If you need the value omitted from state and plan files entirely, use ephemeral values where your Terraform version and workflow support them.

Q: Should my Terraform runner use OIDC or AssumeRole? A: Use whichever gives you short-lived credentials with the narrowest trust boundary for that environment. For external CI systems, AWS explicitly recommends OIDC federation over storing long-term credentials in the application, while AssumeRole remains a strong pattern for scoped cross-account access.

Q: Do permissions boundaries replace SCPs? A: No. A permissions boundary caps what identity-based policies can grant to a specific role or user in an account. An SCP defines maximum permissions at the AWS Organizations account or OU boundary. They are complementary, not interchangeable.

Q: Why do I need both provider version constraints and .terraform.lock.hcl? A: Constraints describe the allowed range, but the lock file records the exact provider package and checksums chosen for the configuration. Together they give you controlled upgrades and reproducible installs.

Search This Blog

DevOpsDreams