AWS VPC Lattice: The Missing Service Layer Between VPC Connectivity and Application Routing

AWS VPC Lattice: The Missing Service Layer Between VPC Connectivity and Application Routing

AWS VPC Lattice is easiest to misunderstand when you treat it like another load balancer. The real value is a service-layer boundary for discovery, auth, routing, and observability across VPCs, accounts, and even on-prem entry paths.

TL;DR

AWS VPC Lattice is most useful when your problem is not raw network connectivity but service-to-service access control and routing across many boundaries. It gives you a service network abstraction, per-service listeners and target groups, IAM-backed auth policies, and request-level observability without forcing every team to hand-build PrivateLink, Route 53, and load balancer patterns from scratch. The important caveat is that it does not replace your VPC underlay, and some protocol choices, especially TLS passthrough, gRPC, and health checks, carry sharp constraints that you need to design for early.

Generated diagram showing AWS VPC Lattice as a service layer above VPC connectivity and below application teams.
VPC Lattice works best when you treat it as a service access layer, not as a substitute for the network underlay.

AWS VPC Lattice Is Not “Another Load Balancer”

That is the wrong mental model, and it leads to the wrong design choices.

If your platform problem is simply "connect VPC A to VPC B," then VPC peering, Transit Gateway, VPN, Direct Connect, or plain AWS PrivateLink are still the primary tools. The AWS PrivateLink concepts guide is clear that PrivateLink is about private connectivity through VPC endpoints. The Amazon VPC Lattice overview describes something different: a managed service layer for connecting, securing, and monitoring services and certain resource endpoints across VPC and account boundaries.

That distinction matters because most internal platform pain is not caused by missing packets. It is caused by missing service boundaries:

  • Who can call which internal service?
  • How do clients discover the right DNS name?
  • Where do path-based or weighted routing rules live?
  • How do you carry caller identity to the backend?
  • How do you get request-level logs without stitching three products together?

VPC Lattice answers those questions with a set of first-class constructs: service networks, services, listeners, target groups, auth policies, and access logs. Current AWS documentation also shows resource configurations and resource gateways, which expands the model beyond the earlier "internal HTTP microservices only" framing. If you learned VPC Lattice when it first launched, that is the first thing to update in your mental model. See the AWS guide to VPC Lattice resource configurations.

Three details from the docs should shape how you evaluate it:

That is why VPC Lattice is valuable. It is not just forwarding traffic. It is defining a consistent control plane for internal service access.

The Real Primitive Is the Service Network

The most important VPC Lattice object is not the service. It is the service network.

AWS defines a service network as a logical boundary for a collection of services and resource configurations. VPCs associated with that service network can reach those services and resources if access rules allow it. The VPC Lattice overview page describes the model, but architecturally the point is simple: the service network gives you an application-facing connectivity boundary that sits above raw VPC wiring.

That makes the service network a good place to model platform intent:

  • payments-core for PCI-sensitive internal services
  • shared-platform for observability, secrets brokers, and internal APIs
  • partner-access for explicitly shared cross-account services

In Terraform, the baseline shape is straightforward:

resource "aws_vpclattice_service_network" "shared_platform" {
  name      = "shared-platform"
  auth_type = "AWS_IAM"
}

resource "aws_vpclattice_service" "checkout" {
  name      = "checkout"
  auth_type = "AWS_IAM"
}

resource "aws_vpclattice_service_network_service_association" "checkout" {
  service_identifier         = aws_vpclattice_service.checkout.id
  service_network_identifier = aws_vpclattice_service_network.shared_platform.id
}

resource "aws_vpclattice_service_network_vpc_association" "client_vpc" {
  service_network_identifier = aws_vpclattice_service_network.shared_platform.id
  vpc_identifier             = aws_vpc.client.id
  private_dns_enabled        = true
  security_group_ids         = [aws_security_group.lattice_entry.id]
}

This is the first design judgment I would make: create service networks around policy and ownership, not around teams alone. If you make one giant organization-wide service network, you have rebuilt the same broad east-west trust zone that you were probably trying to escape.

There is also a network nuance that is easy to miss. A direct VPC association is enough for clients inside the associated VPC. But AWS states that traffic arriving through peering, Transit Gateway, Direct Connect, or VPN reaches the service network only through a service-network endpoint. In other words, VPC Lattice does not make the underlay disappear. It composes with it. See AWS documentation on service-network associations and the AWS PrivateLink guide for service-network endpoints.

That is an important platform boundary:

  • use the network underlay to move packets between environments
  • use VPC Lattice to express which services should be callable and under what policy

Treating those as separate layers keeps the architecture understandable.

Security in VPC Lattice Is Layered, Not Singular

One of the stronger ideas in the VPC Lattice docs is that security is intentionally layered:

  1. Association to the service network or service-network endpoint establishes reachability.
  2. Security groups on the VPC association can filter which clients may talk to the service network.
  3. Target security groups can restrict inbound traffic from the VPC Lattice managed prefix list.
  4. Auth policies at the service network or service level can make request authorization identity-aware. See the AWS guide to VPC Lattice security groups and the AWS guide to VPC Lattice auth policies.

That is much better than the usual "hope DNS names stay private and trust the subnet" model.

The security group side is especially practical. AWS publishes managed prefix lists for VPC Lattice, and the documentation explicitly recommends referencing those prefix lists in target security groups because traffic arrives from VPC Lattice, not directly from the client security group. AWS security group guidance for VPC Lattice

The auth side is where many teams will either love or hate the product. When auth_type = "AWS_IAM", the authorization flow is not "one policy somewhere allows it." AWS evaluates identity-based IAM permissions plus any relevant VPC Lattice auth policies. No explicit allow means no access. Explicit deny anywhere still wins. AWS auth policy behavior for VPC Lattice

That makes auth policies powerful, because you can scope by:

Here is a realistic network-level policy that restricts callers to a specific AWS Organization and a specific source VPC:

resource "aws_vpclattice_auth_policy" "shared_platform" {
  resource_identifier = aws_vpclattice_service_network.shared_platform.arn

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "AllowOrgFromSharedServicesVpc"
        Effect    = "Allow"
        Principal = "*"
        Action    = "vpc-lattice-svcs:Invoke"
        Resource  = "*"
        Condition = {
          StringEquals = {
            "aws:PrincipalOrgID"         = "o-123456example"
            "vpc-lattice-svcs:SourceVpc" = aws_vpc.shared_services.id
          }
          StringNotEquals = {
            "aws:PrincipalType" = "Anonymous"
          }
        }
      }
    ]
  })
}

Two production implications are easy to miss:

  • Anonymous calls are possible if the policy allows them, so "private network" does not automatically mean "authenticated identity." AWS auth policy behavior for VPC Lattice
  • TLS passthrough changes the auth story sharply because VPC Lattice cannot inspect HTTP headers when traffic stays encrypted end to end. AWS documents that auth policies for TLS listeners are limited to anonymous principals. AWS documentation on VPC Lattice TLS listeners

There is also a strong observability benefit here. For HTTP targets, VPC Lattice injects caller identity headers such as x-amzn-lattice-identity, x-amzn-lattice-network, and x-amzn-lattice-target, and it strips spoofed versions of those headers from inbound requests. AWS documentation on VPC Lattice HTTP targets That gives backends trustworthy context about who called them and from where, without inventing another side-channel.

This is the kind of security model platform teams tend to want: network controls, identity controls, and request context all in the same system.

Routing Is Powerful, but the Protocol Matrix Has Teeth

VPC Lattice listeners support HTTP, HTTPS, and TLS, while the Terraform provider exposes the TLS mode as TLS_PASSTHROUGH. See the AWS listener documentation for VPC Lattice and the Terraform aws_vpclattice_listener resource reference.

That sounds flexible, but the protocol matrix matters a lot:

That means protocol choice is not a cosmetic field. It shapes what kinds of targets, routes, and auth behavior you can use.

A clean HTTP example looks like this:

resource "aws_vpclattice_target_group" "checkout" {
  name = "checkout-ip"
  type = "IP"

  config {
    vpc_identifier    = aws_vpc.app.id
    ip_address_type   = "IPV4"
    port              = 8080
    protocol          = "HTTP"
    protocol_version  = "HTTP1"

    health_check {
      enabled                       = true
      health_check_interval_seconds = 20
      health_check_timeout_seconds  = 5
      healthy_threshold_count       = 3
      unhealthy_threshold_count     = 2
      path                          = "/healthz"
      protocol                      = "HTTP"
      protocol_version              = "HTTP1"

      matcher {
        value = "200-399"
      }
    }
  }
}

resource "aws_vpclattice_target_group_attachment" "checkout_1" {
  target_group_identifier = aws_vpclattice_target_group.checkout.id

  target {
    id   = "10.20.3.15"
    port = 8080
  }
}

resource "aws_vpclattice_listener" "checkout_https" {
  name               = "https"
  protocol           = "HTTPS"
  service_identifier = aws_vpclattice_service.checkout.id

  default_action {
    forward {
      target_groups {
        target_group_identifier = aws_vpclattice_target_group.checkout.id
      }
    }
  }
}

There is one more sharp edge that deserves much more attention than it usually gets: fail-open behavior. AWS documents that if a target group contains only unhealthy targets, VPC Lattice routes requests to all targets anyway, regardless of health status. AWS target group behavior for VPC Lattice

That is not automatically wrong. It is sometimes preferable to total black-hole behavior. But you need to decide whether your application wants fail-open semantics. If the answer is no, VPC Lattice health checks are not enough by themselves. You also need application-level readiness and maybe upstream circuit breaking.

This is where VPC Lattice feels like a real platform primitive rather than a wizard-driven convenience feature. It has meaningful behavior. You need to own that behavior.

Observability Is Better Than Most Teams Realize

VPC Lattice access logs are one of the most underrated reasons to adopt it.

AWS documents support for CloudWatch Logs, Amazon S3, and Amazon Data Firehose as destinations. Typical delivery is best effort, but AWS states CloudWatch and Firehose delivery is usually within about 2 minutes, while S3 delivery is usually within about 6 minutes. AWS access log documentation for VPC Lattice

The log schema is useful, not ornamental. It includes:

  • source and destination VPC IDs
  • target group ARN
  • request path and method
  • response code
  • total request duration
  • request-to-target and response-from-target durations
  • authenticated principal information
  • auth denial reason
  • failure reason
  • request ID correlation fields, described in the AWS access log reference for VPC Lattice

The request-tracking story is also solid. AWS says VPC Lattice automatically generates an x-amzn-requestid header if the client does not provide one, then propagates it to targets, includes it in the response, and writes it to access logs. Clients can also set their own value, up to 512 bytes. AWS request-tracking guidance for VPC Lattice access logs

That gives you a clean cross-layer trail: client, service layer, target, logs.

Turning logs on in Terraform is minimal:

resource "aws_cloudwatch_log_group" "vpclattice" {
  name              = "/aws/vpclattice/shared-platform"
  retention_in_days = 30
}

resource "aws_vpclattice_access_log_subscription" "shared_platform" {
  resource_identifier = aws_vpclattice_service_network.shared_platform.id
  destination_arn     = aws_cloudwatch_log_group.vpclattice.arn
}

For internal platform services, that is often the observability gap teams have been patching with NGINX logs, ALB logs, app middleware, and custom tracing headers. VPC Lattice does not replace distributed tracing, but it does give you a trustworthy service-access layer that was previously missing.

Where VPC Lattice Fits, and Where It Does Not

The cleanest way to avoid overusing VPC Lattice is to compare it by responsibility.

ToolBest atNot the same as VPC Lattice
VPC peering / Transit GatewayNetwork connectivity between environmentsNo service-level discovery, request auth policy, or per-service routing
AWS PrivateLinkPrivate endpoint transport between consumers and services or resourcesDoes not by itself give you service networks, listener rules, or service-level auth policies
Amazon ECS Service ConnectSimpler service connectivity inside ECS namespaces and servicesMy architectural read from the ECS docs is that it is narrower in scope than VPC Lattice and focused on ECS-native service connectivity
VPC LatticeService access layer across VPCs, accounts, and selected resource endpointsNot a replacement for the network underlay

That table includes one important inference from the docs. AWS PrivateLink and VPC Lattice are not competitors in a clean one-versus-one sense. AWS documentation shows that VPC Lattice uses service-network endpoints powered by PrivateLink when you need access from outside the local VPC path. See the AWS documentation on service-network associations and the AWS PrivateLink concepts guide.

So the practical question is not "PrivateLink or VPC Lattice?" It is usually:

  • Do I only need a private endpoint to a specific service or resource?
  • Or do I need a reusable service-access fabric with routing, auth, and logs?

That is also why the newer resource configuration model matters. AWS now documents that consumers can access supported resources directly through resource endpoints or via a service network, and resource providers can use resource gateways as ingress points into the VPC where the resource lives. See the AWS guide to VPC Lattice resource configurations. My read of that change is that AWS is positioning VPC Lattice as a more general internal access layer, not just a thin HTTP router for microservices.

My practical rule would be:

  • Use VPC Lattice when multiple teams need a common service-access contract across network boundaries.
  • Use raw PrivateLink when you need point-to-point private access and little else.
  • Keep Transit Gateway, peering, VPN, and Direct Connect focused on transport, not service policy.

That division keeps each product doing the job it is actually designed to do.

Why I Would Adopt It This Way

If I were introducing VPC Lattice into an existing AWS platform, I would not start by migrating every internal service.

I would start with one service network that has clear value:

  • shared platform APIs
  • security-sensitive internal services
  • cross-account service consumers
  • workloads where request attribution and auth policy matter

Then I would standardize four operating rules:

  1. Service networks model trust boundaries, not org charts.
  2. Default to AWS_IAM unless you can defend anonymous access.
  3. Make protocol choice explicit early, especially for gRPC and TLS passthrough.
  4. Turn on access logs from day one.

That approach avoids the most common failure mode with new AWS networking features: adopting them first as a connectivity shortcut, then discovering later that nobody owns the policy model or the operational semantics.

VPC Lattice is worth using when your platform has outgrown ad hoc east-west access patterns and you want something more structured than "more DNS plus more security groups plus another internal load balancer." It is less compelling when your environment is simple, single-VPC, or locked to one orchestration system with a smaller problem surface.

The strongest takeaway from the current docs is this: AWS VPC Lattice is a service-layer control plane. Once you view it that way, the product makes much more sense, and its constraints become easier to reason about.

Frequently Asked Questions

Q: Is AWS VPC Lattice only for HTTP microservices? A: No. AWS documentation now includes resource configurations, resource gateways, and resource endpoints in addition to services. It still has strong HTTP and HTTPS routing features, but the product scope is now broader than the original "internal HTTP microservice access" narrative. AWS guide to VPC Lattice resource configurations

Q: Can I use AWS VPC Lattice with EKS, ECS, EC2, and Lambda? A: Yes, but with different attachment models. AWS documents instance, IP, Lambda, and ALB target types, and notes that EKS Pod targets are registered through the AWS Gateway API Controller while ECS can automatically register tasks with IP target groups. AWS target group documentation for VPC Lattice

Q: What is the biggest production gotcha in AWS VPC Lattice? A: The biggest gotcha is usually assuming the routing and auth model behaves like a generic load balancer. In reality, AWS_IAM is default-deny, TLS passthrough has major limitations, and target groups fail open when every target is unhealthy. Those are platform behaviors, not small implementation details. See the AWS auth policy guide, the AWS TLS listener guide, and the AWS target group behavior guide.

Q: Does AWS VPC Lattice remove the need for Transit Gateway or VPN? A: No. AWS documentation explicitly states that traffic coming from peering, Transit Gateway, Direct Connect, or VPN reaches a service network through a service-network endpoint. The transport layer still matters; VPC Lattice adds the service-access layer above it. AWS documentation on service-network associations

Q: Should I put every internal service into one giant service network? A: Usually no. Service networks are logical trust and policy boundaries. If you collapse everything into one service network, you lose much of the isolation and policy clarity that makes VPC Lattice useful in the first place.

Resources

Comments

Popular posts from this blog

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Argo CD Auto-Sync and Health Checks: An Operator's Guide to Safe GitOps Reconciliation

Kubernetes Multi-Tenancy with Namespaces and Network Policies: A Practical Guide for GitOps Teams