Multi-Tenant GitOps with Argo CD: Isolation Patterns That Survive Production
Multi-Tenant GitOps with Argo CD: Isolation Patterns That Survive Production
Multi-tenant Argo CD is not just RBAC. This deep dive shows how to combine AppProjects, namespaces, OIDC groups, sync controls, and policy gates into a safer GitOps platform.
TL;DR
Multi-tenant Argo CD works only when tenancy is enforced at several layers at once: Git repository boundaries, AppProject source and destination rules, Kubernetes namespaces, OIDC-backed RBAC, admission policy, and deletion controls. The practical model is to give each tenant a narrow AppProject, generate applications from approved repository paths, constrain sync and prune behavior, and let Kubernetes enforce runtime quotas and policy. This keeps GitOps self-service useful in production without turning the Argo CD control plane into a shared cluster-admin escape hatch.
The Real Multi-Tenant Argo CD Problem
The easiest way to make Argo CD "multi-tenant" is to create accounts and call it done. That is also the easiest way to create a shared deployment plane where every team can accidentally route around the platform's guardrails.
Argo CD is powerful because it reconciles desired state from Git into Kubernetes. In a single-team cluster, that usually feels straightforward: one Git repository, one Argo CD instance, one cluster, and a small group of trusted operators. In a platform environment, the trust model changes. Multiple teams want self-service deployment, each team owns different repositories, some workloads are production, and some namespaces contain shared infrastructure. A mistake in Git can become a cluster mutation minutes later.
That means Argo CD tenancy cannot live in one control. It needs layered enforcement:
- Git controls which manifests a tenant can submit.
- AppProjects control which source repositories, clusters, namespaces, and resource kinds Argo CD can deploy.
- Argo CD RBAC controls which users and groups can create, sync, override, or delete applications.
- Kubernetes namespaces, quotas, service accounts, and admission policies control what the applied workload can do at runtime.
- Sync windows, prune rules, finalizers, and progressive rollout gates control change timing and blast radius.
The architecture goal is not "every tenant gets cluster admin through Git." The goal is narrower: every tenant gets enough autonomy to ship their own workloads, while the platform team keeps strong control over cluster-scoped resources, shared ingress, policy, identity, and deletion paths.
Start With The Isolation Boundary
A practical tenant boundary usually maps to one of these units:
| Boundary | Use When | Risk If Too Broad |
|---|---|---|
| Team | One engineering team owns several services | Shared permissions can hide service-level ownership gaps |
| Product | Multiple teams operate one product surface | Emergency access can sprawl across unrelated services |
| Environment | Staging and production need different controls | Lower environments may inherit production friction |
| Compliance zone | PCI, regulated, or data-sensitive workloads | Too many exceptions make policy hard to reason about |
For most platform teams, the clean starting point is one AppProject per team per environment class. For example, payments-nonprod and payments-prod are easier to reason about than one global payments project. The production project can block manual sync overrides during deny windows, restrict sync timing, and require tighter admission policy while the non-production project stays more permissive.
Argo CD's project model supports this shape. An AppProject can restrict allowed source repositories, destinations, namespace-scoped resource kinds, cluster-scoped resource kinds, orphaned resource monitoring, sync windows, and project-local roles. The important design choice is to treat AppProjects as security boundaries, not folder labels.
AppProject As The First Hard Gate
The default project is convenient for demos, but it is too permissive for a shared platform. A tenant project should pin all of the following:
- Source repositories the tenant may deploy from.
- Destination clusters and namespaces the tenant may deploy into.
- Namespace-scoped Kubernetes kinds the tenant may manage.
- Cluster-scoped kinds the tenant may manage, usually none by default.
- Project roles mapped to identity-provider groups.
- Sync windows for sensitive environments.
A production tenant project can look like this:
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: payments-prod
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
description: Production GitOps boundary for the payments team
sourceRepos:
- https://github.com/example-org/payments-platform.git
- https://github.com/example-org/payments-services.git
destinations:
- server: https://kubernetes.default.svc
namespace: payments-prod
clusterResourceWhitelist: []
namespaceResourceWhitelist:
- group: ""
kind: ConfigMap
- group: ""
kind: Service
- group: apps
kind: Deployment
- group: autoscaling
kind: HorizontalPodAutoscaler
- group: networking.k8s.io
kind: NetworkPolicy
orphanedResources:
warn: true
syncWindows:
- kind: deny
schedule: "0 9 * * 1-5"
duration: 8h
applications:
- "*"
manualSync: false
roles:
- name: deployer
description: Payments production deploy access
groups:
- oidc:payments-prod-deployers
policies:
- p, proj:payments-prod:deployer, applications, get, payments-prod/*, allow
- p, proj:payments-prod:deployer, applications, sync, payments-prod/*, allow
This example is intentionally restrictive. It does not allow tenants to create Namespace, ClusterRole, CustomResourceDefinition, MutatingWebhookConfiguration, or storage classes. Those resources are platform-owned because one tenant's cluster-scoped object can change behavior for another tenant.
The key implementation detail is that the destination namespace and source repository restrictions must match the repository structure. If payments-prod can deploy from a shared repository path where another team can write manifests, the AppProject boundary is weaker than it looks.
RBAC: Bind Groups To Capabilities, Not To Good Intentions
Argo CD RBAC uses policy rules and identity information from local accounts or SSO/OIDC. In SSO setups, groups or other configured scopes are read from the identity token and matched to Argo CD policies. That gives you a clean way to map platform groups to project capabilities.
Keep global Argo CD roles small. Most tenant permissions should live in project roles or project-scoped policies. A common pattern is:
role:readonly: view-only access across approved applications.proj:<tenant>:developer: view and sync non-production apps.proj:<tenant>:prod-deployer: sync production apps during approved windows.role:platform-admin: tightly held platform team role for cluster-scoped controls.
Example argocd-rbac-cm fragment:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.default: role:readonly
scopes: '[groups, email]'
policy.csv: |
p, role:readonly, applications, get, */*, allow
p, role:readonly, projects, get, *, allow
g, oidc:platform-admins, role:admin
g, oidc:payments-prod-deployers, proj:payments-prod:deployer
p, role:tenant-app-admin, applications, create, payments-nonprod/*, allow
p, role:tenant-app-admin, applications, update, payments-nonprod/*, allow
p, role:tenant-app-admin, applications, delete, payments-nonprod/*, deny
The deny on delete is not paranoia. In GitOps, delete can mean removing the Argo CD Application object, deleting tracked Kubernetes objects, or triggering finalizer behavior depending on how the application is configured. Production deletion should usually be a separate break-glass or platform-mediated operation.
Also check the identity token, not just the Argo CD policy. If your OIDC provider does not emit the expected group claim, the policy may look correct while users fall back to a default role. Treat group-claim verification as part of platform onboarding.
Kubernetes Still Enforces Runtime Reality
Argo CD controls desired state application. Kubernetes controls workload execution. You need both.
At minimum, every tenant namespace should have:
ResourceQuotato cap aggregate CPU, memory, object counts, and persistent volume claims.LimitRangeto force default requests and limits.NetworkPolicyto block default east-west traffic when the CNI supports it.- Dedicated service accounts with minimal RBAC.
- Admission policy to block privileged pods, hostPath mounts, unsafe capabilities, mutable tags, and unapproved registries.
Example namespace baseline:
apiVersion: v1
kind: Namespace
metadata:
name: payments-prod
labels:
tenant.platform.example.com/name: payments
pod-security.kubernetes.io/enforce: restricted
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: payments-prod-quota
namespace: payments-prod
spec:
hard:
requests.cpu: "20"
requests.memory: 80Gi
limits.cpu: "40"
limits.memory: 160Gi
pods: "80"
services.loadbalancers: "0"
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: payments-prod
spec:
podSelector: {}
policyTypes:
- Ingress
This namespace baseline should be platform-owned. Tenants can deploy workloads into the namespace, but they should not be able to remove the guardrails that make the namespace safe to share on the same cluster.
ApplicationSet Tenancy: Useful, But Easy To Over-Permission
ApplicationSet is attractive in platform environments because it can generate many Argo CD Applications from Git directories, cluster lists, pull requests, or files. For tenancy, the Git generator and matrix generator are especially useful. They let platform teams create an application factory where each service gets a predictable Application object.
The dangerous pattern is templating the project field from tenant-controlled data. Argo CD's own ApplicationSet documentation warns that Git generators can be used by non-admin developers to create Applications, and that templated project fields can let users create Applications under Projects with excessive permissions.
A safer pattern is to fix the project in the template and let repository data control only low-risk fields such as app name, path, and namespace within an approved prefix.
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payments-services-prod
namespace: argocd
spec:
syncPolicy:
preserveResourcesOnDeletion: true
generators:
- git:
repoURL: https://github.com/example-org/payments-services.git
revision: main
files:
- path: services/*/service.yaml
template:
metadata:
name: 'payments-{{name}}-prod'
labels:
tenant.platform.example.com/name: payments
spec:
project: payments-prod
source:
repoURL: https://github.com/example-org/payments-services.git
targetRevision: main
path: 'services/{{name}}/overlays/prod'
destination:
server: https://kubernetes.default.svc
namespace: payments-prod
syncPolicy:
automated:
prune: false
selfHeal: true
syncOptions:
- CreateNamespace=false
- PruneLast=true
Notice the guardrails:
projectis fixed topayments-prod.repoURLis fixed to the tenant's approved repository.destination.namespaceis fixed to the tenant namespace.CreateNamespace=falseprevents the application from creating arbitrary namespaces.prune=falseavoids automatic deletion until the team has stronger operational maturity.preserveResourcesOnDeletion=trueprevents generated Applications from receiving the Argo CD resources finalizer, which is useful when deleting the generator should not delete production workloads.
You can later enable automated prune, but do it per project and after you have tested orphaned resource reporting, finalizer behavior, and recovery paths.
Resource Exclusions And Allow-Lists
AppProject resource allow-lists decide what a tenant Application is allowed to manage. Argo CD also has controller-level resource.exclusions and resource.inclusions settings in argocd-cm. Use them sparingly: they change what Argo CD discovers and syncs, so they are platform-wide behavior rather than tenant-specific authorization.
A shared-control-plane configuration can exclude noisy or controller-owned resources while keeping the inclusion list limited to resource families the platform expects Argo CD to reconcile:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
resource.exclusions: |
- apiGroups:
- events.k8s.io
kinds:
- Event
clusters:
- "*"
- apiGroups:
- coordination.k8s.io
kinds:
- Lease
clusters:
- "*"
resource.inclusions: |
- apiGroups:
- ""
- apps
- batch
- networking.k8s.io
- external-secrets.io
- argoproj.io
kinds:
- "*"
clusters:
- "*"
Do not use exclusions to hide a security problem. If tenants must not create ClusterRole, MutatingWebhookConfiguration, or raw Secret objects, enforce that with AppProject allow-lists, Kubernetes RBAC, and admission policy. Resource exclusions are most useful when Argo CD should ignore resources that are high-volume, generated by another controller, or intentionally outside the GitOps ownership model.
Sync, Prune, And Deletion Blast Radius
Automated sync is not a binary good or bad choice. It is a control loop. The risk depends on what the loop is allowed to change.
For non-production tenant workloads, automated sync with self-heal often gives useful feedback. For production, the platform should decide whether changes deploy immediately, during windows, or after a separate progressive-delivery controller has evaluated metrics. Argo CD supports sync windows at the project level, and sync waves/hooks can order resources inside a sync operation.
Prune deserves special handling. If a manifest disappears from Git and prune is enabled, Argo CD may delete the corresponding Kubernetes resource. That is desirable for cleaning up stale objects, but dangerous for shared resources, stateful workloads, and resources that were moved between applications.
A safer production rollout often uses this progression:
- Enable automated sync without prune.
- Enable orphaned resource warnings.
- Add policy that blocks tenants from managing platform-owned kinds.
- Test deletion behavior in staging with the same AppProject rules.
- Enable prune only for specific application classes.
Example application sync policy:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payments-api-prod
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: payments-prod
source:
repoURL: https://github.com/example-org/payments-services.git
targetRevision: main
path: services/api/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: payments-prod
syncPolicy:
automated:
prune: false
selfHeal: true
syncOptions:
- ApplyOutOfSyncOnly=true
- PruneLast=true
PruneLast=true is not a substitute for review, but it can reduce ordering surprises by pruning after other resources are applied. For workloads with database migrations, use sync waves or hooks deliberately rather than hoping lexical manifest order will save you.
The finalizer is the sharper control. Argo CD's deletion docs state that resources-finalizer.argocd.argoproj.io makes the application controller cascade-delete the Application's managed resources when the Application is deleted. ApplicationSet-generated Applications add another layer: if preserveResourcesOnDeletion is false, generated Applications can receive that finalizer, so deleting an ApplicationSet can delete generated Applications and then the workloads those Applications manage.
That behavior is useful for preview environments. It is dangerous for production if teams do not understand ownership. A good rule is: preview ApplicationSets may cascade, production ApplicationSets should preserve by default, and production decommissioning should be a reviewed workflow rather than a side effect of removing a generator file.
Secrets And Workload Identity
Tenant Git repositories should not contain plaintext credentials. Even if the repository is private, Git history, forks, local clones, CI logs, and code search can turn a leaked secret into a long-lived incident.
Argo CD's secret-management guidance is intentionally unopinionated: teams use External Secrets Operator, Sealed Secrets, Vault-backed workflows, SOPS, and plugins. In a multi-tenant platform, the safest default is to keep secret values outside tenant Git and let tenant manifests reference a namespace-scoped secret source.
Common production patterns include:
- External Secrets Operator or a similar controller that syncs secrets from a cloud secret manager into tenant namespaces.
- Sealed or encrypted secrets when the team explicitly accepts Git-stored encrypted values and key rotation processes.
- Cloud workload identity where pods get short-lived access to cloud APIs without static keys.
- Admission policy that blocks raw
Secretmanifests unless they match an approved exception.
The platform-owned secret controller should run with cloud permissions. Tenant workloads should get only the runtime secret values or workload identity bindings they need. Avoid handing each tenant a broad cloud credential and calling it GitOps.
Policy Gates Close The Gap Between Git And Cluster
Argo CD can say "this Application is allowed to deploy a Deployment into payments-prod." It cannot by itself prove that the Deployment uses a signed image, has resource requests, avoids privileged mode, and references only approved registries. That is the job of admission policy.
Kyverno, Gatekeeper, Kubernetes ValidatingAdmissionPolicy, and Sigstore policy-controller all fit into this layer depending on your standardization. The important point is placement: policy should run in the cluster admission path, not only in a CI job. CI catches issues earlier; admission prevents bypass.
Example policy intent:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-digest-and-resources
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-image-digest
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Container images must be pinned by digest in production."
pattern:
spec:
containers:
- image: "*@sha256:*"
- name: require-requests-and-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory requests and limits are required."
pattern:
spec:
containers:
- resources:
requests:
cpu: "?*"
memory: "?*"
limits:
cpu: "?*"
memory: "?*"
Use this as a policy shape, not a universal copy-paste. Real production policies need exceptions, test coverage, and compatibility checks for init containers, ephemeral containers, and generated pods.
Why We Built It This Way
The model holds up because no single control has to be perfect.
If a developer pushes a risky manifest, AppProject rules can block forbidden destinations or resource kinds. If the AppProject is too broad, Kubernetes admission policy can still reject privileged pods or unsigned images. If a user has sync permissions but not delete permissions, they can reconcile approved applications without removing the project boundary. If a repository path changes unexpectedly, ApplicationSet generation and fixed project names prevent a tenant from silently moving into a more privileged project.
This is the same reason production Kubernetes platforms separate control-plane operations from tenant workload operations. Argo CD should be a deployment interface, not a universal privilege escalator.
Tenant Onboarding Flow
A repeatable onboarding process keeps the platform from accumulating one-off exceptions:
- Create the tenant namespace, quota, limit range, default network policy, and service accounts.
- Create the tenant AppProject with explicit source repositories and destinations.
- Map identity-provider groups to project roles and verify token claims.
- Create an ApplicationSet from approved repository paths.
- Add admission policy labels or namespace selectors.
- Run a dry-run sync in non-production.
- Enable production sync windows and decide prune behavior.
- Document break-glass access and deletion approvals.
If this flow takes days, automate it. A platform repository can own tenant bootstrap manifests, while application repositories own service manifests. That separation keeps tenants productive without letting application teams rewrite the platform boundary.
Common Failure Modes
The most common failure is a broad AppProject with sourceRepos: ["*"] and destinations that include every namespace. That is not multi-tenancy; it is shared admin through a nicer UI.
The second failure is trusting repository structure without repository permissions. If a tenant can open a pull request that changes another tenant's production path, Argo CD will not know that the organizational boundary was violated. Git ownership rules, branch protection, and code owners matter.
The third failure is enabling automated prune before understanding ownership. Prune is safe only when the Application is the clear owner of the resources it tracks.
The fourth failure is letting tenants deploy cluster-scoped resources because one application needed a CRD. CRDs, webhooks, cluster roles, and storage classes are platform lifecycle concerns. Put them in a platform project with stricter review.
Frequently Asked Questions
Q: Should every team get its own Argo CD instance? A: Not always. Separate Argo CD instances provide stronger operational isolation, but they increase controller, upgrade, and repository-management overhead. A shared instance with strict AppProjects, RBAC, and Kubernetes policy can work for many internal platforms; regulated or high-risk tenants may still justify dedicated instances.
Q: Can AppProjects stop a tenant from deploying privileged pods? A: AppProjects can limit destinations and resource kinds, but they do not express every pod-level security rule. Use Kubernetes admission policy and Pod Security Admission labels to enforce runtime restrictions such as privileged mode, host networking, hostPath mounts, and image provenance.
Q: Should tenants be allowed to create namespaces? A: Usually no in production. Namespace creation should be part of platform onboarding because quotas, network policy, labels, service accounts, and policy selectors need to be created together. Letting applications create arbitrary namespaces weakens the tenancy model.
Q: How do sync windows help multi-tenancy? A: Sync windows let a project allow or deny sync operations during scheduled periods. They are useful when production changes need change-freeze windows, business-hour controls, or coordinated release periods without disabling GitOps entirely.
Q: What is the fastest way to audit an existing shared Argo CD instance? A: List every AppProject, find projects using wildcard source repositories or wildcard destinations, map RBAC groups to project roles, identify applications with automated prune, and check which projects can deploy cluster-scoped resource kinds. Those four checks expose most tenancy risks quickly.
Resources
Related internal guides
- Argo CD Auto-Sync and Health Checks
- Kubernetes Multi-Tenancy with Namespaces and Network Policies
- Progressive Delivery on Kubernetes with Argo CD and Argo Rollouts
External references
- Argo CD Projects
- Argo CD Project Specification
- Argo CD RBAC Configuration
- Argo CD ApplicationSet
- Argo CD ApplicationSet Security
- Argo CD Resource Inclusion and Exclusion
- Argo CD Application Deletion
- Argo CD Secret Management
- Kubernetes Resource Quotas
- Kubernetes Limit Ranges
- Kubernetes ValidatingAdmissionPolicy
Comments
Post a Comment