Backstage Software Catalog Lifecycle: Discover, Register, and Govern Your Software Assets

Backstage Software Catalog Lifecycle: Discover, Register, and Govern Your Software Assets

The Backstage catalog is not just a directory of services. It is a lifecycle that starts with entity descriptors, moves through registration and backend processing, and only stays useful when ownership, relations, and cleanup are governed deliberately.

TL;DR

  • A healthy Backstage catalog is a lifecycle, not a one-time YAML import.
  • Teams define entities in catalog-info.yaml, register them through locations or discovery, and let the catalog ingest, process, and stitch the final entity view.
  • The fields that make the catalog operationally useful are usually owner, system, lifecycle, dependency relations, and API relations.
  • Backstage provides governance controls such as catalog rules, readonly mode, and orphan handling.
  • If you only optimize for registration, the catalog becomes stale inventory. If you optimize for source-of-truth ownership and cleanup, it becomes a useful platform map.

Your Catalog Is Not Healthy Just Because catalog-info.yaml Exists

Many Backstage rollouts stall at the same point. A few teams add catalog-info.yaml, the catalog page starts filling up, and everyone assumes the software inventory problem is solved. A few months later, ownership is wrong, systems are inconsistent, dependencies are half-modeled, and the catalog is drifting away from reality.

That is not a Backstage failure. It is a lifecycle failure. The official Backstage software catalog docs describe a flow where entities are defined, registered, ingested, processed, stitched into final catalog output, and then continuously maintained. If you only optimize for getting entries into the catalog, you get inventory. If you optimize for the full lifecycle, you get a usable graph of your platform.

Start With a Descriptor That Models Ownership and Boundaries

Backstage recommends naming descriptor files catalog-info.yaml, and the descriptor format docs explain that the same entity shape is used in both YAML descriptors and the catalog API. For most teams, the anchor entity kind is still Component.

The important thing is not just having a descriptor. It is having a descriptor that carries the metadata you actually need to govern software. The official docs show a component example like this:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: artist-web
  description: The place to be, for great artists
spec:
  type: website
  lifecycle: production
  owner: artist-relations-team
  system: artist-engagement-portal
  dependsOn:
    - resource:default/artists-db
  dependencyOf:
    - component:default/artist-web-lookup
  providesApis:
    - artist-api

This is where weak catalogs usually go wrong. Teams often stop at name and description, but the fields that make the catalog operationally valuable are owner, system, lifecycle, dependsOn, and providesApis.

Backstage also makes an important point about ownership. In the descriptor and relation docs, spec.owner is the singular entity that ultimately owns the component. That is good for accountability and navigation. It is not the same thing as authorization logic for runtime access control, and treating it as such usually creates governance confusion.

Registration Is a Source-of-Truth Decision, Not Just a UI Step

The catalog configuration docs describe several ways to seed the catalog. The simplest path is to register a location directly in configuration:

catalog:
  locations:
    - type: url
      target: https://github.com/backstage/backstage/blob/master/packages/catalog-model/examples/components/artist-lookup-component.yaml

That is enough to prove the catalog works. It is not enough to scale governance by itself.

The same docs explain that Backstage can also use integration processors, including discovery processors such as GitHub discovery, and can be extended with custom processors or providers for systems that already hold authoritative software metadata. That is the real operating-model choice:

  • If Git is the source of truth, keep descriptors near the code and let normal repository workflows update them.
  • If another system is authoritative, mirror that source into Backstage through providers or processors instead of forcing humans to maintain duplicate records.
  • If you want to stop opportunistic manual registration, set catalog.readonly: true so catalog locations cannot be registered or deleted through the catalog APIs.

That last control matters more than it seems. A lot of catalogs become noisy because they allow both curated ingestion and ad hoc manual registration. Once that happens, it becomes difficult to answer a simple question: which source is authoritative?

The Backstage Catalog Lifecycle Has Three Real Backend Stages

The official Life of an Entity documentation is the most useful source for understanding how the catalog actually behaves in production. It breaks the backend lifecycle into three stages: ingestion, processing, and stitching.

1. Ingestion

Entity providers fetch raw entity data from authoritative sources and seed it into the database as unprocessed entities. Backstage includes default providers for user-registered locations and static app-config locations, and the docs note that you can add your own providers.

The operationally important detail is origin ownership. The catalog tracks which provider owns which unprocessed entity, and providers are not allowed to output the same entity. When a provider signals that an entity should be removed, the catalog eagerly purges that entity and its auxiliary data from the database.

That means provider design is not just plumbing. It is part of your governance model.

2. Processing

Processing is where the catalog stops being a document store and starts becoming a graph. The processing loop picks up unprocessed entities, runs them through policies and processors, and can mutate the entity or emit other entities, errors, and relations.

This is also where many catalog relationships become real. The lifecycle docs explicitly describe processors inspecting spec fields and emitting relations from those declarations. That is how a simple descriptor becomes a navigable graph of ownership, dependencies, and system membership.

The same docs call out two details that matter when you extend the catalog:

  • Entities are processed one by one, even when multiple catalog service hosts collaborate on the workload.
  • Processor order matters, and you can tune it with catalog.processorOptions.<processorName>.priority.

If you ever add custom processors, this is where bad assumptions become expensive. A processor that emits inconsistent relations or mutates entities unpredictably can damage catalog quality more quietly than a broken YAML file.

3. Stitching

Stitching assembles the final entity that the catalog API returns. The lifecycle docs say the stitcher merges the processed entity, emitted errors, and all incoming and outgoing relations.

This is the version users actually experience as “the catalog”. They do not care that a raw entity existed in a database. They care that the final page shows the correct owner, the correct system, the right dependencies, and the right API links.

Backstage is also explicit that stitching is a fixed process. If you want to change the final result, that change needs to happen earlier during ingestion or processing.

Relations Are What Turn Inventory Into Architecture

The well-known relations docs are where the catalog becomes genuinely useful for platform engineering. Without relations, the catalog is mostly metadata search. With relations, it becomes a map of how your platform fits together.

The core relations to model early are:

  • ownedBy / ownerOf
  • partOf / hasPart
  • dependsOn / dependencyOf
  • providesApi / apiProvidedBy
  • consumesApi / apiConsumedBy

Backstage documents that these are directional, and several are derived directly from spec fields such as spec.owner, spec.system, and spec.dependsOn.

That leads to two practical rules. First, consistency beats completeness. A catalog where every production component has a trustworthy owner and system is more useful than one where only a few services model every possible relation. Second, entity references need discipline. The references docs define the canonical entity reference format and explain how kind and namespace defaulting works. If teams alternate between bare names, fully-qualified references, and homegrown aliases, relation quality degrades quickly even when the YAML still parses.

Governance Is Mostly About Resisting Drift

Strong catalogs usually do not fail because teams forgot how YAML works. They fail because drift is allowed to accumulate.

Backstage gives you concrete governance controls.

Catalog Rules

The configuration docs note that the catalog only allows Component, API, and Location by default. If you want more kinds, you need rules:

catalog:
  rules:
    - allow: [Component, API, Location, Template]
  locations:
    - type: url
      target: https://github.com/org/example/blob/master/org-data.yaml
      rules:
        - allow: [Group]

This matters because kind sprawl is real. If you allow every kind everywhere without design, governance weakens fast.

Orphan Handling

The lifecycle docs describe orphaning in concrete terms. When a parent entity stops emitting a child entity and nothing else references that child, the stitcher adds the annotation backstage.io/orphan: 'true'. Depending on configuration, the orphan can remain, be reclaimed, or be deleted automatically if orphanStrategy: delete is enabled.

That is not cosmetic metadata. Orphans are one of the earliest signals that your source systems and your catalog are drifting apart.

Readonly Mode

If your operating model says Backstage should mirror an authoritative external source, use catalog.readonly: true. The docs are direct about the effect: registering and deleting locations through the catalog APIs is disabled. That is exactly what you want when self-service catalog changes would undermine the source-of-truth model.

Common Pitfalls

  • Treating registration as the finish line: Getting entities into the catalog is only the beginning. If ownership, systems, and dependencies are not maintained, the catalog becomes stale inventory.
  • Keeping metadata away from the source of truth: If Git owns the component but metadata lives somewhere else, drift becomes much more likely.
  • Using inconsistent entity references: Teams that mix bare names and fully qualified references create broken or misleading relations.
  • Skipping governance controls: Ignoring rules, readonly mode, or orphan handling usually leads to catalog sprawl and duplicate truth.
  • Over-customizing processors too early: Custom processors can be powerful, but they can also introduce hard-to-debug relation or mutation problems if the basic model is still inconsistent.

Key Takeaways

  • The Backstage software catalog is a lifecycle built from ingestion, processing, and stitching.
  • catalog-info.yaml is necessary, but useful catalogs also require strong ownership, relation modeling, and source-of-truth discipline.
  • Relations such as ownership, system membership, dependencies, and APIs are what turn a catalog into a platform map.
  • Catalog rules, orphan handling, and readonly mode are governance tools, not optional extras.
  • The safest way to scale Backstage is to automate ingestion from authoritative sources and keep metadata close to the code when Git is that source.

What To Do Next

  1. Audit your current catalog and identify which production components are missing owner, lifecycle, or system.
  2. Decide what your real source of truth is for each entity type: Git, an identity system, cloud inventory, or another platform registry.
  3. Standardize entity reference patterns and relation fields before adding more custom kinds or processors.
  4. Enable governance deliberately with catalog rules, orphan review, and readonly mode where appropriate.
  5. Measure catalog quality over time by watching orphaned entities, missing ownership, and relation gaps instead of only counting registered components.

If you run the Backstage catalog as a lifecycle instead of a one-time import, it becomes much more than a service list. It becomes a trustworthy operating view of your software estate, which is exactly what platform teams need when they are trying to reduce ambiguity instead of just documenting it.

References

Comments

Popular posts from this blog

Bootstrapping Kubernetes Clusters with Terraform and Argo CD: A Durable Two-Layer Approach

Argo CD Auto-Sync and Health Checks: An Operator's Guide to Safe GitOps Reconciliation

Kubernetes Multi-Tenancy with Namespaces and Network Policies: A Practical Guide for GitOps Teams