
Data governance for Kafka: introducing lineage support in Factor Platform
Table of contents
At Factor House, many of the teams we work with operate in highly regulated industries. NORD/LB, a German commercial bank, is a good illustration of why banking tends to be the most demanding of these. As an institution subject to both BCBS 239 and DORA, they are required to maintain clear, demonstrable data lineage across their environment: knowing where data originates, who owns it, and how it flows through their systems. Running Kafka at scale, they needed that lineage to be visible and actionable within the tools used to operate their streaming infrastructure, not isolated in a separate catalog that engineers rarely consult during day-to-day work.
This gap prompted us to build data governance support into Factor Platform, our unified management layer for streaming infrastructure. This article explains what we've built, why it matters for regulated organisations, and the approach we took to make it practical.
The regulatory context
European banks are subject to two frameworks that place direct requirements on data governance: BCBS 239 and DORA.
BCBS 239, the Basel Committee's principles for effective risk data aggregation and risk reporting, requires systemically important financial institutions to demonstrate clear data lineage, defined data ownership, and the ability to trace data from source to consumption. For banks running event-driven architectures, this means being able to answer questions like: where does this data come from, who owns it, and does it contain regulated information?
DORA, the EU's Digital Operational Resilience Act, adds further requirements around understanding ICT systems and their data dependencies as part of broader operational resilience obligations.
Both frameworks assume that an organisation has a coherent view of its data. In practice, for teams running Kafka at scale, that view has historically been difficult to maintain.
The problem with existing approaches
Most organisations with a data governance programme have invested in a data catalog. These catalogs accumulate metadata about datasets: ownership, classification, documentation, PII flags, and so on. That metadata is typically described using the OpenLineage standard, a vendor-neutral specification for representing datasets, jobs, and their relationships.
The challenge is that this metadata tends to live in a separate system from the Kafka tooling. Engineers operating Kafka can browse topics and schemas, but the governance context (who owns this schema, whether it contains PII, whether it meets the organisation's data quality standards) isn't visible alongside it. The two systems are not integrated, and bridging them has typically required additional infrastructure and dedicated integration work.
Our approach: the schema as the source of truth
Factor Platform's lineage support is built around a straightforward observation: Kafka schemas already exist and are version-controlled. Rather than requiring a separate metadata store or additional infrastructure, we treat the schema itself as the place where governance metadata lives.
You annotate your existing Avro or JSON Schema definitions with lineage metadata - ownership, tags, documentation, PII classification, domain, and any custom attributes your organisation requires - using whatever field structure makes sense for your environment. Factor Platform then uses kJQ expressions to map that embedded metadata to valid OpenLineage Dataset facets, producing a structured, OpenLineage-conforming description of each dataset from your Schema Registry.
This means there is no additional system to run, no separate pipeline to maintain, and no duplication of your schema definitions. The metadata travels with the schema, which is where it belongs.
Mappings can be configured through the Factor Platform UI using a guided wizard, or through the API. Once in place, Factor Platform evaluates them on each schema observation cycle and surfaces the results across the interface. Datasets are versioned, so you have a full history of changes over time, including when an owner was added, when a tag changed, and when a new schema version introduced a new PII field.
What this makes possible
Bringing OpenLineage metadata into your Kafka operational environment changes what you can see and act on.
Identifying PII-tagged data. Column-level mappings allow individual fields within a schema to carry their own lineage metadata. A field like email or date_of_birth can be annotated with a PII flag and a classification. Factor Platform surfaces this when you browse schemas, so engineers and compliance teams can see at a glance which topics carry regulated fields.
Finding ownership gaps. Ownership is a required facet in many governance frameworks. When a schema has no owner mapping, or when the owner field is null or malformed, Factor Platform reports it as a dataset quality issue. This gives you a filtered view of all schemas in your environment that are missing the governance properties your organisation requires, surfacing them before an auditor does.
Catching schema quality issues early. Dataset mappings can mark fields as required. If an expected metadata field is absent or returns an unexpected value, Factor Platform flags it. This shifts schema governance from a periodic manual review process to a continuous, automated check that runs on every observation cycle.
Filtering and operating with context. Once lineage metadata is extracted, it becomes a first-class filter in the Factor Platform UI. You can browse schemas by domain, by owner, by tag, or by any custom attribute your mapping configuration exposes. For a team managing hundreds of schemas across multiple clusters, this makes governance questions significantly more practical to answer quickly.

What's coming next
The current release focuses on Schema Registry as the lineage source. In future releases, Factor Platform will also consume OpenLineage events from other producers in your stack, including Flink jobs and Iceberg tables, and will produce OpenLineage events that you can feed into other systems in your governance toolchain.
Speak to our team
Factor Platform is currently in early access. If you're working in a regulated environment and want to understand how lineage support could work for your Kafka setup, book a time with our team.
