Data lineage: where the number in the report came from, and who owns it
Tracing metrics back to their source is not a technical nicety - it is the foundation of trust in analytics.
In a meeting someone shows a report. One of the participants asks: "Why does it say 4.7 million here, when the table I looked at yesterday showed 5.1?" Pause. People glance at each other. Someone says "probably different periods." Someone else says "or different filters." The outcome: nobody trusts the numbers, the decision is deferred.
This is not a technical failure. It is the absence of what is called data lineage - the ability to trace where a number came from, what transformations it went through, and who is responsible for each step.
What lineage is and why it matters
Lineage is a documented chain from the data source to the number in the report. At its simplest: which table the data came from, how it was filtered, how it was aggregated, when it was last updated.
Why does a manager - not an analyst, but a manager - need this? Because without it there is no principled way to trust numbers. An analyst says "it's all correct" - but that claim cannot be independently verified. When something does not add up, investigation takes hours or days. In complex cases it never reaches a definitive answer. The earlier layer of this problem - dirty master data that corrupts any report before it even reaches lineage - is a separate concern.
Lineage turns "I trust it because a person says so" into "I trust it because I can check."
What the world looks like without lineage
In most companies I have worked with, the picture is roughly the same:
- several reports on the same metric produce different numbers;
- nobody knows for certain which one is right;
- there is "the person who counts this" - and as long as they are around, things work;
- when they leave or go on holiday, things fall apart;
- when a new system is connected, it turns out the calculation logic is written down nowhere.
This is what a bus factor looks like when it applies to data. Lineage is the way to pull that knowledge out of one person's head.
How lineage works in practice
Real lineage does not necessarily require a specialised tool. Starting simple is enough:
- every key metric should have a definition: what exactly is counted, for which period, which cases are excluded;
- every data source for a report should be named explicitly;
- if data goes through a transformation, that transformation should be documented or, better, written in code;
- every report should carry the date of its last update and the name of its owner.
This sounds straightforward. In practice companies go years without even this.
Who owns the number
Lineage without ownership is a dictionary without an author. The second element is data ownership: every key metric needs a specific person who:
- knows how it is calculated;
- ensures the calculation does not break when systems change;
- explains discrepancies when they appear.
This does not have to be an analyst. It can be the head of the commercial team who owns revenue, or the production manager who owns downtime. What matters is that there is a specific name.
Signs that lineage needs attention
A few questions that help identify where to start:
- If the person who builds the main report left tomorrow, could someone else reproduce it from scratch in a week?
- If an error appeared in a report, how quickly would it be found and fixed?
- Can you explain to the CEO where each key number comes from without spending more than five minutes?
- When two reports disagree, is there a procedure that gives a definitive answer?
If the answer to even half of these is "no" - that is not a technology problem. It is a data discipline problem, and it is not solved by rolling out a new system. It is solved by assigning ownership and documenting what is already there.