Data January 22, 2015 3 min read

Metadata is not an appendix to data

Why a data catalog and metadata ownership are an infrastructure question, not bureaucracy.

When a company starts talking about data governance, metadata almost always ends up at the bottom of the list. First the warehouse, then ETL, then dashboards - and the data catalog and field descriptions are "something we will do later". That later, in most cases, never arrives.

The result is predictable: analysts spend several hours a week figuring out what a specific field in the database actually means, or which calculation sits behind a metric in a report. People ask people instead of reading documentation.

What metadata means in practice

Metadata is the answer to the question "what is this and where did it come from". For a database table - that means column descriptions, data source, refresh frequency, and owner. For a metric in a report - that means the calculation formula, the time period, and the exclusions.

Without metadata, the data exists, but using it is hard. Every person who encounters the field "sum_net" for the first time has to go ask someone. Or guess. Both options are bad.

Why this matters beyond the analytics team

When only analysts use the data, the problem is tolerable. Knowledge can live in two or three heads and people ask each other.

But as soon as data is used more broadly - in machine learning models, in integrations with other systems, in automated reports for management - missing metadata becomes a systemic risk. A model trains on a field whose meaning nobody documented. An integration moves data assuming one thing, while the source intended another.

Errors of this kind are hard to spot immediately. They surface later - when the output has already been used in a decision.

Who should own metadata

The technical recording of metadata is an engineering task. But the content - what the metric means, how it is calculated, what its limitations are - that knowledge lives in the business.

This means metadata management cannot be a purely IT project. It is shared work: the business supplies definitions and rules, engineers record them in the system and keep them current.

When that shared work does not happen, IT writes technical descriptions that the business does not understand - or writes nothing at all.

When a data catalog is worth it

A small team with a few dozen tables can get by with a structured Wiki or a simple document. A data catalog as a tool becomes useful when there are many sources, when the same data is used across multiple teams, when you need to trace where a specific value in a specific table came from.

The tool solves only part of the problem. The main thing is an agreement on who maintains the descriptions and what happens when a field changes its meaning.

A few practical questions

If you are unsure how serious the problem is in your company, try answering these:

If a new analyst starts tomorrow - is there a document that explains the key metrics and data sources?
If a field in the database changes its logic - who will know, and how quickly?
Is there a metric in the company that different people answer differently - and both think they are right?
When did IT and the business last sit down together and work out what a specific metric actually means?

If the third point describes a familiar situation, the metadata conversation is overdue. This is not bureaucracy. It is infrastructure - without which data stays raw material rather than a usable resource.

Back to all posts

Contact

What metadata means in practice

Why this matters beyond the analytics team

Who should own metadata

When a data catalog is worth it

A few practical questions

If this resonated, write to me. I reply personally.