Data September 25, 2019 3 min read

Data as a product: why you cannot put one team in charge of all the data

When analytics stops working, the problem is usually not the tools. How to distribute data responsibility across teams.

A standard situation: a company has an analytics department or data team that has been put in charge of "all the data". They build pipelines, answer questions, prepare reports. Meanwhile the actual data lives in systems owned by other teams: CRM, ERP, operational systems. The data team takes data from there but cannot fix quality issues - those systems are not theirs to control.

This contradiction accumulates slowly, but at some point it surfaces. Analytical reports diverge from each other. Nobody knows whose numbers are correct. The data team spends its time fighting fires rather than building infrastructure.

Where the problem comes from

Historically, data was treated as a by-product of systems. The CRM records data because that is how a CRM works. The ERP stores transactions because the accounting cannot function otherwise. Nobody deliberately thought of data as something that needed to be owned separately from the system that produces it.

When there is not much data, this works. When there is a lot of it and it is needed for decisions, the problems begin. No single team has a complete picture. No single team has the mandate and resources to fix it.

In 2019 the technology community is actively discussing a concept called data mesh - an approach to organising data work in which responsibility for data is distributed to domain-owning teams. The idea is not new, but its current articulation makes it more applicable.

What "data as a product" means

If the team that owns a domain - say, the orders team - is responsible not only for the functionality of its system but also for the quality and availability of the data it produces, that changes how work is done.

Data from that domain starts to be treated as a product with consumers: other teams, analytics, ML. This product needs to be described - what data, how often it is updated, what each field means. Its quality needs to be guaranteed - that data is complete, correct, not lost when the system changes. It needs to be maintained - responding to issues that consumers discover.

This requires a different mindset from team leads. You cannot simply say "we shipped the feature, the data will be whatever it is."

Why this is not solved by tools

You can buy or build a data catalogue, configure quality monitoring, implement data lineage. All of that is useful. But none of those tools will work if the data has no owner with a stake in its quality.

Tools make problems visible - who is responsible for a specific dataset, where freshness is broken, which field is undescribed. But someone with a mandate and motivation must respond to that visibility.

This is why data quality work is primarily an organisational task. Technology is secondary.

Where to start

If you recognise this situation in your company, a few steps that produce results faster than buying a new tool:

List the key datasets that underpin both operational and analytical work.
For each dataset, ask: who is de-facto responsible for its quality right now? If there is no answer, that is a risk point.
Agree that the teams who own the systems are responsible for describing and ensuring basic quality of the data those systems produce.
Make this mandate part of what is expected from team leads, rather than an additional burden on the data team.

This is not a fast process. But it is what changes the situation structurally rather than temporarily.

Back to all posts

Contact

Where the problem comes from

What "data as a product" means

Why this is not solved by tools

Where to start

If this resonated, write to me. I reply personally.