Data September 7, 2022 3 min read

Data contracts between teams: why they matter and how they work

When data crosses team boundaries without explicit agreements, things break. A breakdown of what a data contract is and when you need one.

Picture this: the analytics team builds a report that depends on a table maintained by the product development team. The developers rename a field - they had a reason, it was an internal decision. The report breaks. This gets discovered on a Friday evening before the quarterly board meeting.

Neither team did anything wrong within their own area of responsibility. The problem is that the boundaries of responsibility were never explicitly defined.

That is the problem data contracts solve.

What a data contract is

A data contract is an explicit agreement between the team that produces data and the team that consumes it. It describes:

which fields and data types are available;
what freshness guarantees exist (how often it updates, with what latency);
what constitutes a contract violation (schema change, missed update);
who is notified when something changes;
what the change approval process looks like.

This is not a legal document. It is an operational agreement - often just a file in a repository that both teams can read.

When this becomes necessary

In a small team where everyone is in the same Slack channel and talks every day, formal contracts are overkill. One chat message resolves everything.

The situation changes when:

data is used by multiple teams that do not communicate directly;
producers and consumers of data work in different rhythms (product ships a release every two weeks, analytics builds models every quarter);
data feeds automated processes that cannot call anyone to clarify;
the company grows and teams specialise.

The moment a contract becomes necessary is the moment "hang on, I'll ask Alex" stops working as a coordination mechanism.

What happens without contracts

Without explicit agreements, several stable patterns emerge.

The consuming team fears changes. Any schema or structure update is a potential breakage. The team builds defensive mechanisms, duplicates data, creates buffer layers - all of which accumulates as technical debt.

The producing team does not know what they cannot change. For them it is an "internal table". For consumers it is a production dependency.

Debugging takes a long time. When something breaks, there is no shared understanding of where the problem is or whose responsibility it is. Teams blame each other or spend hours finding the source.

A practical minimum

A full data contract management system is mature infrastructure. For most companies it is enough to start small.

Document current dependencies. Which teams use which data? Where is this written down nowhere? That alone is the first step toward visibility.

Introduce schema change notifications. Before thinking about contracts, simply agree: the team that changes a data structure notifies consumers N days in advance.

Define which data is "public". Not all internal data needs to be stable for external consumers. Distinguishing "internal data" from "data for other teams" is already a significant improvement.

Three questions for a leader

If you want to understand whether your company needs data contracts, ask these questions:

If team A changes the structure of their database, how does team B - who uses it - find out?
Have you had incidents where something broke because of data changes nobody warned about?
Do you have data that automated processes or management reports depend on?

If the answer to the second question is "yes" - data contracts address exactly that problem.

Back to all posts

Contact

What a data contract is

When this becomes necessary

What happens without contracts

A practical minimum

Three questions for a leader

If this resonated, write to me. I reply personally.