m@ksim.pro
Back to all posts
Data 3 min read

Data contracts: how teams agree on integration

Why most data integration problems between teams are problems of agreements, not technology.

In companies with more than one team or system, the same scenario appears sooner or later. Team A passes data to Team B. After a while, Team B complains: the data arrived in a different format, fields were renamed, some records are missing. Team A says they changed nothing - or that the changes were necessary. Untangling it takes days.

I have seen this scenario dozens of times. And every time the root is not in the technology. The root is that there was no explicit agreement between the teams about what is being passed, in what format, at what frequency, and what happens when things change.

That is exactly what the concept of data contracts addresses.

What a data contract is

A data contract is an explicit agreement between a data producer and its consumers. It includes:

  • data structure: fields, types, which are required;
  • semantics: what each field means and how it is populated;
  • update frequency and conditions;
  • SLA on availability and latency;
  • change procedure: how the producer notifies of planned changes and what the notice period is.

This can take the form of a technical specification, a wiki document, or a schema in a data catalogue system - the form depends on infrastructure maturity. What matters is that it is an explicit, living agreement, not something oral or assumed.

Why this usually does not happen

The most common reason is that a contract seems like unnecessary bureaucracy while everything is working. "We already know what goes in there" is the typical answer from a team that produces data.

The problem is that "we know" means informally, incompletely, and differently for different people. As soon as someone leaves or the number of consumers grows beyond one, the implicit knowledge starts to diverge from reality.

The second reason is that contracts are seen as limiting the producer's flexibility. "If I have to give two weeks' notice, I cannot iterate quickly." This is a real tension, but it does not eliminate the need for an agreement - it requires choosing a reasonable balance.

What this looks like in practice

You do not need to start with a formal system. A simple document for each critical data flow is enough:

  • source and destination;
  • list of fields with types and descriptions;
  • what the producer guarantees;
  • how the producer notifies of changes.

Once this exists explicitly, several things change. New consumers can understand the data independently. Changes are planned ahead rather than made suddenly. Incidents are resolved faster because there is a contract to compare against.

Where to start

A few questions to assess the current state:

  1. Does your company have documentation for the key data flows between systems or teams?
  2. Who is notified when a data structure changes? How quickly?
  3. Have there been cases in the past year where a change in one system unexpectedly broke something in another?
  4. When a new developer or analyst joins a project, is there somewhere the data structure is described, or is that knowledge passed verbally?

If most of the answers cause discomfort, now is a good time to start by documenting two or three of the most critical flows. Not perfectly - just explicitly.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp