m@ksim.pro
Back to all posts
AI 2 min read

AI does not fix bad data

A short note on why an AI rollout in a company starts not with the model, but with the quality of the data underneath.

Almost every "let's roll out AI" conversation with a founder starts the same way. There is a problem, there is hope for fast automation, and there is a feeling that the technology will solve what the team has been failing to solve manually for years.

I get it. But in practice the wall is rarely the model. The wall is the data.

What "bad data" actually means

Bad data is not always wrong numbers. More often it looks like this:

  • the same entity is named differently in different systems;
  • chunks of context only live in people's heads;
  • dates, currencies, and units are normalised somehow but not consistently;
  • there is one "source of truth" on paper and three in practice;
  • between ERP, CRM, and Excel, data moves by hand, with losses.

AI cannot fix any of that. It will average it, guess, generate a confident-sounding answer, and lock the error into a polished wrapper.

Why the model does not heal the chaos

Modern models are good at generalising patterns and bad at guessing what the data does not contain. If the company has historically conflated "revenue" with "turnover" in reports, the model will keep conflating them. If a vendor is recorded as both "Acme Inc." and "Inc. Acme", it stays two different counterparties until a person or a process links them.

AI does not replace data discipline. It amplifies whatever is already there, in both directions.

What a real rollout starts with

When I look at a new AI project, I almost always start with the same questions:

  1. What data does the task need, and where does it physically live?
  2. Who is responsible for its correctness today?
  3. What hidden quirks does this data have?
  4. What part of it can be turned into a managed process, instead of a one-off export?
  5. What is the actual usage scenario for the result - who works with it, when, and how?

Only after that does it make sense to talk about the model, the agent, or the LLM integration.

A simple test

If, when you ask "where will the data for this AI feature come from?", everyone in the room looks at each other - the project does not start with the model. It starts with data engineering.

That rarely looks impressive in a presentation. But it is exactly what decides whether the AI project will still be running a year from now, or whether it has quietly become an expensive toy.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

m@ksim.pro