MLOps: the gap between experiment and production
Why most ML experiments never reach production, and what to do about it at the organisational level.
There is a pattern I keep seeing in companies that have started working with machine learning. A data scientist trains a good model. The metrics on the test set look convincing. The demo at the meeting is impressive. Then "handover to engineering" begins - and something goes wrong.
Sometimes the model never appears in production. Sometimes it does, but behaves differently from the demo. Sometimes it works for the first month and then quietly degrades because the data changed and nobody was watching.
This is called the ML production gap - the gap between an experiment and a working system. And it is as much an organisational problem as a technical one.
Why experiment and production live in different worlds
An experiment is optimised for speed and flexibility. A notebook, version history in local files, data loaded once, model trained and evaluated. The goal is to find out whether the idea works.
Production is optimised for reliability and reproducibility. Data needs to arrive regularly and in a standard format. The model needs to behave consistently at any hour. There needs to be monitoring. There needs to be a process for updating when quality degrades.
These are fundamentally different requirements placed on the same artefact - the model. And the gap between them does not close on its own.
What MLOps involves, and why a manager needs to know
MLOps is a set of practices and tools that help move ML experiments into reliably working systems. A manager does not need to know the implementation details, but it is useful to understand what this work involves.
Data and code versioning - so that any result can be reproduced. Training pipelines - so that a model can be retrained without manual work when data changes. Model deployment and versioning - so you can roll back if a new version is worse than the old one. Quality monitoring in production - so you know when a model starts to degrade.
Without this layer, an ML project will either stall at the experiment stage, or create an opaque system that nobody dares touch.
Signs the problem is there
A few signals I hear as symptoms:
"We have good models but we can't ship them quickly" - there is no deployment process.
"The model worked well and then started giving strange results" - there is no monitoring for data drift or quality degradation.
"Only Ivan knows how to retrain this model" - there are no reproducible pipelines.
"We deployed the new version and the old one is gone" - there is no versioning.
Each of these symptoms is a separate organisational and technical debt.
Practical steps
Most small ML projects do not need a complex MLOps platform. They need something minimal but functional:
- Every experiment is recorded - data, parameters, metrics. At a minimum, a structured log; better, a dedicated experiment tracker.
- Model deployment is a repeatable process, not a manual operation. Even a script with documentation counts.
- In production, there are model quality metrics, not just system performance metrics.
- There is an owner responsible for the model's health in production - a person who receives alerts and makes decisions.
This is not full-scale MLOps. But it is the difference between a system that stays alive and one that degrades quietly.