Model drift: why an ML system degrades without visible failures
Machine learning models in production lose accuracy over time - quietly, with no errors and no alerts. What drift is and how to monitor for it.
One of the most unpleasant properties of ML systems in production is that they can degrade without a single incident in the monitoring dashboard. The service responds, latency is fine, no errors. Recommendations just quietly get worse, scores become less accurate, forecasts less relevant. Until it starts showing up in business metrics, nobody may notice.
This is called model drift. It is a systemic problem, and most teams run into it when they move ML from a pilot into real operations.
Where drift comes from
A model is trained on historical data. When the world changes, so does the distribution of data the model is expected to work with. There are two main types:
Data drift - the input data has changed. Users behave differently, the product range shifted, a season turned, external conditions moved. The model sees data that differs from its training distribution and starts performing worse.
Concept drift - the relationship between the features and the target variable has itself changed. A credit scoring model trained before a period of rising interest rates is a classic example. After rates change, the features that previously predicted creditworthiness no longer work the same way.
Both types happen gradually and without obvious markers.
Why standard monitoring does not help
Classic application monitoring tracks system metrics: uptime, response time, error rate. For ML that is not enough.
Model quality is not a system metric - it is a statistical property. To track it you need:
- to know the "right answer" for some portion of predictions (ground truth);
- to compare the distribution of incoming data against the training distribution;
- to calculate quality metrics on live traffic, not just on a held-out test set.
Many teams only do the last step at the moment a new model version is deployed - and miss everything that happens in between.
What actually helps detect drift earlier
Practical approaches I have seen working in real systems:
Shadow scoring. A new model version runs in parallel with the old one on real data without affecting the output. Divergence between predictions is a signal to investigate.
Feature distribution monitoring. Watch the statistical characteristics of incoming data - means, quantiles, category proportions. A meaningful shift away from the training distribution is a reason to review the model.
Delayed labelling. For some fraction of predictions, collect the actual outcome after the fact and compute accuracy on that sample. It requires discipline in data collection, but gives a direct measure of quality.
Business metrics as an indirect signal. Conversion rates, returns, complaint rates - all of these can reflect model degradation before it becomes visible in technical metrics.
The organisational question
Model drift is not a problem you solve once at deployment. It is an ongoing operational task. Whether an ML engineer, a data analyst, or a product team owns it depends on how a given company is structured.
The important thing is that the question is answered before the model goes to production. "We will retrain when we notice degradation" is a plan with no metrics and no triggers. That plan tends to fire several months too late.