m@ksim.pro
Back to all posts
AI 3 min read

ML models decay silently - and most companies do not notice

A model that was accurate at launch will gradually stop being accurate as the world changes. Why monitoring for model decay is not optional, and how to set it up before it becomes an incident.

Deploying a machine learning model is often treated as the finish line. The model was validated, the results looked good, and it is now running in production. What happens after that gets much less attention.

Models decay. The world they were trained on changes. Customer behaviour shifts. Product mix evolves. Seasonal patterns drift. The model continues to produce predictions with the same apparent confidence while its accuracy quietly degrades. Because the output format does not change, nobody notices - until a business outcome makes it obvious.

Why this is different from software bugs

A traditional software bug has a clear symptom: the system crashes, returns an error, or produces a visibly wrong result. Model decay is slower and quieter. The predictions are still numerically valid. The API still responds. The only signal is that business outcomes tied to the model's outputs start getting worse, and that connection is rarely obvious to the people noticing the outcomes.

This makes model decay much easier to ignore than a service failure. There is no alert. There is just a gradual drift that takes months to surface in business metrics, by which time the model may be significantly off from its original accuracy.

What causes it

The most common causes I see in production systems are distribution shift and concept drift.

Distribution shift means the statistical properties of the input data have changed. If a model was trained on customer data from 2015-2017 and is now scoring customers who behave differently because the product or market has evolved, the inputs no longer match the training distribution.

Concept drift is subtler: the relationship between the inputs and the thing being predicted has itself changed. A model predicting customer churn trained on a pre-competition market may have learned associations that no longer hold once a major competitor entered.

Both are normal. They are not failures of the original model. They are facts about how the world changes.

What monitoring actually looks like

The minimum viable setup for a production model involves three things.

First, track input data distribution over time. If the features the model consumes start looking statistically different from what the model was trained on, that is an early warning. You can do this with simple statistical tests on a sample of recent inputs.

Second, capture ground truth when it becomes available. If the model predicts churn and the customer either churned or did not 30 days later, that is a data point. Log predictions with a timestamp and link them to outcomes when they come in. This gives you a real accuracy curve over time.

Third, set a retraining trigger. Not a calendar date, but a threshold: when measured accuracy drops below X, a retraining cycle is initiated. What X is depends on the business cost of degraded predictions.

The organisational part

None of this is technically complex. What makes it hard is organisational. The team that built the model has often moved on to the next project. The team that operates the system may not know the model exists as a distinct component that needs monitoring. And the business team watching outcomes may not connect degrading results to an ML component they do not know about.

The answer is that every model deployed to production needs an owner who is responsible for its accuracy over time, not just its initial deployment. That owner gets the alerts and is responsible for initiating retraining when the signal requires it.

Treating model deployment as the end of the project is how expensive AI investments quietly stop delivering value.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp