m@ksim.pro
Back to all posts
Data 3 min read

Data freshness and operational decisions

When stale data costs money, and how to understand which data in your company needs to be current and which does not.

Most discussions about data freshness start from the technical side: how often does the data mart refresh, does the warehouse support stream processing, what is the pipeline latency. These are important questions, but they are the second step.

The first step is to understand which decision is being made on that data and how much that decision costs when made on stale data. Until that question is answered, talking about refresh frequency is premature.

I have seen companies build expensive real-time infrastructure for reports that were looked at once a week. And companies where critical operational decisions were made on two-day-old data - not by design, but because nobody thought about the cost of the delay.

Where the cost of stale data comes from

Stale data does not create a problem by itself - it creates a problem through the decisions made on it.

If a procurement manager sees warehouse stock with a one-day lag, they make a restocking decision based on yesterday's picture. If sales surged during that day, they will either order too little or order too late. The cost of the error is lost sales or excess inventory.

If a dynamic pricing system operates on data with a few-hour lag in a fast-moving market, the company systematically loses margin during periods of high demand.

If the CFO looks at a cash position that is updated once a day and makes a short-term financing decision, the risk is higher than if the data refreshed every few hours.

How to determine which data requires freshness

A simple method is to ask three questions for each dataset or report.

First: how often are decisions made using this data? If the decision is made once a month, day-old data creates no problem. If the decision is made several times a day, a different conversation is needed.

Second: what is the cost of the error when data is stale? Not every error costs the same. In some processes a day's delay is irrelevant. In others it costs money right now.

Third: does the person or system making the decision have a way to know the data is stale? Sometimes the problem is not the delay itself but the absence of any indication of the delay. If a report shows numbers without a timestamp, the user does not know how current they are.

Typical risk zones

In my experience, stale data most often creates operational risk in a few areas:

Inventory management. Restocking decisions made on data delayed beyond one operational cycle systematically err during periods of unusual demand.

Financial monitoring. Cash position and accounts receivable are data where delay directly affects management decisions.

Operational process monitoring. If a problem in a production process appears in a dashboard with several hours of delay, it has time to become expensive before anyone responds.

AI systems operating in real time. Models making decisions on data - scoring, recommendations, pricing - are especially sensitive to freshness because data degradation is not always immediately visible in the system's own metrics.

How to set priorities without rebuilding everything

A practical approach: list the 10-15 key operational reports or data flows and score each on two axes - frequency of decisions and cost of error.

Those that fall into the high-frequency, high-error-cost zone are the priority for improving freshness. Those in the low-frequency or low-error-cost zone - the existing schedule is sufficient.

This is not a technical analysis; a business team can run it together with an analyst in a few hours. The output is a priority map for the technical team that closes the most expensive gaps first.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp