Data April 28, 2015 3 min read

\"We need a real-time dashboard\" is not a question

Why a request for real-time data usually means something else - and how to get to the actual question.

"We need real-time data" is one of the most common requests I hear from managers. Sometimes it is a specific technical requirement. But more often something else is behind it, and before building a stream-processing system, it is worth understanding what that something is.

Real-time is an expensive and complex thing. Real-time means data is updated and available within seconds or fractions of a second after an event. That requires a different architecture, different tools, and significantly more operational attention than batch processing once an hour.

What usually sits behind this request

When I start unpacking what a manager means when they ask for "real-time data", one of several things typically emerges.

The first is frustration with delay. Data arrives too late - not seconds late, but a day or two late. This is not a real-time problem, it is a refresh frequency problem. It is usually solved not by stream processing, but by more frequent batch loads.

The second is a lack of trust in the data. "Real-time" in this context means "I want to see fresh data because I am not confident what I am looking at is current". That is a question of trust and transparency, not technical architecture.

The third is a genuine operational need. Sometimes real-time is truly required: monitoring a production line, processing transactions, detecting anomalies in a stream of events. In these cases the request is specific and justified.

Why it matters to separate these

Stream processing is not an improved version of batch processing - it is a different system with different trade-offs. It is more complex to develop, more expensive to operate, and harder to debug. Latencies that are invisible in a well-run batch system become visible seconds that need to be handled explicitly.

If the request for real-time means "update data at least once an hour instead of once a day" - that is solved much more simply and cheaply.

If the request means "we need to react to events within a few seconds" - that is an honest request for stream processing, and it deserves a serious approach.

How to get to the real question

A few questions that help clarify this:

What exactly happens when data is "not real-time"? What decision is delayed or made incorrectly?

What latency is actually needed - seconds, minutes, hours? Where did that number come from?

What will change in people's actions if data is refreshed more often? Who will look at the dashboard, and what will they do with what they see?

If these questions reveal that a 15-minute refresh solves 90% of the problem, that is the right answer. Not a streaming platform requiring several months of development.

A practical conclusion

Real-time is not a goal in itself. It is an architectural decision with a price. Before making that decision, it is worth confirming that the problem it solves actually requires this particular solution.

A good order: first understand which decision is being delayed because data is stale - then discuss the architecture. That saves both money and time.

Back to all posts

Contact

What usually sits behind this request

Why it matters to separate these

How to get to the real question

A practical conclusion

If this resonated, write to me. I reply personally.