Real-time analytics: when batch is actually enough
Why most companies overpay for streaming analytics where batch processing would be cheaper and more reliable.
"We need real-time analytics" is one of the phrases I hear most often in meetings with teams planning or rebuilding their analytics infrastructure. And almost always, behind that phrase is not a genuine need for data freshness measured in seconds - it is a desire to have "more current" information than they have today.
This distinction matters. Because real real-time is expensive, complex, and creates a specific operational burden. What most companies actually need is simply not to wait a day or a week until the next update.
What real real-time is and when it is needed
Analytics in real time, strictly speaking, means data is processed and becomes available within seconds or fractions of a second of an event. This is needed in a narrow class of tasks: fraud monitoring at the moment of a transaction, management of industrial processes, trading systems, operational monitoring with automated response.
In all these cases the delay is measurable and costly. If a fraud system updates hourly, fraudsters can complete hundreds of transactions in the window. If a production line monitor has a five-minute lag, an incident can be missed.
Where companies overpay
Now look at the typical request: "we want our sales dashboard to update every hour instead of once a day." That is a reasonable need. But it is not a requirement for real-time streaming - it is a requirement for more frequent batch updates.
Or another common request: "we need to see what is happening with orders right now." Usually this means - not twenty-four hours after the event, but within fifteen to thirty minutes. That is also solvable with more frequent batch, not streaming.
Streaming architecture costs more to build, requires different expertise on the team, is harder to debug and monitor, and creates a different class of problems - event ordering, deduplication, delivery guarantees.
If you can solve the task with a batch job running every fifteen to thirty minutes, you do not need streaming.
How to choose
A simple framework for the decision:
Ask: "what specifically breaks or gets worse if the data is thirty minutes stale instead of five seconds?" If the answer is hard to come up with - you probably do not need second-level freshness. If the answer is specific and operationally significant - real-time is justified.
Second question: "who will act on this data and on what time horizon?" If the sales manager looks at the dashboard once an hour - data refreshed every thirty minutes fully covers their need. If the system is making automated decisions in a millisecond cycle - real-time is needed.
Third question: "what is the cost of the delay?" Not in the abstract - what specifically is lost if data is X minutes late.
Four latency zones
It is useful to think in four modes:
- Seconds and below: streaming processing, Kafka or equivalents, specialised team skills.
- Minutes (5-30 min): frequent batch or micro-batch, significantly simpler.
- Hours: standard batch, well-understood tools, simple operational model.
- Daily: classic nightly ETL, the simplest and most reliable.
Most analytics tasks in mid-sized businesses live in the "hours" or "minutes" zone. Streaming architecture is not needed there.
A practical checklist
Before deciding "we need real-time":
- Articulate a specific scenario with specific actions that depend on data freshness.
- Define the acceptable delay for that scenario - in seconds, minutes, or hours.
- Calculate the cost of delay beyond that threshold - it should justify the investment in streaming.
- Ask the team whether it has experience running streaming systems in production - it is a different discipline.
- Compare the cost of streaming architecture with simply running batch more frequently.