Data May 29, 2017 3 min read

Real-time data: when the business actually needs it, and when it is over-engineered

How to tell which business tasks genuinely need stream processing, and which ones work perfectly well with regular batch updates.

The phrase "in real time" has become a standard decoration in technical proposals and IT strategies. It sounds current and persuasive. But when I dig into why real-time processing is actually needed in a specific project, I usually find that the requirement appeared on its own - not from a real business need, but from a general feeling that it is the right way to do things.

That is a problem. Stream processing is not just "faster batch". It is a fundamentally different architecture with different complexity, different infrastructure requirements, and different operational costs. Sometimes it is justified. Often it is not.

The difference between stream and batch processing

Batch processing means processing accumulated data on a schedule. Once an hour, once a day, once a week - take a chunk of data, process it, put the result in storage. This is simple, predictable, and cheap to operate.

Stream processing means processing data as it arrives, with minimal delay. Each event is handled as soon as it comes in. This requires different infrastructure: message queues, stream processors, a different model for storing intermediate results.

Technologies for stream processing exist and work well. The question is not whether this is technically possible, but whether your business needs it.

When real time is actually needed

There are tasks where delay is measured in money or risk. A few examples:

Fraud monitoring in payment systems. A decision to block a transaction must be made in seconds - before the transaction completes. A delay of several hours makes the system pointless.

Industrial equipment monitoring. If a sensor shows an anomalous reading that may precede a failure, an operator needs the signal immediately - not the following morning.

Operational dashboards with short reaction cycles. If a shift manager makes decisions based on data and their reaction horizon is a few minutes, an hourly delay makes the dashboard useless.

What these cases share: the delay in data directly affects the ability to take the necessary action. If the action will have lost its meaning in a few hours, stream processing is needed.

When regular batch is enough

Most corporate analytics tasks do not require data to be updated more often than once an hour or once a day. A few examples:

a daily sales report by region;
a weekly warehouse stock calculation;
a monthly P&L for management;
a dashboard with key metrics checked once in the morning.

Batch processing is sufficient for all of this. Adding stream architecture "for robustness" or "because it is more modern" means adding complexity without value.

The question that clarifies things

Before discussing architecture, I ask one question: what specifically will change in a person's or system's actions if data is updated in real time rather than once an hour?

If the answer is concrete - someone will make a different decision or a system will take a different action - then stream processing is justified. If the answer is vague - "it will be more modern" or "just in case" - then batch architecture is sufficient, and adding complexity is not warranted.

Getting this question right saves significant money on infrastructure and team time on maintenance.

Back to all posts

Contact

The difference between stream and batch processing

When real time is actually needed

When regular batch is enough

The question that clarifies things

If this resonated, write to me. I reply personally.