Streaming data: when operational decisions cannot wait for a batch
When a business needs streaming instead of batch processing, and what needs to be decided before adopting Kafka or similar tools.
Most analytics systems in companies operate on one principle: data accumulates during the day, a batch job runs overnight, the report is ready in the morning. This works for reporting, planning, and analysis of the past.
But there is a class of tasks where a decision needs to be made while the event is still happening - or at least within minutes, not hours. A fraudulent transaction. A technical fault on a production line. An abnormal spike in returns. For these tasks, batch processing is unsuitable by definition.
This is the territory of streaming data.
How streams differ from batches
In the batch model, data accumulates first and is processed afterwards. The boundary is clear: here is the dataset, here is the computation over it.
In the streaming model, data arrives continuously and processing happens as each event or small batch arrives. There is no final dataset - there is an infinite stream.
This requires a different architecture. You cannot load everything into memory, compute, and return an answer. You need to maintain state, handle late-arriving events, manage time windows - what counts as "simultaneous" when events are spread across time?
Apache Kafka, which has become the de facto standard for the transport layer of data streams, solves the problem of reliable event delivery. But the fact of using Kafka does not mean the streaming problem is solved.
Where streams are actually needed
Three task classes where the latency of batch processing is not an inconvenience but a loss of value.
Operational alerts. An anomaly in production metrics, a threshold breach, a deviation from normal equipment behaviour. If detection happens 12 hours after the event, most of the damage is already done.
Reaction to user behaviour. A recommendation at the moment of interaction, fraud prevention before a transaction completes, a personalised offer while the customer is still on the site. Here latency is measured in seconds, not hours.
Real-time integration between systems. When a change in one system must immediately reflect in another - not through a nightly ETL. This is common in operational systems: order status changes, inventory updates, synchronisation across platforms.
What needs to be decided before implementation
The most common mistake when adopting streaming architecture is starting with the tool rather than the task. Kafka is convenient, easy to deploy. But Kafka is a transport. What does the system do with the data flowing through it?
Delivery guarantees. What happens if the consumer is unavailable? Are the data stored and processed later - or lost? Kafka provides the tools for this, but you need to explicitly configure and test behaviour during failures.
Order and duplicates. In distributed systems, events may arrive out of order or arrive twice. The processing logic must account for this - or infrastructure-level guarantees must prevent these situations.
Latency monitoring. If the system is meant to respond in real time, you need to know how long processing actually takes from event to action. This is a separate metric that must be tracked.
A practical starting point
Before moving toward streaming architecture, a few questions are worth answering:
- Is there a specific task where hour-long latency is a real problem, not just an inconvenience?
- What exactly happens to data after it enters the stream - is there processing logic, or just transport?
- Who will maintain the streaming infrastructure, and what are the team's competencies in this area?
- How does the system behave when a consumer or broker fails?
Streaming solves real problems. But it adds operational complexity. That complexity is justified where latency is expensive - and unnecessary where batch handles the job.