Data April 19, 2016 3 min read

Event streaming: when should a business actually look at this

Apache Kafka and streaming architecture are not only for internet giants. I look at which business problems justify this approach and where it is overkill.

Apache Kafka started as LinkedIn's internal tool for processing user activity streams. Today it is one of the most discussed technologies in data architecture. I regularly see two opposite mistakes: "we are too small, this is not for us" and "let us deploy Kafka because everyone does".

I want to offer a more grounded view: when does event streaming actually solve a business problem, and when is it premature complexity.

What event streaming means in non-technical terms

Traditional data architecture works in batches: once an hour, once a day, collect data, process it, load it into storage, build reports. This is fine for most analytical tasks.

Streaming architecture works differently: every event - a sale, a user action, a sensor signal, a transaction - is processed as it arrives. Not an hour later, but within seconds or minutes.

The difference matters where delay has a cost. If you learn about a fraudulent transaction 24 hours later - the money is already gone. If you react within a minute - you can block it.

Business problems that justify this approach

Real-time anomaly detection is the classic case. Fraud in financial services, security violations, production failures. Where fast reaction is required, batch processing does not work by definition.

Operational dashboards with current data. If a manager looks at "today's sales" and sees data that is three hours old, that is not an operational picture - it is history. For some tasks this gap is significant.

Integrating many sources in real time. When data comes from dozens of systems and you need to combine and react to event combinations - streaming architecture provides the right model.

Logging and audit with delivery guarantees. Kafka handles the task of a reliable message queue that does not lose data even when consumers fail.

Where it is overkill

If your analytical tasks fit daily or hourly updates - Kafka adds complexity without corresponding value. A well-configured ETL pipeline and a relational store will solve the problem more cheaply and reliably.

If you do not have a team experienced in operating distributed systems, you will spend significant time maintaining infrastructure rather than working on business problems.

If data volumes are modest, a specialised system is not needed. PostgreSQL with a good index will handle the volume that many companies call "big data".

How to assess readiness

A few self-diagnostic questions:

Do we have tasks where a data processing delay of several hours costs money or creates operational risk?
Do we have data sources that generate events continuously and that we are currently forced to buffer awkwardly?
Do we have engineers with experience in these kinds of systems, or the ability to bring them in?
Are we prepared for the infrastructure burden to grow before we see business value?

Event streaming is a powerful tool for a specific class of problems. Like any specialised tool, it earns its cost only when the problem actually fits it.

Back to all posts

Contact

What event streaming means in non-technical terms

Business problems that justify this approach

Where it is overkill

How to assess readiness

If this resonated, write to me. I reply personally.