Kafka and event streaming: what a manager needs to understand
A plain explanation of why companies are switching to event-driven data flows, what Kafka actually does, and when it is worth the complexity.
A year ago I could explain Kafka to a CTO and skip it with everyone else. That has changed. Event streaming keeps coming up in vendor proposals, architecture reviews, and hiring discussions. If your team is building anything that moves data between systems in near-real time, you will hear about Kafka or one of its relatives sooner or later.
This is not a technical tutorial. It is a framing for decision-makers who need to ask the right questions.
What problem event streaming solves
Classic data integration looks like this: system A writes to a database, a scheduled job reads from that database and copies records to system B, system B processes them on the next run. It works. It also means data is always somewhat stale, coupling between systems is tight, and adding a third consumer means touching A or B.
Event streaming flips this. System A publishes events - "order created", "payment confirmed", "stock level changed" - to a shared log. Any system that cares about those events subscribes and reads them in its own time. A is not aware of B or C. The log is the contract.
Kafka is the most widely deployed system for maintaining that shared log at scale. It is durable, fast, and designed to handle millions of events per second across many consumers.
What Kafka is not
Kafka is not a database. It does not replace your operational systems or your data warehouse. It is a transport and a short-to-medium term buffer, not a system of record.
Kafka is also not a message queue in the classical sense. A queue typically deletes a message once it is consumed. Kafka keeps the event log for a configurable retention period - hours, days, weeks - and any consumer can replay from any point. This is what makes it useful for rebuilding derived data stores, debugging integration failures, or adding new consumers without touching producers.
When the complexity is justified
Event streaming adds real operational complexity: cluster management, schema evolution, monitoring consumer lag, handling replay and ordering guarantees. None of that is free. I have seen teams reach for Kafka when a simple webhook or a shared database table would have served them fine for years.
The pattern that genuinely justifies the investment:
- three or more systems need the same data as it changes;
- latency matters - minutes are too slow;
- consumers have different processing speeds and must not block each other;
- audit trail and replay are a business requirement, not a nice-to-have.
If only two of those are true, think twice. If none are true, do not introduce the complexity at all.
What to ask your team
When an architect or a vendor proposes Kafka, the useful questions are:
- How many consumers will read these events on day one, and how many realistically in a year?
- What is the acceptable latency for the downstream systems?
- Who owns schema evolution when producers change their events?
- What happens if a consumer falls behind - does the business care about the lag?
- Is there a managed cloud offering we can use, or are we committing to running the cluster ourselves?
The last question matters more than people admit. Self-managed Kafka is a serious operational responsibility. Managed offerings - Confluent Cloud, AWS MSK, Aiven - shift that burden for a cost. For most mid-sized companies the managed path is the right one.
The practical framing
Event streaming is infrastructure, not a feature. It changes how teams integrate with each other more than it changes what any single service does. That means the decision belongs at the architecture level, not in a sprint backlog. If your team is proposing to add it, ask for a diagram of who produces events, who consumes them, and what schema governance looks like. If that diagram is clear, the proposal is serious. If it is vague, the team is still figuring it out.