Data September 7, 2016 3 min read

Kafka as a data backbone: what it means for a company

Apache Kafka is no longer only a tool for large tech companies. How to explain its role without technical jargon.

Apache Kafka was born inside LinkedIn to solve a specific problem - transmitting enormous volumes of events between internal systems in real time. LinkedIn then open-sourced it, and for the past few years Kafka has been used by large technology companies for similar tasks.

Now I am starting to see it in projects where the client is not a technology company but a manufacturing, logistics, or financial one. This is a good moment to explain what Kafka does and for which tasks it makes sense - without assuming the reader knows what a "message broker" is.

What a data backbone is in the first place

Imagine a company has a dozen systems: ERP, CRM, a manufacturing MES, warehouse, delivery, analytics warehouse. Each of them produces events - order created, batch shipped, equipment signalled, customer submitted a request.

Without a centralised backbone, each system talks directly to each other. This is what is called a "star" or, in the worst case, "spaghetti" - N systems produce N times (N minus 1) integration connections. Each connection is brittle, and every change in one system breaks the others.

A data backbone is an intermediary. Systems do not talk directly to each other. They send events to the backbone, and subscribers receive what they need. This significantly simplifies the architecture and reduces integration fragility.

How Kafka differs from classic integration buses

Classic ESBs (enterprise service buses) were also backbones. Kafka differs in several properties that matter in practice.

First - Kafka stores event history. A standard message queue: message received, message deleted. Kafka stores the event stream for a configurable retention period. This means a new system connected a month later can "read the past" and catch up on everything that happened.

Second - Kafka scales horizontally. When data volume grows, you add servers instead of rearchitecting.

Third - consumers are independent. An analytics system, a monitoring system, and an operational system can read the same event stream independently, at their own pace.

Where this makes sense

Kafka is justified where there are multiple systems with high event frequency and where consistency of data across them matters.

Manufacturing: equipment signals, MES events, quality data - all are streams that need to be available to several consumers: operational control, analytics, predictive maintenance.

Logistics: shipment events, delivery confirmations, GPS points - an event stream that several systems need simultaneously.

Finance: transactions that need to reach analytics, fraud detection, and operational accounting without intermediate copies and data loss.

What to understand before making a decision

Kafka is a tool with operational requirements. A Kafka cluster needs to be maintained, monitored, and backed up. For a small company with simple integrations it is unnecessary overhead.

A few questions to help assess whether this is needed:

How many systems in the company are currently integrated with each other, and how complex are those integrations?
Are there tasks requiring event processing in real or near-real time?
Are there multiple systems needing the same data - currently solved by copying or synchronisation?
Do we have the engineering resources to support an infrastructure component at this level?

If the answers are yes, Kafka can simplify the architecture. If not, simpler solutions will serve better.

Back to all posts

Contact

What a data backbone is in the first place

How Kafka differs from classic integration buses

Where this makes sense

What to understand before making a decision

If this resonated, write to me. I reply personally.