Event-driven architecture: what managers need to know before committing
Events and message queues solve real coordination problems between services. They also introduce complexity that is easy to underestimate from a project plan.
If you have been reading architecture content over the past few years, you have seen event-driven architecture positioned as a natural successor to REST APIs and synchronous service calls. The idea is that services communicate by publishing events to a message broker - Kafka being the most common example - and other services consume those events independently, without the producer needing to know who is listening.
This solves some real problems. It also introduces a different set of problems that are less visible in the early stages of a project.
What it actually solves
The problem event-driven architecture addresses most cleanly is tight coupling between services. When service A calls service B synchronously, A has to wait for B to respond, and if B is slow or down, A is affected directly. With events, A publishes and moves on. B processes when it is ready.
This matters at scale, and it matters when services are owned by different teams with different release cycles. A logistics service and an invoicing service can evolve independently if they communicate through a shared event log rather than direct API calls.
The other benefit is auditability. An event log is a record of what happened and when. For some domains - financial transactions, inventory movements, order state changes - this is genuinely valuable.
What it makes harder
The first thing that gets harder is debugging. When a request goes wrong in a synchronous system, you trace the call stack. When a message goes wrong in an event-driven system, you trace through the event log across multiple services with potentially different timestamps and storage. This is possible but it requires tooling and discipline that many teams underestimate.
The second thing is consistency. In a synchronous call, the result is either committed or not - you get a response. With events, you have eventual consistency. The order service has published the event. Has the inventory service processed it yet? The warehouse management system? If a user checks their order status thirty seconds after placing it, the answer may depend on which service has caught up.
For some domains this is fine. For others it is a significant complication.
The third thing is operational complexity. A message broker is a system you have to run, monitor, scale, and back up. Kafka is powerful and not simple to operate. The teams I have seen struggle with event-driven architecture are often the ones who chose it before they had the operational capacity to run the infrastructure reliably.
The questions worth asking before choosing it
Before adopting an event-driven approach I ask:
- What specific coupling problem are you solving? Is it a real problem at your current scale, or a projected problem?
- Does your team have experience operating a message broker in production?
- Are the domains involved tolerant of eventual consistency, or do they require synchronous confirmation?
- What is the cost of a delayed or dropped message in your specific context?
- Is the auditability and replay capability of an event log worth the complexity, or would a simpler solution meet the requirement?
A practical framing
Event-driven architecture is not better or worse than synchronous integration in the abstract. It is the right tool for specific problems: high-throughput decoupled processing, audit log requirements, integration between systems with very different operational characteristics.
If you are a company of fifty people with two backend engineers, the operational overhead of running Kafka may not be justified for the coordination problem you actually have. A well-designed REST API with clear ownership and a documented contract solves most of what small and mid-size companies need, and is much cheaper to operate.
The architecture decision should follow the problem, not the trend.