m@ksim.pro
Back to all posts
IT 4 min read

Event-driven architecture: what decoupling services actually buys you

A plain explanation of event-driven patterns for owners and managers - what problems they solve, what new problems they introduce, and when the tradeoff makes sense.

When a company's software has been growing for a few years, it usually reaches a point where changing anything in one part of the system requires knowing what it will break somewhere else. Teams start coordinating releases because service A calls service B directly, and B needs to be updated before A can be deployed. Testing becomes expensive because the whole chain needs to be up.

Event-driven architecture (EDA) is one of the main tools for addressing this. It is also one of the main sources of accidental complexity in systems that adopted it without a clear understanding of the tradeoffs. This post is my attempt at a honest inventory.

The core idea

In a synchronous, request-response system, service A calls service B and waits for an answer. A knows about B; A is coupled to B's interface and its availability.

In an event-driven system, service A publishes an event - "an order was placed," "a payment was received," "a user was deleted" - to a message broker. Service B (and service C, and service D) subscribe to the events they care about and react to them asynchronously. A does not know who is listening. A does not wait.

The decoupling is real and significant: adding a new subscriber requires no change to the publisher. Removing a subscriber requires no coordination. Services can be deployed independently because they are not calling each other at deploy time.

What this enables

Independent deployability. Teams can ship their services on their own schedule without coordinating with every other team that their service interacts with.

Resilience to downstream failures. If service B is down, events accumulate in the broker. When B comes back, it processes the backlog. Service A never saw a failure.

Audit trails by default. An event log is a natural audit trail of what happened in the system and when. For regulated industries or complex debugging, this is significant.

New consumers without coordination. A new analytics service, a compliance reporter, a notification system can start consuming existing events without any change to the systems that produce them.

What this does not solve

EDA introduces its own category of complexity that teams often underestimate:

Eventual consistency is harder to reason about. In a synchronous system, after a successful call, you know the downstream state was updated. In an event-driven system, you know the event was published - you do not know when (or, in failure cases, whether) it was processed. UIs that need to reflect current state have to handle this explicitly.

Debugging is harder. A request trace that crosses three synchronous services is already complex. An event-driven flow where the same business action produces events that trigger five different handlers in five different services requires structured tracing to debug. Without investment in observability, you will not know what happened.

Message ordering and deduplication. In most real systems, events can arrive out of order and can be delivered more than once. Services consuming events need to be designed to handle this. "Idempotency" becomes a word your engineers need to think about for every event handler.

Schema evolution. Events have schemas. When the schema changes, all consumers need to be updated - or you need a versioning strategy. This is a real coordination cost.

When the tradeoff makes sense

EDA is worth the complexity when:

  • You have multiple independent teams that would otherwise be blocked by tight coupling.
  • You have clear, stable event semantics - business facts that are meaningful independently of who consumes them.
  • You have the operational maturity to manage a message broker, monitor consumer lag, and trace event flows.
  • The processes being modelled are genuinely asynchronous - notifications, reporting, audit, background enrichment.

It is not worth it when:

  • The system is small enough that direct calls are maintainable.
  • The "decoupling" is theoretical - in practice one team owns all the services.
  • The response to the event is always needed before returning to the user (this is a synchronous interaction dressed up as events).

The decision is not about whether EDA is good or bad. It is about whether the coupling problem you have is severe enough to justify the operational overhead you are taking on.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp