Breaking up a monolith: why sequence matters more than speed
How to plan a migration away from a monolithic architecture without halting operations - on sequencing, risks, and rollback points.
The conversation about splitting a monolith usually starts the same way. The system has grown, become hard to change, every release is a high-stakes event. Teams get in each other's way. Deployment has turned into a ritual with fingers crossed. The solution seems obvious: break the monolith into services.
I have seen several of these projects. The most painful ones moved fast and worked on everything in parallel. The most successful ones thought through the sequence first.
Why sequence matters
A monolith is a live system. While you are rebuilding it, it continues to serve the business. This is not a refactoring in a quiet room - it is replacing the engine while the car is moving.
If you start on several modules simultaneously, you have several unfinished changes at once. Each is a source of instability. Stack them together and you stack the risks.
A good migration moves like a surgeon: one incision, verify the patient is stable, then the next step.
How to choose where to start
You do not need to start with the largest module or with the one that irritates the team most. You need to start with the module that has the fewest dependencies and the clearest business value if the extraction succeeds.
Good candidates for the first step are modules with:
- clearly defined inputs and outputs;
- few internal calls to other parts of the system;
- a clear business owner;
- the ability to run in parallel - the new service and the old monolith both operate, traffic switches gradually.
That last point matters most. The strangler-fig pattern - where the new service wraps the old functionality and takes over traffic incrementally - is how you reduce risk to a minimum. You are never in an all-or-nothing situation.
What must be decided before you start
Before the first line of code is written for the new service, there need to be clear answers to a few questions.
What will the boundary between the new service and the remaining monolith look like? What is the API? Who owns the data that this module touches? How will traffic be switched - instantly or gradually? What happens if the new service fails - roll back to the monolith or not?
These are not technical questions. They are questions about how you manage risk and how you make decisions in the middle of the process.
Common traps
First: overestimating a module's independence. It looks isolated, but in reality it quietly reads from a shared database or calls a dozen internal functions. This is discovered mid-migration and stops everything.
Second: migrating the code but not the data. The new service runs, but the data still lives in the monolith. Now you have a service that depends on the monolith - the architecture has not improved, it has just become more complex.
Third: no rollback point. If you cannot quickly go back, every step becomes irreversible. Pressure on the team grows, mistakes are harder to fix.
Questions before you start
If you are facing the decision to begin such a migration:
- Is there a dependency map between modules in the monolith - not in theory, but a real, current one?
- Is the first extraction candidate chosen, and is it clear why that one?
- Is there a plan for the new service and the monolith to run in parallel?
- Is there a rollback point for the first step?
- Who decides to switch traffic - and based on what criteria?
If the answers exist, move forward. If not, get the answers first.