Feature flags in production: the promise and the maintenance debt
Feature flags let you ship safely and experiment cheaply - but every flag you add is a piece of logic you have to eventually remove.
A few years ago feature flags were a pattern used mainly by large tech companies. Today they are part of the standard toolkit at companies of almost any size. The pitch is straightforward: decouple deployment from release, control who sees a feature, roll back without a full deploy, run A/B tests in production.
All of that is real. The trouble is the second half of the story, which the pitch usually skips.
What feature flags actually give you
The concrete benefits I see in practice:
- You can deploy unfinished code without it being visible to users. The new checkout flow ships on Tuesday, goes live for ten percent of users on Thursday, and rolls out to everyone the following Monday after you confirm the numbers.
- You can test in production with real traffic, which is different from any staging environment.
- Rollback is a config change, not a revert and redeploy. When something goes wrong at 11 pm, this matters.
- You can give internal users early access without a separate build.
These are meaningful benefits, especially for teams that deploy frequently and cannot afford long freeze windows.
The debt that accumulates
Every flag is a branch in your code. One flag means two paths. Ten flags mean up to 1024 theoretical combinations. In practice you test only a handful of them.
The flags that cause trouble are the ones that outlive their purpose. The experiment ended six months ago, the winner was obvious, but nobody removed the losing path. The codebase now has dead logic that still runs on every request, is still covered by tests that still pass, and will surprise the next engineer who touches that module.
I have seen teams with thirty to forty active flags in a production codebase where fewer than ten were intentionally active. The rest were "we should clean that up" items on a list that nobody maintained.
When flags create the opposite of safety
A flag added to mitigate a risky deployment can hide a deeper problem. If a team is adding flags because they are afraid to deploy directly, the flag is a symptom, not a solution. The underlying issue is usually: the deployment process itself is fragile, or the test coverage is too thin to give confidence.
Flags also introduce their own failure modes. The flag evaluation service goes down. The configuration file is malformed. The flag is set correctly in production but not in staging, and the bug only surfaces at 2 pm on a Tuesday.
How to keep it manageable
The teams I have seen handle flags well share a few habits:
- Every flag has an owner and a planned expiry date set at creation.
- Removing a flag is treated as a first-class task, not optional cleanup. It goes into the sprint like any other ticket.
- The number of active flags is tracked as a metric. When it crosses a threshold, someone is assigned to reduce it.
- Flags are categorised: short-lived experiment flags have different lifecycle rules than long-lived operational flags.
The honest framing for a manager
Feature flags are a useful tool that requires active maintenance to stay useful. If your team adds them without a removal process, you will eventually have a codebase where no one is confident about which code path is actually running in production.
The decision to adopt flags is also a decision to staff the ongoing work of keeping them clean. That is not a large burden. But it is a real one.