Why ML teams keep rebuilding the same data pipelines
The hidden cost of ML at scale is not the models - it is the duplicated feature engineering work every team does independently. What a feature store is and whether you actually need one.
When a company runs more than one or two machine learning models in production, a pattern emerges that nobody planned for. Each team that builds a model also builds a pipeline to compute the inputs - the features - that the model needs. A churn model computes days since last purchase. A pricing model computes the same thing. So does a recommendations model. Three teams, three pipelines, three maintenance obligations, three places where a bug in the underlying data produces three different wrong answers.
This is not an edge case. It is one of the most consistent patterns I see in companies that are past the pilot phase and running ML at any real scale.
Why it happens
Feature engineering - transforming raw data into the inputs a model uses - is where most of the actual work in ML lives. It is also the part that looks unique to each project and therefore easy to justify rebuilding.
In practice, many features are shared across models. Customer behaviour signals - recency, frequency, average order value - appear in almost every customer-facing model. Product availability and pricing signals appear in both demand forecasting and pricing models. Shared signals built independently will eventually diverge because the upstream data changes and teams fix bugs on their own schedules.
The cost of divergence
The most obvious cost is maintenance. Three pipelines for the same feature means three places to update when the source schema changes.
The less obvious cost is inconsistency between training and serving. A model is trained on historical features computed one way. In production, the same feature is computed by a different team's pipeline. The values are similar but not identical. The model performs worse than it did in evaluation, and no one can explain why.
This is called training-serving skew and it is one of the harder problems to diagnose in production ML because it does not produce errors - it produces quietly degraded predictions.
What a feature store is
A feature store is a system that separates feature computation from model training and serving. Teams define features once in a central registry. The computation runs on a schedule and stores results in two places: a historical store for training, and a low-latency store for serving.
Models from different teams can use the same features. When a feature definition is updated, every model using it picks up the change. Training and serving read from the same store.
This is not a new idea - some large technology companies built internal versions of this years ago. In 2017 the ecosystem of open tools is limited, but the concept can be implemented with straightforward infrastructure: a feature computation layer, a table in a data warehouse for historical lookups, and a key-value store for online serving.
When you actually need it
A feature store is justified when you have multiple teams building models independently, when the same features appear in more than one model, and when you have training-serving skew incidents that you cannot explain.
It is not justified when you have one or two models, a single data team, and relatively stable features. The overhead of operating a shared registry is real, and it adds coordination cost that a small team does not need.
A practical minimum
If you are not ready for a full feature store, a simpler version of the discipline is still useful: maintain a central notebook or document listing the canonical definition of shared features - how each is computed, from which source, at what grain. When two teams are about to compute the same feature independently, point them to the shared definition and make one team's pipeline the source.
This does not eliminate all the problems, but it catches the most expensive ones: divergent definitions and undiscovered duplication. The formal tooling can come later, once you know which features are actually shared and how much operational pain the current approach is causing.