Gradient boosting: the machine learning that already works in production
Why ensemble methods - random forests and gradient boosting - became the first real ML for business, and how a manager should think about them.
In conversations about machine learning, neural networks and deep learning tend to dominate. That is understandable - image recognition and language translation are compelling stories. But if you look at what actually works inside companies right now, in 2015, the picture is different.
Most tasks that businesses are solving with ML today - predicting customer churn, credit scoring, lead prioritisation, demand forecasting - are solved by ensemble methods: random forests and gradient boosting. These, not neural networks, are the workhorse of practical machine learning.
Why these methods work
Ensemble methods build many simple models and combine their results. A random forest builds many decision trees on random subsets of data and averages their answers. Gradient boosting builds trees sequentially, with each one correcting the errors of the previous.
For business data this works well for several reasons. These methods are robust to outliers and missing values. They do not require feature normalisation or special preprocessing. They generalise well even on relatively small datasets. Their outputs can be inspected and explained - important for decisions that need to be justified.
Neural networks are more powerful - but they require far more data, compute, and tuning time. For most business problems in 2015, ensemble methods deliver 90% of the result at a fraction of the cost.
Which tasks fit
There are several task classes where gradient boosting already delivers measurable results today.
Event prediction - customer churn, loan default, purchase probability. The formulation is straightforward: we have historical data about customers and what happened to them. We want to predict what will happen with a new customer.
Ranking - prioritising leads for a sales team, sorting applications by probability of successful close. Here the model learns to order objects rather than simply classify them.
Numeric forecasting - demand for the next period, cost estimation, load prediction. This is a regression task, and ensemble methods work reliably here too.
What stands between the idea and a working model
The most common misconception is that an ML project starts with choosing an algorithm. In practice, the algorithm takes a small fraction of the time. Most time goes elsewhere.
Data collection and labelling. You need a historical set of examples with known outcomes. If this data does not exist or was not stored in a usable form - that is the first task.
Feature engineering. The algorithm works with numbers. Translating business data into numeric features, deciding what matters, removing noise - this is separate work that often requires understanding business context deeply.
Evaluation and validation. How good is the model? How do we measure that? What is the cost of each error type in this specific context - a false alarm or a missed event? These questions need to be decided before training begins.
Integration into the process. A model that works in an analyst's notebook is not yet working in the business. You need to decide how it receives data, how it delivers results, and who is monitoring its quality three months later.
Questions to check readiness
When a company is considering an ML project for a specific task, I recommend answering a few questions first:
- Is there historical data with known outcomes - at least a year's worth?
- Who will use the model's output and how - what will change in that person's daily work?
- How will we know the model is performing well? Which quality metric matches the business goal?
- What happens when the model is wrong? How critical is each type of error?
- Who owns the model in six months - who monitors its quality and retrains it when needed?
If there are answers to these questions, a conversation about choosing a method makes sense. If not - start with those questions.