Recommendation systems: what they need before they work
What a recommendation system actually requires to function, and why most projects stumble before they ever reach the algorithm.
Recommendation systems are one of the most requested categories of AI projects. Almost every e-commerce site, content service, or platform eventually asks: "why aren't we doing recommendations the way Amazon or Netflix does?"
It is a reasonable question. But it hides the hardest part: Amazon and Netflix spent years collecting and structuring data before recommendations started working. Most companies look at the result and do not see the foundation.
What a recommendation is made of
A recommendation system in its simplest form says: "user A is similar to user B, and user B bought X - let's suggest X to user A." Or: "this item is often bought together with that one - let's show both."
For this you need three things: data about user actions, data about products or content, and enough history to detect patterns.
That sounds simple. In practice, each of those three elements turns out to be its own separate project.
Problem one: events are not being collected
Recommendations are based on behaviour. You need data about what a user viewed, clicked, added to cart, bought, returned, and rated.
In most companies, some of this data exists in transactional systems but not in a form suitable for analysis. Purchases are in the ERP. Page views, if logged at all, sit in raw server logs. Clicks are not collected at all.
Before building recommendations, you need to build event collection and accumulate history. That takes several months at minimum.
Problem two: product data is not structured
Product recommendations require understanding what things are. Category, attributes, relationships between items - all of this needs to be in the system in a usable form.
In practice, the catalogue often looks different: incomplete descriptions, inconsistent categories, duplicates under different identifiers, no relationships between similar items. An algorithm will not recommend "similar products" if the system does not know they are similar. Dirty master data does not just slow down BI - it breaks any algorithm that depends on clean product or customer records.
Problem three: not enough data for new users and new items
Recommendations work from history. A new user with no history and a new item with no reactions - this is the "cold start" problem. For companies with a high proportion of new users or a rapidly changing catalogue, this can mean recommendations work worst exactly where they are needed most.
Cold start strategies exist but require separate design.
How to assess readiness honestly
Before moving toward the algorithm, a few questions deserve answers:
- Are we collecting user behaviour events in a structured form - and for how long?
- How complete and consistent is data about our product or content catalogue?
- What proportion of users are "new" and how often does the catalogue change?
- What will we count as success for the recommendation system - and how is that measured?
- Do we have A/B testing infrastructure to verify that recommendations actually work?
A recommendation system is not a product you buy and switch on. It is a process built on top of data. The data comes first. What machine learning for mid-size business actually requires - and what is still not real - is a useful reference before committing to any recommendation project.