m@ksim.pro
Back to all posts
AI 3 min read

ML in fraud detection: where AI saves money and where it only complicates the investigation

A look at the decision loop and the explainability problem in machine-learning-based anti-fraud systems.

Anti-fraud systems are one of the earliest and most mature application areas for machine learning in business. Banks and payment systems have been using statistical models to detect fraud for years. You might think this problem is largely solved.

In practice I see a different picture. Companies roll out ML models and get a real effect - the number of missed fraudulent transactions drops. But alongside that, new problems appear that were invisible before the rollout.

Where the model genuinely helps

Machine learning in fraud works where a human analyst physically cannot keep up. Real-time transaction streams, thousands of operations per minute, multi-dimensional behavioural patterns - that is a job for a model, not a person.

A well-trained model catches things a human would miss: an unusual transaction time, an atypical geolocation, an abnormal sequence of actions before the transaction. A combination of weak signals, each individually innocent, that together produce a high risk score.

That is real value. Speed and coverage that manual analysis cannot match.

Where problems start

The problem appears not in detection, but in the next step: what to do with the result.

When a model flags a transaction as suspicious, someone must make a decision - block it, pass it, request confirmation. In an automated loop the system decides. In a manual loop an analyst decides. In a hybrid loop an analyst decides using the model score as input.

This is where the first question appears: can the analyst understand why the model assigned that particular score? Most current ML models - ensembles of decision trees or neural networks - are black boxes. They give an answer but do not explain the reasoning.

The explainability problem in investigations

When a fraud case reaches an investigation - internal or involving law enforcement - explainability becomes critical. "The model assigned a score of 0.94" is not evidence and is not an explanation. It is a number.

An analyst who cannot explain the model's decision ends up in an uncomfortable position. They either make a decision based on a number they do not understand, or they effectively ignore the model and reach their own conclusion independently.

In either case the value of ML diminishes - either because decisions are accepted uncritically, or because the model has become decoration rather than a tool.

The decision loop matters more than model accuracy

I have noticed that companies often optimise model metrics - precision, recall, AUC - and spend much less time thinking about how the model fits into the actual decision-making process.

The key questions for the decision loop:

  • Who makes the final decision - the system or a person, and at what score threshold?
  • How can an analyst challenge a model decision?
  • How are decisions and their rationale recorded?
  • How is decision quality measured - not model metrics, but outcomes on actual cases?
  • How does the model retrain on new fraud patterns?

If there are no clear answers to these questions, the quality of the model itself is secondary.

A practical test

Take the last ten cases where the system flagged a transaction and a human made the call. Ask the analyst to explain each decision. If the answer comes down to "well, the score was high" - the decision loop needs rebuilding, regardless of model accuracy.

ML in fraud is a mature technology. But the maturity of the technology does not mean the maturity of the process and infrastructure around payment systems.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp