m@ksim.pro
Back to all posts
Security 4 min read

When AI exposes the debt sitting in your codebase

The first numbers from Anthropic's Glasswing project are not a story about a smart model. They are a story about how much old vulnerability lives in code we use every day.

Anthropic has published the first results from Glasswing - a program in which their Claude Mythos model looks for vulnerabilities in partner code. In six weeks of work with about fifty companies, the model surfaced more than ten thousand critical bugs. Cloudflare received two thousand findings, of which four hundred were rated high or critical. Mozilla fixed two hundred and seventy-one vulnerabilities in Firefox 150. Anthropic themselves scanned more than a thousand open-source projects and got twenty-three thousand potential issues, of which roughly ninety percent were confirmed as real after manual review.

Headlines of the form "AI found X bugs" are technically correct, but they hide what I think is the more important observation.

These bugs were already there

The model did not create these vulnerabilities. It saw them. The wolfSSL code in which Mythos found a critical certificate-forgery flaw has been around for years. Traffic from banks, IoT devices and corporate VPNs flows through it. The vulnerability lived quietly in production all that time, until someone finally looked at it from the right angle.

We saw similar logic in the Heartbleed story: the OpenSSL flaw existed for years, but it took focused researcher attention to surface it. Glasswing suggests that such dormant problems in the industry do not number in the dozens or hundreds - they number in the thousands inside every large project.

That is the real news. Not "AI got better at finding bugs", but "the entire stack we run on is carrying a large reserve of unread technical debt". And that debt is starting to become visible.

The bottleneck has shifted

Anthropic report that detection speed has roughly improved by an order of magnitude. But open-source maintainers are now asking them to delay disclosure because they cannot ship patches fast enough. That is the interesting part. The bottleneck used to be finding problems. Now it is closing them.

For a business leader this means something simple. When tools like this become widely available - and Anthropic say they are working on it - you will not get "a tool that finds all your bugs". You will get a long report that someone has to read, prioritise and resolve. A team without an established vulnerability-handling process will end up with a queue it cannot drain.

What this means for the business

Accumulated debt in code is not an abstraction. It is real money and real risk, and we have already discussed how to translate it into business decisions. Until now, most of that debt was invisible and therefore absent from management conversations. That is changing.

For owners and executives this means three things.

First, prepare for the wave. If your business depends on open-source components, expect a steady flow of new CVEs over the next year in libraries that have been quietly working for years. That does not mean they have become worse. It means someone has finally looked at them carefully. The question "do we know which of these components are in our production" becomes a practical one.

Second, take an honest look at your own code. If third-party libraries have been accumulating debt for years, your own code most likely has too. Access to these tools inside the company will show a picture that few want to see. Better to see it before someone outside the company does.

Third, rebuild the response process. A tenfold increase in detection speed demands a tenfold readiness to patch. That is not only developers. It is prioritisation, regression testing, the release cycle, customer communication during downtime. The list of questions worth asking after any major disclosure is becoming a regular operating practice rather than a reaction to a single incident.

Questions to bring to the team

A few questions worth raising with IT and security teams before the next such report lands as a CVE:

  1. What third-party components and libraries are currently in our production, and who maintains that list?
  2. How much time passes between an upstream patch and its rollout in our environment?
  3. If a report with a hundred vulnerabilities arrived tomorrow, who would prioritise them, and on what criteria?
  4. Which of our own modules have not been reviewed for the longest time - and which of them handle sensitive data?
  5. What part of our release process limits the speed of security patches most?

It used to be possible to say "we have no known issues". That will stop being a defensible answer. Known issues will be common to everyone. The difference between companies will be in who knows how to work with them.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp