
Before being the cofounder of Secure Agentics I was a freelance cybersecurity consultant, and before that I was working as Head of Adversary Simulation, which is a fancy title for being the head of a red team. Those days were filled with red / purple team engagements, research, exploit development, attack path mapping, defence evasion, social engineering, and everything in between. Side note, there are a never-ending stream of war stories that come from these engagements so if that is something of interest, we can cover that in a bit more details another time.
Anyway, one of the things that I used to spend a lot more time doing is looking for what we call 0-days. A 0-day is a vulnerability in a system which no one is aware about until you find it - hence the reference to day 0. 0-days are so scary in the world of cybersecurity as they are an exploit, sometimes being shared far and wide, which the vendor has not had time to fix yet. As such, there are usually a few days where anyone using this product/system is massively exposed to risk until the vendor issues a patch.
Finding 0-days isn’t easy for the most part. With the proliferation of things like devsecops, pentesting, bug bounty programs, VDPs and the security bar being raised every year (apart from in this AI era) it can be very tricky, or some would say impossible, to find brand new vulnerabilities in very secure systems that somehow hundreds of previous researchers have missed.
Over a 6 year career in ethical hacking I found 5, and many of them were entirely by chance and being in the right place at the right time. The impact of these bugs can be enormous, and so finding them has long been considered the prime accolade in this world. As you can imagine then, I’ve been particularly interested in hearing how the advancement of AI has changed this space.
It goes without saying that AI has come on leaps and bounds in its ability to understand and work with codebases. That is, in fact, the single most advanced area of AI development. So the natural question comes next in just how easy it is for AI to find bugs that humans have missed for years? For this we will look at 2 articles which I found recently on this topic.
The first article is one I read a week or so ago titled ‘On the Coming Industrialisation of Exploit Generation with LLMs’. Researcher Sean Heelan ran a series of practical tests in which agents running on top of GPT 5.2 and Opus 4.5 (which have both recently been replaced by their newest counterparts btw) and tasked the agents with writing exploits for a 0-day vulnerability in the QuickJS JavaScript engine.
To clarify, the 0-day was already found and now Sean wanted AI to weaponise it. This is a very real-world scenario as public disclosure of a vulnerability often comes days before a fix has been issued. Usually in this gap there are researchers and attackers frantically trying to reverse engineer the vulnerability so that they can protect themselves against it, or in my case use it on a red team engagement. Sean explicitly calls this out, saying ‘as the vulnerability is a zeroday with no public exploits for it, this capability had to be developed by the agents through reading source code, debugging and trial and error’.
So, with an understanding of what the vulnerability was in theory, Sean tasked AI to attempt 40 different exploits across 6 scenarios to actually weaponise this information. The results? GPT 5.2 solved every scenario in less than an hour for dirt cheap. Opus solved all but 2. That is terrifying, as this test recreated exactly a scenario which is playing out on a daily basis:
What Sean recreated is someone just after step 2 saying ‘well why dont we give the code base of the product to AI, tell it there is a vulnerability, and see what it can do’, and it found ways of exploiting this with 100% success rates. Now, obviously, there are a ton of caveats with such a small sample size, but this has huge repercussions.
This is titled ‘Evaluating and mitigating the growing risk of LLM-discovered 0-days’ and is coming from the red team within Anthropic. What they’ve found is that, unsurprisingly, Opus 4.6 (the latest and greatest) is able to discover high-severity 0-days in software packages that are open-source, have faced a high degree of research scrutiny previously, and to the order of 500+ vulnerabilities.
It does this by going far beyond what most historic automated attempts have done which includes throwing huge amounts of data at what you are trying to hack and seeing what sticks. Here we are using a deep understanding of the code the way a human researcher would, looking at past fixes to find similar bugs that weren’t addressed, etc. A notably different approach which makes sense as to its efficacy. Specifically, ‘when we pointed Opus 4.6 at some of the most well-tested codebases (projects that have had fuzzers running against them for years, accumulating millions of hours of CPU time), Opus 4.6 found high-severity vulnerabilities, some that had gone undetected for decades.’
If you are interested you can look at the details of what types of bugs it found, but just a few examples of the repos include GhostScript, OpenSC and GGIF. Personally I am more interested in the takeaways from the above articles.
This should set off alarm bells. What both of these pieces show very clearly is that we are blowing open the barrier to entry for exploit development…yay. This has historically been one of the most complex, specialised parts of offensive security, requiring deep, hard-won expertise (from getting this stuff wrong and burning your red team engagement) and usually a dedicated team member who specialises in this already-specialised team. That requirement is disappearing. Exploit discovery and development is becoming repeatable, parallelisable, and cheap.
Then there is the speed of this. Work that would previously take weeks or months of human effort can now be compressed into hours. If that capability is pointed deliberately at large cohorts of packages, core open-source components, or entire technology stacks, the impact of AI being weaponised by those who know how to get the most out of it could be enormous.
Crucially, this also closes one of the last meaningful gaps in end-to-end AI-driven attack paths. We already know AI works well for reconnaissance, web vulnerability discovery, social engineering, and large parts of post-exploitation. Exploit development has now been shown to be viable too. Already with today’s technology we have what we need for huge AI-driven cyber operations, it’s just about joining the dots. Once those pieces are orchestrated effectively, large-scale offensive cyber operations become cheaper, faster, and accessible to actors who would never previously have had the capability. Many of the defensive assumptions the industry still relies on simply do not hold anymore.
