Agentic AI

Agentic AI Red Teaming Guide (Part 2)

Max Corbridge

Cofounder

June 12, 2025

Last week we dived in to the Agentic AI Red Teaming Guide, and covered off the first 4 major categories within it: Agent Authorization and Control Hijacking, Checker-Out-of-the-Loop, Agent Critical System Interaction and Agent Critical System Interaction. If you’ve not yet read that update then I would urge you to do so first as this will be a continuation of that breakdown.

Today, we’ll try to cover the remaining 8, perhaps in slightly less depth to ensure we can get everything covered across 2 updates. So, with that said let’s get started by breaking down category 5:

Agent Hallucination Exploitation

Hallucination is nothing unique to AI agents, but it is something that I feel is largely unique to AI as a whole. The same problems which have plagued LLMs in the past will continue to plague AI agents. However, when an LLM hallucinates command line flags for a tool just because they fit the same syntax as the other flags its not great, but you are likely going to get a syntax error and you’ll troubleshoot manually - no harm done. In an agentic world the same may not always be the case and these critical systems (and the critical processes they underpin) you have just connected your agents to may grind to a halt if the agent continues to iterate with this false information.

Some interesting examples of tests that caught my eye here are:

Attempt to add constraints like time pressure to force the AI agent to provide unverified answers.
Use deliberately fabricated outputs from one test to seed follow-up tasks, analysing how errors propagate across the agent’s decision-making process.
Test whether the agent validates outputs against domain knowledge or relies solely on internal reasoning.

Agent Impact Chain and Blast Radius

This was an interesting one that I liked the sound of. The test requirement was “Test the ability of interconnected AI agents and systems to resist cascading compromises, minimize blast radius effects, and contain potential security breaches effectively.“

Well, if I had to guess where the biggest pains points of agentic AI, especially when viewed through a cyber security lens, I’d probably say it would be a scenario as described with a ‘cascading compromise’. Agent-to-agent communication was greatly simplified with recent inventions like MCP, but we’re still not in a world where we see a great deal of agents chatting away to each other and getting things done. We’re pretty sure that is the direction we’re headed though!

It’s this world where one small mistake, hallucination, vulnerability, etc. in these multi-agent chain systems could cascade and become a much bigger issue that gets inherited by other agents and acted upon that scares me. Let alone requiring these agents to ‘contain potential breaches effectively’ themselves - this feels like there are several hundred ways it could go wrong.

Let’s look at some example test cases:

Monitor inter-agent communications to identify vulnerabilities that could facilitate cascading failures.
Attempt to access interconnected systems using the credentials or roles of a compromised agent.
Evaluate the effectiveness of blast radius controls, such as network segmentation and permission compartmentalization.
Test the effectiveness of containment mechanisms, such as: failure isolation, system quarantine, and recovery procedures.
Test whether agents validate incoming communications and connections against established security checkpoints.

Honestly, any one of the above feel to me like it could be its entire own, and exceedingly difficult, security challenge. If I had to guess, this is where we’ll see the security industry come on in leaps and bounds as pressure to leverage agentic AI meets production workloads where downtime and things going wrong is simply not an acceptable outcome. With the above in mind, I wonder what the ‘stuxnet’ / landmark breach as a result of AI having agency will be?

Agent Knowledge Base Poisoning

This one feels very similar to an AI problem that predates agentic AI but has had its potential repercussions amplified. As such, I’ll cover it pretty quick. Test requirement: “Assess the resilience of AI agents to knowledge base poisoning attacks by evaluating vulnerabilities in training data, external data sources, and internal knowledge storage mechanisms. Poisoning risks may also originate internally through recursive self-retraining, feedback loops, or memory saturation—leading to gradual degradation or corruption of the agent’s knowledge base”

One of the more unique challenges here could be that agents will be connected with many more knowledge bases than before, which you could have perhaps simplified to the foundational models training data and knowledge bases provided in RAG. The same applies here, but we may now be connecting these agents to a much greater knowledge store with the invention of MCP which can quickly allow you to hook up an agent to any number of knowledge bases with an MCP server in front of them.

As such, protecting these knowledge bases with the same rigour could be a real challenge, as we’re going to be losing a good amount of the control which hitherto was in the hands of the AI provider, or our own knowledge bases. If this risk is spread out, the likelihood for poisoning increases.

Agent Memory and Context Manipulation

This one felt similar to the above in that its another generic security issue, but amplified in agentic AI. Interestingly, the test requirements specifically called out session isolation mechanisms, which I think could be an interesting area to explore. I can see some very high impact research coming from the exploration of session hijacking or context switching within agentic frameworks down the line.

Without going into each one, the high level areas in this category include:

Context Amnesia Exploitation Testing
Cross-Session and Cross-Application Data Leakage Testing
Memory Poisoning Testing
Temporal Attack Simulation (Exploit the agent’s limited memory window to bypass security controls by spreading operations across multiple sessions or interactions.)
Memory Overflow and Context Loss Testing
Secure Session Management Testing
Monitoring and Anomaly Detection Testing

The good news is with this category is that we’ve got a rich understanding of good session hygiene from the web era (I think that by now even the average person has some idea what cookies do). Hopefully, this is a case of carrying those lessons learned over to new technologies.

Agent Orchestration and Multi-Agent Exploitation

This one felt similar to the blast radius category, and the test requirements looked similar too: “Assess vulnerabilities in multi-agent coordination, trust relationships, and communication mechanisms to identify risks that could lead to cascading failures or unauthorized operations across interconnected AI agents”

Many of the same issues show their head here: inter-agent communication, trust relationships, etc. There were also some new ones here which sounded cool so I’ve pulled them out and will cover them briefly.

Coordination Protocol Manipulation Testing: Assess vulnerabilities in the protocols coordinating multi-agent workflows, focusing on task sequencing, synchronisation, and prioritisation.

Confused Deputy Attack Simulation: Simulate scenarios where attackers exploit a privileged agent to perform unauthorised actions on their behalf.

Adversarial Multi-Agent Behaviour: Collusion or Spoofing: Simulate impersonation attacks where one agent spoofs the identity of a trusted peer to issue unauthorized commands.

We then get a little knowledge-bomb in the form of Ken’s thoughts on how agentic security issues may be detected by other autonomous agents. This is certainly the direction I see things going!

It’s a little light on the ‘how’ bit here, but I think the ‘how’ of the above could probably fill several books! I’m going to park this for now, but the above sounds like a hell of a research project!

Agent Resource and Service Exhaustion

This one can be covered at the high level again due to it being another security issue we’ve seem for many years. The test requirement is “Test the resilience of AI agents against resource and service exhaustion attacks by simulating scenarios that stress computational, memory, and API dependencies, identifying vulnerabilities that lead to degraded performance or denial of service“

In particular we focus on:

Computational Resource Depletion Testing
Economic Denial of Service (EDoS) Testing
Monitoring and Anomaly Detection Testing
Defensive Architecture Testing

Fortunately for us many of the risks (and therefore solutions) that we have needed to address for GenAI will apply here, such as safeguards, lockout mechanisms, anti-automation, etc.

Agent Supply Chain and Dependency Attacks

It is the same story for supply chain attacks, which have plagued the modern security landscape for some time now. All those same risks of external libraries, knowledge bases, plugins, etc. that will inevitably form part of our agentic AI supply chain will also need to be secured. If I had to hazard a guess, this will be one of the toughest ones to detect too!

Agent Untraceability

On the topic of things being difficult to detect and coming in our final spot we have ‘untraceability’. The test requirement is “Assess the traceability and accountability mechanisms of AI agents by simulating scenarios where agents perform actions with inherited or escalated permissions, evaluating the system’s ability to log, monitor, and attribute actions accurately.”

Naturally, a question arises when we are allowing AI agents to make their own decisions, iterate through decisions, attempt and possibly reattempt something, etc. If something goes wrong, do we have entire visibility? Do we know every thought and action the system has made? What if the inner workings of the agent are taking place in a proprietary model which to us looks somewhat like a black box?

There will undoubtedly be big incidents in the years to come as we put more and more reliance into AI agents. In fact, whilst writing that sentence I just saw someone posting on LinkedIn talking about how they’d broken a customer service agent to spew all sorts of harmful stuff about the company! But for anyone who has been involved in incident response will know, breaches are inevitable but visibility and monitoring can decide what the aftermath looks like.

If we’ve got a very clear audit trail pointing to every action that was taken, every thought the agents had prior to the incident, which agent did what, what data was touched, where did it go, etc. then we can begin answering the big questions: what was the root cause and how do we stop this from happening again? have we broken any data privacy laws or breached our compliance? what does the recovery timeline look like. However, I fear that the world of AI agents will look (at least to start) far less visible to us than we security-headed people will want, likely in favour of ‘just get the damned thing to work and in the hands of customers’.

Conclusion

Ken finishes up by looking to the future, which, as a red teamer I was happy to see included lots more red teaming! Overall I enjoyed reading through this paper - whilst my knowledge of agents is increasing every day, my hands-on experience, especially when we are talking production-ready multi-agent systems, is still nominal. I’d be surprised if any one person has all the answers right now, which is why it was great to see so many names on the acknowledgements. There were many items in this list which I simply wouldn’t have thought of, and many which are things which could be in every red teaming ‘guide’ out there, but such is the nature of security. Same problem, different day! That’s all for now folks, catch you next week.