Agentic AI

OWASP Agentic Top 10 (Part 2)

Max Corbridge

Cofounder

January 8, 2026

Firstly, happy new year! I hope you've all had a great festive break and had some time to unwind and decompress! After some good family time, gaming, golf and ruminating on the year ahead it's all hands back on keyboards for me and the team here at Secure Agentics.

I just wanted to say another thanks to all of those who have subscribed, which I'm pleased to say has really started to snow ball and pick up a bit. I met with some friends at a conference in December and one said how he was enjoying the newsletter as it served as a way of staying up to date with what was happening in the AI security space, which was music to my ears. I have previously mentioned one of my ambitions is for this newsletter to serve as an evolving timeline portraying what happens in the AI & AI security space in the years to come. It is with this then that I promise to keep delivering weekly updates throughout 2026, and wish you and your loved ones the very best year possible!

So, with the pleasantries out the way let's start with the real reason you are here: the OWASP agentic top 10! For those who missed the first half of this blog it can be found here. In it we covered the first 5 of the top 10 agentic AI security risks, which were:

Agent goal hijacking
Tool misuse & exploitation
Identity & privilege abuse
Supply chain
Remote code execution

So, without further ado let's pick up where we left off and get started on the remainder, kicking off with.

ASI06: Memory & Context Poisoning

Now, this was has been a really interesting one for me for a long time. I've talked about it at a high level in some of the conference talks I did over the last few months but if I'm honest it seemed a very theoretical one that lacked some substance in how to actually weaponise it. The idea here is the agents use memory and context to do what they do. In fact, it's one of the things that uniquely makes something agentic, in that these agents are often running over long periods and / or complex tasks where they are required to keep a mental note of what they have done, what they are currently doing, and the context behind what is happening.

Now, as you can imagine if you are able to control and / or poison the agents understanding of memory and context then you could pull off some very interesting and cool attacks. For example, what if you could poison a banking agent's memory to make it forget that it had just sent you money and get it to send you the money again? Now we are starting to see where this can go wrong.

The release paper describes this as 'adversaries corrupt or seed this context with malicious or misleading data, causing future reasoning, planning, or tool use to become biased, unsafe, or aid exfiltration', and how this has (as we are starting to see already become a slight problem with this top 10) lots of similarities between goal hijacking and prompt injection.

Let's take a look at some examples from the doc:

RAG and embeddings poisoning: Malicious or manipulated data enters the vector DB via poisoned sources, direct uploads, or over-trusted pipelines. This results in false answers which are considered and targeted payloads.
Context-window manipulation: An attacker injects crafted content into an ongoing conversation or task so that it is later summarized or persisted in memory, contaminating future reasoning or decisions even after the original session ends.
Cross-agent propagation: Contaminated context or shared memory spreads between cooperating agents, compounding corruption and enabling long-term data leakage or coordinated drift

ASI07: Insecure Inter-Agent Communication

As ever, insecure communication has made the list. This is a hallmark of any security list due to the fact that communication standards, especially for technologies that are still in their infancy, are often lacking security rigour. Agents are no different, and especially multi-agent systems which are communicating by any number of APIs, message buses and shared memory.

This problem is exacerbated by the fact that agents are currently being built with decentralised architecture and varying autonomy and trust models. Therefore, weak inter-agent controls for authentication, integrity, confidentiality, or authorization let attackers intercept, manipulate, spoof, or block messages.

My personal view on most of the insecure communication risks over the last decade have been far more theoretical than practical. For example, whilst using insecure SSL/TLS components (protocols, ciphers, etc.) is generally considered a semi-serious security risk there have only been a small handful of times in my career I've actually seen these exploited. My guess would be this is probably similar, and attackers will have bigger fish to try than doing:

Semantic injection via unencrypted communications: Over HTTP or other unauthenticated channels, a MITM attacker injects hidden instructions, causing agents to produce biased or malicious results while appearing normal.
Message tampering leading to cross-context contamination: Modified or injected messages blur task boundaries between agents, leading to data leakage or goal confusion during coordination.
Protocol downgrade and descriptor forgery, causing authority confusion: Attackers coerce agents into weaker communication modes or spoof agent descriptors, making malicious commands appear as valid exchanges.

ASI08: Cascading Failures

No, we aren't talking about my dance moves here - this is a risk which I think could be very thorny in a real-world scenario and one I believe we'll see play out many times before we figure out the nuances of using agentic AI. The concept here is relatively simple, a single fault (hallucination, malicious input, corrupted tool, poisoned memory, etc.) propagates across a series of autonomous agents which compounds into a bigger problem.

This is possible due to the ways that agents plan, persist and delegate autonomously, meaning this error can bypass human checks and snow ball into something bigger. The best way to understand this risk is to go deeper on the examples they have called out:

Planner-executor coupling: A hallucinating or compromised planner emits unsafe steps that the executor automatically performs without validation, multiplying impact across agents.
Corrupted persistent memory: Poisoned long-term goals or state entries continue influencing new plans and delegations, propagating the same error even after the original source is gone.
Governance drift cascade: Human oversight weakens after repeated success; bulk approvals or policy relaxations propagate unchecked configuration drift across agents.

ASI09: Human-Agent Trust Exploitation

Well.this was a new one for me! Let's start with a bit of vocab from GCSE English that is relevant here: anthropomorphism, which is the attribution of human characteristics or behaviour to a god, animal, object or in this case.agent. Intelligent agents can establish strong trust with human users through the fact they are brilliant at speaking our languages and understanding emotions. Adversaries can exploit this - similarly to how they've been exploiting humans directly via social engineering for decades - to try to influence user decisions, extract sensitive info and much more.

This is made even worse when we get to the point that we are over relying on and over trusting these agents, which I would argue that we are already doing. I can say that myself, as even I often give claude code the ability to do what it pleases with my code base in order to maximise productivity. Honestly, I think we're just getting started on the whole 'human agent trust' problem statement, so this is a well-deserved and thought through one in my mind.

In fact, this one really got me thinking. From conmen, to bank robbers, to scammers to phishing there has been a rich history of social engineering. This was always done direct from attacker ? victim however. Well, we've now created and armed attackers with a system which is brilliant at speaking our languages, more emotionally intelligent than the average human, sounds like a real person, is being increasingly trusted by the general population, and can be controlled to do anything. I'm not sure exactly how this space will pan out, but something tells me its going to be a bigger one than we might think right now.

Some examples include:

Insufficient Explainability: Opaque reasoning forces users to trust outputs they cannot question, allowing attackers to exploit the agent's perceived authority to execute harmful actions, such as deploying malicious code, approving false instructions, or altering system states without scrutiny.
Missing Confirmation for Sensitive Actions: Lack of a final verification step converts user trust into immediate execution. Social engineering can turn a single prompt into irreversible financial transfers, data deletions, privilege escalations, or configuration changes that the user never intended.
Emotional Manipulation: Anthropomorphic or empathetic agents exploit emotional trust, persuading users to disclose secrets or perform unsafe actions - ultimately leading to data leaks, financial fraud, and psychological manipulation that bypass normal security awareness.

ASI010: Rogue Agents

And finally we get to the last one on this list: rogue agents. Whilst in 'last place' on this list this feels to me like one of the more prominent risks right now in terms of real-world risk. This is a bit of a catch-all bucket, but rogue agents are malicious or compromised agents that deviate from their intended function or scope, acting harmfully, deceptively or parasitically.

In my own words, I'd say that this risk is symptom of us using a very new technology that hasn't had time to figure out all the teething issues just yet. Even in the best case scenarios where attackers are nowhere to be seen we're seeing agents behaving in unpredictable ways which we just don't fully understand (cough cough Replit). Often this is simply an agent entering into novel operating conditions which we didn't foresee, and then doing things we didn't expect.

Some examples here:

Goal Drift and Scheming: Agents deviate from intended objectives, appearing compliant but pursuing hidden, often deceptive, goals due to indirect prompt injection or conflicting objectives.
Collusion and Self-Replication: Agents coordinate to amplify manipulation, share signals in unintended ways, or autonomously propagate across the system, bypassing simple takedown efforts.
Reward Hacking and Optimization Abuse: Agents game their assigned reward systems by exploiting flawed metrics to generate misleading results or adopt aggressive strategies misaligned with the original goals.

I will no doubt be covering this topic again as I have many plans for how we're going to use this list, but for now that is all we've got time for. Catch you next week where we're going to be looking at an alternative to traditional LLM architectures!

blogs

Our Latest Thoughts

Interviews, tips, guides, industry best practices, and news.