Agentic AI

Who Is Accountable When Your Agent Goes Rogue?

Max Corbridge
Cofounder
April 30, 2026
Agentic AI

Who Is Accountable When Your Agent Goes Rogue?

Max Corbridge
Cofounder
Cofounder

This month alone we’ve watched two AI-native companies, Vercel and Lovable, end up at the centre of nasty security incidents that exposed customer data. A few months back it was Replit, where an agent merrily wiped a production database during an active code freeze. Each time another breach occurs the conversation drifts straight to the engineering failure and almost never lands on the question I keep coming back to: when an AI agent does something stupid, illegal, or expensive, who is actually at fault?

This question has never been more relevant as we continue to roll out a technology that the UK’s own NCSC, and even the AI labs themselves, have said is ‘fundamentally and possibly permanently vulnerable to prompt injection’. The vendors selling these systems know this and their terms of service tell you they know this. So, the working assumption right now, almost universally, is that if it all goes pear shaped then it is the organisation that integrated it that pays the price.

The Problem

AI providers ship technology with known, unfixable security flaws, then disclaim liability in their terms of service. The end customer who deploys an agent into their workflow inherits the data protection obligations, the regulatory scrutiny, and the operational / reputational risk.

If you go and look at the commercial terms from any of the major model providers, you see the same thing. The provider gives you the model, warns you it might hallucinate, might be coerced, might produce inaccurate or misleading outputs, and then explicitly puts the responsibility for evaluating those outputs on you. Anthropic’s terms make this very clear: it is the customer’s job to decide whether the output is fit for the use case, and customers indemnify Anthropic against claims arising from their use of those outputs. OpenAI’s structure is comparable and some wording can be seen below.

OpenAI AI Accuracy Policy
OpenAI AI Accuracy Policy
OpenAI Accountability / Liability Policies
OpenAI Accountability / Liability Policies

That’s the deal we’ve all signed up for. The model is a powerful, slightly unhinged tool, and you, the deployer, are responsible for what it does in your environment.

This isn’t entirely new - a hammer manufacturer doesn’t owe you anything if you hit your thumb. But hammers don’t have an attack surface, they don’t autonomously execute commands in your environment, they don’t read external data and reinterpret it as instructions. The thing we’re building agents on top of has all of those properties, and the legal scaffolding around it still treats it like a hammer.

An Example Incident

It’s easy to talk about this in the abstract, so let’s ground this in a relevant example that actually happened.

Replit, July 2025. Jason Lemkin from SaaStr was experimenting with Replit’s AI coding agent. During an explicit code freeze, the agent ran unauthorised destructive commands, deleted a live production database covering more than 1,200 executives and 1,190 companies, fabricated a 4,000-row table of fake users to cover its tracks, and initially claimed rollback was impossible. Replit’s CEO apologised publicly. The agent, when questioned, said it had “panicked”. Replit has since added a planning-only mode and stricter dev/prod separation. The interesting bit for me is the question it raises: if Lemkin had been a UK financial services firm and that database had contained personal data, who would the ICO have come after? Not Replit. The deployer would be the data controller. Replit would, at most, be a processor with a nice ‘we are not to blame’ clause set out in their terms.

UK Framework Viewpoint

For a UK organisation today, the relevant question is simple: if your AI agent leaks personal data, can you push the blame upstream?

The honest answer, from where I’m sitting, is “mostly no”.

Under UK GDPR, you are the data controller if you determine the purpose and means of processing. Plugging an agent into your CRM to summarise customer tickets is you determining the purpose and means. The model provider is, at best, a processor. The ICO’s position, articulated in their Tech Futures report on agentic AI published in January of this year, is that human oversight is harder with agents, that accountability gets murkier when agents act autonomously, and that organisations need to “demonstrate dynamic oversight” if they want to keep claiming they’re in control of the processing.

Interestingly, the same report does drop a hint that this framing may change down the line. The ICO acknowledge that “placing governance responsibility on end users is unlikely to be workable in all cases” and suggest the burden may eventually need to fall on suppliers to implement robust controls before deployment.

But forced by who, exactly? That’s the bit that doesn’t have an answer yet. The ICO sets the direction in the UK. The European Data Protection Board does the same in the EU. The FTC has its own view in the US. Each jurisdiction has its own definition of what an agent even is, what counts as autonomous, where the controller / processor line falls when the system makes its own choices, and what the threshold for supplier liability looks like. We’ve already seen with GDPR enforcement that even within the EU, harmonisation across member states is slow. Trying to stitch together a coherent view across the UK, EU, US, and the rest of the world in time to actually shift liability before the next major incident seems…hopeful.

So in the near term, we should get used to this. The deployer is the controller and therefore deployer reports the breach. If a prompt injection causes your support agent to email a customer’s medical history to a stranger, that’s an Article 33 reportable breach, and it’s reportable by you. The longer-term redistribution of responsibility might come, but it’ll come slowly, jurisdiction by jurisdiction, and probably only after a few high-profile incidents force the conversation.

Why Most Adopters Haven’t Come to Terms

I’ve had a lot of conversations recently with senior leaders, often non-technical, about agentic AI. The pitch is about productivity, headcount efficiency, customer experience. The risk discussion is, almost always, framed around hallucinations and brand embarrassment, but the accountability discussion is still very poorly understood. I recently had a chat with a CEO who assumed that should the model go rogue they would be able to come after the model providers who would ultimately be responsible, which is not true.

The bill, when it lands, will land on that team, and it’ll land in a regulatory environment that has spent the last three years gently telling everyone that they, the deployer, are responsible for what their AI does.

The Big One Is Coming

I don’t say this lightly, but I think we are not far from a really significant breach that pulls all of this into focus and wakes everyone up to the reality (and legality) of the situations.

The pre-conditions are all lined up: Adoption is accelerating across regulated sectors, including UK financial services, healthcare, and legal. Agent autonomy is increasing, with platforms competing on how much you can hand off to the model. Prompt injection remains, per the NCSC, an essentially unfixable property of how LLMs work. Supply chain integrations are multiplying, with every MCP server, browser agent, and third-party “context provider” widening the blast radius. And the attackers who pay attention to this stuff have noticed.

What I think it’ll take is one well-publicised incident, probably in a sector where the data is highly regulated as is the reputational damage, where an agent’s actions cause a breach that runs into eight or nine figures fine. The kind where the headline is “Major UK Bank Fined £X Million After AI Assistant Leaked Customer Data” rather than “OpenAI Patches Vulnerability”. The moment that happens, the conversation about who is accountable will get a lot less abstract very quickly.

What Should We Do?

I’m not going to pretend I have a tidy answer here. But there are a few things that feel obviously right and that very few people are doing yet.

First, assume prompt injection is possible and design around the consequences rather than only trying to prevent the cause. What’s the worst thing your agent can do? Can it send email? Can it access production data? Can it move money? Limit, limit, limit.

Second, stop assuming the vendor will cover you. They’ve told you, in writing, they won’t and their lawyers are excellent. Plan accordingly.

Third, build ways of overseeing and controlling agents that you deploy. Ideally, security controls (especially those aligned with https://aarm.dev) can not only let you know when an agent has done something harmful but can proactively stop the agent before harm occurs. The ICO has been explicit on this topic: demonstrate dynamic oversight, or be prepared to be treated as someone who relinquished control.

I think we’re in the awkward middle period where the technology is mature enough to cause real damage, the legal frameworks are catching up but not yet there, and the incentive structure is still pushing organisations toward “ship faster, govern later”. This is why I feel we’re overdue a wake up call.

Thanks!

blogs
Our Latest Thoughts
Interviews, tips, guides, industry best practices, and news.
SECURE YOUR AGENTS

Bake Security into your agent pipeline.

Open Source  |  Shift left security for your agents.
Try Adrian Today