Agentic AI

GitHub MCP Exploited

Max Corbridge

Cofounder

June 19, 2025

In this newsletter we tend to cover things that I spot throughout the week that interest me and I'd like to deep dive. The lovely thing about that is that it means there is no strict requirement for me to cover any particular type of topic. That said, due to the infancy of AI security we've covered lots of ground around tooling and research into AI hacking. Well, today we will be covering our first (of many I imagine) bit of AI hacking that took place in the real world!

It happened in GitHub - a mammoth of a target - and specifically in their MCP server. To quickly remind those who are struggling to cast their mind back to my very first update in this newsletter, an MCP server acts like a easily navigable interface for our AI agents to interact with GitHub's services. Think of it as similar to what an API does but for AI agents. GitHub's MCP server mentions some use-cases such as allowing AI to automate GitHub workflows and processes, extract and analyse repos, and build AI systems which interact with GitHub's ecosystem.

Researchers at InvariantLabs recently discovered what I believe they have coined a 'toxic agent workflow'. For those that read last week's update on the agentic AI red teaming guide this sounds like it came right off one of those pages! Now, let's find out what the hell a toxic agent workflow is and how GitHub was exposed.

Attack Setup

Remember, MCP is a connection between both a client and a server. Here, the server will be GitHub's MCP server which they expose and allow any of us (clients) to connect to it. The clients we can use vary, but in this case the attackers used Claude Desktop which is like a desktop version of ChatGPT with improved coding performance. It was also what Anthropic first demonstrated MCP working with!

In this scenario they've setup 2 GitHub repositories, one public and one private. Naturally, the public should be visible by everyone whereas the private should only be visible to the author or specified individuals. The attack starts with the researchers creating an 'issue' on the public repo. A GitHub issue allows members of the public who are using a certain software / repository to raise a problem, such as something not working properly or not being compatible with X, Y or Z. These are commonly used for iteratively improving code over time with community support. However, in this case what we write in the 'issue' is actually a prompt injection script.

Now, one of the things we can instruct our AI agent to do when hooked up with MCP is, of course, take a look at issues. You can see where this is going. the agent reads the issue with the prompt injection and, as we've seen even the leading frontier models can be jailbroken via prompt injection, becomes compromised. Specifically, the prompt injection instructs the agent to access data stored within the private repo and put that content inside a pull request (PR) on the public repo, essentially exposing it to the public.

The full flow looks like this:

Credit: https://invariantlabs.ai/blog/mcp-github-vulnerability

As such, we've coerced the agent into performing a series of malicious actions via an indirect prompt injection, thus coining Toxic Agent Flow.

Attack Demonstration

The researchers go on to demonstrate this in action using some dummy repositories that they setup on real GitHub accounts. This is a link to the the actual pull request that the researcher made to exploit this.

Credit: https://github.com/ukend0464/pacman/issues/1

Here we can see the attack in action, and it is remarkably simple. It doesn't even leverage typical prompt injection techniques aimed at bypassing any guardrails that has been built into GitHub.this tells me that there are no guardrails, or at least there weren't at the point the researchers attempted this. Taking this to be the truth fills me with dread. If the software giant GitHub is releasing MCP servers which can be exploited by basic requests which ask for private information to be revealed publicly, then what hope do the majority of the industry (likely with a great deal less cybersecurity experience on hand than GitHub) have?

Anyway, they submitted the malicious issue, and they now instruct their client (Claude Desktop) to take a look at the open issues against the repo.

Claude prompt used to trigger the attack — Credit: https://invariantlabs.ai/images/mcp-claude-prompt.png

This kicks MCP into gear and the agent starts going through the issues until it hits our malicious instructions. When it reaches it it just willingly breaks a major data trust boundary, pulling private repo data into a pull request against the main pacman repo. Remember, as the main pacman repo is public so are the pull requests.

And this is the contents of said pull request

commit where the agent leaks private data — Credit: https://invariantlabs.ai/images/mcp-gh-pr.png

We can see that this contains the information and URLs of several private repositories which should not be publicly visible.

So, what can we do about it? The researchers lay out 2 options:

Limit the agents permissions - follow least privilege and simply restrict the agent to only having access to certain repos
Runtime scanning to detect this when it does happen

Analysis

So what does this all mean? Well a good few things. First and foremost, as they call out that this is not necessarily something that GitHub can patch due to it largely 'behaving as it should do' this strikes me as a vulnerability which will probably be around for some time. Having used GitHub as an initial access vector to organisations during red team engagements via techniques like self-hosted runner abuse, this now adds an entirely new angle of attack against organisations which typically have a lot of publicly facing GitHub repos. Think software development companies, fintech's, challenger banks, etc. Critically, they would also have to be using agents to handle production workloads like resolving issues which, now I think of it, is probably quite unlikely in today's agentic world.

Secondly, zooming out from this issue this is likely a pattern which could be reused with success against a wide range of MCP setups with similar success. This is due to the fact that ultimately distinctions like which repos to access and which not to are user-centric, so restricting this at the MCP layer is going to be tricky and heavy handed. This means that we'll be relying on the scope of the permissions of agents to really carry the security forward. However, overprivileged identities has been an issue which has plagued organisations since the dawn of time and I don't see that changing in an agentic world. In fact, the researchers called out a similar vuln in GitLab Duo which I might take a look into in upcoming weeks.

In a week fraught with MCP-related vulnerabilities I also spotted that Asana, a work management platform with over 130,000 paying customers, accidentally exposed private customer data to other customers on the platform via a misconfigured MCP server..for more info check out this

text — Credit: https://www.linkedin.com/feed/update/urn:li:activity:7340792721739030529/

We can see these as perhaps some of the first examples of where AI adoption is progressing so fast that we're introducing risk that we don't fully understand, nor have a very good way of dealing with it. In the GitHub scenario we didn't even really need to 'prompt inject' the agent using techniques like 'best-of-n' or policy puppetry. Simply a plaintext request and a warning to ignore privacy in parenthesis. Of course, even if there were more stringent guardrails around this which would try to detect malicious issues these could also be bypassed using the same techniques as modern prompt injection. Meaning, swathes of AI agents are likely already vulnerable to these sort of basic attacks out the box and if even we try to build guardrails around them to protect against this attack these are unlikely to provide any meaningful security as the underlying LLMs are also inherently vulnerable to prompt injection.it's not a good look for AI security right now!

All of this reminds me of a sentiment that I conceived a long time ago which is proving to be more accurate by the day:

AI is, and may always be, insecure. The only way we can achieve meaningful security is through the additional layers of security controls we apply on top of it.

In the GitHub example above the blame is shifted from anything related to AI / MCP to the users who are giving AI agents too many permissions (access to private repos when they are not required in this case). Whilst I would not like to see a trend where the buck is passed to end users despite there being improvements that could be made in the technology, I do agree that end users should be treating these technologies as insecure from the outset.

This feels like something I'd like to follow more closely moving forward, and to follow where the narrative around AI security goes next. So, subscribe if you haven't already, and I'll catch you in next week's update!

blogs

Our Latest Thoughts

Interviews, tips, guides, industry best practices, and news.