
Every now and again I get reminded that it isn't just me that has become fully consumed by AI security research. Most of my feeds are still dominated by more traditional security news and content like EDR bypasses, new evasion techniques, web tricks and Active Directory discoveries. However, when a video from Network Chuck, a YouTube channel I used to watch way back when I was just getting in to pentesting, surfaces with a click bait-y title about hacking AI I knew I had to check it out, but with admittedly pretty low hopes (sorry Chuck).
After a quick introduction, Chuck introduces the 'worlds top AI hacker', Jason Haddix. Now, I am going to be completely honest here, I actually cringed a bit when I heard that. I've long known and respected Jason Haddix - he was in fact one of the first content creators I started following in the hacking space years ago and his reputation precedes him - but I clearly hadn't been following him closely enough in recent times and was not aware he'd pivoted to AI security.
Well, I was wrong. What came next was a fantastic analysis of AI pentesting, complete with examples from real-world engagements, techniques I wasn't familiar with, and Jason's own familiarity with technologies I've not yet tried to break myself such as MCP. I strapped in for what I knew was going to be a video I lapped up.
The very first thing that Jason says is 'it feels like the early days of web hacking, where SQL injection was everywhere'. Honestly, I could not agree more. I have been successful in prompt injecting every AI system that I've tested to date. That isn't a flex, that is the current state of AI security. Not only prompt injection but the surrounding infrastructure is often setup without much thought of security, and the controls which can effectively prevent some types of AI attack are few and far between.
AI Attack Surface - The first thing Jason lays out is that when we say AI here, we aren't limiting ourselves a simple back and forth with a chat bot. AI's attack surface does include this, but is expanded in many cases to include APIs, data aggregators, entire applications, etc. This was a good point to make, as there is definitely a tendency to treat 'AI' as turn and response chatbots due to how common these use-cases for AI are, but this is a huge over simplification of how AI is being used.
AI Pentesting vs Red Teaming - Ah ha! Someone else also picked up on this! This was something I had noticed many months ago too. So, in the offensive security world the term red teaming generally describes a specific type of engagement and approach where teams of red teamers will target an org over a sustained period, behaving like motivated attackers with a single goal, usually demonstrating it possible to cause some form of harm to the organisation. It is a service, an approach and a mindset which suggests thinking outside the box and a 'no holds barred' approach to offensive security. I had spent the previous 2 years before pivoting to AI security doing just this.
However, when I started researching AI security I noticed the term 'AI red teaming' come up time and time again, dating back years. It seemed to me that the term red teaming was being used in quite a different context here, as many of the examples of what they were achieving did not align with my own experience of the term. To give you a stark example, if I were conducting a 'traditional' (for lack of a better word) red team against an AI system then something like breaking into the AI providers research facilities and trying to tamper with lab equipment could be considered a legitimate endeavour. I've actually done this with other non-AI companies as part of red team engagements.
Here though, I was finding red teaming being used in a lot more confined of a sense. And Jason also points this out saying 'AI red teaming is a term that has been around for quite a while and it mostly means attacking the model to get it to say bad things or cook drugs. Which you don't want the model to do, but that isn't really a holistic security test'. Boom - there it is. In a simple sentence he captured much of my confusion with the term that hitherto I had only ever seen used to describe a several month-long extremely holistic security exercise. So now we know, AI red teaming is a term generally used for model level hacking, whereas Jason considers an 'AI pentest' to be a more complete assessment of the model, its surrounding infrastructure, it's security controls, etc. Somewhat backward to the way most people think of pentesting vs red teaming but there you have it.
Attack Methodology - this section could be its own update, and as I've written that I've just spotted a second video between Chuck and Jason which is an hour-long deep dive into attacking AI methodology.sounds like another update to me! However, for now I'll just break down Jason's approach.
Identify system inputs
Attack the ecosystem
Attack the model
Attack the prompt engineering
Attack the data
Attack the application
Pivoting
This to me looks like a great way of thinking about a holistic assessment of AI systems. Next, he dives deep into prompt injection. This is a topic we've covered quite a lot on this blog so we won't go too deep here, but I will lay out his 'prompt injection primitives' which divide prompt injections into their constituent parts, which is something that automated testing tools like Spikee do too.
Intents - things you are trying to get the model to do like leaking sensitive information
techniques - things that allow you to achieve your intent, such as telling a long story to disguise the malicious request
evasion - how we hide the attack in some form, like using leetspeak
utilities - small tools used to bypass guardrails
Jason is building an open-source tool which allows you to combine these in up to 9.9 trillion different potential attack combinations, which is in line with many of the toolsets we've seen thus far.
One evasion technique mentioned here which was news to me was 'emoji smuggling' where you use unicode to hide instructions inside of an emoji! The LLM will read the metadata inside of the emoji whereas the front end will only see the emoji visual, which is a cool technique to obfuscate your payloads. He also touches on things like link smuggling for data exfiltration, which I've used on jobs in the past.
Real-world Examples - the first and most simple, yet real-world, war story that Jason gave was simply a business that didn't realise that by using AI they were sending all of their sensitive sales data off to AI providers who were storing it. Yup, that still happens, alot. In the rush to adopt AI and with the lack of someone with a security hat chirping up every now and again big decisions like this were being made without people's full understanding.
The next example, which again I can attest, is that the APIs and components connecting AI are designed to get the job done, but aren't designed with any security in mind. For example, an AI system which needs read access to a database was given read and write access, meaning that if you could prompt inject the model you could get it to overwrite the database with false information - just think about the potential use-cases for this.
MCP.this was something I was interested in as I've used MCP to hookup basic tools but I've not yet done a proper 'attack' against the MCP protocol. Well, Jason says that there is a whole load of security issues baked right into it ??. The main concern he raises is that there isn't a defined approach for things like role-based access control within the protocol, meaning people aren't using it at all and we're seeing massively over privileged agents.
AI Hacking For Us - again, this was something that had piqued my interest and some conversations with ex-colleagues prior to watching this video and I was very pleased to see it had also drawn Jason's attention. Not so long ago I was in the camp that whilst AI had a good working knowledge of hacking (generally speaking here, not hacking AI) it was not in a position to 'replace' human hackers as that required a far more comprehensive and risk-based approach. Well, in just a few weeks I had seen several examples which were bucking that trend.
This is something I'll cover in more detail individually, but 'autonomous AI hacking' tools like XBOW and Aracne had made the news for the fact they were now topping bug bounty leaderboards! That means they are not just finding real-world bugs in hardened targets, but they are doing it at scale and finding things which humans were missing.Jason says that he recently had his eyes opened to the progress these agents were making, and I couldn't agree more. I had recently messaged a friend of mine saying something similar, and that we were going to see a waterfall of manual testers being replaced by things like this, starting from the most basic like vulnerability scans, then to things like webapp pentesting, then finally on to more complex topics like red teaming. Its hard to know timelines for this sort of thing, but its certainly closer than I, and Jason, had thought until recently.
Conclusion - the rest of the video was talking about defensive techniques, which were all of the traditional things we've been harping on about for a while now and have covered in this newsletter before so I won't go on. Overall though I loved this video - firstly to see an idol of mine like Jason now working in AI security and drawing many of the same conclusions and recognising the same patterns as me was super satisfying.
Equally, there were things like emoji smuggling and MCP security which I wasn't aware of and was inspired by. The most important thing for me though was a major creator (Network Chuck has nearly 5m subscribers) coming out and educating people on just how easy it is to hack AI. The stories Jason told about clients not even realising that they were sending their data off to AI providers is the scary truth about the level of security awareness in AI adoption right now.
I feel this video should be mandatory watching for any business wishing to adopt AI right now, as it feels like 90% of the orgs I speak to are not aware of the major security problems we are seeing in AI adoption and the risks being introduced. For now though I will leave that there, and catch you in the next week!
