
A few days ago a red team friend of mine shared with me a screenshot from OpenAI’s notice about how their data analytics provider, Mixpanel, had suffered a breach and lost some of OpenAI’s customers’ “limited customer identifiable information”. This was followed with a screenshot that this “limited” information actually included (and I quote):
Name that was provided to us on the API account
Email address associated with the API account
Approximate coarse location based on API user browser (city, state, country)
Operating system and browser used to access the API account
Referring websites
Organization or User IDs associated with the API account
I made a joke and went about my day, busy as ever. I then saw this story pop up once again briefly on my timeline which I didn’t pay much attention to. Then I was chatting with a customer and I mentioned how dangerous it was that so many organisations were sending vast quantities of their data (commercial and otherwise) to a handful of AI provider companies, like OpenAI and Anthropic.
This is when I suddenly remembered about the breach, and the reality of the situation actually hit me. Why on earth wasn’t this bigger news? Why did it seem like no one was talking about it? Data privacy, and more specifically AI providers hoarding our data, has been the talk of the town in AI security for years now, so why was the fact that OpenAI has lost a relatively sensitive set of customer PII not being lorded as the ‘I told you so!’ moment we have long thought it might be?
To understand the reasons, we first need to understand what happened. As ever, I’ll break it down nice and simply here to save you the hassle.
On Nov 9 2025 one of OpenAI’s suppliers called Mixpanel suffered a cyber breach. Mixpanel was a third-party web analytics provider that essentially allows OpenAI to better understand how many people are using their API product (platform.openai.com) and what they are using it for. This is a common thing for companies to outsource, and anecdotally I’ve known of Mixpanel for many years and they have over 8k enterprise customers.
The details of what happened Mixpanel-side are less talked about, but what we know is that attackers conducted a smishing attack (literally SMS-based phishing) which resulted in attackers gaining unauthorised access to part of their systems. From here they exported a dataset which contained “limited customer identifiable information and analytics information”.
Now, it is important to note that this was on OpenAI’s API product, not their frontier product ChatGPT. However, it is the API product (platform.openai.com) which most developers and enterprises will be using, so that isn’t necessarily good news.
Despite trying to dress this up as ‘not a big deal’, this was important and sensitive data. Names, locations, emails, recent sites visited, user ID and even what type of computer the user has. Whilst they were very clear that this did not include chat data, passwords, API keys or payment details this is still a worrying collection of data, and companies have made headlines for far less.
Remember that Personally Identifiable Information (PII) - which is what things like GDPR were created to protect - is defined as any data which can be used to identify a living person directly or indirectly. It does not have to be sensitive, it simply has to allow someone to identify who the data is about, and to potentially allow them to target that person. Well, with the name, location, email address, userID and laptop of a person you can do a lot more than just identify who they are, but you could launch a very specialised phishing attack against them personally with a worrying amount of information at your disposal.
This is the main take away from OpenAI, saying that they are contacting all of those who were affected but that users of the platform should remain vigilant to targeted social engineering attacks. OpenAI also mentioned that they have ceased their relationship with Mixpanel and have removed it from all products, which I guess they have to do to be seen to be being responsible and taking action.
Well, if you were one of those that were affected by this then it means you need to be staying extra attentive to further cyber attacks, as the cybercriminals now have a lot of juicy data about you. However, one of the key things that is omitted from OpenAI’s write up is how many customers were affected. We’ve seen this trend from big companies in the past - they notify everyone about the breach (as they must do lawfully in the EU) but they make it sound like it wasn’t a big deal. Months later they quietly release more information like how many records were stolen only for us to find out in retrospect that it was an enormous loss of customer data - I wouldn’t be surprised if that is that is happening here.
More widely though this shows that these large AI companies are still subject to all of the same security challenges that the rest of us face. OpenAI’s entire supply chain (all of the different third-parties, technologies and products that allow them to do what they do) must be enormous, and through any one of those avenues we could see a breach. This is no different to any large org, but the difference is just how juicy OpenAI are as a target.
Think about it - millions of users and countless organisations have been using OpenAI’s services for years now, many of whom do not understand that they should not be sharing sensitive or personal information with OpenAI, who will store it unless you ask them not to. That means that somewhere locked away in their data centres OpenAI are hoarding probably the largest collection of personal and corporate data that has ever been seen. This would be quite literally the holy grail of data breaches if attackers could get their hands on it, and I do not think it was an accident that the Mixpanel attackers just so happened to steal OpenAI’s data out of the 8k customers they have.
We’ve long said that the centralisation of data in just a handful of companies that is currently happening with AI is creating a scary reality and amount of power. It is also putting an enormous target on the back of OpenAI, Anthropic, or the thousands of companies that support them.
