Agentic AI

Small Language Models

Max Corbridge

Cofounder

November 6, 2025

This week I thought we would take a short departure from the security side of things and cover something that I have seen cropping up a few times in the AI space recently, which is Small Language Models. One of things I love about this newsletter is how I can follow whatever trend is hot right now, rather than being beholden to particular topics. In fact, one of my goals when setting out to create this newsletter was that I wanted to one day look back at it and through it view a timeline of AI evolution.

Anyway, back to the topic. Now, by this point I am sure you are very familiar with large language models, or LLMs. They have become almost synonymous with AI itself at this point, and for those who are following the industry more closely you’ll see how these models are getting larger and larger every few weeks. Some of the largest models are now around 1 trillion parameters! And those are just the ones we are told about publicly, many of the frontier commercial models keep this information private and may have far surpassed this landmark already. But what is a parameter? Why do bigger models mean better models? and why is there increasing talk of ‘small’ language models? Let’s cover some of that now.

Parameters

First and foremost, let’s clear up what it is that we actually mean when we are describing the size of models. What we are actually describing is the number of ‘parameters’, which is essentially the amount of training data that was used to train up the model. These are measured in parameters, as they can come in the form of text, images, etc. So, now that we know what we mean by the ‘size’ of the model, let’s talk about the general idea that the bigger the better.

Historically, when we were training the initial models the number of parameters that the LLM was trained on went pretty hand-in-hand with it’s performance. This makes sense, as they were trained on more data, and perhaps higher quality data, meaning that they were better at doing lots of things. This is why for many years the AI race was simply about who could get their hands on the most data, and how big could they train their model.

However, in that journey we’ve now discovered new and better ways of training the models on the data and, crucially, we are getting better at making more performant models with less data. One example of how we are doing this is using something called Mixture of Experts (MoE). MoE means the model has many “expert” subnetworks, and for each input only a few of those experts get used, so you have large capacity but only activate part of it at inference, making the model more efficient. Essentially, we are now in a world where actually bigger does not always mean better, and lots of attention is being paid to how we might use smaller, cheaper and more efficient models to get the job done.

How small is small?

Typically small here is considered to be just a few billion parameters, whereas large language models are typically in the hundreds of billions to trillions. One key difference is that SLMs are usually trained or fine-tuned on domain-specific data rather than broad internet text. They focus on a narrower field (such as medical texts, legal documents, or company knowledge bases), which gives them deep expertise in that area.

How do they perform?

It is worth mentioning at the start of this section that generally speaking when you make a model smaller you are sacrificing something. Whether that be large models are better at a wide-variety of tasks or simply better at complex tasks like code generation. However, we have seen in recent times that models that can compete with larger models are getting smaller and smaller, and for specialised use-cases smaller models can be just as performant as their large counter parts.

Before looking at the specialised use-cases, let’s take a look at a more generic assessment of small language models performance vs large. The standard approach to measuring a language model’s performance is using benchmarks, where we assess the model against a repeatable test to see how it performs. One of the most enduring benchmarks is the MMLU, or massive multitask language understanding. This benchmark contains more than 15,000 multiple choice questions across all sorts of domains (maths, history, law, medicine). The average human scores around 35%, and a domain expert would score more like 90% on questions that are within their speciality.

When GPT-3 came out (175b parameter) it got 44%, today’s frontier models are more like 80/90%. If you take a pretty well-rounded score of something like 60% (not a domain expert but a very competent generalist) lets see how the size of the models capable of achieving this score have changed. Below is the smallest model that could achieve above 60% in the MMLU at given dates:

Feb 2023 = Llama1-65b (65b parameters)
July 2023 = Llama2-34b (34b parameters)
Sep 2023 = Mistral-7b (7b parameters)
Mar 2024 = Qwen1.5 MoE (<3b paramters)

Look at the staggering pace of AI advancement, literally on a month by month basis. So not only are these smaller models becoming just as performant, but they bring other advantages by the nature of them being smaller: lower computational requirements, faster training times, easier deployment, and more efficient performance in specific scenarios.

SLMs are lightweight enough and can be deployed locally on edge devices or local servers (e.g., smartphones, wearables, factory sensors, or even home appliances), and can offer faster response times. Unlike LLMs, which require cloud-based inference and massive computation, SLMs can deliver real-time decision-making due to their smaller size.

Another benefit that they bring by nature of being smaller and more efficient is that they are also more eco friendly. This is a concern that is rapidly getting out of hand, as we are turning AI features on in just about every aspect of our lives yet LLMs themselves consume huge amounts of energy to work.

Use-cases

Despite the leaps forward in SLM performance, sometimes bigger really does mean better. One of the times this is very true is in a general purpose coding scenario, where intimate knowledge of many programming languages may be required. This is a ‘resource intensive’ task and, as such, favours larger models still. Document-heavy work is another one of these, as they require a larger context window which these bigger models can provide. Multi-lingual translation is another.

Smaller models however, are irreplaceable in scenarios where you want a tiny model running on-device (keyboard prediction, voice commands, offline search, etc.). Small models running on-device can give you the latency and privacy you need for these sorts of use-cases. Summarisation is another time where smaller can compete with the larger models, as well as models which only need to be experts in niche areas.

Conclusion

I personally am excited about the journey we are on regarding LLMs and SLMs. In fact, we’re currently looking to replace some LLMs in our agentic work flows with more focused SLMs here at Secure Agentics. This is good news for speed, cost, efficiency and the environment! Win win.

blogs

Our Latest Thoughts

Interviews, tips, guides, industry best practices, and news.