When it comes to generative AI and the models that drive them, sometimes less is more. Many businesses find that small language models tailored for very specific tasks can be more effective and efficient than large language models (LLMs). Small models are less expensive to train and maintain, and often outperform the kitchen-sink approach of their gigantic multipurpose counterparts.
Here we’ll explain the appeal of small models, how they work, and how they can benefit your business. You will get answers to these questions:
- What is a small language model?
- Why are large language models so expensive?
- How are small language models different?
- How do small language models enable on-device AI?
What is a small language model?
A small language model is a machine-learning algorithm that’s been trained on a dataset much smaller, more specific, and, often, of higher quality than an LLM’s. It has far fewer parameters (the configurations the algorithm learns from data during training) and a simpler architecture. Like LLMs, the advanced AI systems trained on vast amounts of data, small language models can understand and generate human-sounding text.
Small models are typically deployed for a single specific task (such as answering customer questions about a certain product, summarizing sales calls, or drafting marketing emails) and can be more computationally efficient and faster than LLMs due to their small size and higher-quality, more targeted data. This means you can save money and time, while also improving accuracy by designing topic-specific small language models into your architecture.
Small language models are not designed, for example, to help you research trends in the healthcare industry. They can, however, help a healthcare company answer customer questions about, say, a new health program for diabetes prevention.
Cost, relevance, and complexity are three important ways small models differ from LLMs.
What to read next
Why are large language models so expensive?
An LLM is a type of AI that can generate human-like responses by processing natural language inputs, or prompts. This is possible because they’re trained on massive datasets, which gives them an understanding of an expansive range of information.
All this information processing requires enormous computational resources. The larger the AI model, the higher the cost of training, compute power, and energy — to say nothing of the downstream maintenance costs. OpenAI’s ChatGPT-4, for example, costs over $100 million. Each parameter adds to the price tag, which is multiplied by every piece of input data, known as a token. That’s why even seemingly straightforward tasks like answering a simple question of AI – “What is the capital of Germany?”– are resource-intensive and expensive.
To put it simply, in many cases, general-purpose LLMs with tens of millions of parameters are overkill for business users who need help with specific tasks.
“Parameter count is just one of many variables that determine how well an AI deployment can solve problems in the real world,” said Silvio Savarese, executive president and chief scientist at Salesforce.
(For a deeper look at when LLM scale is, and isn’t, necessary, check out this Q&A with Savarese)
Further, LLMs require huge, high-quality datasets. Acquiring and preprocessing them can be time-consuming and very expensive. Training them adds even more effort and expense: You have to make sure the data is diverse, and that it represents the population it will affect. Setting up and maintaining the required infrastructure (such as cloud computing and specialized hardware) can also be extremely high.
How are small language models different?
Small, highly trained, task-specific models may be a better option for many companies, regardless of their size. Here’s why:
Lower cost to serve
LLMs are power-hungry and resource intensive. Small language models require power and resources, too, but because the pool of data they draw from is much smaller and more task- specific, the system requirements (and the ultimate costs) are far lower. And because small models require far fewer compute resources, they consume less power and water than general-purpose models, which helps mitigate cost and their impact on the environment.
Better performance
Generative AI relevance — or the degree to which AI outputs are useful, applicable, and aligned to specific business needs — is a vexing business challenge. Business users need clear solutions to specific queries, not the kitchen sink.
As Savarese wrote in this article, “There’s no substitute for hundreds of billions of parameters when you want to be everything to everyone. But in the enterprise, this ability is almost entirely moot.”
With the right strategy, small language models designed for individual, well-defined tasks, like knowledge retrieval or tech support, can easily outperform larger models.
Small, open-source models like Salesforce’s xGen consistently exceed the performance of larger models by leveraging better pre-training and data curation strategies. xGen, for example, is trained on longer sequences of data, helping it summarize large volumes of text, write code, and more.
Greater accuracy
Model accuracy depends on the quality and quantity of the data it’s trained on. Since LLMs are trained on oceans of data pulled from all over the internet, much of it is irrelevant to the business user’s task at hand. Alternatively, small language models like xGen are trained on business data that looks similar to the customer relationship management (CRM) data that a customer might have.
“xGen is narrowly focused on these specific tasks, and it’s very good at it,” said Kathy Baxter, principal architect, ethical AI practice at Salesforce.
The models’ small size results in a more focused learning process: They adapt faster to the nuances of particular datasets or applications. This is important for companies looking for specialized AI capabilities because they’re better at handling specific tasks.
How do small language models enable on-device AI?
Business users on the go can use their phones to access LLMs that live in the cloud. But there are key issues. You need an internet connection, and the performance is only as good as that connection.
What if you had a small language model that lives on your phone, and works even when you’re offline? Salesforce Research is working on this very thing, with xGen-Mobile, which is tiny enough to fit on a phone but powerful enough to perform tasks accurately and quickly.
The first iterations will be geared toward field service and field sales. In field service, picture a technician diagnosing a washing machine problem onsite. Internet connectivity may be spotty or non-existent in, say, a basement, but that’s not a problem. The technician could access the small language model stored on their device, and instantly get answers to repair questions.
Future iterations of xGen-Mobile will support multimodal capabilities. For example, if the technician takes a picture of a greasy, broken part, the model would recognize it, making it easy to order a new part. By snapping a picture or even recording sound, the tech could get recommendations for the most likely issues, and ways to address them.
Another benefit? Keeping the computation on the device can save costs by not sending data to process in the cloud. Further, you can ground the model in the data that’s on your device, and personalize it to your needs.
“These models can be grounded in information on an individual’s device,” Baxter said. “That means they will eventually be highly personalized, which will make them even more valuable.”
Stronger data privacy
Unlike some external API-based sources, small language models like XGen adhere to stringent data privacy controls. This complies with Salesforce’s restrictions on keeping customer data inside its own secured platform. XGen preserves privacy better because the model runs on a mobile device, where the data lives. This is a good solution for sensitive, regulated industries like banking and healthcare, which are restricted in how and and with whom they can share information.
Small but mighty
Small models can be fine-tuned to specific tasks or industries, giving you more relevant and precise outputs without the overhead of processing unnecessary information. This makes them perfect for applications where speed, cost, and accuracy are crucial, delivering specific solutions without the heavyweight footprint.