Since ChatGPT debuted in the fall of 2022, much of the interest in generative AI has centered around large language models. Large language models, or LLMs, are the giant compute-intensive computer models that are powering the chatbots and image generators that seemingly everyone is using and talking about nowadays.
While there’s no doubt that LLMs produce impressive and human-like responses to most prompts, the reality is most general-purpose LLMs suffer when it comes to deep domain knowledge around things like, say, health, nutrition, or culinary. Not that this has stopped folks from using them, with occasionally bad or even laughable results and all when we ask for a personalized nutrition plan or to make a recipe.
LLMs’ shortcomings in creating credible and trusted results around those specific domains have led to growing interest in what the AI community is calling small language models (SLMs). What are SLMs? Essentially, they are smaller and simpler language models that require less computational power and fewer lines of code, and often, they are specialized in their focus.
From The New Stack:
Small language models are essentially more streamlined versions of LLMs, in regards to the size of their neural networks, and simpler architectures. Compared to LLMs, SLMs have fewer parameters and don’t need as much data and time to be trained — think minutes or a few hours of training time, versus many hours to even days to train a LLM. Because of their smaller size, SLMs are therefore generally more efficient and more straightforward to implement on-site, or on smaller devices.
The shorter development/training time, domain-specific focus, and the ability to put on-device are all benefits that could ultimately be important in all sorts of food, nutrition, and agriculture-specific applications.
Imagine, for example, a startup that wants to create an AI-powered personalized nutrition coach. Some key features of such an application would be an understanding of the nutritional building blocks of food, personal dietary preferences and restrictions, and instant on-demand access to the application at all times of the day. A cloud-based LLM would likely fall short here, partly because it would not only not have all the up-to-date information around various food and nutrition building blocks but also tends to be more susceptible to hallucination (as anyone knows who’s prompted an AI chatbot for recipe suggestions).
There are a number of startups in this space creating focused SLMs around food and nutrition, such as Spoon Guru, that are trained around specific nutrition and food data. Others, like Innit, are building their food and nutrition-specific data sets and associated AI engine to be what they are terming their Innit LLM validator models, which essentially puts food and nutrition intelligence guardrails around the LLM to make sure the LLM output is good information and doesn’t suggest, as Innit CEO Kevin Brown has suggested is possible, a recommendation for “Thai noodles with peanut sauce when asking for food options for someone with a nut allergy.”
The combination of LLMs for generation conversational competency with SLMs for domain-specific knowledge around a subject like food is the best of both worlds; it provides the seemingly realistic interaction capability of an LLM trained on vast swaths of data with savant-y nerdish specificity of a language model focused on the specific domain you care about.
Academic computer scientist researchers have created a model for fusing the LLM and SLMs to deliver this peanut butter and chocolate combination that they call BLADE, which “enhances Black-box LArge language models with small Domain-spEcific models. BLADE consists of a black-box LLM and a small domain-specific LM.”
As we envision a food future of highly specific specialized AIs helping us navigate personal and professional worlds, my guess is that the combination of LLM and SLM will become more common in building helpful services. Having SLM access on-device, such as through a smartwatch or phone, will be critical for speed of action and accessibility of vital information. Most on-device SLM agents will benefit from persistent access to LLMs, but hopefully, they will be designed to interact independently – even with temporarily limited functionality – when their human users disconnect by choice or through limited access to connectivity.