The generative AI landscape is growing bigger day by day.
Today, Meta announced a new family of AI models, Old 2, designed to power applications like ChatGPT OpenAI, Bing Chat, and other modern chatbots. Trained on a mix of publicly available data, Meta claims that the performance of the Llama 2 improves significantly over previous generation Llama models.
Llama 2 is the follow-up to Llama — a collection of models that can generate text and code in response to requests, comparable to other chatbot-like systems. But Llama is only available upon request; Meta decided to close access to the model for fear of being abused. (Despite these precautions, Llama later leaked online and spread across various AI communities.)
Instead, Llama 2 — which is free for research and commercial use — will be available for refinement on AWS, Azure, and Hugging Face AI model hosting platforms in a pre-trained form. And it’ll be even easier to run, says Meta—optimized for Windows thanks to an expanded partnership with Microsoft and smartphones and PCs packing Qualcomm’s Snapdragon system-on-chip. (Qualcomm says it’s working on bringing the Llama 2 to Snapdragon devices in 2024.)
So how is Llama 2 different from Llama? In some way, all of them are highlighted Meta at length White paper.
Llama 2 comes in two versions, Llama 2 and Llama 2-Chat, the latter being adapted for two-way conversations. Llama 2 and Llama 2-Chat are further divided into versions with varying sophistication: 7 billion parameters, 13 billion parameters and 70 billion parameters. (“Parameters” are the part of the model that is learned from the training data and basically determines the skill of the model at a problem, in this case generating text.)
Llama 2 trained with two million tokens, where the “token” represents the raw text — for example “fan”, “bag”, and “tic” for “fantastic”. That’s nearly double the number of trained Llamas (1.4 trillion), and — in general — the more tokens, the better in terms of generative AI. Google’s current flagship big language model (LLM), PaLM 2, is they say trained with 3.6 million tokens, and it is speculated that GPT-4 also trained with trillions of tokens.
The meta does not disclose the specific source of the training data in the whitepaper, except from the web, mostly in English, not from the company’s own products or services and emphasizes “factual” text.
I would venture a reluctance to reveal training details rooted not only in competitive reasons, but also in the legal controversies surrounding generative AI. Just today, thousands of authors signed a letter urging tech companies to stop using their writing for AI model training without permission or compensation.
But I digress. Meta says that in a variety of benchmarks, the Llama 2 model performed slightly worse than its highest profile closed source rivals, GPT-4 and PaLM 2, with the Llama 2 far behind GPT-4 in computer programming. But human evaluators found Llama 2 to be about as “helpful” as ChatGPT, claims Meta; Llama 2 answers the equivalent of about 4,000 commands designed to investigate “help” and “safety”.
Take the results with a grain of salt. Meta acknowledges that its tests are unlikely to capture every real-world scenario and its benchmarks may be less diverse — in other words, not covering areas such as coding and human reasoning sufficiently.
Meta also acknowledges that Llama 2, like all generative AI models, is biased along certain axes. For example, the pronoun “he” tends to earn more than the pronoun “he” due to an imbalance in the training data. As a result of the toxic text in the training data, the performance does not outperform other models on toxicity benchmarks. And Llama 2 has a Western leaning, again thanks to an imbalance of data including an abundance of the words “Christian,” “Catholic,” and “Jewish.”
The Llama 2-Chat model performed better than the Llama 2 model on Meta “benefit” and internal toxicity benchmarks. But they also tend to be overly cautious, with models erring by rejecting certain requests or responding with too many safety details.
To be fair, the benchmarks don’t take into account the extra layer of security that might be in place on the hosted Llama 2 model. As part of its collaboration with Microsoft, for example, Meta is using Azure AI Content Safety, a service designed to detect “inappropriate” content across AI-generated images and text, to reduce toxic Llama 2 throughput on Azure.
Because of this, Meta is still making every effort to distance itself from potentially harmful outcomes involving Llama 2, emphasizing in its whitepaper that Llama 2 users must comply with Meta’s license terms and acceptable use policy in addition to guidelines regarding “safe development and deployment”.
“We believe that openly sharing today’s major language models will support the development of useful and safer generative AI as well,” Meta wrote in a blog post. “We’re looking forward to seeing what the world builds with Llama 2.”
Given the open source nature of the model, there’s no telling how – or where – it might be used exactly. At the lightning speed the internet is moving, it won’t be long before we find out.