• AI Emergence
  • Posts
  • What Makes Llama 3.1 the Largest Open-Source AI Model Ever?

What Makes Llama 3.1 the Largest Open-Source AI Model Ever?

Along with: What's behind Mistral's back-to-back breakthroughs with Large 2 and NeMo?

Hey there, 

What a week! I can’t contain my excitement about what Meta did this week. They not only open-sourced their biggest model but also the learnings they had in building the same. The model beats GPT-4o and Claude 3.5 in several benchmarks and brings open source to the front.

Also, If you haven't secured your ticket to the DataHack Summit 2024 yet, now is the time to act. The majority of tickets have already been sold out. Don't miss your chance to be part of this groundbreaking event showcasing the latest advancements in GenAI.

Let’s dive into the developments this week

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Table of Contents

Earlier this year, Meta hinted at a major AI development and now Meta has finally launched Llama 3.1 - an open-source model. What’s special? It’s not just any other model, it’s a frontier model beating top competitors like GPT-4o and Claude 3.5 Sonnet in various benchmarks.

Llama 3.1, the largest open-source AI model so far, has 405 billion parameters and a context window of 128k tokens shows superior performance. 

Trained with over 16,000 of Nvidia’s H100 GPUs, Llama 3.1 is a major leap from its predecessors. Despite the high costs, Meta is releasing Llama 3.1 as open-source. Zuckerberg believes open-source AI models will eventually surpass proprietary ones, similar to how Linux became a dominant open-source operating system.

Additionally, Meta has open-sourced a ton of learnings in a paper containing 100 pages of pure wisdom. (source)

Link to paper

Mistral has launched its new AI model, Mistral Large 2, with a 128k context window and support for numerous languages and coding languages. 

It boasts 123 billion parameters, is optimized for high-performance single-node inference, and excels in tasks requiring long-context processing. 

The model, which requires a commercial license for business use, also shows advanced capabilities in reasoning, code generation, and multilingual performance. Mistral Large 2 is designed to be more accurate and less prone to errors, enhancing its utility in complex AI applications. (source)

A few days ago Mistral, in collaboration with NVIDIA, also released Mistral NeMo, a 12B AI model. This model, licensed under Apache 2.0 for broad adoption, features a vast 128k token context window and stands out in reasoning, world knowledge, and coding accuracy.

It incorporates the Tekken tokenizer to efficiently support numerous major languages and is finely tuned for precise performance in a variety of AI tasks. Designed for easy integration, Mistral NeMo seamlessly replaces the previous Mistral 7B systems. (source)

Elon Musk's xAI is actively developing the Grok series of AI models. Grok 2, slated for an August release after a delay due to data cleaning, aims to surpass OpenAI's GPT-4 using 24,000 H100 GPUs leased from Oracle. 

Musk has also initiated the development of Grok 3, promising exceptional capabilities through training on a self-built cluster of 100,000 H100 GPUs, signaling a strategic shift away from Oracle to speed up development and enhance control. (source)

Google DeepMind has released Foundational Large Autorater Models (FLAMe), setting new standards in the AI industry by surpassing existing proprietary models in assessing Language Learning Model outputs.

FLAMe, trained on a dataset with 5 million human judgments covering 100 tasks, demonstrates exceptional adaptability and accuracy. The FLAMe-RM variant, optimized for reward modeling, excels in the RewardBench benchmark, highlighting its superior performance (source)

Fujitsu and Cohere Inc. partnered to develop and distribute advanced Large Language Models (LLMs) with enhanced Japanese language capabilities for enterprise use. 

This partnership involves significant investments and the launch of "Takane," a new AI model designed for high-security and industry-specific applications within private clouds.

Fujitsu will utilize Cohere’s technology to deliver this model through its platforms, aiming to boost productivity and security in sectors such as finance and government. (source)

DeepL has launched a new language model featuring three innovations:

It optimizes language tasks to ensure accurate, error-minimized translations; 

It trains on over seven years of proprietary content creation and translation data, and 

It integrates insights from thousands of selected language experts. 

The model outperforms competitors like Google Translate and ChatGPT-4, especially in English to Japanese and Chinese translations. DeepL, aiming to add more languages, now offers this model to its Pro customers. (source)

Recently, McDonald's aired its AI-generated commercial titled 'A Taste of Tomorrow,' showcasing a groundbreaking approach to advertising. This innovative commercial uses advanced AI technology to depict a scenario where a computer uses ChatGPT to express a desire for a byte of the burger. (video)

Fei-Fei Li, the renowned computer scientist known as the “godmother of AI,” has founded World Labs, now valued at over $1 billion, developing AI models that mimic human visual processing for enhancing robotics, VR, and AR. 

Leveraging her ImageNet experience and Google Cloud expertise, Li focuses on spatial intelligence to improve AI's three-dimensional understanding.(source)

SmolLM, a new series of small language models, offers three sizes (135M, 360M, 1.7B) tailored for efficient local operation on devices like smartphones and laptops. 

These models, developed to perform robustly on diverse datasets including Cosmopedia and Python-Edu, excel in reasoning and coding tasks, thanks to targeted training and sophisticated data curation techniques. 

They combine high-quality, diverse training data and advanced model training techniques to ensure top performance, emphasizing privacy and reduced inference costs.(source)

Tweet of the Week

In today’s digital landscape, professionals across industries struggle to efficiently generate realistic 3D models for various applications, from product design to virtual reality experiences. Bing AI for 3D images provides an advanced tool that simplifies the creation of detailed 3D visuals, making it accessible even to those without specialized graphic design skills.

Problem statement: Suppose you are an interior designer who needs to visualize and present room layouts and decor schemes to clients. Utilize Bing / Copilot Designer to modify high-quality 3D images of interior spaces, allowing you to effectively showcase your design ideas to your clients.

How to access the tool:

Here is my prompt: Generate a 3D image of a modern living room with Scandinavian design elements.

Results: Link

Do you want to learn more about this tool? Check out the Analytics Vidhya blog for further details.

  • In the recent episode of Leading with Data, I had this amazing conversation with Dr. Geeta Manjunath about her contribution to revolutionizing breast cancer detection through her innovative, AI-driven solution at Niramai Health Analytix. She calls herself an accidental Entrepreneur - but to me, it is a story of courage, a story of immense passion, and that which needs to be told to every data science professional.

  • As a resource, you can not miss going through the Llama 3 paper published by Meta. Specifically, look at the scaling laws section and the learnings Meta had in building the largest open-source model on Earth.

  • Deeplearrning.ai launched a new course with Flower Labs called Intro to Federated Learning. There is also a detailed course on the same. I would recommend doing the Introductory course and then deciding if this is of interest to you.

What was your highlight for the week? What did you learn this week? Would love to hear it from you.

Keep learning!

Kunal

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.