• AI Emergence
  • Posts
  • Enter Gemma: Google’s latest open-source LLM

Enter Gemma: Google’s latest open-source LLM

Along with: Experience the future of video production with SORA

Hey there, 

Have you ever heard of "Internet time," where things happen faster online than offline? Think bigger: "AI time" where tasks are completed even quicker (fraction of the internet time!). This also applies to AI progress - just this week, we've seen OpenAI's cutting-edge text-to-video tool, SORA, enhancements in Gemini, and Google diving into open-source LLMs.

Let’s dive into the most significant developments this week!

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Table of Contents

Google has launched Gemma 2B and 7B, lightweight, open-source LLMs giving developers broader access to its flagship Gemini model's research. The models can also be used in Commercial settings responsibly.

Despite their compact size, the Gemma models outperform much larger models in crucial benchmarks and are designed to operate on a developer's laptop or desktop.

These models can be accessed via Kaggle, Hugging Face, Nvidia’s NeMo, and Google’s Vertex AI.

These models efficiently handle non-complex tasks like summarization and chatbots, saving on speed and cost.

While Google aims to attract the developer community with these models, I think this is a great development. Having multiple LLM options and the ability to select the best one will go a long way in building new applications. (source)

Meet SORA, OpenAI's newest model that turns text prompts into high-quality videos, which can be up to one minute long.

SORA can create intricate scenes featuring various characters, diverse motions and speeds, and highly detailed environments and backgrounds.

OpenAI states, “The model not only grasps the user's prompt but also the real-world 'physics' behind it,” showing its sophisticated understanding of real-world dynamics.

There is a large debate happening on Twitter about whether the model has actually learned Physics or is just copying physical patterns from a lot of data. I think this would be a GPT3 moment for videos.

The videos produced are realistic, cinematic, and often humorous. There are problems you can see in the videos when you watch closely. Common ones from the videos shared till now include physical objects coming out of nowhere (dogs), birthday cake candles blowing in different directions, and many more. But you have to admit - you see them only when you are looking for them specifically.

In 6 months I expect a chatGPT moment to arrive, where the model is further finetuned and starts creating extremely useful, near-flawless videos (with some hallucinations). (source)

The Gemini 1.5 entry: Extended Context Window

Just two months after unveiling Gemini, Google has rolled out an advanced version, Gemini 1.5. This update brings an expanded context window and introduces a 'mixture of experts' architecture, enhancing the model’s speed and efficiency.

Google claims this model can handle up to 1 million tokens—a significant leap. For perspective, OpenAI's GPT-4, released in November '23, could manage a context length of 128k tokens.

That is a 7.8x longer context!

Why is this important? Context length determines how much information you can feed a model in your prompt, alongside its built-in knowledge. The 1.5 Pro Model is capable of processing an hour-long video as part of its prompt context and it is doing it well!

Check out these 2 experiments from none other than Ethan Mollick to see the improvement in capabilities. Experiment 1 and Experiment 2.

Historically, to do this - you would either need to reduce context or would look at techniques like RAG to accomplish the same. (source)

This week, Meta launched V-JEPA, a new self-supervised learning architecture. Pre-trained on video data, this non-generative model actively learns by predicting the missing or obscured parts of a video in an abstract representation space.

Meta aims to build machines with human-like learning efficiency through V-JEPA. This approach is key for machines to develop broad reasoning and strategic skills, allowing them to construct internal representations of their environment to learn, adapt, and strategize effectively for complex challenges. (source)

After announcing an IPO set for this coming March, there's buzz that Reddit has signed a whopping $60 million contract with an undisclosed AI company. They've reportedly struck a content licensing deal, giving the green light for their data to be used in training AI models.

Traditionally, AI firms have used data from the open web without explicit consent, a practice now facing legal scrutiny. This move by Reddit may suggest a shift towards securing data usage rights more solidly. (source)

Additionally, Reddit emailed all its users about updates to the Privacy Policy and User Agreement to make data usage clearer, improve the documents' navigation, update for current products, and comply with the EU Digital Services Act.

Recently, OpenAI introduced the OpenAI Forum, an exclusive 'invitation-only' online community designed to bring together domain experts and students for discussions and collaborations on AI. This forum is open to everyone, free of charge, and will feature both online and in-person events. However, OpenAI will screen applicants to ensure a great fit.

Additionally, OpenAI is rolling out paid opportunities for community members to contribute to OpenAI research projects. This includes tasks like model evaluations, creating evaluation sets, and supporting the Preparedness team's efforts to ensure the safety of cutting-edge models.

So, if you're keen, why not go ahead and sign up? (source)

This AI tool grabbed everyone's attention after its public benchmark tests went viral on X, revealing its computation and response speed to outperform popular AI chatbot ChatGPT. 

Groq’s custom chips are powered by LPUs (Language Processing Units) which allows it to generate roughly 500 tokens per second whereas if compared ChatGPT 3.5 can generate around 40 tokens per second.

The speed of generation is true-mind-boggling and is now at a place where text generation is faster than the speed at which you converse. This would mean that models like GroqAI can deliver agents that are conversing like humans normally do with each other! (source)

Adobe this week unveiled its latest generative AI initiative: an AI Assistant designed to facilitate engagement with PDF documents.

The beta version of this AI Assistant, launched in Acrobat and Reader, features an AI-powered conversational engine capable of answering document-related questions, generating summaries, and more. Emphasizing security, Adobe does not store or use any customer document content to further train the technology.

This innovation is crucial because PDFs serve as the main storage for vital information for individuals and organizations alike, with Adobe Acrobat and Reader being the standard tools for PDF interaction. (source)

  • Magicslides.app is an AI presentation tool that's here to spark your creativity so you'll never have to stare at a blank slide again. Imagine effortlessly creating presentations from text, YouTube, or even PDFs without the hassle of learning a new tool. Sounds handy, right?

  • GPT Excel is an AI tool designed to give your spreadsheet tasks a boost. It offers features like AI-powered formula generation, script creation, SQL queries, template generation, and more. These features aim to automate complex calculations, streamline your workflows, and make data validation and filtering a breeze. Indeed a game-changer!!

  • Slack has introduced Slack AI, a secure, reliable, and intuitive AI experience designed to give you an edge during your workday. They've rolled out new features such as Search Answers, offering personalized, intelligent responses to your queries; Channel Recaps, creating key highlights from accessible channels; and Thread Summaries, helping you catch up on lengthy conversations with just one click.

  • In this episode of Leading with Data, I had a chat with Srikanth Velamakanni about his vision with Fractal to power every human decision in the enterprise and bring AI, engineering, and design to help the world’s most admired companies. He also shared his insights on AI's transformative potential in reshaping our work methodologies.

  • Andrej Karpathy has shared an interesting tutorial where he builds Tokenizer used in the GPT series from OpenAI from scratch. 

Fascinating times to be part of! This week - we have not only seen multiple new models, but these models are bringing in immense new capabilities. Google gives us Million token context as well as open source models; SORA generates real-life videos and breakthroughs in the speed of generation. 

I am very confident that in the next few years, we will look at today as we look at the pre-smartphone world now! 

Just run this thought experiment to see what I mean - Assume that a person goes to sleep in the week before chatGPT was launched and woke up in 2 years. How would they feel about the world they land in?

What if that was 3 years?

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.