• AI Emergence
  • Posts
  • The most powerful video segmentation model from Meta!

The most powerful video segmentation model from Meta!

Along with: OpenAI's bold move to rethink internet searching by Search GPT

Hey there, 

Meta is ruling the roost now and Zuckerberg is the new favourite of the industry! From releasing the biggest models, open-sourcing them, being on the most popular podcasts in the community to exchanging jackets with Jensen - he seems to be getting everything right.

I am now waiting for the next update of very successful Meta RayBan glasses and the impending Quest update!

Please note: We will not be publishing our newsletter next week as we will be busy hosting the most-awaited GenAI event of the year - DataHack Summit in Bengaluru. We look forward to reconnecting with you through our newsletter on August 14th. If you are attending the Summit, we hope to see you there.

It is going to be a lot of fun!

So let’s get started

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Table of Contents

The original Segment Anything Model (SAM) has evolved into SAM 2, which now applies its object segmentation capabilities to videos, enabling real-time tracking of objects across frames. 

This addresses the complexities of video segmentation where objects may move quickly, change appearance, or be obscured. 

SAM 2 enhances video editing, supports quicker visual data annotation for AI training, and enables interactive experiences in mixed reality. 

The best part is it’s open-sourced (in the true Meta way). (source)

Google has unveiled Gemma 2 2B, a compact AI model with 2.6 billion parameters, with a comparative performance with industry giants like OpenAI’s GPT-3.5 and Mistral AI’s Mixtral 8x7B in performance.

Alongside Gemma 2 2B, Google has launched Gemma Scope, a groundbreaking tool for interpreting LLM predictions, marking a significant advancement in explainable AI for generative models. 

Gemma Scope, developed using state-of-the-art JumpReLU Sparse Auto Encoders, consists of over 400 sparse autoencoders that analyze layer and sublayer outputs, identifying activated features in response to specific words and concepts. 

This tool acts as a "microscope" for language model activations, enhancing transparency and understanding of AI processes. Users can explore its functionalities through a hands-on Colab notebook and an interactive demo. (source)

Apple planned to release its new AI feature "Apple Intelligence" for its upcoming iOS 18  and iPadOS 18 software updates. 

However, these features won't be ready for the initial release in September but will instead be included in updates around October. 

This delay allows Apple to offer early access for developers and ensures the new AI capabilities that can create text and images, work seamlessly on the latest devices.

This rollout follows Apple's enhanced focus on AI to boost device functionality and sales, which was emphasized in June. 

To enroll for early access, first create an Apple account and then request Apple developer access here. (source)

OpenAI recently announced SearchGPT, a search engine prototype, to revolutionize online information searches by making them more intuitive and conversational.

This initiative is a direct challenger to Google's search engine dominance and follows closely after the launch of OpenAI's GPT-4o mini, highlighting continuous innovation in AI. (source)

Apple has recently launched and fully open-sourced its DCLM 7B model, a move that marks a significant shift in its AI strategy towards open-sourcing all its models. 

This development aims to extend Apple's reach across its billion devices, breaking from its long-standing closed-source tradition. 

The DCLM, excelling in various benchmarks against competitors like Mistral, Qwen2, and Gemma, is based on primarily English data with a 2048 context window. 

Notably, Apple's release includes a detailed explanation of its data curation process, which is crucial for understanding LLM training. The DCLM is trained on 2.5T tokens under the DataComp-LM framework, enhancing future data curation experiments.(source)

Just when we thought OpenAI had already impressed everyone with its GPT-4o mini, the company introduced the GPT-4o Long Output. 

This update expands the model's output capability to 64,000 tokens from the previous 4,000, enabling users to receive much longer responses, equivalent to a 200-page novel. 

This adjustment, driven by user demand for more comprehensive outputs, enhances applications requiring detailed responses, such as coding and writing, thereby improving the overall user experience. (source)

OpenAI has released ChatGPT’s Advanced Voice Mode to select ChatGPT Plus users, integrating enhanced voice capabilities of GPT-4o for hyperrealistic audio responses. 

The new mode detects emotional nuances and merges voice-to-text and response generation into one process, reducing conversation latency. 

OpenAI modified the feature to avoid legal and ethical issues, introducing four standard voices and excluding the controversial "Sky" voice. (source)

Jim Fan has announced breakthroughs in Project GR00T, which addresses the critical issue of data scarcity in robotics. 

The project uses human demonstrations with Apple Vision Pro headsets to generate initial data by wearing the headset to control robots directly, translating human actions into robot movements. which RoboCasa then multiplies and replicates across numerous virtual environments which increases the variety of situations the robot data encompasses, enhancing the robot's adaptability. 

Lastly, MimicGen expands these datasets by generating new action trajectories and also filters out any unsuccessful ones. This innovative method significantly increases both the volume and diversity of training data, promising to accelerate the evolution of robotics across various fields. (source)

Elon Musk has announced the Tesla Bot Optimus Gen 3, aiming to integrate robots into the workforce without replacing human jobs. 

Optimus 2.0, currently in development, is designed to handle tedious, dangerous, or physically demanding tasks. 

This innovation aims to allow humans to concentrate on roles requiring creativity, critical thinking, and communication. 

Musk envisions these robots as enhancing human capabilities, not displacing them, posing questions about their future impact on labor market integration and automation. (source)

OpenAI's AlphaProof and AlphaGeometry 2 have made significant advances in solving complex mathematical problems. 

AlphaProof, leveraging reinforcement learning, tackles formal mathematical reasoning, while AlphaGeometry 2 improves upon its predecessor by utilizing more extensive training data for complex geometry problems. 

Both systems demonstrated their capabilities by solving problems from this year's International Mathematical Olympiad, with AlphaProof handling algebra and number theory and AlphaGeometry 2 solving geometry issues. (source)

Gemini's latest update has made significant improvements to help users complete tasks more efficiently and creatively. 

Users can now access the improved Gemini 1.5 Flash for faster, high-quality responses free of charge, including a larger context window of 32K tokens and forthcoming file upload capabilities. 

The update also introduces new features to address hallucinations and expands both Gemini for Teens and the mobile app more widely. 

Enhancements in related content links and double-check features improve information accuracy, while Gemini's integration into Google Messages now extends to the European Economic Area, the UK, and Switzerland in additional languages. (source)

Runway has advanced its generative AI video platform despite controversies over data practices. 

They've enhanced their Gen-3 Alpha model, launched in June 2024, to create realistic videos from still images as well as text prompts. 

Users can generate 5 or 10-second videos on Runway’s platform, using credits. The update also includes safety features to prevent the generation of inappropriate content. (source)

Google shared a message encouraging those inspired by #TeamUSA athletes to pursue their dreams, highlighting the support from Gemini. 

It thanks Sydney McLaughlin-Levrone, a world record holder, for her involvement and wishes luck to all aspiring to achieve big dreams this summer. (source)

Have you ever wondered about the potential of AI in revolutionizing digital marketing through the creation of virtual influencers? The process of creating an AI influencer involves using platforms like RenderNet, which is a tool that helps to generate images from text prompts, which can then be used to establish a social media presence. 

Problem Statement: Suppose you are working in marketing and are looking for innovative ways to engage your audience without the limitations of human influencers.

How to Access the tool:

  • Visit the RenderNet and create a free account.

  • Use the studio to select a model and input your desired traits and settings to generate images.

  • Download the photos and use them to establish a social media presence for your AI influencer.

This tool is best utilized by brands and individuals aiming to enhance their digital presence and interaction, offering a fresh, consistent, cost-effective approach to influencer marketing.

Check what I generated by clicking here.

For more detailed guidance on the tool check out the Analytics Vidhya’s blog here.

  • In the recent episode of Leading with Data, I had a great conversation with Dr. Rodolphe Katra, who leads the global AL Strategy at Medtronic and heads the Enterprise AI Centre of Excellence, where he discussed his experience in the medical industry and how he managed to integrate advanced AI technologies in the medical industry.

  • To anyone eager to advance their knowledge in the field of AI-driven search solutions. This course, "Embedding Models: From Architecture to Implementation" by deeplearning.AI, is designed for those interested in the technical foundations and practical applications of semantic search technologies. It offers a deep dive into how embedding models enhance search functionalities, particularly through the use of dual encoder architecture to refine search relevance in large language model applications.

That’s it for the week - see you after the DataHack Summit. I’ll share some updates, snippets, and learnings from the conference in the next edition.

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.