• AI Emergence
  • Posts
  • “ChatGPT 4o” world vs. Google Gemini Era

“ChatGPT 4o” world vs. Google Gemini Era

Along with: A non-aligned OpenAI?

Hey there, 

Who would have thought that there would be 2 potentially world-changing product announcements in 2 days, which will change the world forever?

With the launch of ChatGPT 4o and the latest updates from Google I/O, it's clear that Generative AI is defining the new world order. To reflect this, we have dedicated Generative AI sessions at the DataHack Summit 2024, where experts will tackle GenAI problems in a live, interactive manner. Get more details here.

Now coming back to the newsletter.

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Table of Contents

This week, OpenAI created the “ChatGPT” moment by announcing ChatGPT 4o (Omni) - the flagship model capable of reasoning across audio, vision, and text in real-time. Key highlights include:

  • Free Access: GPT-4o is freely accessible to all users, making advanced AI technology widely available.

  • Unified Model: A single neural network processes text, vision, and audio inputs and outputs, enhancing integration and functionality.

  • Enhanced Performance: GPT-4o is twice as fast, half the price, and offers five times the rate limits of GPT-4 Turbo. (source)

Some of the amazing videos include - Two GPT-4os interacting and singing, Sal Khan’s son getting tutored by AI, and so much more!

At the Google I/O 2024 developer conference, Google unveiled several artificial intelligence innovations designed to enhance its AI services amid fierce competition. Here’s a breakdown of the updates and launches:

  • Gemini 1.5 Pro Updates: Google has made significant upgrades across its Gemini models. The Gemini 1.5 Pro will now have a context window of 2k tokens.

  • Gemini 1.5 Flash: A new multimodal model is just as powerful as Gemini 1.5 Pro but optimized for “narrow, high-frequency, low-latency tasks.” That makes it better at generating fast responses.

  • Google Veo, Imagen 3, and Audio Overviews: Google introduced Veo, the SORA competitor that generates high-definition videos, and Imagen 3, which creates lifelike images with minimal visual artifacts. Alongside these, the Audio Overviews feature can generate audio narratives from text inputs. 

  • Ask Photos: It lets Gemini pore over your Google Photos library in response to your questions, and the feature goes beyond just pulling up pictures of dogs and cats.

  • Project Astra: Developed by Google's DeepMind, this prototype AI assistant assists with daily tasks and complex queries through a video-audio interface, with plans for future integration into the Gemini platform.

  • AI Sandbox: This new feature allows for the creation of music and sounds from textual prompts, addressing the ongoing challenges with generative AI accuracy.

  • AI Overviews: This feature in Google Search delivers concise summaries for complex queries, simplifying information retrieval.

  • AI Teammate: A new feature in Google Workspace that builds searchable collections of work from communications and analyzes readiness for project launches, enhancing workplace efficiency. (source)

Ilya Sutskever, Co-founder and Chief Scientist of OpenAI is leaving OpenAI after a decade.

He plans to focus on a new, personally meaningful project, though he has not provided details.

His successor as head of research will be Jakub Pachocki, a seven-year veteran of OpenAI.

Along with him Jan Leike Key figure in super alignment group has also announced his departure from the company. (source)

Google DeepMind has introduced AlphaFold 3, a groundbreaking AI model that significantly enhances our ability to understand the structures and interactions of all molecular components within living organisms.

AlphaFold 3 offers at least a 50% improvement in predicting protein interactions with other molecules over existing methods and even doubles the accuracy for key interaction types.

Additionally, Isomorphic Labs is collaborating with pharmaceutical companies to leverage AlphaFold 3 in drug design, aiming to develop new treatments.  (source)

Meta is reportedly developing AI-powered earbuds equipped with cameras under a "Camera Buds" project.

These innovative earbuds are designed to identify objects and translate foreign languages, indicating Meta's continued interest in expanding its portfolio of smart wearable technology.

This development follows the introduction of their AI-enabled Ray-Ban smart glasses last September, priced at $299, which can provide information about viewed objects. (source)

The Technology Innovation Institute of Abu Dhabi has launched Falcon 2 LLM, which includes two versions: the Falcon 2 11B and the Falcon 2 11B VLM (Vision to Language Model), capable of transforming visual inputs into textual outputs and marking the first foray into multimodal models.

These models have been open-sourced, allowing developers worldwide to use and adapt them freely.

The Falcon 2 11B has demonstrated superior performance compared to similar models from competitors like Meta and Google, achieving high scores in independent tests by Hugging Face. (source)

AI superpowers author, Kai-Fu Lee, has recently launched an innovative bilingual (English and Chinese) open-source AI model named Yi-34B through his AI startup, 01.AI.

Despite being much smaller, this model has outperformed leading AI models like Llama 2’s 70 billion-parameter model and GPT-3.5 in certain benchmarks.

Lee announced Yi-34B after revealing that the model ranked first among pre-trained LLMs on the AI platform Hugging Face. (source)

At the France summit, Microsoft recently announced its largest investment in France, aiming to accelerate the country's AI and cloud technology adoption.

This investment, totaling €4 billion, includes expanding Microsoft's cloud and AI infrastructure with new data centers in Paris, Marseille, and a planned campus in Mulhouse Alsace Agglomération.

The investment is set to bring 25,000 advanced GPUs to France by 2025, significantly boosting the local AI capabilities and aligning with the EU and French digital sovereignty principles. (source)

Apple is reportedly finalizing a partnership with OpenAI to integrate ChatGPT technology into iPhones, coinciding with the upcoming iOS 18 release.

This collaboration aims to enhance iPhone capabilities by incorporating AI-driven chatbot features, prioritizing security and privacy.

Unlike traditional cloud processing, Apple plans to process as much data as possible on-device to safeguard user privacy, aligning with its longstanding commitment to keeping user data secure and private. (source)

SoftBank Group's subsidiary, Arm, is gearing up to launch AI chips by the fall of 2025, as it enters the competitive AI chip market.

Arm, primarily a U.K.-based chip designer, is preparing to build a prototype by spring 2025 and has begun discussions with contract manufacturers including Taiwan's TSMC for production. (source)

Fujitsu and its consortium have launched "Fugaku-LLM," a highly advanced LLM trained on the supercomputer Fugaku, featuring 13 billion parameters.

This model excels in Japanese language tasks, setting a new benchmark with a high score, particularly in humanities and social sciences.

Fugaku-LLM is optimized for performance with innovations in distributed training techniques and the utilization of Fugaku's computational resources, which allows it to process data using CPUs instead of GPUs, a strategic move given the global GPU shortage. (source)

Tools : ChatGPT-4o

This new update in GPT can reason across audio, vision, and text in real time.

Problem Statement: Make the use case of real-time translation. (Reference)

  1. Sign in/ Sign up to GPT Omni

  2. Click on the conversation tap

  3. Mention the language you want to start the conversation in.

  4. Generate it.

For more information about the tool, check out our guide on using GPT-Omni on the Analytics Vidhya blog.

  • In the recent episode of "Leading with Data," I had a chat with Joshua Starmer about his inspiring professional journey, which led him to establish StatsQuest. We explored the pivotal moments on YouTube that propelled him into entrepreneurship and discussed his vision for StatsQuest's future, including upcoming books.

  • If you are interested in enhancing your skills in AI agent systems and applying them to real-world business scenarios, the course 'Multi AI Agent Systems with crewAI' from DeepLearning.AI focuses on teaching the principles of designing effective AI agents and organizing a team of AI agents to perform complex, multi-step tasks.

  • In this video by Khan Academy Salman Khan, the CEO discusses his published book “Brave New Word” and tells about his experience in building KhanMigo and the future of Education.

What a week! Exciting product launches - Astra and GPT 4o both feel like a lot more mature Alexa without the limitation of the hardware! But, the irony is that Amazon is not even in the race currently.

Also - Project Astra showcased that Google will bring back the glasses. It would be a direct competition to Meta Ray-Ban glasses but could have a lot more features and probably a more advanced Gemini-Flash - only time will tell.

What were your reactions to the launches? Reaction to the exit of Ilya from a company he had built? Do let me know!

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.