What's New with OpenAI's GPT-4o Mini?

Along with: Andrej Karpathy’s new “AI Native” Education Platform

Hey there, 

It has been a tremendous week!

I finally downloaded the iOS18 Public beta and gave the Maths Notes a try and I must say - Apple has nailed what the future of Math education looks like. In addition, one of the best teachers in AI started his own education company.

I feel super excited about the future of education and for all those people who think AI reliance will bring down the critical thinking in people - the use of AI in education might actually deepen it. Now you will get personalized feedback on what you are learning!

Let’s run through updates for the week.

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Please note: This is an abbreviated version of our newsletter due to email length restrictions. For the complete experience, visit our website and enjoy the full, unabridged edition.

Table of Contents

OpenAI launched the GPT-4o mini, a smaller and more cost-effective version of its AI models, designed for tasks such as chatbots, coding, and math reasoning. 

The GPT-4o mini, priced at 15 cents per million input tokens and 60 cents for output tokens, offers a significant cost reduction compared to earlier models. 

This model supports text and vision inputs and plans to include image, video, and audio in the future. It integrates rigorous safety measures and is accessible through multiple API interfaces.

Its Feature includes

  • Offers a cost-effective pricing structure.

  • Supports text and vision inputs.

  • Aims to expand support to include text, images, video, and audio.

  • Delivers high performance in chat preferences, math reasoning, and coding.

  • Integrates rigorous safety measures. (source)

A recent investigation by Proof News has unveiled that some of the world's leading AI companies, including Anthropic, Nvidia, Apple, and Salesforce, have utilized content from thousands of YouTube videos to train their AI models.

The investigation revealed that subtitles from 173,536 YouTube videos, sourced from over 48,000 channels, were included in the dataset known as "YouTube Subtitles". The dataset features transcripts from YouTube megastars including MrBeast, Marques Brownlee, PewDiePie, and more! (source)

Mistral launched 2 models - a math-based model and a code-generating model for programmers and developers based on the new architecture known as Mamba developed by other researchers late last year.

  • Codestral Mamba is a newly developed architecture designed to offer linear time inference and the capability to model sequences of infinite length, providing efficient and quick responses for code productivity use cases.

  • Mathstral is a specialized model designed to tackle advanced mathematical problems requiring complex, multi-step logical reasoning. (source)

Andrej Karpathy, former head of AI at Tesla and researcher at OpenAI, is launching Eureka Labs, an “AI native” education platform.

Eureka Labs will develop AI assistants or personalities that would work with a human teacher to allow “anyone to learn anything,”. Teachers would still design the course material, but they’d be supported by this AI assistant.

Eureka Labs’ first product will be LLM101n, an undergraduate-level AI course that guides students through training their own AI, similar to a smaller version of the AI teaching assistant. The course materials will be available online, with plans for both digital and physical cohorts. (source)

Microsoft researchers have introduced "SpreadsheetLLM," an AI model crafted to comprehend and interact with spreadsheets, marking a significant leap in enterprise AI. 

The model encodes spreadsheet data for use with large language models (LLMs), enabling these models to interpret and reason over spreadsheet contents. 

Leveraging natural language processing, users can query and manipulate spreadsheet data using plain English instead of complex formulas, democratizing data access and empowering data-driven decision-making. Additionally, SpreadsheetLLM can automate tasks like data cleaning, formatting, and aggregation, enhancing efficiency and productivity. (source)

OpenAI is developing a project named 'Strawberry' to enhance AI models' reasoning capabilities.

It utilizes a new method to enable AI to autonomously navigate and research the internet, perform complex reasoning, and manage long-horizon tasks that require planning over extended periods.

This initiative pushes AI toward more sophisticated reasoning and problem-solving abilities, distinguishing it from existing models that often lack common sense and produce unreliable outputs when faced with complex tasks. (source)

Recently, OpenAI shared a structured five-level framework to show its advancements in developing Artificial General Intelligence (AGI)  aiming to create AI that surpasses human capabilities across most tasks. 

Each level signifies a deeper integration of AI into human-like capabilities and autonomous functions.

  • Conversational AI (Level One): At this stage, AI systems handle basic conversational interactions and assist with routine communication and content creation, akin to customer service agents or AI coaches.

  • Reasoning AI (Level Two): These AI systems can solve complex problems at a level comparable to a Ph.D. holder, without external tools, transitioning from simple conversations to advanced reasoning skills.

  • Autonomous AI (Level Three): Referred to as "agents," these systems operate independently for extended periods, managing tasks without ongoing oversight, much like reliable team members.

  • Innovating AI (Level Four): Known as "Innovators," these AIs autonomously develop new methods and improvements, enhancing processes and increasing efficiency beyond basic task execution.

  • Organizational AI (Level Five): At this pinnacle stage, AIs, called "organizations," can perform all functions of an entire entity, managing operations comprehensively with little to no human input. (source)

Google has launched Google Vids, a beta AI-powered video creation tool integrated within its Workspace suite for business users. 

This new tool utilizes Gemini AI and Vertex AI technologies to enable users to generate professional presentations quickly. It supports text, audio, and video elements, offering features like royalty-free media access and collaborative project capabilities. 

Google Vids includes diverse voice options and is currently being tested by a select group of users to gather feedback for further improvements, reflecting Google's ongoing integration of AI across its product ecosystem. (source)

Samsung is set to enhance its voice assistant, Bixby, with its own LLM leveraging GenAI. The updated Bixby, launched in 2017 alongside the Galaxy S8, will include advanced functions like live translations and interactive camera features through Bixby Vision. This upgrade aligns with Samsung's strategy to keep multiple voice assistants on its devices, allowing users to choose from options like Google's AI assistant. (source)

Google's DeepMind Robotics team has integrated the Gemini 1.5 Pro AI to enhance robot navigation and interaction within an office environment, showcasing this advancement through a series of demonstrations. 

Robots equipped with this technology respond to verbal instructions and navigate complex environments using a combination of long-context vision-language models and topological graphs. 

This method, known as Multimodal Instruction Navigation with Demonstration Tours (MINT), combines environmental understanding with gesture recognition, enabling robots to perform tasks with high levels of autonomy and context awareness. (source)

OpenAI faces significant safety concerns despite being a leader in AI technology. Internal criticisms have emerged, with employees and former staff voicing worries about the company's rush to launch products without adequate safety checks, especially after the departure of key safety team members. 

These concerns are intensified by broader anxieties about the potential national security risks posed by advanced AI technologies. OpenAI has attempted to address these issues through partnerships and internal measures aimed at improving safety protocols, but doubts remain about the adequacy of these efforts. (source)

Haiper has just launched Haiper 1.5 which is said to be an enhanced visual foundation model. It lets users generate 8-second video clips from various prompts, doubling the length from its previous model. Moreover, It also introduces a new upscale for enhancing video quality and plans to integrate image generation. (source)

Meta Platforms is set to launch the largest version of its Llama 3 model on July 23, 2024, marking a significant advancement in AI and LLM. 

The Llama 3 model, known for its versatility, will be released for free, contrasting with competitors' paid models, potentially disrupting the AI market. (source)

Google's DeepMind team has enhanced robot capabilities using Gemini AI, which trains RT-2 robots to navigate and perform tasks using natural language instructions. 

Robots learn their environment by "watching" a video tour, then execute commands based on observed elements, demonstrating a high success rate in responding to over 50 instructions within a large operating space. 

The use of Gemini 1.5 Pro allows the robots to plan actions like retrieving specific items based on user requests, indicating a promising future in environment-aware robotic assistance. (source)

The Mixture-of-Experts (MoE) technique which is used in effectively scaling LLMs, is facing problems in scaling limit which is an obstacle in improving the performance and new capabilities of LLM.

Google PEER architecture addressed this challenge by replacing the fixed router with a learned index to efficiently route input data to a vast pool of experts enhancing performance without increasing costs. This novel approach significantly improves the model's computational efficiency by using tiny experts and a multi-head retrieval system, promising better scaling and performance for large AI models. (source)

Nvidia and Georgia tech introduced RankRAG which advances the capabilities of LLM in RAG tasks by instruction-tuning a single LLM to efficiently perform both context ranking and answer generation. 

RankRAG is said to improve the precision of context selection during retrieval and enhance answer accuracy, thereby addressing the limitations of previous RAG systems.

The process starts by using a comprehensive dataset to fine-tune the LLM on tasks that improve its search capabilities and RAG performance. 

Then, during inference, the LLM re-ranks retrieved contexts before generating answers, aiming to optimize the relevance of the information used in generating responses, resulting in better integration of complex knowledge and effective adaptation of various NLP.(source)

AWS App Studio democratizes enterprise app development, enabling non-developers like IT project managers and data engineers to build applications using natural language prompts. 

This AI-powered service simplifies the creation of complex applications with a user-friendly interface that handles design, logic, and data integration, streamlining deployment and maintenance while connecting with AWS and third-party services.(source)

Engineers at Northwestern University have created an innovative low-cost artificial actuator designed like human muscle, significantly advancing robotics technology.

This new actuator, made from cheap materials like standard rubber and accessible via 3D printing, allows robots to interact safely and effectively in human-centric environments.

The device successfully powered a soft robot through complex maneuvers and an artificial bicep that could lift weights repeatedly without failure. (source)

Anthropic has doubled the maximum output token limit for Claude 3.5 Sonnet from 4096 to 8192 in their API. To utilize this new limit, simply include the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" in your API requests. (source)

Have you ever wished you could turn your imaginative ideas or maybe dreams into visually stunning digital art with just a few clicks? The Luma Dream Machine allows you to do just that. 

This AI-powered tool transforms your text prompts into detailed digital artwork, offering limitless creative possibilities with the ease of adjusting artistic parameters. 

Problem statement: Challenge yourself to capture the essence of your dream in a prompt and explore the artistic interpretation AI can provide.

Here's how to use the tool:

  1. Sign up on the platform.

  2. Enter a text prompt or upload an image describing the video you wish to create.

  3. Click "Generate" to create your video.

Review the pricing plans, which vary based on monthly or yearly subscriptions.

For more detailed information about the tool, visit the Analytics Vidhya blog here.

  • In this video, you will get to know Devin, a cutting-edge AI software agent developed by Cognition AI, as Scott Wu, co-founder and CEO, demonstrates its capabilities and shares valuable insights from the experience of creating this tool.

  • In the recent episode of Leading with Data, I had a chat with Ines Montani about her experiences and her expertise as a leader in the field of generative AI, focusing on the development of spaCy and Prodigy. 

  • If you are interested in mastering the intricacies of pretraining LLMs for specialized domains, this course offered by deeplearning.ai covers everything from data preparation using HuggingFace to advanced training techniques like depth up-scaling.

I can’t leave the thread before knowing what you guys are thinking about the impact of AI on education in a 5-year horizon, so here is the poll.

How would AI impact the future of education and logical reasoning?

Login or Subscribe to participate in polls.

That’s it for the week! Keep learning and Emerging!

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.