• AI Emergence
  • Posts
  • Apple’s study reveals limitations in AI's mathematical reasoning abilities

Apple’s study reveals limitations in AI's mathematical reasoning abilities

Along with: OpenAI’s Agentic Framework and Musk’s Driverless Car

Hi there,

This week has seen so much - Elon went all out with demos of Robotaxi, Robovan, and Optimus, only to disclose later that some parts were assisted by humans at the back!

More importantly, Apple released an awesome piece of research calling out the lack of thinking in current LLMs! The timing of the release of research and pulling out of OpenAI funding last minute could be linked! And to be honest - the research raises some very valid and important questions.

Let’s see how quickly can AI address some of the shortcomings raised by Apple! For now, let’s cover the developments this week.

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

Table of Contents

A study by Apple engineers has revealed the fragility of advanced AI models, particularly in mathematical reasoning tasks. Using a modified benchmark called GSM-Symbolic, they found that even trivial changes to names or numbers in math problems caused significant drops in the accuracy of large language models (LLMs).

This indicates that LLMs rely heavily on probabilistic pattern matching, rather than genuine reasoning.

The performance worsened when irrelevant information was added, causing models to make flawed assumptions. These findings highlight the limitations of current AI in reasoning, which lacks a formal understanding of underlying concepts. (source)

Perplexity is making headlines this week with its new finance platform, "Perplexity for Finance," enabling users to access real-time financial data, compare companies, and analyze earnings with an engaging UI. 

However, the company is also facing scrutiny from The New York Times, which issued a cease-and-desist letter, alleging unauthorized use of its copyrighted content in AI-generated summaries. 

Perplexity aims to collaborate with content creators and has initiated an ad-revenue-sharing program to address publisher concerns while contending with allegations of unauthorized web scraping from multiple sources. A controversy between Perplexity AI and The New York Times began when the newspaper issued a cease-and-desist letter to the AI company. (source)

Google has signed a groundbreaking deal to purchase energy from six or seven small modular nuclear reactors (SMRs) from California’s Kairos Power to power its AI data centers.

Set to be operational by 2030, the SMRs aim to provide a clean and reliable energy source amid rising electricity demands driven by generative AI. 

The agreement, which involves a total purchase of 500 megawatts of power, underscores Google's commitment to sustainable energy while raising questions about the viability and cost-effectiveness of SMR technology. (source)

OpenAI has launched the Swarm framework as an experimental tool designed to make it easier for AI agents to work together on tasks. Swarm is lightweight and stateless, offering developers flexibility and control, but it requires external memory solutions for more complex tasks. Here’s a simpler breakdown:

Key Features of Swarm include:

  • Lightweight, stateless design: Easier for developers to understand and implement.

  • Flexibility and control: Developers can customize and extend functionalities based on their needs.

  • "Routines" and "handoffs": These features help organize tasks across different specialized agents for smoother collaboration.

  • Best suited for enterprise automation: Helps in automating tasks where multiple agents are involved.

  • Not yet production-ready: Still experimental, but provides valuable insights into multi-agent systems.

  • Open-source: Available for the community to contribute and build upon.

  • Ethical discussions: The release raises questions about the impact of AI automation on jobs and the workforce.(source)

However, after its release, Swarm has reignited concerns about the ethical risks of AI, particularly regarding misuse, fairness, and potential job displacement. While promising, the development also highlights the need for effective governance and societal safeguards as AI adoption accelerates.

At Analytics Vidhya, we are also focusing on Agentic AI and have launched a course to help professionals and enterprises understand and implement this cutting-edge technology to ease out their workflows.

Adobe is crafting your ultimate creative companion, pushing the boundaries of creativity to the next level.

Adobe launched over 100 new features in Creative Cloud at Adobe MAX 2024, enhancing tools like Photoshop, Illustrator, Premiere Pro, firefly, and Lightroom. These updates make design and editing faster and more creative, with a big focus on integrating AI. Here’s what’s new:

  • Photoshop: Improved Distraction Removal tool to erase unwanted elements, new Generative Fill/Expand for detailed AI-generated images.

  • Firefly: Combines advanced features like scene generation and intelligent editing to streamline the video production process.

  • Illustrator: New "Objects on Path" to quickly arrange objects, better Image Trace for converting hand-drawn art to editable vectors, and a Mockup tool for product designs.

  • Premiere Pro: "Generative Extend" adds frames to video clips with AI for smoother edits.

  • Lightroom: Easier photo editing with Generative Remove and new Quick Actions for fast adjustments.

  • Adobe Express: New features for creating campaigns, rewriting text, animating designs, and setting up brand colors with one click. (source)

“As you can see, I just arrived in Robotaxi - the CyberCab - there are no people in them as you can see” - Elon Musk on making his usual heroic entry at the Tesla Cybercab event.

Elon Musk unveiled Tesla’s first fully driverless vehicle, the Cybercab, at the "We, Robot" event in California.

Attendees rode the vehicle, which didn’t even have a steering wheel or pedals and charged wirelessly.

Musk surprised the audience with the Robovan, a new transport vehicle designed for passengers or cargo.

The event showcased these futuristic vehicles but raised skepticism among investors, particularly about Tesla’s delayed promises of self-driving technology.

Tesla also displayed humanoid Optimus robots, which interacted with guests but mainly served as entertainment. Tesla plans to begin Cybercab production by 2026. (source)

Tool: NoteGPT

Introducing NoteGPT, an AI tool that quickly generates video transcripts and summaries from any link. Perfect for saving time, it provides key insights and lets users ask questions about the content without watching the full video. Ideal for students, professionals, or anyone seeking efficient information access.

How to Access: 

  1. Copy the link 

  2. Click on "Generate" 

  3. Access the features.

  • For those curious about the potential upsides of powerful AI, this blog  Machines of Loving Grace by Dario Amodei offers an insightful exploration of what a positive future could look like if we navigate the associated risks effectively. 

  • For those interested in learning how to choose the right Large Language Model (LLM) for business, the "Choosing the Right LLM for Your Business" course by Analytics Vidhya offers a deep dive into evaluating and selecting LLMs to maximize business efficiency and outcomes.

  • In a recent episode of Leading with Data, I had the pleasure of engaging in a fascinating conversation with Bob van Luijt, the CEO & Co-Founder of Weaviate. Bob shared his insightful journey into AI, the birth and growth of Weaviate, and how their innovative AI-native databases differ from traditional ones.

What do you think about the research from Apple?

Login or Subscribe to participate in polls.

Would love to know, what you think.

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.