• AI Emergence
  • Posts
  • Grok-3: Is it really the smartest AI or Musk Hype?

Grok-3: Is it really the smartest AI or Musk Hype?

Along with - Mira Murati is assembling an A-team for her new venture

Hi there đź‘‹

It’s been an exciting week for research - Grok 3 and Perplexity’s Deep Research just dropped, bringing two powerful deep research tools into the mix. I’ll be using them and sharing my thoughts next week.

On a different note, I tried some “vibe coding” on Replit this week. In just 30 minutes, I built and deployed a personal daily task planner - no project setup hassle, no deep coding required. It’s crazy how fast and easy building software has become. As “vibe coding” turns into the norm, it’ll be interesting to see how it reshapes software development.

With this thought, let’s look at the news for this week.

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection

Table of Contents

Elon Musk’s xAI has officially launched Grok-3, the latest evolution of its AI chatbot family, claiming it to be "the smartest AI on Earth." With 10x the computing power of its predecessor and a suite of new capabilities, xAI is positioning Grok-3 as a serious competitor to OpenAI, Google, and Anthropic. But does it truly live up to the hype?

Grok-3: Key Upgrades & Capabilities

  • Bigger, Faster, More Powerful – Trained at xAI’s Memphis data center on 200,000 GPUs, Grok-3 packs way more compute than Grok-2.

  • Better Reasoning Skills – Uses reinforcement learning to improve logic and problem-solving, similar to OpenAI’s o-series and DeepSeek R1.

  • DeepSearch Tool – A new AI-powered research assistant, taking on OpenAI’s Deep Research and Perplexity AI’s research tools.

  • Multiple Versions – Grok-3 Mini, Grok-3 Advanced Reasoning, and Grok-3 DeepSearch are all rolling out, with voice mode expected soon.

How Does Grok-3 Compare to the Competition?

Early reports claim Grok-3 beats GPT-4o, Gemini 2 Pro, and DeepSeek V3 on coding, math, science, and reasoning benchmarks. It also topped the Chatbot Arena leaderboard with a score of 1400, supposedly outperforming OpenAI, Google, and Anthropic.

But here’s the catch- none of these claims have been independently verified.

The AI Arms Race: Scaling vs. Innovation

Despite its impressive infrastructure and rapid development, Grok-3 follows the same industry trends as its rivals- research "agents," Chain-of-Thought reasoning, reinforcement learning, and massive models trained on internet data. The AI landscape is increasingly homogenized, with companies essentially releasing variations of the same chatbot technology under different branding.

  • The AI race remains focused on benchmarks and scaling, with no game-changing application yet.

  • Key details like Grok-3’s training data, energy use, and environmental impact remain undisclosed.

  • xAI’s Memphis data center has raised air pollution issues, highlighting the environmental costs of large-scale AI.

What’s Next for xAI?

Musk says Grok-2 will be open-sourced once Grok-3 stabilizes. Meanwhile, xAI is scaling up fast- its next data center is expected to have 5x the power requirements of its current setup.

Andrej Karpathy compared Grok-3 to OpenAI’s o1-pro, but noted that "real evaluations over time will tell us how it actually performs." So while Grok-3 looks strong on paper, the real test is yet to come. (source)

Also, xAI just dropped an update- check out the latest announcement below:

Microsoft has introduced Majorana 1, a quantum chip designed to address one of the biggest challenges in quantum computing - scalability and error resistance. Built on Topological Core architecture, this chip integrates qubits and control electronics into a compact form that fits in the palm of your hand.

Key Innovations

  • Topological Qubits for Stability - Majorana 1 uses a new class of qubits based on topoconductors, enabling the creation and control of Majorana particles for more stable and scalable quantum operations.

  • Scaling to One Million Qubits - The chip is designed to support up to a million qubits, a critical threshold for tackling complex problems like breaking down microplastics or developing self-healing materials.

  • Built-in Error Resistance - Topological qubits naturally reduce computational errors, addressing a key limitation in quantum computing.

  • New Measurement Techniques - A voltage-pulse-based measurement system simplifies qubit state detection, improving accuracy while reducing complexity.

Why It Matters

  • Bringing Quantum Computing Closer to Reality - Majorana 1's advancements could accelerate fault-tolerant quantum computing, making large-scale applications more feasible.

  • Potential Industrial Impact - With significantly improved computing power, quantum breakthroughs in materials science, drug discovery, and clean energy could be on the horizon. (source)

Former OpenAI CTO Mira Murati has announced Thinking Machines Lab, an AI startup focused on making AI systems more flexible, adaptable, and personalized. Murati is leading the company as CEO, joined by OpenAI co-founder John Schulman as Chief Scientist and ex-OpenAI Chief Research Officer Barret Zoph as CTO.

Star-Studded Team

The startup brings together top AI researchers from OpenAI, Meta, Google DeepMind, CharacterAI, and Mistral, assembling a high-caliber team to push AI innovation forward.

Next-Gen AI Systems

Thinking Machines Lab aims to develop more responsive and customizable AI, moving beyond rigid models to create systems that adapt better to user needs across diverse applications. (source)

OpenAI just dropped SWE-Lancer, a new benchmark designed to measure how well AI handles real-world freelance software engineering tasks. Pulled from 1,400 Upwork jobs worth a total of $1M, this benchmark is testing whether AI can actually earn like a human freelancer.

How SWE-Lancer Works

The benchmark includes:

  • Coding Tasks: Ranging from $50 bug fixes to $32,000 full feature builds, graded with end-to-end tests.

  • Managerial Tasks: AI models evaluate technical proposals, with their decisions compared against real engineering managers.

Why This Matters

  • A More Realistic AI Test – Unlike traditional benchmarks, SWE-Lancer challenges AI to debug, verify patches, and complete entire workflows—just like a human freelancer.

  • Coding & Decision-Making Skills – The test doesn’t just measure how well AI writes code, but also how well it manages projects, reflecting what real engineers do.

  • AI Still Has a Long Way to Go – Even the best models (GPT-4o, Claude 3.5 Sonnet) struggled, with coding pass rates between 8.0% and 26.2% and managerial accuracy at 44.9%.

  • Connecting AI to Real-World Value – OpenAI open-sourced the dataset, helping researchers explore how AI affects productivity and earnings in freelance work.

What’s Next?

Findings show that frontier models still struggle with most tasks, revealing key limitations in AI-driven software engineering. To push further research, OpenAI has open-sourced a Docker image and the SWE-Lancer Diamond evaluation split, allowing researchers to dive deeper into AI’s economic impact on the field. (source)

French AI startup Mistral has launched Mistral Saba, a 24B parameter AI model designed specifically for Arabic-speaking countries and South Asian languages like Tamil and Malayalam. This marks Mistral’s first localized AI model, built to capture linguistic nuances and cultural context often missed by general-purpose AI.

What Makes Mistral Saba Different?

  • Region-Specific & More Accurate - Designed for Arabic, Tamil, and Malayalam, Mistral Saba ensures better cultural and linguistic understanding than general AI models, reducing errors and improving contextual accuracy.

  • Cost-Effective & Efficient - With 24B parameters and Mixture of Experts (MoE) architecture, it outperforms models 5x its size while running on a single GPU, making AI more accessible and affordable.

  • Fast & Scalable - Generates 150+ tokens per second for low-latency, high-accuracy responses, ideal for chatbots, domain-specific AI, and content creation, even in resource-limited environments.

  • Secure & Deployable Anywhere - Unlike cloud-based models, Saba can be deployed on-premise, making it a privacy-first choice for industries like finance, healthcare, and government. (source)

Over the last month, AI-powered deep research tools have been making waves, offering ways to cut research time from days or months to just minutes. Perplexity’s Deep Research is the latest addition, standing out for its real-time web access- a capability that many AI tools still lack.

While ChatGPT’s Deep Research can take 20-30 minutes to compile findings, Perplexity generally delivers results within minutes, running multiple searches, analyzing sources, and structuring insights into clear, shareable reports.

How It Works

  • Searches & Analyzes in Real Time - Iteratively refines research plans and gathers insights across domains like finance, tech, and marketing.

  • Generates Comprehensive Reports - Findings are structured into organized reports, which can be exported as PDFs or shared via Perplexity Pages.

  • Available to All Users - Open to everyone, with unlimited access for Pro subscribers and a daily cap for free users.

  • Fast Turnaround - Completes research tasks in under 3 minutes, with performance benchmarks like Humanity’s Last Exam (21.1%) and SimpleQA (93.9%) highlighting its accuracy.

With AI increasingly handling research-heavy tasks, tools like Perplexity’s Deep Research show how automation is reshaping the way we find and process information. (source)

As per a report by Bloomberg, Meta is forming a robotics division within Reality Labs to develop humanoid robots capable of assisting with physical tasks, including household chores.

Key Details:

  • Led by Marc Whitten - Former CEO of Cruise, with experience at Amazon, Microsoft, and Sonos.

  • Focus on Robotics Hardware & AI - Developing software and hardware for the broader robotics industry.

  • Not a Meta-Branded Robot (Yet) - Meta aims to build foundational tech for the robotics market, much like Android did for smartphones.

  • Potential Partnerships - Meta is in talks with Unitree Robotics and Figure AI for prototype collaborations. (source)

Anthropic is preparing to release its next major AI model within weeks, according to The Information. Described as a hybrid system, the model can switch between deep reasoning and fast responses, with a sliding scale to help developers balance costs. 

It reportedly outperforms OpenAI’s o3-mini-high on some programming tasks and excels at analyzing large codebases and business benchmarks. Anthropic CEO Dario Amodei hinted at the launch, emphasizing the company’s focus on better-integrated reasoning models rather than separating them from standard AI models. (source)

Sam Altman’s latest X post hints at a major leap in AI capabilities with GPT-4.5. He describes testing it as a "feel the AGI" moment, suggesting that even AI experts and discerning users are experiencing a notable shift toward more advanced intelligence. This fuels speculation that GPT-4.5 could be a major step closer to AGI, bringing improvements in reasoning, adaptability, and overall performance. (source)

Meta’s FAIR lab and the Basque Center on Cognition, Brain and Language (BCBL) have achieved a major milestone in AI-driven brain research. Their new studies demonstrate how AI can decode sentences from non-invasive brain recordings, accurately reconstructing up to 80% of characters from brain activity.

Key Research Findings

  • Non-Invasive Language Decoding - AI models trained on MEG and EEG data can predict sentences as participants type, offering a potential path for brain-computer interfaces to aid people with communication impairments.

  • Understanding Language Formation - AI analysis of MEG signals reveals how the brain transforms thoughts into words, uncovering a dynamic neural code that chains successive representations over time.

  • Clinical Potential & Challenges - While promising, challenges remain, including improving accuracy, practical limitations of MEG, and application to patients with brain injuries. (source)

Writesonic: This AI-powered SEO and marketing assistant connects with tools like Ahrefs, Search Console, and HubSpot to help with competitor analysis, content creation, and workflow automation- offering a more efficient way to manage marketing tasks.

How to Access:

Step 1: Sign up with your email.

Step 2: Choose your area of work (e.g., Content Marketing, SEO, or Lead Generation).

Step 3: Select the specific tasks you need help with, such as content strategy, keyword analysis, or campaign optimization.

Step 4: The AI agent processes your inputs, provides insights, and automates certain tasks based on your selection.

  • The Ultra-Scale Playbook, developed by nanotron and hosted on Hugging Face, serves as a comprehensive guide for training large language models (LLMs) on extensive GPU clusters. It offers in-depth insights into the architecture and scaling of LLMs, including detailed discussions on memory management, compute requirements, and distributed training strategies. The playbook also provides practical resources, such as datasets and code examples, to facilitate efficient model training and deployment.

  • Last week, we launched some more free AI & ML courses to build smarter AI systems. These free courses offer practical, hands-on insights into the latest AI tools and techniques- 

    • xAI Grok 3: Smartest AI on Earth - With this course, gain insights into xAI Grok 3’s evolution, key features, and performance through comparative analysis and benchmarks.

    • Introduction to Transformers and Attention Mechanisms - Learn the fundamentals of attention mechanisms in transformers. Explore RNNs, Seq2Seq models, and pre-trained transformers like BERT & T5 through hands-on projects and real-world NLP applications.

Curious to know your take on “vibe coding.” Have you built anything recently using AI tools? A website, an app or a side project?

What do you think - game-changer or just another trend? Drop your thoughts!

Until next week,

Reply

or to participate.