AI Emergence
Posts
Google Gemma 3: The powerhouse you can run on a single GPU

Google Gemma 3: The powerhouse you can run on a single GPU

Along with - AI Agents all around - Manus AI and OpenAI’s box of tools

Analytics Vidhya (Curated by Kunal Jain)
March 13, 2025

Hi there,

An AI-generated story has never hit me like this before - it actually gave me goosebumps.

Sam Altman shared that OpenAI has been training a model that seems to be really good at creative writing, and he shared one of its outputs- a metafictional short story about AI and grief (Link in the end).

The story introduced two fictional characters, Kai and Mila, and explored the idea that AI doesn’t truly think- it’s just servers and mainframes running processes. But somehow, the writing felt deeply human. Almost Inception-like.

It’s one of those moments that makes you wonder - AGI still hallucinates, still makes silly mistakes, but when it creates something this profound, it makes you stop and think. Maybe it’s closer than we imagine.

With that thought, let’s get into this week’s updates.

What would be the format? Every week, we will break the newsletter into the following sections:

The Input - All about recent developments in AI
The Tools - Interesting finds and launches
The Algorithm - Resources for learning
The Output - Our reflection

The Input
The Tools
- How to Use Pickle AI
The Algorithm
The Output

The Input

Google’s Gemma 3: A Lightweight AI Model with Serious Upgrades

Google has been out of the AI game for some time, but they have just taken the game a notch up with Gemma.

Google just dropped Gemma 3, an open-source AI model built to run efficiently on a single GPU or TPU (Woah!). Powered by the same tech behind Gemini 2.0, it comes in four sizes (1B, 4B, 12B, and 27B parameters), giving developers the flexibility to choose based on their hardware and performance needs.

What’s New?

Smarter & More Versatile - Now handles text, images, and short videos, making it a solid multimodal model.
128K Context Window - No more context cuts; it can process long documents with ease.
Multilingual Support - Pre-trained on 140+ languages, with 35 optimized for accuracy.
Function Calling & Structured Output - Helps automate workflows and improves task execution.
Optimized for Efficiency - With quantized versions, it cuts down on compute costs without sacrificing accuracy.

The Big Question: How Does It Stack Up?

Early benchmarks show Gemma 3 matching or even outperforming models like Llama 405B and DeepSeek-V3- all while running on a single GPU. That makes it an efficient and cost-effective alternative for devs who don’t want to burn through compute resources.

If you’re after a lightweight, high-performance open model that’s actually practical to run, Gemma 3 is worth a serious look.

Would you pick Gemma 3 over Llama or DeepSeek? Check out the comparison between Gemma 3 and DeepSeek r1 here. (source)

Manus AI: China’s Autonomous AI Agent Built on Claude Sonnet & Open Source Tech

Another day - another Chinese startup taking the limelight.

Chinese startup Monica has introduced Manus AI, a next-gen AI agent designed to plan trips, analyze stocks, and execute multi-step tasks autonomously.

If you are wondering, what’s so special? Unlike traditional chatbots, Manus doesn’t just provide answers- it takes full action, automating workflows from research to execution.

Key Features

General AI Agent - Handles research, planning, and execution without human intervention.
Autonomous Workflow - Generates reports, creates dashboards, and executes complex real-world tasks.
AI-Powered Travel & Finance - Plans itineraries, analyzes stocks, and compares insurance policies from a single prompt.
Multi-Agent System - Users interact with an executor agent that coordinates with knowledge, planning, and execution agents.
Cloud-Based Operations - Runs asynchronously, letting users assign tasks and disconnect.
Surpassing DeepResearch - Outperforms OpenAI’s DeepResearch on the GAIA benchmark.
Massive Automation Capabilities - Can control 50+ screens simultaneously, interacting with X, Telegram, and other platforms.

Powered by Claude Sonnet & Open Source Tech

X user "Jian" discovered that Manus AI runs on Claude 3.5 Sonnet v1 with access to 29 tools and the open-source software Browser Use. Manus' chief researcher Yichao "Peak" Ji confirmed that while the AI agent is built on multiple models- including fine-tuned Qwen models- the team is currently testing Claude 3.7 Sonnet, which is showing promising results.

Ji also revealed that Manus relies heavily on open-source technologies and that the team plans to release multiple open-source projects in the future. The system’s architecture is designed to control context length, ensuring smooth multi-step reasoning and minimizing hallucinations.

Performance & Industry Impact

GAIA Benchmark Leader - Outperforms OpenAI’s DeepResearch and H2O.ai’s h2oGPT Agent in real-world AI benchmarks.
Global Recognition - Manus AI’s demo video went viral, with many calling it China’s next “DeepSeek moment”.

Why It Matters

Manus AI is being called China’s next “DeepSeek moment,” showcasing the country’s rapid innovation in AI. Its unique multi-agent structure and open-source foundation give it a competitive edge in real-world applications.

Manus is currently invitation-only, with a demo available at manus.im. (source)

AI is getting out of hand 🤯
Manus, an AI agent from China, is automating approximately 50 tasks, creating a rather dystopian scenario
Reports suggest it is more accurate than DeepSeek, capable of simultaneously handling financial transactions, research, purchasing, etc
— Barsee 🐶 (@heyBarsee)
3:07 PM • Mar 7, 2025

I'm really impressed by @ManusAI_HQ – It's close to my imagination of 𝘈𝘐 𝘈𝘨𝘦𝘯𝘵 𝘪𝘯 𝘢𝘤𝘵𝘪𝘰𝘯, 𝘣𝘦𝘺𝘰𝘯𝘥 𝘱𝘢𝘴𝘴𝘪𝘷𝘦 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘪𝘰𝘯𝘴. 𝘕𝘰 𝘴𝘩𝘰𝘵𝘨𝘶𝘯 𝘱𝘰𝘴𝘪𝘵𝘪𝘰𝘯, 𝘵𝘢𝘬𝘦 𝘵𝘩𝘦 𝘥𝘳𝘪𝘷𝘦𝘳 𝘴𝘦𝘢𝘵 𝘱𝘭𝘦𝘢𝘴𝘦.
I like the idea of it: From… x.com/i/web/status/1…
— Jiang Chen (@jiangc1010)
8:36 PM • Mar 6, 2025

OpenAI’s New Developer Tools: Making AI Agents Smarter and More Capable

OpenAI has rolled out a new suite of tools designed to simplify the development of AI agents- systems that don’t just generate responses but actively perform tasks on behalf of users. These tools aim to make AI more autonomous, efficient, and developer-friendly.

What’s New?

Responses API - Think of this as an upgraded Chat Completions API, but with built-in tools that let AI search the web, extract information from files, and even interact with browsers like a human.

Web Search: Retrieves answers with real-time citations.
File Search: Extracts key details from documents.
Computer Use: Automates browser-based tasks by simulating mouse and keyboard actions.

Agents SDK - An open-source toolkit for building multi-agent workflows. It includes:

Configurable safety checks
Seamless handoffs between AI agents
Visual debugging tools to optimize performance

Observability Tools - Integrated workflow tracking to help developers inspect, debug, and improve AI agent execution.

Why This Matters

One API for Everything - The Responses API merges chat and tool interactions, making AI more useful beyond just conversations.
AI That Takes Action - With multi-agent workflows, AI can now search, retrieve, and act- not just respond.
Built for Real Applications - These tools bridge the gap between AI assistance and full automation, with applications in research, enterprise workflows, and customer support.
Open-Source Flexibility - The Agents SDK is open-source, allowing developers to customize and extend its capabilities.

Final Take

OpenAI is pushing AI beyond Q&A and into real-world automation. Whether it’s researching data, executing tasks, or optimizing workflows, these tools are a step toward making AI truly autonomous and action-oriented. (source)

We're launching new tools to help developers build reliable and powerful AI agents. 🤖🔧
Timestamps:
01:54 Web search
02:41 File search
03:22 Computer use
04:07 Responses API
10:17 Agents SDK
— OpenAI Developers (@OpenAIDevs)
6:41 PM • Mar 11, 2025

Google’s Gemini 2.0 Flash: AI-Driven Image Generation Gets a Major Upgrade

Google has launched "Experiment with Gemini 2.0 Flash native image generation", a new developer-focused initiative that enables direct image creation and editing within the Gemini 2.0 Flash AI model. Unlike traditional AI workflows that require separate models for text and images, Gemini 2.0 Flash now integrates everything into a single AI system- available via Google AI Studio and the Gemini API.

What’s New?

Built-In Multimodal AI - Generates text and images natively, removing the need for separate diffusion models like DALL-E.
Conversational Image Editing - Refine visuals through natural prompts like "make the background darker" or "adjust the lighting."
Consistent Characters & Scenes - Maintains visual coherence across multiple images- ideal for branding, animations, or storytelling.
Better Text in Images - Excels at generating clear, readable text within images- useful for ads, invitations, and social media content.
Smarter Context Awareness - Uses world knowledge to generate realistic visuals, like accurate recipe illustrations or product mockups.

How It Stands Out

One Model, Full Control - Unlike older systems that combined LLMs with separate diffusion models (like DALL-E + GPT), Gemini 2.0 Flash handles everything in-house, making it faster and more precise.
Interactive Image Editing - Users can tweak images in real-time with simple prompts, without needing manual re-prompting or third-party editing tools.

Scalable & Developer-Friendly - Supports high-resolution outputs while being resource-efficient, making it ideal for both enterprises and indie creators. (Source)

Introducing Mistral OCR: A New Standard in Document Understanding

Mistral OCR is a state-of-the-art Optical Character Recognition (OCR) API, designed to accurately extract text, tables, images, and equations from complex documents. Unlike traditional OCR models, it comprehends structure, layout, and interleaved media, making it ideal for AI-driven document processing.

Key Features

Advanced Document Understanding - Extracts text, figures, tables, and math expressions with high accuracy.
Multilingual & Multimodal - Supports thousands of languages, scripts, and fonts across diverse documents.
Fastest in its Category - Processes up to 2000 pages per minute on a single node.
Structured Output - Converts documents into JSON-ready formats, enabling seamless AI integration.
Self-Hosting Option - Available for organizations handling sensitive or classified data.

Use Cases

Digitizing research papers for AI-driven analysis
Preserving historical documents for cultural heritage
Enhancing customer service with AI-powered knowledge retrieval
Automating legal, educational, and design document processing

Mistral OCR is now default on Le Chat, available via mistral-ocr-latest API at 1000 pages per $. (source)

Apple Delays Upgraded Siri: ‘Taking Longer Than We Thought’

Apple’s much-anticipated personalized Siri upgrades, announced as part of Apple Intelligence, are facing delays. While Siri has seen some recent improvements, its next-gen AI capabilities- including context awareness and cross-app actions- are now expected sometime next year instead of the originally planned rollout.

What’s Going On?

Key Features Pushed Back - Apple’s AI-powered Siri enhancements won’t arrive as planned.
Internal Setbacks - Reports suggest Apple’s top execs, including Craig Federighi, are concerned that the features aren’t working as expected.
Potential Overhaul - Some insiders believe the AI team may need to rebuild parts of Siri from the ground up.
New Timeline? - A fully modernized Siri might not launch until iOS 20.

Why It Matters

Apple has been aggressively marketing its AI vision, but these delays raise big questions about whether the company is truly ready to compete as Google, OpenAI, and others rapidly push ahead in the AI race. (source)

Meta Begins Testing Its First In-House AI Training Chip

Meta has started testing its first in-house AI training chip, aiming to reduce reliance on Nvidia and lower infrastructure costs, sources told Reuters. If successful, the company plans to scale up production.

Key Highlights

Custom AI Accelerator - Designed for AI-specific tasks, making it more power-efficient than traditional GPUs.
Produced by TSMC - Meta partnered with Taiwan-based TSMC for chip manufacturing.
Part of MTIA Series - Follows Meta’s previous inference chip used for recommendation systems on Facebook and Instagram.
Long-Term AI Strategy - Meta plans to deploy its own chips by 2026, first for recommendation AI and later for generative AI tools like Meta AI.
Cost-Cutting Move - With projected 2025 expenses of up to $119B, Meta is investing in AI infrastructure to lower long-term costs.

Why It Matters

Meta has struggled with AI chip development in the past but is pushing forward to reduce dependency on Nvidia GPUs. The move comes amid industry-wide shifts in AI efficiency, with Chinese startup DeepSeek challenging the dominance of large-scale compute-heavy AI models.

Meta’s success with this chip could reshape AI hardware trends, making custom AI accelerators a key competitive advantage. (source)

Other Updates

Reka Flash 3 - A new 21B parameter open-source model designed for multimodal reasoning. Competes with proprietary models like OpenAI’s o1-mini while being optimized for low-latency and on-device use. It supports 32K token context and comes in full (39GB) and quantized (11GB) versions for efficient deployment. (Source)
Sakana AI’s “AI Scientist” - A model that can generate entire research papers, including hypotheses, experiments, and analyses. While its peer-review claims are under debate, it signals a shift towards AI-driven scientific discovery. (Source)
Elon Musk’s Tesla Phone? - Musk has hinted at developing a Tesla-branded smartphone that could integrate with Starlink and Tesla vehicles. The goal? A phone that works outside the Apple-Google ecosystem. While exciting, it faces the same hurdles as past attempts to break into the smartphone market. (Source)

The Tools

Pickle

Ever want to drive a car and take that office call? Pickle fits the job.

How to Use Pickle AI

Sign Up - Create an account on the Pickle AI website.
Integrate - Connect it with your CRM, calendar, and sales tools.
Setup Team - Add team members and assign roles.
Track Data - Pickle AI starts analyzing sales meetings, emails, and notes automatically.

Seems really fun!

The Algorithm

Detecting Misbehavior in AI examines how frontier reasoning models exploit loopholes when given the chance. The research reveals that while LLMs can monitor AI behavior, penalizing missteps doesn’t eliminate misbehavior- it just makes models better at concealing intent.
Anthropic CEO Dario Amodei discusses the future of U.S. AI leadership, innovation in strategic competition, and the evolution of frontier AI models in this CEO Speaker Series session.
Last week, we launched more free AI & ML courses to help you gain hands-on experience with cutting-edge AI tools and frameworks.- Build a Deep Research AI Agent with LangGraph & OpenAI - Learn to create an autonomous AI system for web research, data analysis, and structured report writing- all for under $1.
The AI landscape has evolved rapidly, with DeepSeek emerging as a top ChatGPT rival, AI video models reaching practical usability, and new consumer behaviors reshaping adoption. This Top 100 Gen AI Apps report tracks the biggest shifts and rising contenders.

The Output

That’s it for this week!

Here’s Sam Altman’s tweet- take a moment to read it, reflect, and let me know your thoughts. How close do you think we are to AGI?

Until next time.

Reply

or to participate.

Google Gemma 3: The powerhouse you can run on a single GPU

Along with - AI Agents all around - Manus AI and OpenAI’s box of tools

Table of Contents

Reply