💰Cloudflare Makes AI Pay to Scrape

Along with: Xiaomi’s AI Glasses Are Here to Flex

Hey there 👋

The bots are eating the internet, and for the first time, the internet is starting to send them a bill. This week, Cloudflare launched "Pay per Crawl", a new system (in private beta) that flips the script on AI data scraping. Instead of letting bots quietly slurp up your content for free, site owners can now charge per crawl or block AI access entirely. Think of it as a firewall for large language models.

Why now? Because AI companies are taking way more than they give. Google crawls your site about 14 times for every visitor it sends. OpenAI? Around 1,700 times. Anthropic? A staggering 73,000 times. Cloudflare’s message is simple. If your content is helping train billion-dollar models, you should have control and maybe even get paid for it.

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Algorithm - Resources for learning

  • The Output - Our reflection

Table of Contents

Google just dropped Gemma 3n, and it’s a big step forward for AI that doesn’t live in the cloud. This model runs directly on your device, no server, no connection, no problem. It handles text, images, audio, and video as input, and somehow does it all on just 2–3GB of RAM. Seriously.

What’s New:

  • Multimodal-native: Supports image, audio, video, and text inputs, with text outputs. Built to handle complex input types directly on-device.

  • Two efficient sizes: E2B (5B params, ~2GB RAM) and E4B (8B params, ~3GB RAM), but thanks to architectural tricks, they run like 2B/4B-class models.

  • MatFormer architecture: Think of it like nesting dolls - the E4B model includes a fully functional E2B inside it. You can even slice models into custom sizes to fit your device’s memory.

  • Per-Layer Embeddings (PLE): A new embedding trick that shifts most of the model’s load to CPU, which means less GPU or VRAM is needed to run it - a big win for edge deployment.

  • KV Cache Sharing: Speeds up long-sequence inputs by 2×. Useful for streaming video/audio tasks where response time matters.

  • MobileNet-V5 vision encoder: New 300M-param model optimized for edge. 13× faster than SoViT on Pixel TPU, with higher accuracy and 4× lower memory use.

  • Audio understanding built-in: ASR and AST support via a USM-based encoder (6 tokens/sec). 30-second clips at launch, streaming coming soon.

  • Multilingual + benchmark leader: Understands 140 languages, handles multimodal input in 35. E4B crosses 1300 on LMArena - first <10B param model to do it.

  • Get Started with Gemma 3n: Try it instantly on Google AI Studio (runs in-browser)

Models like Phi-3, Qwen2, and Mistral have made waves in the small-model space, but most focus on text-only tasks and still rely heavily on GPU or server-side power. Gemma 3n stands out by going multimodal, CPU-friendly, and fully offline, all while keeping its size small enough to run on real-world devices. (source)

Xiaomi just launched a sleek pair of AI-powered smart glasses aimed straight at Meta’s Ray-Ban line - and on paper, they’re winning. Announced at the Human x Car x Home event in Beijing, these glasses bring better battery, smarter AI, and flashier features, all starting at just $279.

What’s New:

  • Built-in AI assistant + camera: 12MP ultra-wide lens lets you take photos, record POV videos, ID objects, and translate text - all hands-free.

  • Real-time translation + pay-by-glance: Integrated with Alipay. Scan QR codes, confirm purchases by voice.

  • 8.6-hour battery life: 2× longer than Ray-Ban Meta, powered by a 263mAh cell and Qualcomm’s Snapdragon AR1 platform.

  • Stereo audio + voice control: 5 mics, open-ear speakers, and voice commands baked in.

  • Electrochromic lenses: Color-shifting tint changes on tap in just 0.2s. Optional upgrade.

  • Lightweight + tuned fit: 40g frame with adjustable arms built for Asian facial features.

  • Splash-resistant (IP54): Ready for real-world use - rain, sweat, and daily wear.

  • Pricing: Starts at $279. Monochrome tint model is $375, color version hits $420.

  • Frame styles: Classic Black, Parrot Green, Translucent Tortoiseshell Brown.

More battery, more features, more style - Xiaomi’s AI Glasses might not have Meta’s brand heat, but on specs? They’re already ahead. (source)

Google Labs just dropped Doppl, an experimental app that turns any outfit photo into a virtual try-on. Snap it, upload it, and see how it looks on a digital, animated version of you - complete with AI-generated motion to show how the clothes feel in action.

What’s New:

  • Virtual try-on made easy: Upload outfit photos or screenshots - Doppl shows how it’d look on you.

  • AI-generated videos: See clothing in motion, not just still images.

  • Style from anywhere: Try looks you spot on friends, social, or in-store - just take a pic.

  • Save + share: Store your favorite looks or post them to get feedback.

  • Experimental by design: Powered by Google Shopping’s tech, now enhanced with new Labs features.

  • Available now: Free on iOS and Android (U.S. only for now).

It’s early, so expect some rough edges - but Doppl makes style exploration fun, fast, and kind of futuristic. (source)

Baidu just open-sourced ERNIE 4.5, a family of large-scale multimodal models built on a novel heterogeneous Mixture-of-Experts (MoE) design - with variants ranging from 0.3B to a massive 424B total parameters. It’s trained on both text and vision, optimized for real-world deployment, and hits state-of-the-art scores across language, reasoning, and multimodal benchmarks.

What’s New:

  • Model lineup: 10 variants, including dense (0.3B) and MoE models with 47B/3B active params. Largest has 424B total params.

  • Multimodal heterogeneous MoE: New architecture shares parameters across modalities but allows dedicated ones too - boosting cross-modal performance without sacrificing text quality.

  • State-of-the-art scores: Beats DeepSeek-V3-671B on 22 of 28 benchmarks. The 21B-A3B model outperforms Qwen3-30B with 30% fewer params.

  • Vision-Language models: ERNIE-4.5-VL supports both thinking and non-thinking modes - performs strongly on benchmarks like MathVista, MMMU, CV-Bench, RealWorldQA.

  • Efficient training stack: Built on PaddlePaddle with FP8, hybrid parallelism, MoE-aware load balancing, and 4/2-bit quantization support.

  • Deployment ready: Comes with ERNIEKit (for fine-tuning: SFT, LoRA, DPO, QAT, PTQ) and FastDeploy (multi-hardware, OpenAI/vLLM-compatible, speculative decoding, quantized inference).

  • PyTorch support: Models also available in PyTorch format for broader developer access.

  • Fully open-source: Released under Apache 2.0 License for research and commercial use.
    FastDeploy
    ERNIEKit

    (source)

Tencent just released Hunyuan-A13B, a fine-grained Mixture-of-Experts model with 80B total / 13B active parameters, rivaling o1 and DeepSeek in benchmark performance - but with much lower compute overhead. It's optimized for long-context tasks, fast/slow reasoning, and agentic tool use.

What’s New:

  • 80B → 13B MoE setup: Big-model quality with small-model efficiency.

  • Hybrid reasoning: Supports both fast and slow thinking modes.

  • Ultra-long context: Native 256K context window for stable long-text processing.

  • Agent optimized: Excels at tool-calling, with strong results on BFCL-v3, τ-Bench, and C3-Bench.

  • Efficient inference: Uses GQA, supports quantization (FP8, GPTQ-Int4).

  • Open-source datasets:
    ArtifactsBench - visual + interactive code evaluation
    C3-Bench - stress-tests agents, interpretability focus

Available now on Hugging Face | GitHub | API Address

Alibaba just unveiled Qwen VLo, a new multimodal model that doesn’t just perceive images - it edits and generates them with precision. Available now as a preview in Qwen Chat, you can prompt it to generate or edit visuals using simple natural language (in English or Chinese).

What’s New:

  • Understand + generate: Upload an image (“a cat”), ask for changes (“add a cap”), or start from scratch (“draw a cute cat”).

  • Progressive generation: Creates visuals step-by-step - smoother, sharper, and more controllable results.

  • Instruction-based editing: Works with creative commands like “make this look 19th century” or “add a sunny sky.” Supports object edits, background swaps, and even segmentation tasks.

  • High semantic accuracy: Maintains structure, detail, and realism in edits - no weird distortions or misreads.

  • Multilingual support: Use Chinese, English, or both - it understands and responds fluently.

From photo touch-ups to artistic remixes, Qwen VLo turns ideas into polished visuals - and it’s just getting started. (source)

Cloudflare just launched Pay per Crawl, a new marketplace (now in private beta) that lets website owners charge AI bots for scraping content - or block them entirely. It’s an early attempt to give publishers real control and compensation in the AI era.

What’s New:

  • Micropayments for crawls: Site owners can set per-crawl rates for AI bots or block them outright.

  • Transparency tools: See which AI crawlers are accessing your site, and why (training, search, etc.).

  • Default block for new sites: All new Cloudflare domains block AI crawlers unless explicitly allowed.

  • Backed by major publishers: TIME, AP, The Atlantic, Conde Nast, and others have signed on.

  • Built for the agentic web: Envisions a future where AI agents pay on your behalf to access premium web content - programmatically and at scale.

  • No crypto (yet): Runs on fiat payments for now, but Cloudflare is exploring its own stablecoin for future transactions.

  • Cloudflare says bots are scraping far more than they give back:

    • Google: 14 scrapes per referral

    • OpenAI: 1,700 per referral

    • Anthropic: 73,000 per referral

Pay per Crawl may not solve everything, but it could reshape how publishers monetize in an AI-dominated web. (source)

There are now dozens of voice generation models across open-source and commercial ecosystems. From real-time multilingual speech to expressive voice cloning, tools like Bark, Tortoise, and OpenVoice are pushing boundaries. ElevenLabs just dropped Voice Design v3, letting you generate custom AI voices from pure text prompts - with control over tone, age, accent, pacing, and delivery. It’s live now on ElevenLabs, free to start.

What’s New:

  • Prompt-to-voice in seconds: Describe any voice - “a calm, husky warrior with a Japanese accent” - and hear it instantly.

  • Full control: Adjust prosody, emotion, tone, pacing, gender, and delivery style.

  • Global coverage: Supports 70+ languages with hundreds of localized accents.

  • Production-ready audio: High-quality output, compatible with v3 expressive tags.

  • API (coming soon): Endpoints for voice preview + saving to your library.

  • Voice Prompting Guide: Tips, samples, and best practices to fine-tune your results.

  • Vibe Coding with Replit: Just getting started with code? We made a free course that walks you through real projects right inside Replit. No setup, no stress. It’s super beginner friendly and perfect if you want to dip your toes into building with AI. Try it here

  • Analyzing Data with Power BI: If you’ve been meaning to learn Power BI but didn’t know where to start, this one’s for you. We’ll show you how to clean, explore, and visualize data step by step. No prior experience needed. Check it out

This week, AI stopped being just code and started becoming something you can wear, hear, and feel.

Whether it’s Xiaomi’s glasses translating the world in real-time, Gemma 3n turning your phone into a frontier-class model hub, or Doppl helping you try on outfits from a screenshot, the message is clear. AI is stepping out of the browser and into your life, quietly embedding itself in the things you use, wear, and touch. It’s no longer just a backend tool, it's a front-row feature.

Until next time 👋

Reply

or to participate.