• AI Emergence
  • Posts
  • Could Apple's ReALM revolutionize how we use Siri?

Could Apple's ReALM revolutionize how we use Siri?

Along with: Stable Audio 2.0 Turns Your Text Into Chart-Toppers!

Hey there, 

Do you think we use the word ‘Delve’ a lot in our newsletters?

In his post on X, Jeremy Nguyen shared an image of the word “delve” (one of ChatGPT’s favorite words) being used in papers on PubMed. It’s an exponential jump.

What are your thoughts on finding originality amongst the flooded AI content?

(P.S. - If you find the word ‘Delve in our newsletter - we do take support of ChatGPT and Gemini to create the content. 🙂)

What would be the format? Every week, we will break the newsletter into the following sections:

  • The Input - All about recent developments in AI

  • The Tools - Interesting finds and launches

  • The Algorithm - Resources for learning

  • The Output - Our reflection 

  • Question to ponder before we meet next!

Table of Contents

As revealed in a recent paper, Apple has launched ReALM, an AI model that outperforms OpenAI's GPT-4. ReALM excels in understanding various contexts, allowing it to provide precise responses to queries related to on-screen content or background activities. 

This development reflects Apple's ongoing commitment to AI, evidenced by its latest publications and plans to incorporate advanced AI into iOS 18 and macOS 15. 

Unlike GPT-4, which handles text and images, ReALM enhances interactions by analyzing on-device data to resolve references related to the conversation, screen displays, or background happenings. This innovation aims to transform Siri into a more context-aware and efficient virtual assistant. (source)

Stability AI introduces Stable Audio 2.0, a groundbreaking model that creates high-quality, full tracks up to three minutes long from text prompts, offering 44.1 kHz stereo sound.

This version extends its predecessor's capabilities with audio-to-audio transformation, allowing users to modify uploaded samples into various sounds and effects.

Highlighting a significant leap in AI-generated music, Stable Audio 2.0 features sound effect generation and style transfer, empowering artists with enhanced creativity and control. Also Available for free on the Stable Audio website. (source)

Grok 1.5 is going to be released soon! 

It introduces a significant enhancement in long-context understanding, now capable of processing up to 128K tokens within its context window, which is a 16-fold increase in memory capacity compared to its predecessors allowing it to handle longer documents and more complex prompts effectively.

Grok-1.5 is available only for early testers right now. (source)

Shortly after unveiling Grok 1.5, Elon Musk announced on X that Grok 2 is on the horizon, poised to outperform leading AI models such as OpenAI's GPT-4, Meta's Llama, and Anthropic's Claude 3 across all metrics, with a rollout to X users scheduled for next week. (source)

OpenAI has revised the governance of its venture capital fund. CEO Sam Altman is no longer in direct control of the fund, addressing concerns raised about its prior structure. 

Control has been transferred to Ian Hathaway, a partner at the fund since 2021, who has overseen its accelerator program and led investments in startups like Harvey, Cursor, and Ambience Healthcare. 

The OpenAI Startup Fund initially backed with $175 million and now holding $325 million,  invests in early-stage AI companies with funding support from partners like Microsoft. (source)

The US House of Representatives has banned the use of Microsoft's Copilot generative AI assistant by congressional staff, citing cybersecurity concerns.

The Axios report reveals that due to the Office of Cybersecurity's findings, which identified a risk of leaking House data to unauthorized cloud services, the use of Microsoft's Copilot has been banned.

In response to these heightened security needs, a Microsoft spokesperson highlighted plans to introduce a range of AI tools, including Copilot, designed to meet the federal government's stringent security and compliance standards later in the year. (source)

Replit is enhancing its developer environment by integrating AI tools, particularly focusing on a Replit-native model for code repair.

This initiative, motivated by the aim to significantly reduce the time developers spend on debugging, leverages the Language Server Protocol (LSP) diagnostics to identify errors in code.

The model synthesizes fixes from a dataset of code and diagnostic pairs, undergoing supervised finetuning to accurately predict corrections for coding errors.

This model is proving competitive with larger models, highlighting its efficiency boost in coding. It's a stride towards embedding AI deeply into development tools, with plans to broaden its capabilities and language support. (source)

Google DeepMind's AI, SAFE, outperforms human fact-checkers by verifying information accuracy through Google Search, proving both more accurate and cost-effective.

While achieving a 72% match rate with human evaluations, it correctly adjudicated disagreements 76% of the time.

Critics, however, question the "superhuman" label, calling for comparisons with expert fact-checkers instead of crowd workers.

SAFE offers a scalable, economical solution for fact-checking in the age of extensive AI-generated content, but experts emphasize the need for transparent benchmarks against professional standards to truly assess its efficacy. (source)

OpenAI has developed a new text-to-speech AI model called Voice Engine, capable of creating synthetic voices from just a 15-second audio sample!

It can accurately clone voices for various uses, such as providing reading assistance, enabling creators to reach global audiences while maintaining native accents, and helping individuals recover their voices after speech-impairing conditions.

OpenAI has opted not to broadly release its Voice Engine technology, due to ethical concerns surrounding potential misuse, such as voice cloning for scams, election interference, and unauthorized access to voice-protected accounts. (source)

Microsoft and OpenAI are planning a $100 billion Stargate project in the U.S. to build an AI supercomputer data center, expected to take 5-6 years.

This venture, far more costly than Microsoft's past infrastructure investments, may require significant energy, potentially sourced from nuclear power, raising environmental and safety considerations.

The project highlights the evolving generative AI market and underscores the urgent need for enhanced AI-related policies and legislation, amidst a competitive landscape with other tech giants advancing their AI technologies. (source)

Databricks introduces DBRX, a groundbreaking open large language model (LLM) that sets new performance benchmarks across a variety of metrics, outpacing GPT-3.5 and rivaling Gemini 1.0 Pro.

With its fine-grained mixture-of-experts architecture, DBRX excels in coding and general-purpose tasks, offering significant improvements in training efficiency and inference speed.

Available for Databricks customers through APIs, the model is also open-sourced, providing a robust foundation for future AI innovations and applications. (source)

Do you have a skill on a certain topic that you believe you can teach to others? How about launching the course within 5 minutes? Made possible by EverLearns.

Problem Statement - Create a course on “GenAI tools for marketers” in 5 minutes

Solution - 

  1. Visit https://everlearns.com/

  2. Sign up

  3. Click on ‘click on ‘Create your first course’

  4. Add course details: add a few details such as the course topic, target audience, learning outcomes, and information about the course you want to generate.

  5. Add course references if any

  6. Your course outline is ready!

  7. You can then explore the detailed curriculum

  8. Your Text-Based Course is ready!

Here is the video you can refer to for the output - Click Here

For more information about the tool, check out this blog.

  • The video "What's next for AI agentic workflows" showcases Andrew Ng discussing the future of AI agentic workflows at Sequoia Capital's AI Ascent. He delves into how these workflows could drive AI progress, potentially outstripping the advancements brought by the next wave of foundational models.

  • In a recent Leading with Data episode, I had an enlightening conversation with Dr. Anand Rao, Professor of Applied Data Science and AI at Carnegie Mellon University and former Global AI Lead at PWC, about AI's evolution, his journey from finding joy in nature's call to academia, and his AI funding venture, Golden Sparrow.

  • A good read by Anthropic which has identified a "many-shot jailbreaking" method that can circumvent safety features in AI models by exploiting large context windows. This vulnerability highlights the double-edged sword of advanced AI capabilities, posing risks as models grow more powerful. 

  • In continuation of the previous thread, Andrew Ng launched a new course on Read teaming LLMs. If you are building/launching your LLMs, this is a good course to get an overview of Read Teaming.

  • Ethan Mollick’s book Co-intelligence was released today. I have got my copy delivered and am looking forward to reading it. Have you ordered it?

I feel that we are now in a space where recommending a single large model is becoming increasingly difficult. Till a few months back - it was GPT-4 all the way.

I feel Claude is the most self-aware and the best for having conversations. Gemini is great with context length, and Replit Code Repair is showing promise for coding.

All of this could probably change as soon as “OpenAI releases a new big model this Summer” 

Which of these models do you use currently?

Login or Subscribe to participate in polls.

How do you rate this issue of AI Emergence?

Would love to hear your thoughts

Login or Subscribe to participate in polls.

Reply

or to participate.