TECH

Google Ships Free On-Device AI Apps for iPhone via Gemma Models

30+

Signals

Strategic Overview

01.
Google quietly released AI Edge Eloquent, a free offline-first dictation app for iOS that transcribes speech in real time using on-device Gemma models, with no subscription and no usage caps.
02.
The companion AI Edge Gallery app lets users download and run open-source LLMs including Gemma 4 entirely on-device for private AI chat, image queries, and transcription on both Android and iOS.
03.
Gemma 4, released April 2, 2026 under Apache 2.0, powers both apps with four model sizes including an E2B variant that runs in under 1.5GB of memory, making on-device generative AI practical on smartphones.
04.
Google also introduced FunctionGemma, a 270M-parameter model enabling on-device function calling at 1,916 tokens/second prefill, opening the door to autonomous mobile AI agents that work without any server connection.

Google's Trojan Horse: Planting an AI Ecosystem Inside Apple's Walled Garden

Google releasing two AI-powered apps on the iOS App Store is not merely a product launch -- it is a strategic land grab inside Apple's ecosystem. While Apple and Google signed a deal in January 2026 for Gemini to power Apple Intelligence features, Google is simultaneously building a parallel AI stack that runs entirely outside Apple's control. AI Edge Gallery lets iPhone users download and interact with open-source Gemma models directly, bypassing Apple's own on-device intelligence layer entirely. This is Google establishing developer and user mindshare for its AI models on a competitor's hardware.

The move is especially significant because it arrives with zero marketing fanfare. There was no keynote, no press release, no blog post for Eloquent specifically. Google appears to be testing whether a genuinely superior free product can achieve organic adoption on iOS. If Eloquent gains traction as the default dictation tool for iPhone power users, Google will have inserted its AI models into millions of Apple devices -- creating a distribution channel for Gemma that Apple cannot easily shut down without appearing anti-competitive. The quiet launch also sidesteps the kind of scrutiny that a splashy announcement would invite from regulators already watching Big Tech AI moves closely.

The $15/Month Dictation App Is Dead: How Free Offline AI Rewrites the Economics

Eloquent's business model -- or rather, its deliberate lack of one -- poses an existential threat to the subscription dictation market. Competing products like Wispr Flow and Willow charge approximately $15 per month, while SuperWhisper costs $85 per year. These apps rely on cloud processing, meaning they require internet connectivity and send voice recordings to external servers. Eloquent does everything these apps do, adds automatic filler word removal and text polishing, charges nothing, has no usage caps, and keeps all data on-device. The value proposition gap is not incremental; it is categorical.

This follows a familiar Google pattern: subsidize a high-quality free product to establish platform dominance, then monetize adjacent services. Eloquent includes an optional cloud mode powered by Gemini models for enhanced text polishing, which likely serves as the on-ramp to Google's broader paid AI ecosystem. For the dictation app market specifically, the implication is stark. Companies charging subscription fees for capabilities that a free, offline alternative now matches will need to either dramatically differentiate on features Google cannot easily replicate, or accept that the standalone dictation app category is being commoditized. The broader signal is that any AI application layer that merely wraps a model capability -- without unique data, workflow integration, or network effects -- is vulnerable to being zeroed out by foundation model providers releasing free reference apps.

FunctionGemma and the Birth of Truly Autonomous Mobile Agents

While Eloquent grabbed headlines, the more consequential technical development may be FunctionGemma -- a 270-million-parameter model purpose-built for on-device function calling. Running at 1,916 tokens per second prefill and 142 tokens per second decode on a Pixel 7 Pro, this model can parse user intent and trigger real device actions without any server communication. Google demonstrated this through 'Mobile Actions' and 'Tiny Garden' features in AI Edge Gallery, but the implications extend far beyond demos.

On-device function calling solves two problems that have kept mobile AI agents theoretical. First, latency: cloud round-trips introduce delays that make real-time agentic interactions feel sluggish, whereas FunctionGemma responds locally in milliseconds. Second, reliability: an agent that depends on connectivity fails in elevators, airplanes, tunnels, and rural areas -- precisely the contexts where hands-free AI assistance is most valuable. As Google product manager Alice Zheng stated, shifting tool-use on-device allows developers to build interactions that respond instantly while remaining fully functional regardless of connectivity. Combined with Gemma 4 E2B's ability to run in under 1.5GB of memory, the hardware floor for running agentic AI is now a mid-range smartphone from 2022. This positions Google to define the on-device agent runtime before Apple, Meta, or anyone else establishes an alternative standard.

Gemma 4 by the Numbers: Why On-Device AI Crossed the Practicality Threshold

The Gemma model family has been downloaded over 400 million times with more than 100,000 community variants, but Gemma 4 represents a step change that makes consumer on-device deployment genuinely viable. The E2B model (2.3 billion effective parameters) runs in under 1.5GB of memory with 2-bit and 4-bit quantization, can process 4,000 tokens across two skills in under 3 seconds, and supports a 128K context window. The larger 31B dense model scores 85.2% on MMLU Pro -- a 26% improvement over Gemma 3's 67.6% -- and 89.2% on AIME 2026, placing it in territory previously reserved for models requiring data center hardware.

The collaboration with chip partners underscores that this is not just a model release but an infrastructure play. Qualcomm optimized Gemma 4 E2B and E4B for Dragonwing IQ8 NPU support, while MediaTek is working on mobile chipset optimization and NVIDIA is accelerating deployment through RTX AI Garage. All four Gemma 4 variants ship under Apache 2.0, meaning any device manufacturer or app developer can integrate them without licensing fees. The combination of dramatically reduced memory requirements, hardware-specific optimization from major chip vendors, and permissive licensing removes the three biggest barriers that previously kept generative AI tethered to the cloud. For developers, the practical takeaway is that building offline-first AI features is no longer a compromise -- it is now a competitive advantage.

Historical Context

2024-02

Released original Gemma models, launching the open-weight on-device AI model family.

2025-03

Released Gemma 3 with 27B parameters, achieving 67.6% on MMLU Pro.

2025-06

Released Gemma 3n with Per-Layer Embeddings and audio support, pushing toward multimodal on-device capabilities.

2026-01

Apple and Google announced a deal for Gemini AI to power Apple Intelligence features including an upgraded Siri.

2026-04-02

Released Gemma 4 family with four model sizes under Apache 2.0, scoring 85.2% on MMLU Pro versus Gemma 3 67.6%.

2026-04-06

Quietly launched AI Edge Eloquent on iOS and updated AI Edge Gallery with Gemma 4 support, with no formal press release.

Power Map

Key Players

Subject

Google Ships Free On-Device AI Apps for iPhone via Gemma Models

Google DeepMind

Developer of the Gemma 4 open-weight model family and the AI Edge platform powering both Eloquent and Gallery apps.

Apple

Platform host for both apps via the iOS App Store and simultaneous competitor through Apple Dictation and Apple Intelligence; separately partnered with Google for Gemini-powered Siri features.

Wispr Flow / Willow

Subscription-based dictation apps charging approximately $15/month with cloud-dependent processing, directly threatened by Eloquent free offline alternative.

Qualcomm

Hardware partner collaborating on Gemma 4 E2B/E4B optimization for Dragonwing IQ8 NPU support on mobile devices.

NVIDIA

Accelerating Gemma 4 deployment for local agentic AI through RTX AI Garage and edge computing support.

THE SIGNAL.

Analysts

"In our tests with pre-release checkpoints we have been impressed by their capabilities, to the extent that we struggled to find good fine-tuning examples because they are _so good_ out of the box."

Hugging Face Team

AI Model Platform

"Shifting tool-use on-device allows developers to build interactions that respond instantly while remaining fully functional regardless of connectivity."

Alice Zheng

Product Manager, Google

"Eloquent fundamentally challenges the subscription model of competing dictation apps and positions voice interfaces as finally practical for enterprise use, given that recordings never leave the device."

The Next Web Analysis

Technology Publication

"Highlighted that Google now has an official app to run Gemma 4 on phones that is 100% open source, fully offline and private, and multimodal with text, audio, and image support. The post garnered over 5,300 likes."

Paul Couvert (@itsPaulAi)

AI Content Creator

The Crowd

"Friendly reminder that Google has an official app to run Gemma 4 on your phone. 100% open source, Fully offline and private, Multimodal with text/audio/image, Works with Gemma E4B and E2B. And the app is available on both iOS and Android."

@@itsPaulAi5300

"Google just released AI Edge Eloquent, a free offline-first subscription-less dictation app that cleans speech into usable text on the phone. Standard speech-to-text records every stumble, but Gemma-based ASR tries to recover intent, then removes filler words."

@@rohanpaul_ai2600

"Run Gemma 4 AI Offline on Your Phone – Quick Guide. Google just released AI Edge Gallery: a free app that runs open-source Gemma 4 models (E2B and E4B) 100% on your iPhone or Android. Private (nothing leaves your phone), Fully offline."

@@BrianRoemmele1100

Broadcast

Gemma 4 Is INCREDIBLE! Google's Open Model IS POWERFUL! (Fully Tested)

Gemma 4 on Your Phone?! Google AI Edge Gallery (Offline AI, Gemma Chat, Ollama Guide)

Google's Gemma 4 Runs Offline on iPhone — Even Google Can't See What You Ask It