TECH

Google Releases Gemma 4 Open AI Models Under Apache 2.0

45+

Signals

Strategic Overview

01.
Google DeepMind released Gemma 4 on April 2, 2026 — a family of four open-weight AI models built on Gemini 3 research, available under the permissive Apache 2.0 license for the first time in the Gemma series.
02.
The lineup includes four variants — E2B and E4B edge models, a 26B Mixture-of-Experts model activating only 3.8B parameters, and a 31B Dense model — supporting up to 256K token context windows, native vision, audio, and 140+ languages.
03.
Gemma 4 models rank among the top open models on the LMSYS Arena AI leaderboard, with the 31B Dense at #3 and the 26B MoE at #6, while scoring 89.2% on AIME math benchmarks compared to Qwen 3.5-27B's 48.7%.
04.
All four models feature native function calling trained from the ground up for multi-turn agentic workflows, and can run locally on devices ranging from Android phones and Raspberry Pi to NVIDIA RTX GPUs.

Why This Matters

Gemma 4 represents a strategic inflection point for Google's open-source AI strategy. The shift to Apache 2.0 licensing — dropping all custom clauses and restrictive terms that characterized previous Gemma releases — signals that Google is now willing to compete on fully open terms with Chinese open-weight competitors like DeepSeek, Alibaba's Qwen, Moonshot AI, and Z.AI. This is not merely a licensing technicality: Apache 2.0 removes friction for enterprise adoption, allows derivative works without restriction, and positions Gemma 4 as a genuine open-source offering rather than an "open-weight" model with strings attached.

The timing is equally significant. With over 400 million downloads and 100,000+ community variants across the Gemma series, Google has already built substantial developer ecosystem momentum. By releasing models that rank #3 and #6 on the Arena AI leaderboard — and that dramatically outperform competitors on math benchmarks (89.2% vs Qwen 3.5-27B's 48.7% on AIME) — Google is making a case that open models from Western labs can match or exceed the Chinese open-weight models that have dominated recent discourse. As Holger Mueller of Constellation Research noted, this is about building an ecosystem of AI developers across diverse device form factors, not just winning benchmarks.

How It Works

Gemma 4 ships in four variants designed to cover the full compute spectrum. At the high end, the 31B Dense model provides maximum quality with 256K token context windows, native vision and audio processing, and support for 140+ languages. The 26B Mixture-of-Experts (MoE) model is the efficiency standout: it activates only 3.8 billion parameters per inference pass while retaining 97% of the dense model's quality at roughly one-eighth the compute cost. This MoE design makes it practical for deployment scenarios where throughput and cost matter more than squeezing out the last percentage point of quality.

At the edge, the E2B and E4B models target on-device deployment with 128K context windows. The E4B model notably includes a native audio encoder compressed to just 305 million parameters, enabling voice-based applications directly on mobile hardware. All four models feature native function calling — not bolted on as a fine-tuning afterthought, but trained from the ground up for multi-turn agentic workflows. According to researcher Sebastian Raschka, the performance leap likely stems from training recipe and data improvements rather than radical architectural changes, suggesting Google DeepMind found ways to extract more intelligence per parameter through better training methodology.

By The Numbers

The benchmark numbers paint a picture of models punching well above their parameter weight class. The 31B Dense model sits at #3 on the Arena AI text leaderboard, while the 26B MoE holds #6 — both remarkable positions for models small enough to run on consumer hardware. On the AIME math benchmark, Gemma 4 scores 89.2% compared to Qwen 3.5-27B's 48.7%, nearly doubling the competition. The Codeforces ELO rating jumped from 110 to 2,150, and MMLU Pro hits 85.2%, all suggesting substantial improvements in reasoning and coding ability.

The efficiency metrics are equally compelling. The 26B MoE model delivers 97% of the dense model's quality while using 8x less compute through its 3.8B active parameter design. For on-device deployment, the E4B model runs in just 5GB of RAM at 4-bit quantization, making it viable on mainstream smartphones. The 31B dense model fits on a 24GB GPU at 4-bit quantization, meaning a single NVIDIA RTX 4090 can host it. Google also reports that Gemini Nano 4, the closely related on-device variant for Android, runs 4x faster and uses 60% less battery than its predecessor. Day-one support spans 12+ inference frameworks, ensuring broad compatibility from launch.

Impacts and What's Next

The immediate impact of Gemma 4 is felt across three dimensions: developer accessibility, enterprise adoption, and the competitive landscape. On the developer side, availability through Hugging Face, Kaggle, Ollama, and 12+ inference frameworks means the models are immediately usable across nearly every major deployment pathway. The Apache 2.0 license removes the legal ambiguity that kept some enterprises and startups from building on previous Gemma versions, potentially accelerating commercial adoption.

For the broader competitive landscape, Gemma 4 reasserts Google's position in the open-weight race against Chinese competitors. The partnership network — NVIDIA for GPU optimization, Qualcomm and MediaTek for mobile deployment — creates a hardware ecosystem advantage that purely model-focused Chinese competitors may struggle to match. The Hacker News community response, with 1,572 points and 423 comments, showed mixed-to-positive sentiment focused on practical deployment concerns like quantization strategies and local inference performance, suggesting developers are already evaluating Gemma 4 for real workloads rather than treating it as a benchmarking curiosity. On X, sentiment was overwhelmingly positive, with Google's announcement tweet receiving 17,000 likes and commentary highlighting the Apache 2.0 shift as a watershed moment for open AI.

The Bigger Picture

Gemma 4 arrives at a moment when the open-weight AI model landscape is rapidly bifurcating along geopolitical lines. Chinese labs — DeepSeek, Alibaba's Qwen team, Moonshot AI, and Z.AI — have set an aggressive pace in releasing capable open models, often under permissive licenses. Google's decision to match that permissiveness with Apache 2.0 suggests the company views licensing openness as a competitive necessity, not a strategic risk. The message is clear: if developers are choosing between similarly capable models, the one with fewer legal restrictions wins.

The emphasis on on-device deployment is equally telling about where Google sees the AI market heading. By optimizing Gemma 4 to run on Android phones, Raspberry Pi, NVIDIA Jetson Orin Nano, and consumer RTX GPUs, Google is betting that the next wave of AI adoption will be defined not by cloud API calls but by local inference. Native function calling across all four variants, combined with native audio and vision capabilities, positions these models specifically for agentic applications — AI systems that can take actions, call tools, and process real-world inputs without round-tripping to the cloud. If this bet pays off, Gemma 4 could become the default foundation for a generation of edge AI applications, from on-device assistants to autonomous embedded systems.

Historical Context

2024-02-21

Google released the original Gemma models, marking its first foray into open-weight language models for the developer community.

2025-03-01

Google released Gemma 3 with 128K context window support, but under a proprietary license with custom restrictions on usage.

2026-04-02

Gemma 4 launched with four model variants, 256K context windows, native multimodal support, and a landmark shift to Apache 2.0 licensing.

Power Map

Key Players

Subject

Google Releases Gemma 4 Open AI Models Under Apache 2.0

Google DeepMind

Developer of the Gemma 4 model family, competing directly with Chinese open-weight models from DeepSeek, Alibaba, and others

NVIDIA

Hardware partner providing optimized deployment on RTX GPUs, DGX Spark, and Jetson platforms with day-one support

Qualcomm and MediaTek

Mobile chipset partners enabling on-device Gemma 4 deployment on Android phones and other mobile hardware

Chinese AI Competitors (DeepSeek, Alibaba/Qwen, Moonshot AI, Z.AI)

Key open-weight competitors whose rapid advances in model quality motivated the timing and ambition of the Gemma 4 release

Hugging Face, Kaggle, and Ollama

Distribution platforms providing immediate access to Gemma 4 model weights for the developer community

THE SIGNAL.

Analysts

"Google is building its lead in AI, not only by pushing Gemini, but also open models with the Gemma 4 family. These are important for building an ecosystem of AI developers, and will help the company to tap into functional and vertical use cases on different device form factors."

Holger Mueller

Analyst, Constellation Research

"The leap in Gemma 4 performance is likely attributable to training recipe and data improvements rather than radical architectural changes."

Sebastian Raschka

AI Researcher

"The team managed to squeeze out more intelligence per parameter, allowing the Gemma 4 models to punch significantly above their weight class."

Clement Farabet and Olivier Lacombe

Researchers, Google DeepMind

The Crowd

"We just released Gemma — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially permissive Apache 2.0 license."

@@Google17000

"Meet Gemma: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we're releasing them under an Apache 2.0 license."

@@GoogleDeepMind7400

"Here we go: Gamma released: Outperforms models 20x its size. Google dropped Gemma under Apache 2.0, full open-source, big licensing shift. Built on Gemini 3 tech, four sizes: E2B, E4B, 26B MoE, 31B Dense. Price-performance: 31B is #3 open model on Arena AI, 26B MoE is #6."

@@kimmonismus461

Broadcast