Sakana AI launches Fugu and Fugu Ultra
TECH

Sakana AI launches Fugu and Fugu Ultra

47+
Signals

Strategic Overview

  • 01.
    On June 22, 2026, Tokyo-based Sakana AI launched Fugu and Fugu Ultra, a multi-agent orchestration system delivered as a single model that routes tasks across a swappable pool of frontier LLMs through one OpenAI-compatible API.
  • 02.
    Fugu is itself an LLM trained to call other models recursively, assigning Thinker, Worker, and Verifier roles to plan, execute, and validate work before returning a single answer.
  • 03.
    Sakana reports Fugu Ultra matching or beating Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro on coding and agentic benchmarks, though every figure is self-reported and not yet independently verified.
  • 04.
    Fugu ships in two tiers with subscription and pay-as-you-go pricing but is unavailable in the EU/EEA at launch while Sakana works toward GDPR compliance.

An LLM That Hires Other LLMs

Fugu inverts the dominant scaling playbook. Rather than train one ever-larger model, Sakana built a comparatively small router LLM that calls other LLMs, including instances of itself recursively, and coordinates them behind a single OpenAI-compatible endpoint [1]. When a request arrives, Fugu doesn't necessarily answer it directly; it decides whether to solve the task itself or assemble a team of expert models, then delegates, verifies, and synthesizes a single reply. The orchestration leans on three adaptive roles: a Thinker that plans, a Worker that executes coding and applied tasks, and a Verifier that checks outputs for errors, hallucinations, and requirement compliance [1].

MarkTechPost describes the result as an orchestration model that routes tasks across a swappable pool of frontier LLMs [6]. Two variants ship through one API: a low-latency base Fugu for everyday work and Fugu Ultra, which marshals a deeper pool of agents for hard, multi-step problems like research, cybersecurity, and patent search. The architecture is grounded in two ICLR 2026 papers, TRINITY and Conductor [1]. The practical upshot for a developer is that the complexity of a multi-agent system never reaches their code; it looks, and bills, like a single model.

The Export-Control Loophole

The timing is the tell. Sakana wrapped the launch in the language of AI sovereignty, and the pitch lands because of recent events: after export controls restricted access to Anthropic's Fable and Mythos models, Sakana argues that relying on a single vendor's API for critical infrastructure has become a material vulnerability rather than a hypothetical risk [4]. Because Fugu's agent pool is swappable, the company frames it as a hedge against sudden supply-chain disruption, where the orchestrator simply reroutes to whatever models remain available if a provider is cut off [4].

Underneath the messaging sits a sharper strategic logic. Japan lacks the compute scale of the United States and the data scale of China, so Sakana is making an asymmetric bet: a roughly 7-billion-parameter foreman that accesses the world's best models rather than spending billions to train a frontier model of its own [5]. That is how a Tokyo lab can claim Fable-and-Mythos-class results without ever training a frontier model itself [2]. Whether sovereignty built on top of other companies' closed APIs is a genuine hedge or a relabeled dependency is exactly where the skeptics dig in.

By The Numbers: Strong Scores, Big Asterisk

By The Numbers: Strong Scores, Big Asterisk
Fugu Ultra's self-reported scores versus leading frontier models on three coding and agentic benchmarks.

By Sakana's own benchmarks, Fugu Ultra doesn't merely keep pace with the frontier, it edges ahead on the hardest coding and agentic tests. On SWE-Bench Pro it posts 73.7 against Claude Opus 4.8's 69.2 and GPT-5.5's 58.6; on TerminalBench 2.1 it leads at 82.1; and on LiveCodeBench it tops the field at 93.2 [2]. It also reports 50.0 on Humanity's Last Exam and 95.5 on GPQA-Diamond [1].

The catch sits in plain sight: every one of these figures is self-reported, no independent evaluation has landed yet, and Sakana has not disclosed the ratio of open to closed models behind the scores [4]. There is a subtler measurement problem too. Because Fugu is orchestrating models like Opus, GPT, and Gemini under the hood, a benchmark partly credits those underlying models plus the router rather than a capability Sakana trained from scratch. The numbers are real and impressive; the open question is what exactly they are crediting.

What The Skeptics Are Hammering

The sharpest pushback isn't that Fugu is fake, it's that it may simply relocate the lock-in. Critics argue Fugu trades single-vendor dependency for orchestrator dependency: the orchestrator is closed, the routing is hidden, and a user cannot see or control which model answered a given query or how much it cost [4]. Hands-on testing cooled the launch hype, too. One widely cited researcher reported the system was incredibly slow and that results fell short of Fable in practice [3].

That skepticism dominated developer reaction. Across developer YouTube and Reddit, the recurring frame was that Fugu is an orchestrator, not a model, with testers flagging steep latency, opaque token consumption, and base-plan usage limits that one user exhausted on a single prompt. The most interesting counter, tellingly, came from inside the skeptical camp: several practitioners argued the lazy dismissal misses the real story, because reaching next-generation performance through a better harness rather than a bigger model is itself the result worth paying attention to. That tension, more than the benchmark table, is what makes Fugu worth watching.

Historical Context

2017
Co-authored 'Attention Is All You Need,' introducing the Transformer architecture that underpins modern LLMs, years before co-founding Sakana AI.
2023
Founded in Tokyo by David Ha and Llion Jones to build nature-inspired AI from combinations of smaller models rather than single monolithic ones.
2024
Raised a major Series A and became Japan's first AI unicorn, later valued at over $2.5 billion.
2026-06-22
Launched Fugu and Fugu Ultra, its orchestration models built on two ICLR 2026 papers, TRINITY and Conductor.

Power Map

Key Players
Subject

Sakana AI launches Fugu and Fugu Ultra

SA

Sakana AI

Tokyo-based developer of Fugu, valued at over $2.5 billion; champions an asymmetric strategy of a small router model that accesses the world's best models instead of training a frontier model, making orchestration its core bet.

LL

Llion Jones & David Ha

Sakana co-founders and ex-Google researchers; Jones co-authored the 2017 Transformer paper, lending the orchestration approach unusual research credibility.

AN

Anthropic

Maker of Fable 5 and Mythos, which serve as both the benchmark target Fugu claims to match and the concrete example of export-control risk that Fugu's swappable pool is pitched to route around.

EN

Enterprises and Japanese institutions

Target customers seeking AI sovereignty and reduced single-vendor dependency; nearly 500 early beta testers focused on multi-step workflows.

Fact Check

6 cited
  1. [1] Sakana Fugu: Beyond Bigger Models
  2. [2] No Claude Fable 5, no problem: Sakana achieves frontier performance with new Fugu multi-model auto-synthesis system
  3. [3] Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks
  4. [4] Mitigating vendor lock-in: Sakana AI's Fugu multi-agent models
  5. [5] Japan's AI dark horse emerges: how a 7B small model challenges the frontier
  6. [6] Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Source Articles

Top 5

THE SIGNAL.

Analysts

"Reported that hands-on testing was incredibly slow and that results, while acceptable, fell short of Anthropic's Fable in practice."

Ethan Mollick
AI researcher, Wharton

"Frames Fugu as a hedge against sudden supply-chain disruption via its swappable agent pool, while cautioning that the platform's resilience depends entirely on which models remain available in that pool."

Ryan Daws
Author, AI News

"The sharpest criticism is that Fugu is a closed orchestrator partly leaning on closed model APIs, trading single-vendor lock-in for orchestrator lock-in, with benchmark scores that remain unverified self-reports and an undisclosed open-to-closed model ratio."

Developer community (aggregated)
Early reviewers and commentators
The Crowd

"Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our 'Fugu Ultra' model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: https://t.co/aDEFyySWlS 🐡"

@@SakanaAILabs37888

"Guess which is Fugu Ultra? This is how recent models compare when generating endless procedural terrain (using Three.js). All of these are one-shotted! Just wild! Trying a few more examples. Will share soon!"

@@omarsar0267

"Japan just dropped a Mythos class model. Sakana ai, Tokyo based nature inspired AI, Recently, they launched Fugu Ultra, an orchestration model, And apparently, many people are already calling Fugu a potential competitor to Mythos, But unfortunately it's not available in EU."

@@0x_sakata78

"Sakana in Japan just dropped a mythos competitor and it looks great"

@u/thomas_unise429
Broadcast
I Battle Tested Sakana Fugu's Fable Killer

I Battle Tested Sakana Fugu's Fable Killer

Sakana Fugu Ultra BEATS Fable 5 & GPT-5.5? (Fully Tested)

Sakana Fugu Ultra BEATS Fable 5 & GPT-5.5? (Fully Tested)

Fugu Ultra: A Model That Beats Mythos and Fable? This Can't Be True...

Fugu Ultra: A Model That Beats Mythos and Fable? This Can't Be True...