TECH

Embodied AI foundation models for humanoid robots

28+

Signals

Strategic Overview

01.
Wayve closed a $1.2B Series D at an $8.6B post-money valuation on February 25, 2026, and launched Wayve Labs as a frontier research unit explicitly dedicated to embodied AI beyond autonomous driving.
02.
On April 9, 2026, AGIBOT released GO-2 (Genie Operator-2), a unified vision-language-latent-action (ViLLA) foundation model that fuses perception, balance, and motion planning into a single end-to-end network for whole-body humanoid control.
03.
At its April 17, 2026 Partner Conference in Shanghai, AGIBOT codified a 'One Robotic Body, Three Intelligences' framework spanning Locomotion, Manipulation and Interactive Intelligence, built on top of a stack of embodied foundation models.
04.
London, Shanghai, and San Francisco-anchored labs are converging on the same architectural bet: replace hand-engineered modular pipelines with unified foundation models that map pixels and language directly to actions.

Under the hood: ViLLA, dual-system, and the death of the modular stack

AGIBOT's GO-2 is the cleanest statement yet of where humanoid software is headed. It is a unified vision-language-latent-action (ViLLA) model — perception, balance, and motion planning collapse into one network rather than the classic perception → planner → controller pipeline robotics has relied on for two decades ^[1]. The company frames it as 'the first system to bridge the last mile between logical reasoning and precise execution within a unified architecture' ^[1], a claim that, marketing aside, matches what Air Street Capital described as the foundation-model playbook 'infusing new life into robotics' ^[2].

The architectural trick is the asynchronous dual-system: a slow, language-conditioned planner that decides what to do at low frequency, and a fast action expert that streams motor commands at high frequency ^[3]. This is the same pattern Figure AI ships in Helix, where System 1 runs at 200 Hz while System 2 deliberates at 7-9 Hz ^[4]. The convergence is not a coincidence — once you accept that a single learned model has to output joint torques, you have to solve the latency problem, and dual-rate inference is the obvious answer. GO-2 reports a 98.5% success rate on the LIBERO benchmark and 86.6% on LIBERO-Plus zero-shot, with 82.9% real-world success transferred from the Genie Sim 3.0 simulator ^[1].

Follow the money: a $1.5B London round and a 34.5B yuan Chinese flood

Wayve's Series D is the headline number — $1.2B raised, $1.5B total secured, $8.6B post-money valuation, closed February 25, 2026 ^[5]. But the more telling figure is geographic balance. Chinese embodied-intelligence companies pulled in 34.5 billion yuan of financing across 2026, and 23 of them are now in the '10-billion-yuan club' ^[6]. Figure AI sits between the two at a $39B valuation as of September 2025 ^[4]. Three distinct capital pools are now sized to underwrite multi-year foundation-model R&D in parallel.

The investor lineup tells you what kind of company is being financed. Microsoft and Uber both publicly endorsed Wayve's end-to-end thesis at the Series D, with Satya Nadella saying Azure 'supports the scale, reliability, and safety needed to bring that innovation into the real world' and Dara Khosrowshahi calling the approach 'purpose-built for scale, safety, and effectiveness' ^[5]. That is a cloud hyperscaler and a global ride-hailing platform betting that the same learned stack Wayve drove zero-shot through more than 500 cities in Europe, North America and Japan generalises beyond the car ^[5]. The market sizing under all of this — $4.44B in 2025 growing to $23B by 2030 at a 39% CAGR ^[6]— is what justifies writing billion-dollar checks before unit economics are proven.

The three poles: London, Shanghai, San Francisco

Embodied AI now has three distinct geographic centres of gravity, each with a different operating model. London is the Wayve pole, where the bet is that a single end-to-end driving model trained on European, North American and Japanese streets generalises to any embodied platform, and Wayve Labs is explicitly the unit chartered to push that frontier beyond cars ^[7]. The framing is unambiguous: 'The future of AI won't happen behind a screen. We need embodied AI' ^[7].

Shanghai is the AGIBOT pole, where the bet is that vertically integrated humanoid manufacturing plus a ViLLA foundation model creates a flywheel. The April 2026 Partner Conference codified this as 'One Robotic Body, Three Intelligences' — locomotion, manipulation, and interactive — running on shared embodied-AI infrastructure ^[8]. Hitting 10,000 humanoids off the line in March 2026 ^[9]matters because every shipped robot is a data-collection node. San Francisco is the Figure pole, where the dual-system Helix architecture anchors a Western humanoid stack ^[4]. The three poles are not competing on the same axis — they are competing on whose data and capital flywheel compounds fastest.

The contrarian read: GPT-2 stage, sped-up demos, and the hardware-software gap

The skeptic case has muscle, and some of the loudest doubts come from inside the field. On Reddit, the most upvoted strategic framing of the moment is X Square Robot founder Wang Qian's public characterisation of embodied AI as roughly at the GPT-2 stage — useful precisely because it sets expectations for a long, expensive trajectory before any consumer breakthrough. The community reaction to recent open-weights humanoid releases has skewed the same way: enthusiasm for the architecture, sharp pushback on launch reels rendered at 10x playback, and a steady drumbeat of questions about how to train these models on real embodiments rather than benchmark suites. Even Air Street Capital, broadly bullish on the architecture shift, frames VLAMs as the 'clearest expression' of the foundation-model playbook arriving in robotics ^[2]— language that signals optimism about trajectory, not present capability.

The quantitative picture cuts both ways. GO-2's 98.5% LIBERO and 86.6% LIBERO-Plus zero-shot numbers ^[1]are strong on academic benchmarks but a long distance from the messiness of a household kitchen. Morgan Stanley and Bank of America estimates put the humanoid market at roughly 90,000 units in 2026 and 1.2M by 2030 ^[10], which is meaningful at the factory level but trivial at the 'general-purpose helper' level the marketing implies. The honest reading: the architecture race is real, the capital is real, the benchmark wins are real — and the gap between a 200 Hz Helix loop and a humanoid that can reliably load a dishwasher unsupervised is still enormous.

Where the field's loudest voices are pointing

Across platforms, the conversation has converged on a shared framing more than on shared sentiment. The single highest-engagement voice on humanoid foundation models on X is NVIDIA's Jim Fan, who positions Project GR00T as 'a cornerstone for Foundation Agent' — the generalist embodied-AI roadmap whose vocabulary the rest of the field has quietly adopted. Wayve's own announcement of its embodied-AI round and Beijing X-Humanoid's open-sourcing of its XR-1 VLA model are the other anchor signals, and together they sketch a Western-vs-Chinese pattern: hyperscaler-backed proprietary stacks in London and SF, open-weights releases out of Beijing and Shanghai.

YouTube tells a curriculum story rather than a hype story. The most-watched panels are the AI House Davos 2026 embodied-AI conversation (a Yann LeCun and Marc Pollefeys discussion on transferring AI advances to robotics) and Stanford CS25's lecture by Google DeepMind's Fei Xia on low-level embodied intelligence with foundation models, alongside AGIBOT's official GO-1 architecture walkthrough showing the VLM → Latent Planner → Action Expert decomposition that GO-2 inherits. The fact that academic and industrial talks are the high-engagement category — not flashy demos — is itself a signal: this is being absorbed as a graduate-level architectural shift, not a consumer product yet.

Historical Context

2017-08-21

Founded in Cambridge by Alex Kendall and Amar Shah to pursue end-to-end learned driving without HD maps or hand-coded rules.

2023-02

Founded in Shanghai by ex-Huawei engineers Deng Taihua and Peng Zhihui.

2024-05-07

Alex Kendall publishes 'The Road to Embodied AI' thesis, arguing cognitive AI captures only a fraction of the prize.

2025-01

Produces its 1,000th general-purpose embodied robot, marking the start of mass production.

2026-02-25

Closes $1.2B Series D ($1.5B total secured) at $8.6B post-money valuation; Wayve Labs launches.

2026-03

Rolls 10,000th humanoid off the line.

2026-04-09

Releases GO-2 ViLLA foundation model with asynchronous dual-system planning/execution.

2026-04-17

2026 Partner Conference in Shanghai unveils the 'One Robotic Body, Three Intelligences' (locomotion / manipulation / interactive) framework.

Power Map

Key Players

Subject

Embodied AI foundation models for humanoid robots

Wayve

London-based autonomy lab; raised $1.2B Series D at $8.6B post-money valuation and launched Wayve Labs as a frontier embodied-AI research unit explicitly extending beyond driving to warehouse robots and humanoids.

AGIBOT

Shanghai humanoid maker; released GO-2 ViLLA foundation model on April 9, 2026 and rolled out its 10,000th humanoid in March 2026 — using shipped robots as a data-collection flywheel.

Figure AI

SF humanoid company valued at $39B as of September 2025; ships the Helix dual-system model running System 1 at 200 Hz and System 2 at 7-9 Hz — the same dual-rate pattern AGIBOT's GO-2 adopted.

Microsoft and Uber

Strategic backers of Wayve's Series D; CEOs publicly endorsed the end-to-end embodied-AI thesis at the funding announcement.

Mercedes-Benz, Nissan, Stellantis

OEM investors in Wayve's Series D, committing to a single embodied-AI foundation model as the software layer across vehicle programs.

Fact Check

12 cited

Source Articles

Top 1

Embodied AI + humanoid robot foundation models surge: Wayve Labs, AGIBOT AGILE, London robot-brain startups

THE SIGNAL.

Analysts

"Autonomy will not scale through city-by-city robotaxi deployments alone. It will scale through a trusted platform that automakers and fleets can deploy globally and improve continuously."

Alex Kendall

Co-Founder & CEO, Wayve

"Cognitive AI only unlocks a fraction of the ultimate potential of AI — the next frontier is embodied."

Alex Kendall

Co-Founder & CEO, Wayve

"Embodied intelligence is no longer a concept, it is becoming a new form of productive infrastructure."

Peng Zhihui

Co-founder, President & CTO, AGIBOT

"Wayve is pushing the frontier of embodied AI for autonomous driving, and Azure supports the scale, reliability, and safety needed to bring that innovation into the real world."

Satya Nadella

Chairman & CEO, Microsoft

"Wayve's powerful end-to-end approach is purpose-built for scale, safety, and effectiveness."

Dara Khosrowshahi

CEO, Uber

"VLAMs have become the clearest expression of how the foundation-model playbook is infusing new life into robotics."

Air Street Capital

London VC, State of AI Report authors

The Crowd

"Foundation Agent: a roadmap to build generally capable embodied AI that acts skillfully across many worlds, virtual or real. Project GR00T, the Humanoid robot foundation model, is a cornerstone for Foundation Agent. It's the North Star, the next grand challenge in our quest for"

@@DrJimFan1244

"We've got some exciting news to share 🚨 We are thrilled to announce that we've raised $1.05bn in our latest fundraising round led by @SoftBank_Group with contributions from @nvidia and @Microsoft. This milestone will propel us in developing and launching our first Embodied AI"

@@wayve_ai589

"Beijing Humanoid Open-Sources XR-1 Ecosystem: Making Robots "Work"🤖 Beijing Innovation Center of Humanoid Robotics (X-Humanoid) has officially open-sourced XR-1, the first VLA model to pass China's national embodied AI standards, alongside the RoboMIND 2.0 dataset and ArtVIP"

@@XRoboHub78

"Sunday Robotics just introduced ACT-1, a frontier foundation model trained on zero robot data, behind their home wheeled-humanoid Memo From Sunday on X (thread with multiple videos)"

@u/OpenSourceDroid4Life94

Broadcast

Embodied AI: Systems that See, Hear, and Act in the World Alongside Humans | AI House Davos 2026

Stanford CS25: V3 I Low-level Embodied Intelligence w/ Foundation Models

AgiBot GO-1: The Evolution of Generalist Embodied Foundation Model from VLA to ViLLA