AI Agents and Computer Use Capabilities
TECH

AI Agents and Computer Use Capabilities

175+
Signals

Strategic Overview

  • 01.
    Anthropic launched Claude computer use on March 24, 2026 as a research preview, enabling Claude to open apps, navigate browsers, fill spreadsheets, and complete tasks autonomously via the new Dispatch feature. The system checks for available integrations first and falls back to screen-level computer control when no connector exists.
  • 02.
    OpenAI's GPT-5.4, released March 5, 2026, became the first AI model to surpass the human baseline on the OSWorld-Verified benchmark, achieving 75.0% versus the 72.4% human score -- a dramatic jump from GPT-5.2's 47.3%. It also supports a 1 million token context window.
  • 03.
    The Linux Foundation formed the Agentic AI Foundation (AAIF) in December 2025 with founding contributions from Anthropic's Model Context Protocol (MCP), Block's goose, and OpenAI's AGENTS.md. Over 10,000 MCP servers now exist, and the protocol has been adopted by Claude, Cursor, Microsoft Copilot, Gemini, VS Code, and ChatGPT.
  • 04.
    The agentic AI market grew from $5.25B in 2024 to $7.84B in 2025 and is projected to reach $52.62B by 2030, with vertical AI agent startups alone capturing over $15B in funding in 2025. However, 88% of organizations have reported AI agent security incidents, and only 14% have agentic solutions ready for deployment.

Why This Matters

The arrival of computer use capabilities marks a fundamental inflection point in AI: the shift from models that talk to models that act. For years, large language models have been conversational partners -- able to draft emails, summarize documents, and answer questions, but unable to click a button, open a spreadsheet, or navigate a website on your behalf. That changed decisively in March 2026, when both Anthropic and OpenAI shipped production-grade computer control features within weeks of each other. This is not an incremental improvement; it is a category change in what AI systems can do.

The stakes are enormous. Enterprise demand has been the primary accelerant: 80% of enterprise applications are expected to embed agents by late 2026, and vertical AI agent startups captured over $15 billion in funding in 2025 alone. The agentic AI market is projected to grow tenfold from $5.25 billion in 2024 to $52.62 billion by 2030. On social media, excitement is palpable but measured. A viral Reddit thread on r/artificial about Manus AI sparked debate over 'agent washing' -- the practice of rebranding simple automation as AI agents -- suggesting the community is developing a healthy skepticism that will pressure vendors to deliver genuine autonomy rather than marketing buzzwords. Meanwhile, on X.com, posts about Claude's Dispatch feature and MCP integrations with tools like Figma have garnered thousands of engagements, reflecting practitioner-level enthusiasm for real, usable agentic workflows.

How It Works

Modern computer use agents operate through a layered architecture that balances efficiency with generality. Anthropic's Claude computer use exemplifies this approach: when given a task, it first checks for available structured integrations -- Google Calendar, Slack, email connectors, and other API-based tools. If the right connector exists, it uses it directly for speed and reliability. When no structured integration is available, Claude falls back to controlling the computer like a human would: reading the screen pixel by pixel, moving the mouse, clicking buttons, and typing text. This dual-path design is pragmatic -- API calls are faster and more reliable, but screen-level control provides universal coverage for any application.

The Model Context Protocol (MCP) is the connective tissue making this ecosystem work at scale. Described as 'USB-C for AI,' MCP provides a universal open standard for connecting AI models to tools, data sources, and services. With over 10,000 MCP servers now deployed and adoption by every major AI platform -- Claude, ChatGPT, Gemini, Cursor, VS Code, Microsoft Copilot -- it has achieved the network effects necessary to become a true standard. As demonstrated in Anthropic's popular YouTube session 'Building Agents with MCP' (327K views), MCP allows developers to expose any tool or data source to an AI agent through a simple server interface. OpenAI's AGENTS.md complements this by providing project-level context: adopted by 60,000+ open source repositories, it gives agents the situational awareness needed to work effectively within specific codebases and workflows. Together, MCP and AGENTS.md form the protocol layer of the emerging agent stack, now governed under the neutral Agentic AI Foundation at the Linux Foundation.

By The Numbers

By The Numbers
AI agent benchmark scores across OSWorld, GAIA L3, and CUB (2025-2026)

The benchmarks tell a story of rapid capability gains. OpenAI's GPT-5.4 achieved 75.0% on the OSWorld-Verified benchmark, surpassing the 72.4% human baseline -- a near-doubling from GPT-5.2's 47.3% just months earlier. This is the first time any AI model has outperformed average human operators at general computer tasks. On the more demanding Computer Use Benchmark (CUB), which tests 106 real-world workflows end-to-end, scores remain in single digits for most agents (Writer's Action Agent leads at 10.4%), indicating that fully autonomous multi-step computer operation remains an unsolved challenge. On the GAIA Level 3 benchmark for general agent capabilities, Writer leads at 61% and Manus AI follows at approximately 57.7%.

The market and adoption numbers are equally striking. The agentic AI market grew from $5.25 billion in 2024 to $7.84 billion in 2025. Gartner recorded a 1,445% surge in enterprise inquiries about multi-agent systems from Q1 2024 to Q2 2025. Over 10,000 MCP servers have been deployed, and AGENTS.md has been adopted by 60,000+ open source projects. However, the security picture is sobering: 88% of organizations have reported confirmed or suspected AI agent security incidents, with healthcare at 92.7%. Only 14.4% of deployed agents went live with full security and IT approval. Prompt injection remains the top OWASP LLM vulnerability, with fine-tuning attacks bypassing Claude Haiku in 72% of cases and GPT-4o in 57%. Shadow AI breaches cost an average of $670,000 more than standard security incidents. Gartner predicts that 40% of agentic AI projects will fail by 2027 due to legacy system incompatibility, and only 14% of organizations currently have agentic solutions ready for deployment.

Impacts & What's Next

The workforce implications are profound but nuanced. Projections suggest AI agents will outnumber humans 82:1 in the enterprise by end of 2026, and Gartner predicts 15% of day-to-day work decisions will be made autonomously by 2028. Yet the International AI Safety Report from early 2026 provides a grounding counterpoint: while agents can complete many software engineering tasks with limited oversight, they cannot yet handle the full range of complex tasks and long-term planning required to automate entire jobs. The current reality is augmentation, not replacement. As Wharton's Ethan Mollick observes, true agents are already here -- companies just are not using them yet at scale.

The competitive landscape is intensifying rapidly. Anthropic and OpenAI are in a direct sprint on computer use. Google's Project Mariner targets web and mobile automation through Gemini, while Microsoft embeds Copilot with its proprietary Fara model across the Windows ecosystem at $30/user/month. NVIDIA is building open agent development platforms for enterprise knowledge work, showcasing local agent execution on RTX PCs at GTC 2026. Startups like Manus AI have attracted millions of users with consumer-friendly agent interfaces, while Writer has quietly taken the lead on rigorous benchmarks with its enterprise-focused Action Agent. Reddit discussions around Claude's computer use have praised its permission-first approach but flagged the macOS-only limitation and usage caps as practical pain points. Looking ahead, the AAIF's governance framework, combined with continued benchmark improvements and security hardening, will determine whether agentic AI fulfills its transformative promise or collapses under the weight of premature deployment and unresolved trust gaps.

The Bigger Picture

We are witnessing the emergence of a new computing paradigm. The analogy is not incremental -- it is architectural. Just as the web browser created a universal interface for information, and smartphones created a universal interface for connectivity, computer use agents are creating a universal interface for action. The key enabling insight is that by controlling screens and keyboards -- the same interfaces humans use -- AI agents gain access to every application ever built, without requiring any application to be modified. MCP and AGENTS.md then provide structured shortcuts where they exist, creating a graceful capability spectrum from API-native to screen-native operation.

The standardization happening through the AAIF is historically significant. Having Anthropic, OpenAI, Google, and Microsoft collaborate on shared agent protocols under Linux Foundation governance mirrors the early days of web standards. As Dell CTO John Roese notes, the real breakthrough is agents that can 'pass context between each other, reason across boundaries, and interact over protocols like agent-to-agent.' Stanford's popular webinar on agentic AI (622K views) frames this progression clearly: from simple tool use, through reflection and planning, to fully iterative autonomous operation. The Stanford and Anthropic educational content (Barry Zhang's 'How We Build Effective Agents' at 414K views emphasizes simplicity-first design) suggests the field is maturing past hype into principled engineering. The coinage of 'agent washing' on Reddit indicates a community that demands substance over spectacle -- a healthy immune response that will ultimately strengthen the ecosystem by rewarding genuine capability over marketing claims.

Historical Context

2023-10-01
The GAIA benchmark was introduced as a standard for evaluating AI agent general-purpose capabilities, establishing a common yardstick for the emerging field.
2025-03-01
Manus AI launched as a consumer-friendly general-purpose AI agent, rapidly gaining millions of users and scoring approximately 57.7% on the GAIA Level 3 benchmark.
2025-06-01
The Computer Use Benchmark (CUB) was released as the first comprehensive 106-workflow benchmark for end-to-end computer use tasks, providing a standardized evaluation framework.
2025-08-01
OpenAI released AGENTS.md, a markdown-based standard for providing project-specific guidance to AI coding agents, subsequently adopted by over 60,000 open source projects.
2025-12-09
The Linux Foundation formed the Agentic AI Foundation (AAIF) with founding contributions from Anthropic (MCP), Block (goose), and OpenAI (AGENTS.md), establishing neutral governance for agentic AI standards.
2026-03-05
OpenAI released GPT-5.4 with native computer use capabilities, achieving 75% on OSWorld-Verified and becoming the first AI model to surpass the 72.4% human baseline.
2026-03-24
Anthropic launched Claude computer use as a research preview for Pro and Max subscribers, enabling Claude to open apps, navigate browsers, and complete tasks on users' computers via the Dispatch feature.

Power Map

Key Players
Subject

AI Agents and Computer Use Capabilities

AN

Anthropic

Created Claude computer use and the Model Context Protocol (MCP), which has become the de facto standard for AI-tool integration with 10,000+ servers. Platinum member of the Agentic AI Foundation.

OP

OpenAI

Released GPT-5.4 with native computer use that surpassed the human baseline on OSWorld. Contributed AGENTS.md (adopted by 60,000+ open source projects) to the AAIF. Platinum member.

GO

Google

Developing Project Mariner for web and mobile agent automation using Gemini 2.5 Pro. Announced official MCP support for Google services. Platinum member of AAIF.

MI

Microsoft

Integrates Copilot Studio and Fara (a 7B-parameter model for PC automation) into the Windows and Office ecosystem at $30/user/month. Platinum member of AAIF.

LI

Linux Foundation / AAIF

Established the Agentic AI Foundation as a neutral governance structure for agentic AI standards, with platinum members including AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI.

WR

Writer

Created Action Agent which leads both the GAIA Level 3 benchmark (61%) and CUB benchmark (10.4%). Enterprise-focused with 600+ app connectors and 1M token context via Palmyra X5 LLM.

THE SIGNAL.

Analysts

"True agents are already here. You're just not using them. Companies are building agentic workflows that do a lot of work autonomously at high accuracy levels."

Ethan Mollick
Professor, Wharton School

"Don't simply pave the cow path. Instead, take advantage of this AI evolution to reimagine how agents can best collaborate."

Brent Collins
VP of AI Strategy, Intel

"Agents have the ability to pass context between each other, to reason across boundaries, and to interact over protocols like agent-to-agent."

John Roese
CTO & Chief AI Officer, Dell

"Although agents have demonstrated the ability to complete a variety of software engineering tasks with limited human oversight, they cannot yet complete the range of complex tasks and long-term planning required to fully automate many jobs; agents complement rather than replace humans."

International AI Safety Report authors
International AI Safety Commission, 2026
The Crowd

"you paying attention? Anthropic is closing the gap on the exact infrastructure that made OpenClaw so valuable. Claude's recently shipped: dispatch (text your agent from your phone, it works on your machine), scheduled tasks (recurring autonomous workflows), remote..."

@@kloss_xyz552

"You can now use AI agents / Claude Code to design directly in Figma with full access to your design system context. Figma just launched new use_figma MCP tool and skills to teach them."

@@rohanpaul_ai8500

"AI Agents vs. Agentic AI — AI Agents react to prompts; Agentic AI initiates and coordinates tasks. Agentic AI includes orchestrators and meta-agents to assign and oversee sub-agents."

@@rohanpaul_ai1000

"Manus: A Fully Autonomous AI Agent That Can Browse, Code, and Execute Tasks"

@u/unknown15000
Broadcast
Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

How We Build Effective Agents: Barry Zhang, Anthropic

How We Build Effective Agents: Barry Zhang, Anthropic

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic