TECH

OpenAI Agents SDK update with sandboxing and open-source harness

31+

Signals

Strategic Overview

01.
OpenAI released a major update to its Agents SDK on April 15, 2026, introducing native sandbox execution and an enhanced model-native harness aimed at enterprise agent development.
02.
The SDK separates the control harness from the compute layer so credentials stay out of the environments where model-generated code actually runs.
03.
Sandboxes give agents an isolated, Unix-like workspace with a filesystem, shell, installed packages, mounted data, exposed ports, snapshots, and controlled external access, described via a portable Manifest abstraction.
04.
The new harness and sandbox capabilities launch first in Python, with TypeScript support planned for a later release, and are offered to all customers via the API at standard pricing.

The Split That Matters: Harness Above, Sandbox Below

The single most consequential shift in this release is architectural, not featural. Previous Agents SDK deployments tended to collapse the orchestration logic and the execution environment into the same process — the code that decided what an agent should do next also ran whatever the model wrote. OpenAI has now pulled those apart. The harness, which holds plans, tool definitions, memory, and credentials, stays in the developer's trusted environment. The sandbox, which runs model-generated shell commands and code, lives on a separate compute backend — Cloudflare, Vercel, Modal, E2B, Blaxel, Daytona, Runloop, Docker, or a local Unix option.

What falls out of that split is a security posture that was awkward to achieve before: 'No API keys, no secrets in that sandbox. You want it to be totally isolated — probably isolated from the network in a lot of cases,' as an OpenAI representative told The New Stack. If the sandbox gets compromised by a prompt injection or a misbehaving tool call, it doesn't have the credentials to do anything interesting. The harness is where auth lives; the sandbox is where risk is contained. The Manifest abstraction — a portable description of a fresh sandbox's starting contents — is the mechanism that lets the same agent code target any of the eight integrated providers without rewriting infrastructure glue.

Why The Timing: Agents That Run For Days, Not Steps

The release reads like a routine SDK update until you stack it against how model capability has shifted. OpenAI frames the change in blunt terms: earlier-generation agents 'could take five, six, seven steps maybe in a workflow, but not really go beyond that,' while current models 'can kind of work for hours at a time or days or weeks.' That is a two- to three-order-of-magnitude jump in runtime horizon, and it breaks the assumptions of an in-process harness. A five-step agent can live inside a single function call. A five-day agent cannot — it needs durable state that survives process restarts, human approvals, and weekend outages.

That is what the new RunState, session_state, and snapshot primitives are for. RunState snapshots serialize to JSON, so a long-horizon workflow can pause for a compliance review, resume on a different worker, or replay after a crash. Before this, the standard workaround was to bolt on an external durable-execution system — the AI Engineer talk 'OpenAI + Temporalio: Building Durable, Production Ready Agents' illustrates the pattern, with Temporal framing itself as the orchestration brain agents were missing. Bringing snapshotting in-house changes the calculus: durability is no longer an integration, it is a primitive. Independent orchestration vendors now have to argue why an external system is still worth the wire-up.

Three Theories Of Where Agent Value Lives

The most interesting reaction on X is not cheerleading but a live strategic disagreement about what an agent platform even is. One widely-circulated framing contrasts three bets: Anthropic's (the model and the harness are one integrated product, as with Claude Code); OpenAI's (the model is a commodity, the harness is open, and the sandbox is bring-your-own); and Cursor's (any model, any harness, value accrues to the IDE layer). The OpenAI release is the clearest statement yet of the middle bet — it open-sources the orchestration scaffolding and invites every sandbox vendor onto an equal footing.

One widely-shared developer-side framing on X pushed the point further: 'the real product is no longer the model. It's the control plane around the model.' If that read is right, the winners of this release are infrastructure players who can plug into the Manifest interface and the losers are single-purpose 'agent framework' startups whose moat was orchestration glue. The partners OpenAI lists — Cloudflare, Vercel, Modal, E2B, and peers — benefit today from distribution while accepting that the control plane itself is now owned by OpenAI, not them.

What The Skeptics Are Saying

The update is not receiving uniform applause. On r/AI_Agents, one developer pushed back that 'there's a whole class of agent work that fundamentally can't be sandboxed: anything that needs to interact with native desktop apps through accessibility APIs' — a reminder that Unix-like isolation is a poor fit for Windows automation, operator-style screen control, or anything else whose value is specifically that it touches the real machine. A Vercel-adjacent comment flagged the other structural objection: Claude's and OpenAI's SDKs 'lock you to one provider,' which is why multi-provider wrappers like the Vercel AI SDK keep pulling interest even as OpenAI expands its own stack. And a running thread of smaller-scale developers argued that 'for small projects, a simple loop around API calls still works fine' — the full harness/sandbox machinery is overhead nobody asked for below a certain complexity threshold.

There are narrower but concrete gaps too. TypeScript developers do not get the new harness and sandbox APIs on day one; Python ships first, TypeScript is 'planned for a later release.' For teams whose agent code already lives in Node, that is a non-trivial wait. And the sandbox-provider comparison work being done on Reddit — ranking systems like SmolVM, Microsandbox, OpenSandbox, and E2B on snapshotting, fork/clone, and pause/resume — suggests the eight-provider roster is less commoditized than the uniform Manifest interface implies. Snapshot semantics in particular vary, and 'portable' will only be as real as the lowest-common-denominator feature set.

By The Numbers

The quantitative shape of the release tells the consolidation story clearly. Eight integrated sandbox providers ship on day one — Blaxel, Cloudflare, Daytona, Docker, E2B, Modal, Runloop, and Vercel — plus a Unix-local option for development. Four enterprise cloud storage backends are mountable directly into sandboxes: AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2, which covers essentially the full set of object stores a Fortune-500 buyer is likely to already run. The SDK is also advertised as compatible with 100+ non-OpenAI LLMs via Chat Completions-compatible endpoints, which positions the harness as deliberately model-agnostic even as it is published by a model vendor.

The most striking number, though, is the one that isn't a unit count: the agent-workflow horizon has moved from roughly five to seven steps under previous-generation models to hours, days, or weeks of continuous autonomous work under current frontier models. Every other design decision in this release — snapshots, configurable memory, isolated compute, mountable cloud storage — follows from that single shift. If you only read one data point out of this launch, read that one.

Historical Context

2024-10

Released Swarm, an experimental multi-agent framework that became the precursor to the Agents SDK.

2025-03-11

Launched the Responses API and the first production Agents SDK, replacing Swarm with built-in tracing, handoffs, and tool use.

2026-04-15

Published the 'next evolution' update adding sandboxed execution, configurable memory, snapshotting, and multi-provider support.

Power Map

Key Players

Subject

OpenAI Agents SDK update with sandboxing and open-source harness

OpenAI

Vendor releasing the updated Agents SDK and positioning it as the orchestration layer for enterprise agents on top of the Responses API. Its decision to keep the harness open-source and provider-agnostic sets the default shape of the ecosystem.

Cloudflare, Vercel, Modal, E2B

First-wave integrated sandbox providers whose runtimes are wired in via dedicated clients; they gain distribution from the SDK while ceding the orchestration layer to OpenAI. Cloudflare also supplies R2 as a mountable data backend.

Blaxel, Daytona, Runloop, Docker

Additional integrated sandbox providers supported out of the box alongside a Unix-local option for development, broadening portability of the Manifest abstraction across infrastructure choices.

Enterprise developers

Primary audience demanding isolated, network-restricted environments with no secrets in the sandbox; their compliance posture is the reason the harness/compute split exists in this shape.

Third-party agent framework vendors

Indirect losers whose differentiation narrows as OpenAI ships first-party orchestration scaffolding, persistent state, and provider-agnostic sandboxing in a single supported stack.

THE SIGNAL.

Analysts

"Frames sandboxing as an enterprise-deployment requirement: 'No API keys, no secrets in that sandbox. You want it to be totally isolated — probably isolated from the network in a lot of cases.'"

OpenAI product/engineering representative

OpenAI, interviewed by The New Stack

"Argues the model step-count has expanded dramatically: earlier agents 'could take five, six, seven steps' in a workflow, but current frontier models 'can kind of work for hours at a time or days or weeks,' which is what motivates the new harness and durable state."

OpenAI product/engineering representative

OpenAI, interviewed by The New Stack

"Identifies unsupervised agent behavior as the central risk the sandbox mitigates: 'running agents in a totally unsupervised fashion can be risky due to their occasionally unpredictable nature.'"

TechCrunch

Technology publication

"Characterizes the update as 'orchestration scaffolding for complex, multi-step tasks. Developers bring their own infrastructure; the harness provides persistent state' — framing the release as consolidation of previously fragmented open-source agent plumbing."

OpenLink Software

Industry commentator

"Reads the SDK as a strategic play to collapse the multi-framework, multi-vector-DB agent tooling stack into a single standardized platform controlled by OpenAI."

VentureBeat analysis

Technology publication

The Crowd

"Build long-running agents with more control over agent execution. New capabilities in the Agents SDK: • Run agents in controlled sandboxes • Inspect and customize the open-source harness • Control when memories are created and where they're stored"

@@OpenAIDevs2200

"Bring your own environment. The Agents SDK now supports sandbox execution with providers including @Cloudflare @Vercel @modal @e2b, and more. Keep files, credentials, and execution state in your environment while passing approved context to the model."

@@OpenAIDevs171

"OpenAI just turned the Agents SDK into a long-running agent runtime with sandbox execution and direct control over memory and state. Before this, developers often had to stitch together 3 separate pieces themselves: the model loop, the machine where code runs, and the memory or state that lets a task continue later. That sounds small, but it is usually where agents fail, because a model may know what to do yet still lose files, lose progress, leak secrets, crash mid-task, or restart from scratch. The new SDK gives one standard setup for that missing layer: a harness that manages the agent loop, plus a sandbox where the risky work happens, plus control over when memory is created and where it is stored. Runs can now survive pauses and failures through snapshotting and rehydration. Developers can inspect the open-source harness, choose when memories are written and where they live, and use Cloudflare, Vercel, Modal, E2B, or their own environment."

@@rohanpaul_ai26

"I compared sandbox options for AI agents. Here's my ranking."

@u/aniketmaurya13

Broadcast

OpenAI + @Temporalio : Building Durable, Production Ready Agents - Cornelia Davis, Temporal

OpenAI's BRAND NEW Agents SDK (Crash Course)

Temporal + OpenAI Agents SDK Demo: Build Production-Ready Agents, Fast