Sandboxes

Agents generate code, interact with filesystems, and run shell commands. Because we can’t predict what an agent might do, it’s important that its environment is isolated so it can’t access credentials, files, or the network. Sandboxes provide this isolation by creating a boundary between the agent’s execution environment and your host system. In deep agents, sandboxes are backends that define the environment where the agent operates. Unlike other backends (State, Filesystem, Store) which only expose file operations, sandbox backends also give the agent an execute tool for running shell commands. When you configure a sandbox backend, the agent gets:

All standard filesystem tools (ls, read_file, write_file, edit_file, glob, grep)
The execute tool for running arbitrary shell commands in the sandbox
A secure boundary that protects your host system

Why use sandboxes?

A sandbox provides:

Security — Code runs in isolation; it can’t access your credentials, files, or network
Clean environments — Use specific dependencies or OS configurations without local setup
Reproducibility — Consistent execution environments across teams
Flexibility — Switch between cloud providers or use local VFS for development

Unlike an in-process Python interpreter, sandboxes also provide the agent with the following abilities:

Shell and git — Run arbitrary commands, use version control, and work with different languages in the same session
Clone repositories — Many providers offer native git APIs (e.g., Daytona’s git operations) so the agent can clone and work with repos
Docker-in-Docker — Run containers inside the sandbox for build and test pipelines

Available providers

For provider-specific setup, authentication, and lifecycle details, see the provider integration pages:

Modal

ML/AI workloads, GPU access, Python.

Daytona

TypeScript/Python development, fast cold starts.

Deno

Deno/JavaScript workloads, microVMs.

Node VFS

Local development, testing, no cloud required.

Basic usage

import { createDeepAgent } from "deepagents";
import { ChatAnthropic } from "@langchain/anthropic";
import { DenoSandbox } from "@langchain/deno";

// Create and initialize the sandbox
const sandbox = await DenoSandbox.create({
  memoryMb: 1024,
  lifetime: "10m",
});

try {
  const agent = createDeepAgent({
    model: new ChatAnthropic({ model: "claude-opus-4-6" }),
    systemPrompt: "You are a JavaScript coding assistant with sandbox access.",
    backend: sandbox,
  });

  const result = await agent.invoke({
    messages: [
      {
        role: "user",
        content:
          "Create a simple HTTP server using Deno.serve and test it with curl",
      },
    ],
  });
} finally {
  await sandbox.close();
}

How sandboxes work

Isolation boundaries

All sandbox providers protect your host system from the agent’s filesystem and shell operations. The agent cannot read your local files, access environment variables on your machine, or interfere with other processes. However, sandboxes alone do not protect against:

Context injection — An attacker who controls part of the agent’s input can instruct it to run arbitrary commands inside the sandbox. The sandbox is isolated, but the agent has full control within it.
Network exfiltration — Unless network access is blocked, a context-injected agent can send data out of the sandbox over HTTP or DNS. Some providers support blocking network access (e.g., blockNetwork: true on Modal).

See security considerations for how to handle secrets and mitigate these risks.

The `execute` method

Sandbox backends have a simple architecture: the only method a provider must implement is execute(), which runs a shell command and returns its output. Every other filesystem operation — read, write, edit, ls, glob, grep — is built on top of execute() by the BaseSandbox base class, which constructs scripts and runs them inside the sandbox via execute(). This design means:

Adding a new provider is straightforward. Implement execute() — the base class handles everything else.
The execute tool is conditionally available. On every model call, the harness checks whether the backend implements SandboxBackendProtocol. If not, the tool is filtered out and the agent never sees it.

When the agent calls the execute tool, it provides a command string and gets back the combined stdout/stderr, exit code, and a truncation notice if the output was too large. You can also call the backend execute() method directly in your application code. For example:

4
[Command succeeded with exit code 0]

bash: foobar: command not found
[Command failed with exit code 127]

If a command produces very large output, the result is automatically saved to a file and the agent is instructed to use read_file to access it incrementally. This prevents context window overflow.

Two planes of file access

There are two distinct ways files move in and out of a sandbox, and it’s important to understand when to use each: Agent filesystem tools — read_file, write_file, edit_file, ls, glob, grep, and execute are the tools the LLM calls during its execution. These go through execute() inside the sandbox. The agent uses them to read code, write files, and run commands as part of its task. File transfer APIs — the uploadFiles() and downloadFiles() methods that your application code calls. These use the provider’s native file transfer APIs (not shell commands) and are designed for moving files between your host environment and the sandbox. Use these to:

Seed the sandbox with source code, configuration, or data before the agent runs
Retrieve artifacts (generated code, build outputs, reports) after the agent finishes
Pre-populate dependencies that the agent will need

Working with files

Seeding the sandbox

Use uploadFiles() to populate the sandbox before the agent runs. File contents are provided as Uint8Array:

const encoder = new TextEncoder();
const responses = await sandbox.uploadFiles([
  ["src/index.js", encoder.encode("console.log('Hello')")],
  ["package.json", encoder.encode('{"name": "my-app"}')],
]);

// Each response indicates success or failure
for (const res of responses) {
  if (res.error) {
    console.error(`Failed to upload ${res.path}: ${res.error}`);
  }
}

Retrieving artifacts

Use downloadFiles() to retrieve files from the sandbox after the agent finishes:

const results = await sandbox.downloadFiles(["src/index.js", "output.txt"]);

const decoder = new TextDecoder();
for (const result of results) {
  if (result.content) {
    console.log(`${result.path}: ${decoder.decode(result.content)}`);
  } else {
    console.error(`Failed to download ${result.path}: ${result.error}`);
  }
}

Inside the sandbox, the agent uses its own filesystem tools (read_file, write_file) — not uploadFiles or downloadFiles. Those methods are for your application code to move files across the boundary between your host and the sandbox.

Lifecycle and cleanup

Sandboxes consume resources (and cost money for cloud providers) until they’re shut down. You must clean them up when done. Use try/finally to ensure cleanup even when the agent throws an error.

Recommend TTL for chat applications. When users can re-engage after idle time, you often don’t know if or when they’ll return. Configure a time-to-live (TTL) on the sandbox—for example, TTL to archive or TTL to delete—so the provider automatically cleans up idle sandboxes. Many sandbox providers support this.

Basic lifecycle

All sandbox implementations follow a create → use → close pattern:

const sandbox = await ModalSandbox.create(options);

try {
  const result = await sandbox.execute("echo hello");
  // ... use sandbox
} finally {
  await sandbox.close();
}

Sandbox patterns

When integrating sandboxes, you have two patterns: Reuse a sandbox across multiple invocations when you want continuity (for example to persist installed dependencies):

const sandbox = await DenoSandbox.create({ memoryMb: 1024 });

const agent = createDeepAgent({
  model,
  systemPrompt: "You are a coding assistant.",
  backend: sandbox,
});

// Multiple invocations share the same sandbox
await agent.invoke({ messages: [...] });
await agent.invoke({ messages: [...] }); // Same sandbox, same files

await sandbox.close(); // You manage the lifecycle

Create a fresh sandbox per invocation for clean isolation between tasks:

const agent = createDeepAgent({
  model: new ChatAnthropic({ model: "claude-opus-4-6" }),
  systemPrompt: "You are a coding assistant.",
  backend: () => DenoSandbox.create({ memoryMb: 1024 }),
});

Sandboxes in middleware

If you provision sandboxes in middleware, there is no on_error hook for cleanup when exceptions occur. As a workaround: If you provision sandboxes in middleware, there is no on_error hook for cleanup when exceptions occur. As a workaround:

Set a TTL for the sandbox—most providers support this. Use it even on the happy path; it’s especially important when cleanup might not run.
Use wrap_model_call and wrap_tool_call hooks to catch unexpected exceptions and clean up resources. Note that this won’t help for node-style hooks (before_agent, after_agent, before_model, after_model) if the exception arises from other middleware.

Security considerations

Sandboxes isolate code execution from your host system, but they don’t protect against context injection. An attacker who controls part of the agent’s input can instruct it to read files, run commands, or exfiltrate data from within the sandbox. This makes credentials inside the sandbox especially dangerous.

Never put secrets inside a sandbox. API keys, tokens, database credentials, and other secrets injected into a sandbox (via environment variables, mounted files, or the secrets option) can be read and exfiltrated by a context-injected agent. This applies even to short-lived or scoped credentials — if an agent can access them, so can an attacker.

Handling secrets safely

If your agent needs to call authenticated APIs or access protected resources, you have two options:

Keep secrets in tools outside the sandbox. Define tools that run in your host environment (not inside the sandbox) and handle authentication there. The agent calls these tools by name, but never sees the credentials. This is the recommended approach.
Use a network proxy that injects credentials. Some sandbox providers support proxies that intercept outgoing HTTP requests from the sandbox and attach credentials (e.g., Authorization headers) before forwarding them. The agent never sees the secret — it just makes plain requests to a URL. This approach is not yet widely available across providers.

If you must inject secrets into a sandbox (not recommended), take these precautions:

Enable human-in-the-loop approval for all tool calls, not just sensitive ones
Block or restrict network access from the sandbox to limit exfiltration paths
Use the narrowest possible credential scope and shortest possible lifetime
Monitor sandbox network traffic for unexpected outbound requests

Even with these safeguards, this remains an unsafe workaround. A sufficiently creative enough context injection attack can bypass output filtering and HITL review.

General best practices

Review sandbox outputs before acting on them in your application
Block sandbox network access when not needed
Use middleware to filter or redact sensitive patterns in tool outputs
Treat everything produced inside the sandbox as untrusted input

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Get started

Core capabilities

Deep Agents CLI

Why use sandboxes?

Available providers

Modal

Daytona

Deno

Node VFS

Basic usage

How sandboxes work

Isolation boundaries

The `execute` method

Two planes of file access

Working with files

Seeding the sandbox

Retrieving artifacts

Lifecycle and cleanup

Basic lifecycle

Sandbox patterns

Sandboxes in middleware

Security considerations

Handling secrets safely

General best practices

Get started

Core capabilities

Deep Agents CLI

​Why use sandboxes?

​Available providers

Modal

Daytona

Deno

Node VFS

​Basic usage

​How sandboxes work

​Isolation boundaries

​The execute method

​Two planes of file access

​Working with files

​Seeding the sandbox

​Retrieving artifacts

​Lifecycle and cleanup

​Basic lifecycle

​Sandbox patterns

​Sandboxes in middleware

​Security considerations

​Handling secrets safely

​General best practices

Why use sandboxes?

Available providers

Basic usage

How sandboxes work

Isolation boundaries

The `execute` method

Two planes of file access

Working with files

Seeding the sandbox

Retrieving artifacts

Lifecycle and cleanup

Basic lifecycle

Sandbox patterns

Sandboxes in middleware

Security considerations

Handling secrets safely

General best practices