execute tool for running shell commands. When you configure a sandbox backend, the agent gets:
- All standard filesystem tools (
ls,read_file,write_file,edit_file,glob,grep) - The
executetool for running arbitrary shell commands in the sandbox - A secure boundary that protects your host system
Why use sandboxes?
A sandbox provides:- Security — Code runs in isolation; it can’t access your credentials, files, or network
- Clean environments — Use specific dependencies or OS configurations without local setup
- Reproducibility — Consistent execution environments across teams
- Flexibility — Switch between cloud providers or use local VFS for development
- Shell and git — Run arbitrary commands, use version control, and work with different languages in the same session
- Clone repositories — Many providers offer native git APIs (e.g., Daytona’s git operations) so the agent can clone and work with repos
- Docker-in-Docker — Run containers inside the sandbox for build and test pipelines
Available providers
For provider-specific setup, authentication, and lifecycle details, see the provider integration pages:Modal
ML/AI workloads, GPU access, Python.
Daytona
TypeScript/Python development, fast cold starts.
Deno
Deno/JavaScript workloads, microVMs.
Node VFS
Local development, testing, no cloud required.
Basic usage
How sandboxes work
Isolation boundaries
All sandbox providers protect your host system from the agent’s filesystem and shell operations. The agent cannot read your local files, access environment variables on your machine, or interfere with other processes. However, sandboxes alone do not protect against:- Context injection — An attacker who controls part of the agent’s input can instruct it to run arbitrary commands inside the sandbox. The sandbox is isolated, but the agent has full control within it.
- Network exfiltration — Unless network access is blocked, a context-injected agent can send data out of the sandbox over HTTP or DNS. Some providers support blocking network access (e.g.,
blockNetwork: trueon Modal).
The execute method
Sandbox backends have a simple architecture: the only method a provider must implement is execute(), which runs a shell command and returns its output. Every other filesystem operation — read, write, edit, ls, glob, grep — is built on top of execute() by the BaseSandbox base class, which constructs scripts and runs them inside the sandbox via execute().
This design means:
- Adding a new provider is straightforward. Implement
execute()— the base class handles everything else. - The
executetool is conditionally available. On every model call, the harness checks whether the backend implementsSandboxBackendProtocol. If not, the tool is filtered out and the agent never sees it.
execute tool, it provides a command string and gets back the combined stdout/stderr, exit code, and a truncation notice if the output was too large.
You can also call the backend execute() method directly in your application code.
For example:
read_file to access it incrementally. This prevents context window overflow.
Two planes of file access
There are two distinct ways files move in and out of a sandbox, and it’s important to understand when to use each: Agent filesystem tools —read_file, write_file, edit_file, ls, glob, grep, and execute are the tools the LLM calls during its execution. These go through execute() inside the sandbox. The agent uses them to read code, write files, and run commands as part of its task.
File transfer APIs — the uploadFiles() and downloadFiles() methods that your application code calls. These use the provider’s native file transfer APIs (not shell commands) and are designed for moving files between your host environment and the sandbox. Use these to:
- Seed the sandbox with source code, configuration, or data before the agent runs
- Retrieve artifacts (generated code, build outputs, reports) after the agent finishes
- Pre-populate dependencies that the agent will need
Working with files
Seeding the sandbox
UseuploadFiles() to populate the sandbox before the agent runs. File contents are provided as Uint8Array:
Retrieving artifacts
UsedownloadFiles() to retrieve files from the sandbox after the agent finishes:
Inside the sandbox, the agent uses its own filesystem tools (
read_file, write_file) — not uploadFiles or downloadFiles. Those methods are for your application code to move files across the boundary between your host and the sandbox.Lifecycle and cleanup
Sandboxes consume resources (and cost money for cloud providers) until they’re shut down. You must clean them up when done. Usetry/finally to ensure cleanup even when the agent throws an error.
Basic lifecycle
All sandbox implementations follow a create → use → close pattern:Sandbox patterns
When integrating sandboxes, you have two patterns: Reuse a sandbox across multiple invocations when you want continuity (for example to persist installed dependencies):Sandboxes in middleware
If you provision sandboxes in middleware, there is noon_error hook for cleanup when exceptions occur. As a workaround:
If you provision sandboxes in middleware, there is no on_error hook for cleanup when exceptions occur. As a workaround:
- Set a TTL for the sandbox—most providers support this. Use it even on the happy path; it’s especially important when cleanup might not run.
- Use
wrap_model_callandwrap_tool_callhooks to catch unexpected exceptions and clean up resources. Note that this won’t help for node-style hooks (before_agent,after_agent,before_model,after_model) if the exception arises from other middleware.
Security considerations
Sandboxes isolate code execution from your host system, but they don’t protect against context injection. An attacker who controls part of the agent’s input can instruct it to read files, run commands, or exfiltrate data from within the sandbox. This makes credentials inside the sandbox especially dangerous.Handling secrets safely
If your agent needs to call authenticated APIs or access protected resources, you have two options:- Keep secrets in tools outside the sandbox. Define tools that run in your host environment (not inside the sandbox) and handle authentication there. The agent calls these tools by name, but never sees the credentials. This is the recommended approach.
-
Use a network proxy that injects credentials. Some sandbox providers support proxies that intercept outgoing HTTP requests from the sandbox and attach credentials (e.g.,
Authorizationheaders) before forwarding them. The agent never sees the secret — it just makes plain requests to a URL. This approach is not yet widely available across providers.
General best practices
- Review sandbox outputs before acting on them in your application
- Block sandbox network access when not needed
- Use middleware to filter or redact sensitive patterns in tool outputs
- Treat everything produced inside the sandbox as untrusted input