By Hidde KehrerApril 8, 20265 min read

The agent lives inside the machine

boxd is built on a single architectural bet: the agent belongs inside the machine, not outside it.

That one sentence decides everything else — how we built the VMM, why fork is a first-class operation, why a box persists across sessions, why there's no SDK, why SSH is the interface. This post is about why we made that call.

The default model

Most agent platforms today separate the agent from the execution environment. The agent process runs on one machine and reaches into a sandbox on another through a defined tool interface: file.read, file.edit, shell.exec, search. The sandbox is passive. The brain lives elsewhere.

It's a reasonable starting point. It ships today. It gives you clean observability (every action is a typed call), easy orchestration (one agent can drive many sandboxes), and clean failure modes (a dead sandbox doesn't kill the agent). For controlled enterprise workflows where predictability is the primary constraint, it's a fine architecture.

We think it's the wrong primitive to build on.

What the separation costs you

Every interaction goes through a keyhole. Yes, shell.exec() exists. An agent outside the VM can technically install packages, write scripts, set up databases — but only by serializing intent into command strings, sending them over the network, and parsing text back. An agent inside the VM is just a process on an OS. It reads files at disk speed. It tails logs as they stream. It talks to running services over localhost. Nothing gets serialized. Nothing gets parsed. It just does the thing.

The latency tax grows over time. Today, most of an agent's wall-clock is spent on inference, so per-tool-call overhead is a rounding error. But inference is getting faster and agents are making more calls per task. The tool-call overhead becomes a larger share of total runtime. An agent inside the VM doesn't pay that tax at all.

Fork stops meaning anything. Forking is the most powerful primitive we offer: clone a full environment in milliseconds to explore two approaches at once. If the agent lives outside the VM, forking only clones the environment — not what the agent was thinking, what it tried, what it learned. You have to re-inject context into the new sandbox, spending tokens and time to rebuild what the agent already knew. With the agent inside the VM, fork captures everything: environment, agent state, in-progress work. Ten forks, ten parallel approaches, each with full context. Keep the one that works.

What we built instead

Every agent on boxd gets its own computer. Not a sandbox it reaches into through a tool API — a real Linux machine it runs inside. Everything about boxd flows from that decision:

Persistence is the architecture, not a feature. A box doesn't die between requests. The agent's state, scratch files, running processes, installed packages — they're still there tomorrow. The VM is the durable context. No external orchestrator has to reconstruct anything.

Hibernation is free. A box suspends in ~10ms and resumes in ~10ms, agent and all. No state serialization, no rehydration, no context re-injection. You pay nothing while it sleeps.

Fork is a real primitive. Clone the whole machine — kernel, filesystem, processes, agent — in under 60ms. Every fork inherits everything. This is only possible because the agent is part of the VM snapshot.

Model choice stays decoupled. The agent inside the VM calls whatever model it wants over the network. Swap providers with a config change. "Agent inside the VM" doesn't mean "agent welded to one model."

Isolation comes from the hardware, not the tool surface. Each box is a KVM virtual machine with its own kernel, its own memory, its own network stack. The agent inside has full root, full filesystem, full network — and can still never affect anything outside its box. Security without guardrails. Flexibility without compromise.

The interface is the computer. SSH in, and you have it. No SDK, no dashboard, no permission model for reaching into your own machine. The agent has the same interface a developer does, because the agent is doing the same kind of work.

The training wheels argument

We think the "agent outside the sandbox" architecture is a crutch for a temporary problem.

Current models need structured tool interfaces because they're not reliable enough to operate a raw computer cleanly. They misread filesystems. They run commands that don't make sense. They need guardrails. The tool abstraction is the guardrail.

But models are getting better fast. Every generation is more autonomous, more capable, more trustworthy. The guardrails that make today's models functional will constrain tomorrow's.

You don't give a junior engineer a five-command terminal and an approval workflow for every action forever. You start there, maybe. But the goal is to give them a real machine and trust them with it. The same trajectory applies to agents.

If the guardrails are structural — baked into the architecture rather than sitting at a policy layer — they have to be replaced when models outgrow them, not loosened. That's a bet on standing still. We're not taking it.

We might be wrong

The honest version: we're making a bet on where model capability is headed. If models plateau and continue to need heavy scaffolding, the separated architecture wins — it's more controllable, more auditable, more predictable.

If they keep improving, the companies whose agents have full computers will outperform the ones whose agents work through keyholes. We're betting on the second world. The last three years make that look like the right side of the bet.

The bigger picture

Start with one agent, one computer. That's today.

Tomorrow: one agent coordinates ten computers. Then a hundred. Each one a full, isolated machine the agent can fork, specialize, and orchestrate. An agent hits a complex problem and forks ten copies of its environment, explores in parallel, merges the best result. The limiting factor stops being architecture and becomes energy.

You don't get there by routing every action through a tool API on a remote orchestrator. You get there by giving agents computers and getting out of the way.

That's what boxd is.

For the operational version of this thesis — what a substrate that holds the agent actually needs to provide — see where to run an agent harness in production. For the architectural sister-thesis: persistence beats ephemeral.

Try it now

No signup. No install. Just SSH.

$ ssh boxd.sh

Built by Azin Tech in Amsterdam. Open for early access.

The agent lives inside the machine

The default model

What the separation costs you

What we built instead

The training wheels argument

We might be wrong

The bigger picture

Read next

Persistence beats ephemeral

Where to run an agent harness in production

A single Rust binary

Try it now