Self-hosted agent execution for engineering teams
Your engineers are running Claude Code, Cursor, Devin, OpenCode, or all four against your codebase right now. The question worth answering before scaling that further: where does the agent actually execute?
For most teams, the honest answer is somewhere we don't fully control. The agent's execution environment is the vendor's hosted compute. Your code is checked out there. Your environment variables are loaded there. Tool calls fire from there. The model's responses, the file edits, the terminal output — all of it transits the vendor's network.
For a small team building a side project, that's fine. For a team in healthcare, finance, defence, or anywhere with serious IP concerns, it isn't. For an EU-headquartered company subject to DSGVO and Schrems II case law, it's a slow legal headache. This piece is about the alternative: self-hosted agent execution as an actual platform decision, not a security checkbox.
What "self-hosted agent execution" actually means
The phrase gets stretched. Concretely:
- The execution environment — the VM, container, or sandbox the agent operates in — runs on infrastructure you control.
- The code, files, env vars, and tool calls stay on your network. They never transit a vendor's compute.
- The model can be whatever you want: Anthropic's API, an Azure-hosted Claude, an in-VPC self-hosted model. The model is a separate concern from the execution substrate.
- The audit logs belong to you. Every shell command, every file edit, every MCP tool call lands somewhere you can query.
That's the full picture. Self-hosted just the model is not what teams mean when compliance asks the question. Self-hosted just the editor is not it either. Self-hosted agent execution means the loop — read code, write code, run command, read output — happens on your infrastructure end to end.
The compliance dimensions
Five of them in scope for most teams that ask about this:
Data residency. Where does the code physically execute? For DSGVO and DSGVO-derived contracts, the EU is not the same as somewhere a US company has an EU region. Schrems II invalidated Privacy Shield and required supplementary measures, which has made that distinction operationally consequential — the kind of thing legal teams now flag in due diligence.
Audit trail. SOC 2, ISO 27001, internal review processes — they all want a log of what the agent did, not just what the developer asked it to do. That log needs to live on infrastructure you control.
Network isolation. Production credentials, internal API keys, customer data — when an agent executes on a VM in your VPC, exfiltration paths reduce to whatever the VM is allowed to call. When it executes on a vendor's shared compute, the boundary is the vendor's policy.
Identity. Most engineering orgs already manage SSH keys, SSO identities, hardware tokens. The agent platform should plug into that, not introduce a parallel auth surface.
Blast radius. When an agent does something it shouldn't — leaks a secret, deletes a file, makes the wrong API call — you want it isolated to that one developer's workspace, not running in a shared multi-tenant container.
These are not premium concerns. They are the default for a non-trivial number of orgs the moment legal or security looks at how AI tooling is being deployed.
The engineering dimensions
Compliance only matters if the platform is usable. Five engineering requirements that tend to come up:
- Per-developer workspaces. Not one shared box. Each engineer gets their own environment with their own state.
- Fork from a golden template. A new engineer gets a working environment in minutes, not days. Setup drift between team members goes to zero.
- Persistent across sessions. The agent doesn't lose context when the developer logs off for the night. Long-running tasks survive.
- Reproducible. What ran on Dev A's machine runs identically on Dev B's. The environment is not a snowflake.
- Cheap when idle. A platform where every engineer pays for a 24/7 VM whether they use it or not is a platform finance pushes back on. Idle should bill near zero.
The compliance and engineering requirements pull on different directions only if you let them. A well-shaped substrate satisfies both: persistent VMs that fork from templates, run on infrastructure you own, sleep when no one's using them, and log everything they do.
What an internal developer platform for agents needs from its substrate
The platform is the thing engineers see — golden templates, onboarding flow, observability dashboards. The substrate is what holds it up. Five properties the substrate has to provide:
- Persistent VMs per identity. One VM per developer, surviving across sessions. No shared multi-tenancy where one engineer's workload affects another.
- Fork-from-template in seconds. Branch the golden image to onboard a new engineer; branch a workspace to try a refactor; branch a CI run to debug a failure. State copy, not image build.
- Self-hostable as a single artifact. A platform you can install means a platform that fits your existing operational model — config management, monitoring, backup. A platform that requires an external Postgres, Redis, queue, and scheduler does not.
- Audit-friendly by construction. Every action through a logged channel. SSH session recording is a solved problem; the substrate should make it the default, not the bolt-on.
- Run where you need it. Your cloud, your datacenter, your country. The decision of where the agent executes should be your call, not a feature gate.
The closer the substrate gets to one binary, plug into KVM, runs anywhere, the easier every other decision becomes.
What we built
boxd is a self-hosted agent execution platform shaped around exactly this set of requirements. The design choices that fall out:
- One Rust binary. No external Postgres, Redis, queue, or scheduler. Drop the binary on a host with KVM and a config file. That's the install.
- KVM-isolated VMs per developer. Real hardware-level isolation. Real per-VM cgroups. A bug in one engineer's workspace cannot reach another's.
- SSH-key identity. Each engineer's SSH public key fingerprint is their identity. Plugs into whatever SSH key management you already do.
- Persistent VMs that fork from a golden template. Build the team's image once. Every onboard is a 60ms fork from that image. Every workspace is a fork. Every experiment is a fork.
- Sleeps when idle. A workspace nobody's touched for an hour hibernates. Sub-millisecond resume when the engineer comes back.
- Per-VM public IPv4. Real DNS, real TLS, real network identity. Webhooks, MCP servers, and tool callbacks land on a stable address per workspace.
- Runs where you need it. Your VPC. Your bare-metal rack. A datacenter in your country. The single-binary design means self-hosted dev environment and self-hosted in EU jurisdiction are the same operation.
The architecture maps to the requirements directly because we built it for the requirements, not for a SaaS sales motion that we later said "and you can self-host this" about.
The EU sovereign cloud angle, honestly
There's a lot of theatre around EU cloud and sovereign AI right now. Most of what's marketed as sovereign cloud is a US-headquartered company with an EU region. Schrems II made clear that region is not the same as jurisdiction. The CLOUD Act doesn't stop at a customer-facing label.
Real sovereignty for agent execution looks like this:
- The execution binary runs in your jurisdiction.
- It does not call out to a vendor control plane.
- The data — code, env, logs — never leaves the jurisdiction.
- The vendor's failure or US government order does not affect your operation.
A single-binary self-hosted platform with no managed control plane is one of the few architectures that meets that bar. boxd is shaped that way because the EU customers we talked to early kept asking for it. Sovereign is not a SKU we add; it's how the binary works by default.
For teams who'd otherwise look at hosted ephemeral sandboxes — E2B, Daytona, code interpreters — self-hosted boxd is the closest thing to a self-hosted E2B in the agent-execution category as of mid-2026, since E2B itself does not currently offer self-host. The question for your security review isn't which sandbox API; it's whose binary is in the data path.
For DSGVO-bound teams, that's the difference between theoretically compliant if we accept the contractual hand-waving and the audit fits on one page.
When self-hosting isn't worth it
Self-host has operational cost. Be honest about when it doesn't pay for itself.
- Small teams without compliance pressure. A five-engineer SaaS startup running open-source code through Cursor is paying ops cost for control they don't need. Use the SaaS option.
- Teams without anyone to operate the platform. Self-host requires someone whose job description includes the platform. If nobody on the team will own it, it will rot.
- Workloads that aren't sensitive. If the agent is editing your blog and reading your docs, self-host is overkill.
If your code is sensitive, your jurisdiction is regulated, or your security review will block the SaaS option anyway — self-host. Otherwise, don't.
What to take from this
The agent vendors are building good products. Their hosted execution is fast, polished, and improving every quarter. For most teams it's the right answer.
For the teams it isn't, self-hosted agent execution is now a real category, not a fringe option. The platform shape that fits — single-binary, KVM-isolated, fork-from-template, runs in your jurisdiction — exists. Building it on top of an internal developer platform you already operate is a smaller project than most engineering teams expect.
That's what boxd is for. One binary. Drop it on KVM. Your engineers SSH in. The code runs where you need it to run. For the broader category context, the cloud dev environment in 2026 covers how CDE platforms compare. For agent-specific workloads, running Claude computer use and hosting MCP servers follow the same self-host pattern.
Last verified: 2026-05-04. This article is informational, not legal advice. Send corrections to hello@boxd.sh.
Frequently asked
- What does self-hosted agent execution actually mean?
- The execution environment — the VM, container, or sandbox the agent operates in — runs on infrastructure you control. The code, files, env vars, and tool calls stay on your network. The model can still be a hosted API (Anthropic, OpenAI, Azure-hosted Claude); only the execution substrate is yours.
- Is self-hosted E2B a real option?
- Not currently. E2B is managed-only as of 2026. If you need self-host on the agent-sandbox axis, the working options are boxd (single Rust binary on KVM) and a few others. Sprites also lacks self-host today.
- Do I need to self-host the model too?
- No. The execution substrate and the model are separate concerns. You can self-host the substrate (where the agent runs) while still calling the Anthropic API for the model. For most teams that's the right split — model self-hosting is expensive and rarely needed for compliance.
- Does self-hosting on an EU cloud actually count as EU sovereign?
- Depends on the vendor's parent company. Schrems II made clear that 'EU region' is not the same as 'EU jurisdiction' when the operator is US-headquartered. A single-binary platform you run yourself, on EU infrastructure, with no upstream control plane, is the architecture that meets the bar.
- How is this different from running my engineers' workloads on EC2?
- EC2 is a VM you operate. Self-hosted agent execution is a platform that gives every engineer their own VM, persistent across sessions, forkable from a team template, audit-logged, and SSO-friendly. You can build this on top of EC2 — boxd does — but EC2 alone is not the platform.
- Is this only for regulated industries?
- No. Compliance is the strongest driver, but engineering teams without compliance pressure also adopt self-host for blast-radius control, IP isolation, predictable cost, and avoiding lock-in to a vendor's policy roadmap.
Read next
The cloud dev environment: a practical guide for 2026
Why CDE is back in the conversation, the four shapes the category comes in this year, and what changed when AI agents joined the workload.
boxd vs sprites.dev: two bets on persistent agent compute
Sprites and boxd agree on the architecture: persistent microVMs are the right primitive for AI agents. Where the two implementations diverge, honestly.
A single Rust binary
boxd is one Rust binary. No Postgres, no Redis, no Kubernetes. Why simplicity is the design.
Try it now
No signup. No install. Just SSH.
Built by Azin Tech in Amsterdam. Open for early access.