How boxd Works
Your SSH key is your identity. Your terminal is your dashboard.
boxd is a cloud platform where the interface is SSH. There's no web console to click through, no YAML to write, no container orchestrator to learn. You create a VM with one command, SSH into it with another, and it's reachable over HTTPS immediately.
ssh boxd.sh new --name=myapp
ssh myapp.boxd.sh
That's it. Your public key is your account. Every VM gets its own public IPv4 address. Every VM gets a DNS name that just works.
We built boxd because we kept seeing the same problem: developers and AI agents need remote compute, and the options are either too complex (Kubernetes, Terraform, cloud consoles) or too constrained (managed containers with no root access, no persistent storage, no real networking). We wanted something in between — real VMs, real isolation, zero ceremony.
This post explains how the system works under the hood.
Architecture at a Glance
boxd is a distributed system. A cluster is made up of four types of nodes, each with a single responsibility:
- Control nodes make decisions — where to place a VM, which IP to assign, whether to admit a request.
- Worker nodes run VMs — each VM is a hardware-virtualized machine using KVM, not a container.
- Proxy nodes are the front door — they terminate SSH and TLS, route traffic to the right VM, and serve the public API.
- DNS nodes answer queries —
myapp.boxd.shresolves to the VM's public IP.
All four node types participate in a single consensus cluster. Every node maintains a local replica of the system's state. This is key to how boxd achieves simplicity: there's no service mesh, no message queue, no external database. The consensus log is the coordination layer.
When the control plane decides to create a VM, it writes an entry to the log. A worker is picked. The worker that's responsible for VM sees the entry appear in its local replica and acts on it — pulls the image if needed, sets up networking, boots the VM. There's no RPC call from control to worker. No queue. The log is the API between components.
One Process Per VM
Each VM runs as its own OS process. It's not a thread inside a larger runtime or a container inside a pod. It's a standalone process that manages a KVM virtual machine.
This has a practical consequence that matters: if the supervisor process crashes, the VMs keep running. When the supervisor comes back, it reconnects to the running VMs through their Unix sockets. There's no cold restart, no boot storm, no downtime for the user. The VM process is the unit of reliability, not the host daemon.
Networking Without Complexity
Every VM gets a dedicated public IPv4 address from a pool attached to the proxy. When you SSH to myapp.boxd.sh, DNS resolves to that IP, and the proxy uses a compound routing key — your public key fingerprint plus the destination IP — to identify which VM you're trying to reach. IPs are shared across users, but the key fingerprint disambiguates.
Worker nodes have no public IP at all. They communicate exclusively over a private L2 fabric, which means the attack surface is minimal: the only internet-facing components are the proxy and DNS nodes.
Identity Is Just an SSH Key
There are no usernames, no passwords, no API keys to manage (unless you want them). Your SSH public key fingerprint is your identity. Multiple keys can be linked to one account.
The registration flow is minimal:
- Connect to
boxd.shwith an unknown key - You get a URL
- Authenticate via OAuth in your browser
- Your key is linked to your account
- Reconnect — you're in
For programmatic access, you can issue JWT tokens. These tokens authenticate against the gRPC API, which exposes the same operations as the SSH interface.
Why Consensus?
People sometimes ask why we use Raft consensus for a VM platform. The answer is that it eliminates an entire class of infrastructure.
With Raft, we don't need:
- A separate database cluster
- A message broker for inter-service communication
- A service discovery system
- A distributed lock manager
- An eventually-consistent cache with invalidation logic
Every node has a complete, consistent view of the world. The proxy knows which VMs exist, which IPs they have, and which users own them — because it reads directly from its local replica. DNS does the same. Workers do the same. There's no cache miss, no stale read, no split brain for reads.
Writes go through the leader, which means they're linearizable. When a VM creation request returns, the VM entry exists on every node. The worker is already acting on it.
The tradeoff is that the cluster needs a quorum to make progress on writes (3 control nodes, so it tolerates 1 failure). But reads — which are the vast majority of operations — are always local and always fast.
What "Self-Hostable" Means
boxd isn't a SaaS-only product. The entire system is designed to run on your own hardware. The provisioning pipeline takes a set of bare metal or VPS instances and turns them into a boxd cluster. You bring the machines; boxd handles the rest.
The minimum viable cluster is a single node running all roles in one process — useful for development or personal use. A production deployment is typically 3 control nodes, 2+ workers, 2+ proxies, and a DNS node.
Since all coordination happens through the consensus log, adding capacity is straightforward: bring up a new worker, point it at the cluster, and it starts accepting VM placements. No reconfiguration of existing nodes required.
What's Next
We're early. The architecture is solid, but we're still building out the product surface — better image support, snapshot and restore, GPU passthrough, and a richer CLI experience.
If what we're building sounds interesting and you have a cool use case in mind, we'd love to hear from you.