What is the Model Context Protocol (MCP)?

An open spec from Anthropic that defines how AI agents talk to external tools, resources, and prompts. The MCP server is a process that exposes those capabilities to a model client. Think of it as a uniform protocol for the model-to-tool interface.

Can I run an MCP server on Cloudflare Workers or Vercel?

For stateless servers (a wrapper around a public API where every call is independent), yes — serverless is fine and scales to zero. For stateful servers (caches, persistent auth tokens, indexes, DB connections), serverless cold starts and per-instance state loss make this painful. Use a persistent VM instead.

How is hosting an MCP server different from hosting a regular API?

Three things: it's long-lived (sessions persist across many tool calls), it's bursty (idle for hours, then 50 calls in 20 seconds), and it usually holds credentials for the systems it integrates with — which makes single-tenancy a baseline, not a premium feature.

Can I self-host an MCP server?

Yes. Self-host the MCP server on infrastructure you control. boxd is one Rust binary on KVM, so you can run an MCP server inside your VPC or in your own datacenter. For regulated teams, this is the only acceptable path.

What's the cheapest way to host an MCP server?

Local stdio is free for development. For production, the cost depends on duty cycle. A persistent VM that sleeps when idle (boxd, sprites.dev) bills near zero when no agent is connected, which fits the bursty pattern of most MCP workloads. A flat-rate VPS works but pays for compute the server isn't using.

Can multiple agents share an MCP server?

Yes — that's the production pattern. The server holds state (caches, tokens) across agents. Authenticated transport (OAuth or bearer tokens) is required when more than one client connects.

← Back to blog

By Hidde KehrerMay 4, 20269 min read

MCP server hosting: a complete guide

The Model Context Protocol went from Anthropic spec (November 2024) to default agent vocabulary in under eighteen months. Claude has it. Cursor has it. OpenAI added support. Search interest in claude mcp is up roughly 50% year-over-year and rising sharply on the trough-to-peak measure. Every team building agents now writes or runs MCP servers; the question is no longer should we use MCP but where do those servers actually live in production.

This guide is about that second question. It's not a what is MCP explainer (Anthropic's docs are better than ours would be) and it's not a how to build an MCP server tutorial (the SDKs walk you through it). It's a hosting guide for people who already have MCP servers and need to put them somewhere real.

The shape of an MCP server in production

An MCP server is a process that exposes tools, resources, and prompts to a model client. The protocol is transport-agnostic — stdio, HTTP, SSE — but the shape of the workload tends to look the same as a long-running agent harness:

Long-lived. Once an agent connects, the session persists across many tool calls.
Stateful. Auth tokens, DB connections, caches, file-system state — most MCP servers carry context between calls.
Bursty. Idle for hours, then 50 calls in 20 seconds when an agent picks it up.
Identity-bound. The server usually needs credentials for whatever it integrates with — GitHub, Linear, Notion, your internal API. Those credentials need a stable home.
Reachable. Remote MCP servers need a real DNS name, real TLS, and a stable address agents can return to.

Most cloud primitives weren't shaped for that mix. A serverless function is great for the bursty part and bad for the stateful part. A long-lived container is great for state and pricey when idle. A laptop is fine for development and unreachable from production agents.

The four hosting patterns

Pick by who calls the server, how stateful it is, and whether you can afford to share infrastructure with anyone outside your trust boundary.

1. Local (stdio transport)

The MCP server runs as a child process on the same machine as the agent client. Claude Desktop's filesystem and shell servers work this way; many Claude Code MCP integrations do too.

Good for: developer-local tools, anything that should never leave your laptop, prototyping.

Bad for: anything multi-user. Anything an agent in a CI runner needs to reach. Anything that needs persistent identity across sessions.

2. Bundled with the client

The MCP server ships inside the agent — same binary, same process, same memory. Some Claude Code extensions work this way.

Good for: zero-config integrations where the server has no state of its own.

Bad for: servers that need their own credentials, persistent caches, or independent deploy cycles.

3. Edge / serverless

Deploy the server as a Cloudflare Worker, Vercel Function, AWS Lambda, or similar. HTTP transport. Scales to zero.

Good for: stateless MCP servers — wrappers around a public API where each call is independent. Cloudflare's MCP gateway pattern fits here.

Bad for: anything stateful. Cold starts hurt agent latency. Per-instance state is gone the moment the container recycles, so you end up rebuilding session context on every call. The platform's request-timeout limits collide with long tool calls.

4. Persistent server on a VM

Long-running process on a host you control. SSH in to deploy. Real DNS, real TLS, real disk. Sleeps when idle if your substrate supports it; otherwise runs 24/7.

Good for: production. Multi-agent. Stateful. Anything that touches credentials you care about.

Bad for: trivially-stateless single-call wrappers where serverless cold-starts don't matter. Use what fits.

Most teams start with pattern 1 (local), reach for pattern 4 (persistent VM) the moment they need to share, and adopt pattern 3 (serverless) for the specific subset of servers that fit it.

What changes when MCP goes to production

Local-stdio is forgiving. Production is not. Five things stop being free:

Identity and addressing. Multiple agents need to hit the same server. The server needs a DNS name they all know. That name needs TLS. The server needs to be running when they call.

State that survives restarts. Auth tokens you negotiated with GitHub. Cached results from expensive API calls. Vector indexes you spent ten minutes building. Any of those gone after a redeploy is a regression.

Observability. When an agent does something surprising, you need the tool-call history. Stdout into the void is fine for prototypes; production needs structured logs you can query.

Concurrency. A single agent burst can hit a server with dozens of parallel tool calls. The server has to either handle it or queue it without dropping requests.

Auth and isolation. A shared MCP server with credentials for production systems is a credential vault. Treat it that way: authenticated transport, per-client identity, blast radius limits.

What a production MCP server host actually needs

Five properties. They map to the five production concerns above.

Persistent process and disk. The server's state lives across restarts. Caches survive. Negotiated tokens survive. Indexes don't get rebuilt on every deploy.
Reachable on a stable address. Real DNS, real TLS, no NAT punching. Agents come back to the same hostname every time.
Cheap when idle. Most MCP servers spend 90% of their time waiting. The bill should reflect that.
Single-tenant by default. When the server holds your GitHub PAT, your Notion key, and your DB password, single-tenant is not a premium feature. It's the default.
Self-hostable. For sensitive integrations, deploy MCP server on infrastructure I own needs to be a real option, not a sales conversation.

What we built

boxd is a substrate that has all five properties. It's a self-hostable persistent VM platform shaped around long-lived agent workloads, and MCP servers are exactly that.

Persistent VMs. The MCP server process runs and keeps running. Disk persists. Caches persist. Tokens persist.
Per-VM public IPv4 + DNS + TLS. Every VM gets a routable address. Point your mcp.example.com at it. Agents reach it the same way every time.
Costs nothing while it sleeps. Idle VMs hibernate. Sub-millisecond wake when an agent connects. See pricing — you're not paying for compute that isn't doing anything.
Single-tenant by design. Your VM is your VM. Per-VM cgroups, per-VM network, per-VM disk. No noisy neighbours sharing your credential vault.
Self-hostable. boxd is one Rust binary. Drop it on any host with KVM. Run it in your cloud, your datacenter, your country.

A working remote MCP server on boxd looks roughly like this:

ssh mcp.example.boxd.sh
# install your MCP server
npm install -g @your-org/mcp-server-internal
# run it under systemd or your supervisor of choice, bound to localhost
# expose via boxd's reverse proxy at mcp.example.boxd.sh

Your Claude Code or Claude Desktop config points at https://mcp.example.boxd.sh. The server holds its credentials, caches its state, and stays reachable. When no one's connected, the VM sleeps. When the next call arrives, it wakes in under a millisecond and serves the request.

If you need to test a new version of the server without breaking the live one, fork the VM in 60 milliseconds. Run the new build on the fork, point a staging client at it, validate, then promote.

Patterns by team size

Solo developer. One VM. Run all your personal MCP servers on it. Reverse-proxy by subpath. Costs roughly nothing while you're not using it.

Small team. One VM per shared MCP server, or one shared VM with per-server systemd units. Either way you keep state, you keep auth, you keep observability. Add a second VM for staging when you need to test changes against real agents.

Engineering org / regulated team. Self-hosted boxd in your VPC or your own datacenter. Per-team VMs. Per-environment VMs. Identity tied to SSH keys you already manage. The MCP servers that touch internal systems run on infrastructure you fully own — no vendor in the request path.

The architecture is the same across all three. What changes is how many VMs and where the substrate runs.

When not to use a persistent VM for MCP

Persistence is the right default for most MCP servers, not all of them.

Truly stateless wrapper around a public API. No credentials to hold, no cache to warm, no cost to a cold start. Cloudflare Workers or a Vercel Function fits. Lower ops surface, near-zero cost.
One-off scratch servers for experimentation. Run them locally over stdio. No need for a host.
Servers your laptop can answer. If the only client is your local Claude Code session, local stdio is the simplest answer. Reach for remote hosting when you have more than one client or you want it reachable when the laptop's closed.

Match the host to the workload. Use the right pattern for each server, not one pattern for all of them.

What to take from this

MCP server hosting is not one problem. It's four overlapping ones — local convenience, edge scale, persistent state, self-host control. Most teams need a mix. The ones running real production agents end up with at least one server in pattern 4, the persistent VM, because that's where stateful, credentialed, multi-agent workloads belong.

When you reach for that pattern, the substrate matters. Persistence, reachability, cheap idle, single-tenancy, self-host — those are the five things the host has to give you for free, so you can spend your time on the server itself.

That's what boxd is for. SSH in. Run your MCP server. It stays. If you're shopping persistent-microVM substrates, see boxd vs sprites.dev. If your team has compliance pressure, self-hosted agent execution for engineering teams covers the GDPR / DSGVO angle.

Last verified: 2026-05-04. This article is informational, not legal advice. Send corrections to hello@boxd.sh.

Frequently asked

What is the Model Context Protocol (MCP)?: An open spec from Anthropic that defines how AI agents talk to external tools, resources, and prompts. The MCP server is a process that exposes those capabilities to a model client. Think of it as a uniform protocol for the model-to-tool interface.
Can I run an MCP server on Cloudflare Workers or Vercel?: For stateless servers (a wrapper around a public API where every call is independent), yes — serverless is fine and scales to zero. For stateful servers (caches, persistent auth tokens, indexes, DB connections), serverless cold starts and per-instance state loss make this painful. Use a persistent VM instead.
How is hosting an MCP server different from hosting a regular API?: Three things: it's long-lived (sessions persist across many tool calls), it's bursty (idle for hours, then 50 calls in 20 seconds), and it usually holds credentials for the systems it integrates with — which makes single-tenancy a baseline, not a premium feature.
Can I self-host an MCP server?: Yes. Self-host the MCP server on infrastructure you control. boxd is one Rust binary on KVM, so you can run an MCP server inside your VPC or in your own datacenter. For regulated teams, this is the only acceptable path.
What's the cheapest way to host an MCP server?: Local stdio is free for development. For production, the cost depends on duty cycle. A persistent VM that sleeps when idle (boxd, sprites.dev) bills near zero when no agent is connected, which fits the bursty pattern of most MCP workloads. A flat-rate VPS works but pays for compute the server isn't using.
Can multiple agents share an MCP server?: Yes — that's the production pattern. The server holds state (caches, tokens) across agents. Authenticated transport (OAuth or bearer tokens) is required when more than one client connects.

Try it now

No signup. No install. Just SSH.

$ ssh boxd.sh

Built by Azin Tech in Amsterdam. Open for early access.