MCP Servers in Production: Patterns From Real Deployments

Model Context Protocol servers are one of the most practical ways to give an AI assistant structured access to the systems it needs. The spec is small, the surface is clean, and the payoff when done right is significant. The failure modes, however, are also specific. Most of the MCP servers I have seen in the wild either over-scope, under-document, or handle auth in a way that will eventually leak. The patterns below are what I have landed on after building and reviewing enough of them to know where the sharp edges live.

Transport Is a Design Decision, Not a Default

The protocol supports multiple transports, but teams default to stdio and rarely reconsider. Stdio is great for local-only servers a user owns end to end. It is the wrong choice the moment multiple users, shared hosting, or network boundaries enter the picture. HTTP with streaming makes sense when the server is hosted and the assistant is elsewhere, and SSE is the right answer when the server emits events the client should see immediately without polling.

The transport you pick shapes everything downstream. Stdio assumes trust. HTTP forces you to think about auth, rate limits, and connection lifecycle. Choose the transport to match the deployment model, not the other way around.

Scope the Tool Surface Deliberately

A tempting first version of an MCP server exposes everything the underlying system can do. "Here are all fifty endpoints" sounds useful, and then the assistant spends half its context window deciding which one to call. The servers that work in production expose a smaller surface tuned to real workflows, with each tool doing a meaningful unit of work.

The discipline is to ask, for every tool, whether the assistant can use it correctly without asking three clarifying questions first. If the tool requires filling out ten parameters the assistant cannot reasonably know, split it into a workflow of smaller steps, or accept a higher-level object that the server parses internally.

Error Messages Are Part of the Protocol

When a tool fails, the assistant reads the error message and decides what to do next. Vague errors produce vague recoveries. A production-grade MCP server returns errors that are specific about what went wrong and what the caller can do about it. "Invalid input" is a dead end. "Missing field 'region'. Accepted values: us-east-1, eu-west-1, ap-south-1" lets the assistant recover without a round trip.

This sounds obvious but is consistently the place I see the biggest improvement between first drafts and real deployments. The server authors know the system well, so they write terse errors. The assistant does not know the system, so it needs explicit guidance.

Auth That Survives Contact With Users

If the MCP server reads or writes anything sensitive, auth has to exist on day one. The common mistake is embedding a single long-lived token in the configuration and trusting that nobody will copy it around. A cleaner pattern is per-user credentials, scoped tightly, rotated on a schedule, and logged at call sites so compromises are detectable.

If the server fronts a user-owned API, pass the auth through from the client. Do not collect and cache it on the server side. The fewer places credentials live, the fewer places they can leak.

Observability Is Not Optional

MCP servers that live in production need the same observability story as any other service: structured logs, per-call timing, error-rate dashboards, and a way to replay failed calls without rebuilding state. The fact that the client is an AI assistant does not change this. If anything, it raises the bar, because the caller cannot tell you what went wrong except by telling the user, who will not forward you useful details.

I keep a simple rule: if a call fails and I cannot reconstruct what was attempted from logs alone, the observability is not done.

Rate Limits and Backpressure

Assistants are aggressive callers. They retry, they parallelize, they loop when a response looks ambiguous. A server that does not defend itself against this pattern will fall over on its first real day of use. Simple per-call rate limits at the tool boundary, with clear retry-after hints, prevent most of the damage.

Backpressure matters for long-running tools. If a tool normally takes 30 seconds, the server should support cancellation and progress reporting, and it should refuse to queue up infinite parallel runs from the same caller.

Documentation Lives in the Tool Descriptions

The assistant reads your tool descriptions the way a human reads a cheat sheet. Treat them as the primary documentation, not an afterthought. A well-described tool has one job, a clear input schema, an example or two inline, and a pointer toward the related tools the assistant might reach for next.

Poor descriptions are the single most common reason assistants misuse MCP servers. Fixing the descriptions is usually a bigger quality lever than changing the underlying behavior.

Know When Not to Use MCP

An MCP server is not always the right answer. For bulk data ingestion, a one-off HTTP call from the assistant is usually fine. For local-only utilities the user can script themselves, a shell wrapper is simpler. The right moment for an MCP server is when the assistant needs structured, authenticated, repeatable access to a system it will use across many conversations.

The servers I keep in production share that shape: narrow scope, clear auth, real observability, descriptions that teach. Everything else is either still an experiment or on its way out.