MCP Server Architecture Explained

1. What is MCP? (And Why It Matters)

The Model Context Protocol (MCP) is Anthropic's standardization of how AI clients communicate with external tools and data sources. Instead of every AI company inventing their own tool integration format, MCP provides a unified protocol that works across Claude, GitHub Copilot CLI, and any other MCP-compatible client.

At its core, MCP is JSON-RPC 2.0 over stdio or Server-Sent Events (SSE). That's it. No custom transport, no proprietary serialization. Just structured JSON messages flowing between a client (like Claude Desktop) and a server (like Memory Spine's MCP endpoint).

Why MCP exists: The tool integration mess

Before MCP, every AI platform had its own way to define tools:

OpenAI Function Calling — JSON schema with specific format requirements
Anthropic Tool Use — Similar but incompatible schema differences
LangChain Tools — Python-centric with multiple inheritance patterns
Custom Agent Frameworks — Each with proprietary tool definitions

If you built a tool for one platform, porting it to another meant rewriting the interface layer. MCP solves this by standardizing the protocol layer, not just the schema format.

MCP's key insight: The transport layer matters more than the schema. JSON-RPC over stdio means any language can implement an MCP server with ~50 lines of code.

2. MCP Server Anatomy: Transport > Handler > Registry

Every MCP server has the same three-layer architecture, regardless of implementation language:

MCP server architecture: clean separation between transport, handling, and business logic.

Transport Layer: JSON-RPC over stdio

The transport layer handles the physical communication. For stdio-based servers (the most common), this means:

# Client sends to server's stdin:
{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {...}}

# Server responds via stdout:
{"jsonrpc": "2.0", "id": 1, "result": {...}}

The beauty is simplicity. No HTTP servers, no WebSocket management, no connection pooling. The client launches your server as a subprocess, communicates over pipes, and kills it when done. Stateless by design.

Handler Layer: Message parsing and validation

The handler layer parses JSON-RPC messages, validates schemas, and routes to appropriate methods:

class MCPHandler:
    def handle_message(self, message):
        # Parse JSON-RPC
        request = json.loads(message)
        method = request['method']
        params = request.get('params', {})
        
        # Route to method
        if method == 'tools/call':
            return self.call_tool(params)
        elif method == 'tools/list':
            return self.list_tools()
        elif method == 'resources/list':
            return self.list_resources()
        # ...etc

Registry Layer: Tool and resource management

The registry maintains your server's capabilities — tools, resources, and prompts. This is where the business logic lives:

class ToolRegistry:
    def __init__(self):
        self.tools = {}
        self.resources = {}
        
    def register_tool(self, name, func, schema):
        self.tools[name] = {
            'function': func,
            'schema': schema
        }
    
    def call_tool(self, name, args):
        tool = self.tools[name]
        return tool['function'](**args)

3. Tool Definition Format & Schema Validation

MCP tools use JSON Schema Draft 2020-12 for parameter validation. The format is similar to OpenAI function calling but with MCP-specific extensions:

{
  "name": "memory_search",
  "description": "Search stored memories by semantic similarity",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "limit": {
        "type": "integer", 
        "default": 10,
        "minimum": 1,
        "maximum": 100
      }
    },
    "required": ["query"],
    "additionalProperties": false
  }
}

Key differences from OpenAI function calling:

inputSchema vs parameters — MCP uses inputSchema, OpenAI uses parameters
Stricter validation — MCP enforces additionalProperties: false by default
Better error handling — Schema validation failures return structured error objects
Tool grouping — MCP supports grouping related tools under a common namespace

Advanced schema patterns

For complex tools, MCP supports advanced JSON Schema features:

{
  "name": "memory_batch_store",
  "inputSchema": {
    "type": "object", 
    "properties": {
      "memories": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "content": {"type": "string"},
            "tags": {"type": "array", "items": {"type": "string"}},
            "metadata": {"type": "object"}
          },
          "required": ["content"]
        }
      }
    },
    "required": ["memories"]
  }
}

🔧 Schema Validation Performance

Memory Spine processes 50K+ tool calls daily. JSON Schema validation adds ~0.2ms per call — negligible overhead for the safety it provides. We've caught 847 malformed tool calls this month that would have caused runtime errors.

4. Resource & Prompt Primitives

Beyond tools, MCP defines two other primitives:

Resources: Static data access

Resources provide read-only access to data sources. Unlike tools (which are function calls), resources are static references:

# List available resources
GET resources/list
> {"resources": [{"uri": "memory://search/recent", "name": "Recent Memories"}]}

# Read a specific resource  
GET resources/read {"uri": "memory://search/recent"}
> {"contents": [{"type": "text", "text": "Last 10 memories..."}]}

Think of resources as MCP's equivalent of file paths or URLs — ways for clients to reference static data without making function calls.

Prompts: Reusable prompt templates

Prompts are parameterized templates that clients can invoke:

{
  "name": "analyze_codebase",
  "description": "Generate codebase analysis prompt with context",
  "arguments": [
    {"name": "repo_path", "description": "Path to repository"}
  ]
}

# Client calls prompt
GET prompts/get {"name": "analyze_codebase", "arguments": {"repo_path": "/app"}}
> {
  "description": "Analyze codebase structure and patterns", 
  "messages": [
    {"role": "user", "content": "Analyze the codebase at /app..."}
  ]
}

Prompts are useful for complex, reusable prompt engineering that you want to centralize server-side rather than hardcode in clients.

5. Building a Custom MCP Server in Python

Let's build a minimal but complete MCP server. This example implements a simple key-value store with three tools:

#!/usr/bin/env python3
import json
import sys
from typing import Dict, Any

class SimpleMCPServer:
    def __init__(self):
        self.storage: Dict[str, Any] = {}
        
    def handle_stdin(self):
        """Main server loop reading from stdin"""
        for line in sys.stdin:
            try:
                request = json.loads(line.strip())
                response = self.handle_request(request)
                print(json.dumps(response), flush=True)
            except Exception as e:
                error_response = {
                    "jsonrpc": "2.0",
                    "id": request.get("id"),
                    "error": {"code": -1, "message": str(e)}
                }
                print(json.dumps(error_response), flush=True)
                
    def handle_request(self, request: dict) -> dict:
        """Route JSON-RPC requests to handlers"""
        method = request["method"]
        params = request.get("params", {})
        request_id = request.get("id")
        
        if method == "initialize":
            result = self.initialize(params)
        elif method == "tools/list":
            result = self.list_tools()
        elif method == "tools/call":
            result = self.call_tool(params)
        else:
            raise Exception(f"Unknown method: {method}")
            
        return {
            "jsonrpc": "2.0",
            "id": request_id,
            "result": result
        }
        
    def initialize(self, params: dict) -> dict:
        """MCP initialization handshake"""
        return {
            "protocolVersion": "2024-11-05",
            "capabilities": {
                "tools": {}
            },
            "serverInfo": {
                "name": "simple-kv-server",
                "version": "1.0.0"
            }
        }
        
    def list_tools(self) -> dict:
        """Return available tools"""
        return {
            "tools": [
                {
                    "name": "kv_set",
                    "description": "Store a key-value pair",
                    "inputSchema": {
                        "type": "object",
                        "properties": {
                            "key": {"type": "string"},
                            "value": {"type": "string"}
                        },
                        "required": ["key", "value"]
                    }
                },
                {
                    "name": "kv_get",
                    "description": "Retrieve value for a key",
                    "inputSchema": {
                        "type": "object",
                        "properties": {
                            "key": {"type": "string"}
                        },
                        "required": ["key"]
                    }
                },
                {
                    "name": "kv_list",
                    "description": "List all stored keys",
                    "inputSchema": {
                        "type": "object",
                        "properties": {}
                    }
                }
            ]
        }
        
    def call_tool(self, params: dict) -> dict:
        """Execute a tool call"""
        name = params["name"]
        args = params.get("arguments", {})
        
        if name == "kv_set":
            self.storage[args["key"]] = args["value"]
            return {
                "content": [
                    {
                        "type": "text", 
                        "text": f"Stored {args['key']} = {args['value']}"
                    }
                ]
            }
        elif name == "kv_get":
            key = args["key"]
            if key in self.storage:
                return {
                    "content": [
                        {
                            "type": "text",
                            "text": f"{key} = {self.storage[key]}"
                        }
                    ]
                }
            else:
                return {
                    "content": [
                        {
                            "type": "text",
                            "text": f"Key '{key}' not found"
                        }
                    ]
                }
        elif name == "kv_list":
            keys = list(self.storage.keys())
            return {
                "content": [
                    {
                        "type": "text",
                        "text": f"Stored keys: {', '.join(keys) if keys else 'none'}"
                    }
                ]
            }
        else:
            raise Exception(f"Unknown tool: {name}")

if __name__ == "__main__":
    server = SimpleMCPServer()
    server.handle_stdin()

Save this as simple-mcp-server.py, make it executable, and you have a working MCP server. The key insights:

Synchronous by default — Each request blocks until complete
Stateless design — Server can be killed and restarted anytime
Error handling — Always return valid JSON-RPC error responses
Tool results — Must return content array with type and data

6. Connecting to Claude, Copilot & Other Clients

To connect your MCP server to AI clients, you need to register it in the client's configuration:

Claude Desktop configuration

Edit ~/AppData/Roaming/Claude/claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "simple-kv": {
      "command": "python",
      "args": ["/path/to/simple-mcp-server.py"]
    }
  }
}

GitHub Copilot CLI configuration

For the GitHub Copilot CLI, edit ~/.copilot/mcp-config.json:

{
  "mcpServers": {
    "simple-kv": {
      "url": "stdio://python /path/to/simple-mcp-server.py"
    }
  }
}

Environment and security considerations

When deploying MCP servers in production:

Use absolute paths — Relative paths fail when clients change working directories
Handle environment variables — Pass secrets via environment, not command args
Resource limits — MCP servers inherit resource limits from the client process
Logging — Write logs to stderr (stdout is reserved for JSON-RPC)

# Production MCP server configuration
{
  "mcpServers": {
    "memory-spine": {
      "command": "/usr/bin/python3",
      "args": ["/opt/memory-spine/mcp-server.py"],
      "env": {
        "MEMORY_SPINE_API_KEY": "${MEMORY_SPINE_API_KEY}",
        "LOG_LEVEL": "INFO"
      }
    }
  }
}

7. Debugging with MCP Inspector

Anthropic provides the MCP Inspector — a web-based debugging tool for MCP servers. It's invaluable during development:

# Install MCP Inspector
npm install -g @modelcontextprotocol/inspector

# Debug your server
mcp-inspector python /path/to/simple-mcp-server.py

The inspector opens a web UI at localhost:5173 where you can:

Test tool calls — Interactive forms for each tool with schema validation
View logs — Real-time stderr output from your server
Inspect messages — Full JSON-RPC request/response pairs
Performance profiling — Tool execution times and memory usage

⚠️ Inspector Security

MCP Inspector runs your server with full permissions in the current environment. Don't use it with production credentials or in shared environments — it's a development tool only.

Advanced debugging techniques

For complex servers, add structured logging to stderr:

import logging
import sys

# Configure logging to stderr (stdout is for JSON-RPC)
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    stream=sys.stderr
)

logger = logging.getLogger(__name__)

def call_tool(self, params: dict) -> dict:
    logger.info(f"Tool call: {params['name']} with args {params.get('arguments', {})}")
    start_time = time.time()
    
    try:
        result = self._execute_tool(params)
        duration = time.time() - start_time
        logger.info(f"Tool {params['name']} completed in {duration:.3f}s")
        return result
    except Exception as e:
        logger.error(f"Tool {params['name']} failed: {e}")
        raise

8. Real Example: Memory Spine's 32-Tool MCP Server

Memory Spine's MCP server is a production example handling 50K+ daily requests across 32 tools. Here's the high-level architecture:

📊 Memory Spine MCP Stats

32 tools across 6 categories: memory operations, search, analytics, knowledge graphs, conversation tracking, and agent handoff. Average response time: 47ms. 99.97% uptime over the last 6 months.

Tool categories in production

Category	Tools	Usage %	Avg Latency
Memory Ops	8	45%	23ms
Search	6	32%	78ms
Analytics	5	12%	156ms
Knowledge Graph	7	8%	91ms
Conversation	4	2%	34ms
Agent Handoff	2	1%	67ms

Key architectural decisions

Async tool execution — Search and analytics tools run async to prevent blocking:

async def call_tool_async(self, name: str, args: dict) -> dict:
    if name in self.async_tools:
        return await self.execute_async_tool(name, args)
    else:
        return self.execute_sync_tool(name, args)

Connection pooling — Database connections are pooled and reused across requests:

class MemorySpineMCPServer:
    def __init__(self):
        self.db_pool = asyncio.create_task(
            asyncpg.create_pool(DATABASE_URL, min_size=5, max_size=20)
        )

Tool result caching — Expensive operations like analytics are cached for 5 minutes:

@lru_cache(maxsize=128)
def memory_analytics(self, cache_key: str) -> dict:
    # Expensive analytics computation
    return self._compute_analytics()

Graceful degradation — When dependencies fail, tools return partial results rather than errors:

def memory_search(self, query: str, limit: int = 10) -> dict:
    try:
        results = self.vector_search(query, limit)
    except VectorDBException:
        # Fallback to text search
        results = self.text_search(query, limit)
        
    return {"memories": results, "fallback_used": True}

Build Your Own MCP Server

Ready to integrate your tools with the MCP ecosystem? Our starter template gets you running in minutes.

Download MCP Template →