mcp-builder — Skillopedia

MCP Builder Purpose Use this skill to build high-quality MCP servers that let agents interact with external APIs and services through well-designed tools, resources, and prompts. Development Phases 1. Research MCP design and the target API. Understand authentication, objects, rate limits, pagination, common workflows, and destructive operations. 2. Plan the tool surface. Balance broad API coverage with higher-level workflow tools when those workflows are common and valuable. 3. Implement infrastructure: API client, auth, error handling, response formatting, pagination, and transport. 4. Imple…

)\n age: int = Field(..., description=\"User's age\", ge=0, le=150)\n\n @field_validator('email')\n @classmethod\n def validate_email(cls, v: str) -> str:\n if not v.strip():\n raise ValueError(\"Email cannot be empty\")\n return v.lower()\n```\n\n## Response Format Options\n\nSupport multiple output formats for flexibility:\n\n```python\nfrom enum import Enum\n\nclass ResponseFormat(str, Enum):\n '''Output format for tool responses.'''\n MARKDOWN = \"markdown\"\n JSON = \"json\"\n\nclass UserSearchInput(BaseModel):\n query: str = Field(..., description=\"Search query\")\n response_format: ResponseFormat = Field(\n default=ResponseFormat.MARKDOWN,\n description=\"Output format: 'markdown' for human-readable or 'json' for machine-readable\"\n )\n```\n\n**Markdown format**:\n- Use headers, lists, and formatting for clarity\n- Convert timestamps to human-readable format (e.g., \"2024-01-15 10:30:00 UTC\" instead of epoch)\n- Show display names with IDs in parentheses (e.g., \"@john.doe (U123456)\")\n- Omit verbose metadata (e.g., show only one profile image URL, not all sizes)\n- Group related information logically\n\n**JSON format**:\n- Return complete, structured data suitable for programmatic processing\n- Include all available fields and metadata\n- Use consistent field names and types\n\n## Pagination Implementation\n\nFor tools that list resources:\n\n```python\nclass ListInput(BaseModel):\n limit: Optional[int] = Field(default=20, description=\"Maximum results to return\", ge=1, le=100)\n offset: Optional[int] = Field(default=0, description=\"Number of results to skip for pagination\", ge=0)\n\nasync def list_items(params: ListInput) -> str:\n # Make API request with pagination\n data = await api_request(limit=params.limit, offset=params.offset)\n\n # Return pagination info\n response = {\n \"total\": data[\"total\"],\n \"count\": len(data[\"items\"]),\n \"offset\": params.offset,\n \"items\": data[\"items\"],\n \"has_more\": data[\"total\"] > params.offset + len(data[\"items\"]),\n \"next_offset\": params.offset + len(data[\"items\"]) if data[\"total\"] > params.offset + len(data[\"items\"]) else None\n }\n return json.dumps(response, indent=2)\n```\n\n## Error Handling\n\nProvide clear, actionable error messages:\n\n```python\ndef _handle_api_error(e: Exception) -> str:\n '''Consistent error formatting across all tools.'''\n if isinstance(e, httpx.HTTPStatusError):\n if e.response.status_code == 404:\n return \"Error: Resource not found. Please check the ID is correct.\"\n elif e.response.status_code == 403:\n return \"Error: Permission denied. You don't have access to this resource.\"\n elif e.response.status_code == 429:\n return \"Error: Rate limit exceeded. Please wait before making more requests.\"\n return f\"Error: API request failed with status {e.response.status_code}\"\n elif isinstance(e, httpx.TimeoutException):\n return \"Error: Request timed out. Please try again.\"\n return f\"Error: Unexpected error occurred: {type(e).__name__}\"\n```\n\n## Shared Utilities\n\nExtract common functionality into reusable functions:\n\n```python\n# Shared API request function\nasync def _make_api_request(endpoint: str, method: str = \"GET\", **kwargs) -> dict:\n '''Reusable function for all API calls.'''\n async with httpx.AsyncClient() as client:\n response = await client.request(\n method,\n f\"{API_BASE_URL}/{endpoint}\",\n timeout=30.0,\n **kwargs\n )\n response.raise_for_status()\n return response.json()\n```\n\n## Async/Await Best Practices\n\nAlways use async/await for network requests and I/O operations:\n\n```python\n# Good: Async network request\nasync def fetch_data(resource_id: str) -> dict:\n async with httpx.AsyncClient() as client:\n response = await client.get(f\"{API_URL}/resource/{resource_id}\")\n response.raise_for_status()\n return response.json()\n\n# Bad: Synchronous request\ndef fetch_data(resource_id: str) -> dict:\n response = requests.get(f\"{API_URL}/resource/{resource_id}\") # Blocks\n return response.json()\n```\n\n## Type Hints\n\nUse type hints throughout:\n\n```python\nfrom typing import Optional, List, Dict, Any\n\nasync def get_user(user_id: str) -> Dict[str, Any]:\n data = await fetch_user(user_id)\n return {\"id\": data[\"id\"], \"name\": data[\"name\"]}\n```\n\n## Tool Docstrings\n\nEvery tool must have comprehensive docstrings with explicit type information:\n\n```python\nasync def search_users(params: UserSearchInput) -> str:\n '''\n Search for users in the Example system by name, email, or team.\n\n This tool searches across all user profiles in the Example platform,\n supporting partial matches and various search filters. It does NOT\n create or modify users, only searches existing ones.\n\n Args:\n params (UserSearchInput): Validated input parameters containing:\n - query (str): Search string to match against names/emails (e.g., \"john\", \"@example.com\", \"team:marketing\")\n - limit (Optional[int]): Maximum results to return, between 1-100 (default: 20)\n - offset (Optional[int]): Number of results to skip for pagination (default: 0)\n\n Returns:\n str: JSON-formatted string containing search results with the following schema:\n\n Success response:\n {\n \"total\": int, # Total number of matches found\n \"count\": int, # Number of results in this response\n \"offset\": int, # Current pagination offset\n \"users\": [\n {\n \"id\": str, # User ID (e.g., \"U123456789\")\n \"name\": str, # Full name (e.g., \"John Doe\")\n \"email\": str, # Email address (e.g., \"[email protected]\")\n \"team\": str # Team name (e.g., \"Marketing\") - optional\n }\n ]\n }\n\n Error response:\n \"Error: \u003cerror message>\" or \"No users found matching '\u003cquery>'\"\n\n Examples:\n - Use when: \"Find all marketing team members\" -> params with query=\"team:marketing\"\n - Use when: \"Search for John's account\" -> params with query=\"john\"\n - Don't use when: You need to create a user (use example_create_user instead)\n - Don't use when: You have a user ID and need full details (use example_get_user instead)\n\n Error Handling:\n - Input validation errors are handled by Pydantic model\n - Returns \"Error: Rate limit exceeded\" if too many requests (429 status)\n - Returns \"Error: Invalid API authentication\" if API key is invalid (401 status)\n - Returns formatted list of results or \"No users found matching 'query'\"\n '''\n```\n\n## Complete Example\n\nSee below for a complete Python MCP server example:\n\n```python\n#!/usr/bin/env python3\n'''\nMCP Server for Example Service.\n\nThis server provides tools to interact with Example API, including user search,\nproject management, and data export capabilities.\n'''\n\nfrom typing import Optional, List, Dict, Any\nfrom enum import Enum\nimport httpx\nfrom pydantic import BaseModel, Field, field_validator, ConfigDict\nfrom mcp.server.fastmcp import FastMCP\n\n# Initialize the MCP server\nmcp = FastMCP(\"example_mcp\")\n\n# Constants\nAPI_BASE_URL = \"https://api.example.com/v1\"\n\n# Enums\nclass ResponseFormat(str, Enum):\n '''Output format for tool responses.'''\n MARKDOWN = \"markdown\"\n JSON = \"json\"\n\n# Pydantic Models for Input Validation\nclass UserSearchInput(BaseModel):\n '''Input model for user search operations.'''\n model_config = ConfigDict(\n str_strip_whitespace=True,\n validate_assignment=True\n )\n\n query: str = Field(..., description=\"Search string to match against names/emails\", min_length=2, max_length=200)\n limit: Optional[int] = Field(default=20, description=\"Maximum results to return\", ge=1, le=100)\n offset: Optional[int] = Field(default=0, description=\"Number of results to skip for pagination\", ge=0)\n response_format: ResponseFormat = Field(default=ResponseFormat.MARKDOWN, description=\"Output format\")\n\n @field_validator('query')\n @classmethod\n def validate_query(cls, v: str) -> str:\n if not v.strip():\n raise ValueError(\"Query cannot be empty or whitespace only\")\n return v.strip()\n\n# Shared utility functions\nasync def _make_api_request(endpoint: str, method: str = \"GET\", **kwargs) -> dict:\n '''Reusable function for all API calls.'''\n async with httpx.AsyncClient() as client:\n response = await client.request(\n method,\n f\"{API_BASE_URL}/{endpoint}\",\n timeout=30.0,\n **kwargs\n )\n response.raise_for_status()\n return response.json()\n\ndef _handle_api_error(e: Exception) -> str:\n '''Consistent error formatting across all tools.'''\n if isinstance(e, httpx.HTTPStatusError):\n if e.response.status_code == 404:\n return \"Error: Resource not found. Please check the ID is correct.\"\n elif e.response.status_code == 403:\n return \"Error: Permission denied. You don't have access to this resource.\"\n elif e.response.status_code == 429:\n return \"Error: Rate limit exceeded. Please wait before making more requests.\"\n return f\"Error: API request failed with status {e.response.status_code}\"\n elif isinstance(e, httpx.TimeoutException):\n return \"Error: Request timed out. Please try again.\"\n return f\"Error: Unexpected error occurred: {type(e).__name__}\"\n\n# Tool definitions\[email protected](\n name=\"example_search_users\",\n annotations={\n \"title\": \"Search Example Users\",\n \"readOnlyHint\": True,\n \"destructiveHint\": False,\n \"idempotentHint\": True,\n \"openWorldHint\": True\n }\n)\nasync def example_search_users(params: UserSearchInput) -> str:\n '''Search for users in the Example system by name, email, or team.\n\n [Full docstring as shown above]\n '''\n try:\n # Make API request using validated parameters\n data = await _make_api_request(\n \"users/search\",\n params={\n \"q\": params.query,\n \"limit\": params.limit,\n \"offset\": params.offset\n }\n )\n\n users = data.get(\"users\", [])\n total = data.get(\"total\", 0)\n\n if not users:\n return f\"No users found matching '{params.query}'\"\n\n # Format response based on requested format\n if params.response_format == ResponseFormat.MARKDOWN:\n lines = [f\"# User Search Results: '{params.query}'\", \"\"]\n lines.append(f\"Found {total} users (showing {len(users)})\")\n lines.append(\"\")\n\n for user in users:\n lines.append(f\"## {user['name']} ({user['id']})\")\n lines.append(f\"- **Email**: {user['email']}\")\n if user.get('team'):\n lines.append(f\"- **Team**: {user['team']}\")\n lines.append(\"\")\n\n return \"\\n\".join(lines)\n\n else:\n # Machine-readable JSON format\n import json\n response = {\n \"total\": total,\n \"count\": len(users),\n \"offset\": params.offset,\n \"users\": users\n }\n return json.dumps(response, indent=2)\n\n except Exception as e:\n return _handle_api_error(e)\n\nif __name__ == \"__main__\":\n mcp.run()\n```\n\n---\n\n## Advanced FastMCP Features\n\n### Context Parameter Injection\n\nFastMCP can automatically inject a `Context` parameter into tools for advanced capabilities like logging, progress reporting, resource reading, and user interaction:\n\n```python\nfrom mcp.server.fastmcp import FastMCP, Context\n\nmcp = FastMCP(\"example_mcp\")\n\[email protected]()\nasync def advanced_search(query: str, ctx: Context) -> str:\n '''Advanced tool with context access for logging and progress.'''\n\n # Report progress for long operations\n await ctx.report_progress(0.25, \"Starting search...\")\n\n # Log information for debugging\n await ctx.log_info(\"Processing query\", {\"query\": query, \"timestamp\": datetime.now()})\n\n # Perform search\n results = await search_api(query)\n await ctx.report_progress(0.75, \"Formatting results...\")\n\n # Access server configuration\n server_name = ctx.fastmcp.name\n\n return format_results(results)\n\[email protected]()\nasync def interactive_tool(resource_id: str, ctx: Context) -> str:\n '''Tool that can request additional input from users.'''\n\n # Request sensitive information when needed\n api_key = await ctx.elicit(\n prompt=\"Please provide your API key:\",\n input_type=\"password\"\n )\n\n # Use the provided key\n return await api_call(resource_id, api_key)\n```\n\n**Context capabilities:**\n- `ctx.report_progress(progress, message)` - Report progress for long operations\n- `ctx.log_info(message, data)` / `ctx.log_error()` / `ctx.log_debug()` - Logging\n- `ctx.elicit(prompt, input_type)` - Request input from users\n- `ctx.fastmcp.name` - Access server configuration\n- `ctx.read_resource(uri)` - Read MCP resources\n\n### Resource Registration\n\nExpose data as resources for efficient, template-based access:\n\n```python\[email protected](\"file://documents/{name}\")\nasync def get_document(name: str) -> str:\n '''Expose documents as MCP resources.\n\n Resources are useful for static or semi-static data that doesn't\n require complex parameters. They use URI templates for flexible access.\n '''\n document_path = f\"./docs/{name}\"\n with open(document_path, \"r\") as f:\n return f.read()\n\[email protected](\"config://settings/{key}\")\nasync def get_setting(key: str, ctx: Context) -> str:\n '''Expose configuration as resources with context.'''\n settings = await load_settings()\n return json.dumps(settings.get(key, {}))\n```\n\n**When to use Resources vs Tools:**\n- **Resources**: For data access with simple parameters (URI templates)\n- **Tools**: For complex operations with validation and business logic\n\n### Structured Output Types\n\nFastMCP supports multiple return types beyond strings:\n\n```python\nfrom typing import TypedDict\nfrom dataclasses import dataclass\nfrom pydantic import BaseModel\n\n# TypedDict for structured returns\nclass UserData(TypedDict):\n id: str\n name: str\n email: str\n\[email protected]()\nasync def get_user_typed(user_id: str) -> UserData:\n '''Returns structured data - FastMCP handles serialization.'''\n return {\"id\": user_id, \"name\": \"John Doe\", \"email\": \"[email protected]\"}\n\n# Pydantic models for complex validation\nclass DetailedUser(BaseModel):\n id: str\n name: str\n email: str\n created_at: datetime\n metadata: Dict[str, Any]\n\[email protected]()\nasync def get_user_detailed(user_id: str) -> DetailedUser:\n '''Returns Pydantic model - automatically generates schema.'''\n user = await fetch_user(user_id)\n return DetailedUser(**user)\n```\n\n### Lifespan Management\n\nInitialize resources that persist across requests:\n\n```python\nfrom contextlib import asynccontextmanager\n\n@asynccontextmanager\nasync def app_lifespan():\n '''Manage resources that live for the server's lifetime.'''\n # Initialize connections, load config, etc.\n db = await connect_to_database()\n config = load_configuration()\n\n # Make available to all tools\n yield {\"db\": db, \"config\": config}\n\n # Cleanup on shutdown\n await db.close()\n\nmcp = FastMCP(\"example_mcp\", lifespan=app_lifespan)\n\[email protected]()\nasync def query_data(query: str, ctx: Context) -> str:\n '''Access lifespan resources through context.'''\n db = ctx.request_context.lifespan_state[\"db\"]\n results = await db.query(query)\n return format_results(results)\n```\n\n### Transport Options\n\nFastMCP supports two main transport mechanisms:\n\n```python\n# stdio transport (for local tools) - default\nif __name__ == \"__main__\":\n mcp.run()\n\n# Streamable HTTP transport (for remote servers)\nif __name__ == \"__main__\":\n mcp.run(transport=\"streamable_http\", port=8000)\n```\n\n**Transport selection:**\n- **stdio**: Command-line tools, local integrations, subprocess execution\n- **Streamable HTTP**: Web services, remote access, multiple clients\n\n---\n\n## Code Best Practices\n\n### Code Composability and Reusability\n\nYour implementation MUST prioritize composability and code reuse:\n\n1. **Extract Common Functionality**:\n - Create reusable helper functions for operations used across multiple tools\n - Build shared API clients for HTTP requests instead of duplicating code\n - Centralize error handling logic in utility functions\n - Extract business logic into dedicated functions that can be composed\n - Extract shared markdown or JSON field selection & formatting functionality\n\n2. **Avoid Duplication**:\n - NEVER copy-paste similar code between tools\n - If you find yourself writing similar logic twice, extract it into a function\n - Common operations like pagination, filtering, field selection, and formatting should be shared\n - Authentication/authorization logic should be centralized\n\n### Python-Specific Best Practices\n\n1. **Use Type Hints**: Always include type annotations for function parameters and return values\n2. **Pydantic Models**: Define clear Pydantic models for all input validation\n3. **Avoid Manual Validation**: Let Pydantic handle input validation with constraints\n4. **Proper Imports**: Group imports (standard library, third-party, local)\n5. **Error Handling**: Use specific exception types (httpx.HTTPStatusError, not generic Exception)\n6. **Async Context Managers**: Use `async with` for resources that need cleanup\n7. **Constants**: Define module-level constants in UPPER_CASE\n\n## Quality Checklist\n\nBefore finalizing your Python MCP server implementation, ensure:\n\n### Strategic Design\n- [ ] Tools enable complete workflows, not just API endpoint wrappers\n- [ ] Tool names reflect natural task subdivisions\n- [ ] Response formats optimize for agent context efficiency\n- [ ] Human-readable identifiers used where appropriate\n- [ ] Error messages guide agents toward correct usage\n\n### Implementation Quality\n- [ ] FOCUSED IMPLEMENTATION: Most important and valuable tools implemented\n- [ ] All tools have descriptive names and documentation\n- [ ] Return types are consistent across similar operations\n- [ ] Error handling is implemented for all external calls\n- [ ] Server name follows format: `{service}_mcp`\n- [ ] All network operations use async/await\n- [ ] Common functionality is extracted into reusable functions\n- [ ] Error messages are clear, actionable, and educational\n- [ ] Outputs are properly validated and formatted\n\n### Tool Configuration\n- [ ] All tools implement 'name' and 'annotations' in the decorator\n- [ ] Annotations correctly set (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)\n- [ ] All tools use Pydantic BaseModel for input validation with Field() definitions\n- [ ] All Pydantic Fields have explicit types and descriptions with constraints\n- [ ] All tools have comprehensive docstrings with explicit input/output types\n- [ ] Docstrings include complete schema structure for dict/JSON returns\n- [ ] Pydantic models handle input validation (no manual validation needed)\n\n### Advanced Features (where applicable)\n- [ ] Context injection used for logging, progress, or elicitation\n- [ ] Resources registered for appropriate data endpoints\n- [ ] Lifespan management implemented for persistent connections\n- [ ] Structured output types used (TypedDict, Pydantic models)\n- [ ] Appropriate transport configured (stdio or streamable HTTP)\n\n### Code Quality\n- [ ] File includes proper imports including Pydantic imports\n- [ ] Pagination is properly implemented where applicable\n- [ ] Filtering options are provided for potentially large result sets\n- [ ] All async functions are properly defined with `async def`\n- [ ] HTTP client usage follows async patterns with proper context managers\n- [ ] Type hints are used throughout the code\n- [ ] Constants are defined at module level in UPPER_CASE\n\n### Testing\n- [ ] Server runs successfully: `python your_server.py --help`\n- [ ] All imports resolve correctly\n- [ ] Sample tool calls work as expected\n- [ ] Error scenarios handled gracefully","content_type":"text/markdown; charset=utf-8","language":"markdown","size":25099,"content_sha256":"2da52f77e675191014ca2e146a4b95aa04d0ca7dd7e2b100322df15ade685e80"},{"filename":"scripts/connections.py","content":"\"\"\"Lightweight connection handling for MCP servers.\"\"\"\n\nfrom abc import ABC, abstractmethod\nfrom contextlib import AsyncExitStack\nfrom typing import Any\n\nfrom mcp import ClientSession, StdioServerParameters\nfrom mcp.client.sse import sse_client\nfrom mcp.client.stdio import stdio_client\nfrom mcp.client.streamable_http import streamablehttp_client\n\n\nclass MCPConnection(ABC):\n \"\"\"Base class for MCP server connections.\"\"\"\n\n def __init__(self):\n self.session = None\n self._stack = None\n\n @abstractmethod\n def _create_context(self):\n \"\"\"Create the connection context based on connection type.\"\"\"\n\n async def __aenter__(self):\n \"\"\"Initialize MCP server connection.\"\"\"\n self._stack = AsyncExitStack()\n await self._stack.__aenter__()\n\n try:\n ctx = self._create_context()\n result = await self._stack.enter_async_context(ctx)\n\n if len(result) == 2:\n read, write = result\n elif len(result) == 3:\n read, write, _ = result\n else:\n raise ValueError(f\"Unexpected context result: {result}\")\n\n session_ctx = ClientSession(read, write)\n self.session = await self._stack.enter_async_context(session_ctx)\n await self.session.initialize()\n return self\n except BaseException:\n await self._stack.__aexit__(None, None, None)\n raise\n\n async def __aexit__(self, exc_type, exc_val, exc_tb):\n \"\"\"Clean up MCP server connection resources.\"\"\"\n if self._stack:\n await self._stack.__aexit__(exc_type, exc_val, exc_tb)\n self.session = None\n self._stack = None\n\n async def list_tools(self) -> list[dict[str, Any]]:\n \"\"\"Retrieve available tools from the MCP server.\"\"\"\n response = await self.session.list_tools()\n return [\n {\n \"name\": tool.name,\n \"description\": tool.description,\n \"input_schema\": tool.inputSchema,\n }\n for tool in response.tools\n ]\n\n async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any:\n \"\"\"Call a tool on the MCP server with provided arguments.\"\"\"\n result = await self.session.call_tool(tool_name, arguments=arguments)\n return result.content\n\n\nclass MCPConnectionStdio(MCPConnection):\n \"\"\"MCP connection using standard input/output.\"\"\"\n\n def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None):\n super().__init__()\n self.command = command\n self.args = args or []\n self.env = env\n\n def _create_context(self):\n return stdio_client(\n StdioServerParameters(command=self.command, args=self.args, env=self.env)\n )\n\n\nclass MCPConnectionSSE(MCPConnection):\n \"\"\"MCP connection using Server-Sent Events.\"\"\"\n\n def __init__(self, url: str, headers: dict[str, str] = None):\n super().__init__()\n self.url = url\n self.headers = headers or {}\n\n def _create_context(self):\n return sse_client(url=self.url, headers=self.headers)\n\n\nclass MCPConnectionHTTP(MCPConnection):\n \"\"\"MCP connection using Streamable HTTP.\"\"\"\n\n def __init__(self, url: str, headers: dict[str, str] = None):\n super().__init__()\n self.url = url\n self.headers = headers or {}\n\n def _create_context(self):\n return streamablehttp_client(url=self.url, headers=self.headers)\n\n\ndef create_connection(\n transport: str,\n command: str = None,\n args: list[str] = None,\n env: dict[str, str] = None,\n url: str = None,\n headers: dict[str, str] = None,\n) -> MCPConnection:\n \"\"\"Factory function to create the appropriate MCP connection.\n\n Args:\n transport: Connection type (\"stdio\", \"sse\", or \"http\")\n command: Command to run (stdio only)\n args: Command arguments (stdio only)\n env: Environment variables (stdio only)\n url: Server URL (sse and http only)\n headers: HTTP headers (sse and http only)\n\n Returns:\n MCPConnection instance\n \"\"\"\n transport = transport.lower()\n\n if transport == \"stdio\":\n if not command:\n raise ValueError(\"Command is required for stdio transport\")\n return MCPConnectionStdio(command=command, args=args, env=env)\n\n elif transport == \"sse\":\n if not url:\n raise ValueError(\"URL is required for sse transport\")\n return MCPConnectionSSE(url=url, headers=headers)\n\n elif transport in [\"http\", \"streamable_http\", \"streamable-http\"]:\n if not url:\n raise ValueError(\"URL is required for http transport\")\n return MCPConnectionHTTP(url=url, headers=headers)\n\n else:\n raise ValueError(f\"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'\")\n","content_type":"text/x-python; charset=utf-8","language":"python","size":4875,"content_sha256":"9403668a2041568772082a8b334122c1f88daf0541fb393af4522d0094a47a6e"},{"filename":"scripts/evaluation.py","content":"\"\"\"MCP Server Evaluation Harness\n\nThis script evaluates MCP servers by running test questions against them using Claude.\n\"\"\"\n\nimport argparse\nimport asyncio\nimport json\nimport re\nimport sys\nimport time\nimport traceback\nimport xml.etree.ElementTree as ET\nfrom pathlib import Path\nfrom typing import Any\n\nfrom anthropic import Anthropic\n\nfrom connections import create_connection\n\nEVALUATION_PROMPT = \"\"\"You are an AI assistant with access to tools.\n\nWhen given a task, you MUST:\n1. Use the available tools to complete the task\n2. Provide summary of each step in your approach, wrapped in \u003csummary> tags\n3. Provide feedback on the tools provided, wrapped in \u003cfeedback> tags\n4. Provide your final response, wrapped in \u003cresponse> tags\n\nSummary Requirements:\n- In your \u003csummary> tags, you must explain:\n - The steps you took to complete the task\n - Which tools you used, in what order, and why\n - The inputs you provided to each tool\n - The outputs you received from each tool\n - A summary for how you arrived at the response\n\nFeedback Requirements:\n- In your \u003cfeedback> tags, provide constructive feedback on the tools:\n - Comment on tool names: Are they clear and descriptive?\n - Comment on input parameters: Are they well-documented? Are required vs optional parameters clear?\n - Comment on descriptions: Do they accurately describe what the tool does?\n - Comment on any errors encountered during tool usage: Did the tool fail to execute? Did the tool return too many tokens?\n - Identify specific areas for improvement and explain WHY they would help\n - Be specific and actionable in your suggestions\n\nResponse Requirements:\n- Your response should be concise and directly address what was asked\n- Always wrap your final response in \u003cresponse> tags\n- If you cannot solve the task return \u003cresponse>NOT_FOUND\u003c/response>\n- For numeric responses, provide just the number\n- For IDs, provide just the ID\n- For names or text, provide the exact text requested\n- Your response should go last\"\"\"\n\n\ndef parse_evaluation_file(file_path: Path) -> list[dict[str, Any]]:\n \"\"\"Parse XML evaluation file with qa_pair elements.\"\"\"\n try:\n tree = ET.parse(file_path)\n root = tree.getroot()\n evaluations = []\n\n for qa_pair in root.findall(\".//qa_pair\"):\n question_elem = qa_pair.find(\"question\")\n answer_elem = qa_pair.find(\"answer\")\n\n if question_elem is not None and answer_elem is not None:\n evaluations.append({\n \"question\": (question_elem.text or \"\").strip(),\n \"answer\": (answer_elem.text or \"\").strip(),\n })\n\n return evaluations\n except Exception as e:\n print(f\"Error parsing evaluation file {file_path}: {e}\")\n return []\n\n\ndef extract_xml_content(text: str, tag: str) -> str | None:\n \"\"\"Extract content from XML tags.\"\"\"\n pattern = rf\"\u003c{tag}>(.*?)\u003c/{tag}>\"\n matches = re.findall(pattern, text, re.DOTALL)\n return matches[-1].strip() if matches else None\n\n\nasync def agent_loop(\n client: Anthropic,\n model: str,\n question: str,\n tools: list[dict[str, Any]],\n connection: Any,\n) -> tuple[str, dict[str, Any]]:\n \"\"\"Run the agent loop with MCP tools.\"\"\"\n messages = [{\"role\": \"user\", \"content\": question}]\n\n response = await asyncio.to_thread(\n client.messages.create,\n model=model,\n max_tokens=4096,\n system=EVALUATION_PROMPT,\n messages=messages,\n tools=tools,\n )\n\n messages.append({\"role\": \"assistant\", \"content\": response.content})\n\n tool_metrics = {}\n\n while response.stop_reason == \"tool_use\":\n tool_use = next(block for block in response.content if block.type == \"tool_use\")\n tool_name = tool_use.name\n tool_input = tool_use.input\n\n tool_start_ts = time.time()\n try:\n tool_result = await connection.call_tool(tool_name, tool_input)\n tool_response = json.dumps(tool_result) if isinstance(tool_result, (dict, list)) else str(tool_result)\n except Exception as e:\n tool_response = f\"Error executing tool {tool_name}: {str(e)}\\n\"\n tool_response += traceback.format_exc()\n tool_duration = time.time() - tool_start_ts\n\n if tool_name not in tool_metrics:\n tool_metrics[tool_name] = {\"count\": 0, \"durations\": []}\n tool_metrics[tool_name][\"count\"] += 1\n tool_metrics[tool_name][\"durations\"].append(tool_duration)\n\n messages.append({\n \"role\": \"user\",\n \"content\": [{\n \"type\": \"tool_result\",\n \"tool_use_id\": tool_use.id,\n \"content\": tool_response,\n }]\n })\n\n response = await asyncio.to_thread(\n client.messages.create,\n model=model,\n max_tokens=4096,\n system=EVALUATION_PROMPT,\n messages=messages,\n tools=tools,\n )\n messages.append({\"role\": \"assistant\", \"content\": response.content})\n\n response_text = next(\n (block.text for block in response.content if hasattr(block, \"text\")),\n None,\n )\n return response_text, tool_metrics\n\n\nasync def evaluate_single_task(\n client: Anthropic,\n model: str,\n qa_pair: dict[str, Any],\n tools: list[dict[str, Any]],\n connection: Any,\n task_index: int,\n) -> dict[str, Any]:\n \"\"\"Evaluate a single QA pair with the given tools.\"\"\"\n start_time = time.time()\n\n print(f\"Task {task_index + 1}: Running task with question: {qa_pair['question']}\")\n response, tool_metrics = await agent_loop(client, model, qa_pair[\"question\"], tools, connection)\n\n response_value = extract_xml_content(response, \"response\")\n summary = extract_xml_content(response, \"summary\")\n feedback = extract_xml_content(response, \"feedback\")\n\n duration_seconds = time.time() - start_time\n\n return {\n \"question\": qa_pair[\"question\"],\n \"expected\": qa_pair[\"answer\"],\n \"actual\": response_value,\n \"score\": int(response_value == qa_pair[\"answer\"]) if response_value else 0,\n \"total_duration\": duration_seconds,\n \"tool_calls\": tool_metrics,\n \"num_tool_calls\": sum(len(metrics[\"durations\"]) for metrics in tool_metrics.values()),\n \"summary\": summary,\n \"feedback\": feedback,\n }\n\n\nREPORT_HEADER = \"\"\"\n# Evaluation Report\n\n## Summary\n\n- **Accuracy**: {correct}/{total} ({accuracy:.1f}%)\n- **Average Task Duration**: {average_duration_s:.2f}s\n- **Average Tool Calls per Task**: {average_tool_calls:.2f}\n- **Total Tool Calls**: {total_tool_calls}\n\n---\n\"\"\"\n\nTASK_TEMPLATE = \"\"\"\n### Task {task_num}\n\n**Question**: {question}\n**Ground Truth Answer**: `{expected_answer}`\n**Actual Answer**: `{actual_answer}`\n**Correct**: {correct_indicator}\n**Duration**: {total_duration:.2f}s\n**Tool Calls**: {tool_calls}\n\n**Summary**\n{summary}\n\n**Feedback**\n{feedback}\n\n---\n\"\"\"\n\n\nasync def run_evaluation(\n eval_path: Path,\n connection: Any,\n model: str = \"claude-3-7-sonnet-20250219\",\n) -> str:\n \"\"\"Run evaluation with MCP server tools.\"\"\"\n print(\"🚀 Starting Evaluation\")\n\n client = Anthropic()\n\n tools = await connection.list_tools()\n print(f\"📋 Loaded {len(tools)} tools from MCP server\")\n\n qa_pairs = parse_evaluation_file(eval_path)\n print(f\"📋 Loaded {len(qa_pairs)} evaluation tasks\")\n\n results = []\n for i, qa_pair in enumerate(qa_pairs):\n print(f\"Processing task {i + 1}/{len(qa_pairs)}\")\n result = await evaluate_single_task(client, model, qa_pair, tools, connection, i)\n results.append(result)\n\n correct = sum(r[\"score\"] for r in results)\n accuracy = (correct / len(results)) * 100 if results else 0\n average_duration_s = sum(r[\"total_duration\"] for r in results) / len(results) if results else 0\n average_tool_calls = sum(r[\"num_tool_calls\"] for r in results) / len(results) if results else 0\n total_tool_calls = sum(r[\"num_tool_calls\"] for r in results)\n\n report = REPORT_HEADER.format(\n correct=correct,\n total=len(results),\n accuracy=accuracy,\n average_duration_s=average_duration_s,\n average_tool_calls=average_tool_calls,\n total_tool_calls=total_tool_calls,\n )\n\n report += \"\".join([\n TASK_TEMPLATE.format(\n task_num=i + 1,\n question=qa_pair[\"question\"],\n expected_answer=qa_pair[\"answer\"],\n actual_answer=result[\"actual\"] or \"N/A\",\n correct_indicator=\"✅\" if result[\"score\"] else \"❌\",\n total_duration=result[\"total_duration\"],\n tool_calls=json.dumps(result[\"tool_calls\"], indent=2),\n summary=result[\"summary\"] or \"N/A\",\n feedback=result[\"feedback\"] or \"N/A\",\n )\n for i, (qa_pair, result) in enumerate(zip(qa_pairs, results))\n ])\n\n return report\n\n\ndef parse_headers(header_list: list[str]) -> dict[str, str]:\n \"\"\"Parse header strings in format 'Key: Value' into a dictionary.\"\"\"\n headers = {}\n if not header_list:\n return headers\n\n for header in header_list:\n if \":\" in header:\n key, value = header.split(\":\", 1)\n headers[key.strip()] = value.strip()\n else:\n print(f\"Warning: Ignoring malformed header: {header}\")\n return headers\n\n\ndef parse_env_vars(env_list: list[str]) -> dict[str, str]:\n \"\"\"Parse environment variable strings in format 'KEY=VALUE' into a dictionary.\"\"\"\n env = {}\n if not env_list:\n return env\n\n for env_var in env_list:\n if \"=\" in env_var:\n key, value = env_var.split(\"=\", 1)\n env[key.strip()] = value.strip()\n else:\n print(f\"Warning: Ignoring malformed environment variable: {env_var}\")\n return env\n\n\nasync def main():\n parser = argparse.ArgumentParser(\n description=\"Evaluate MCP servers using test questions\",\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=\"\"\"\nExamples:\n # Evaluate a local stdio MCP server\n python evaluation.py -t stdio -c python -a my_server.py eval.xml\n\n # Evaluate an SSE MCP server\n python evaluation.py -t sse -u https://example.com/mcp -H \"Authorization: Bearer token\" eval.xml\n\n # Evaluate an HTTP MCP server with custom model\n python evaluation.py -t http -u https://example.com/mcp -m claude-3-5-sonnet-20241022 eval.xml\n \"\"\",\n )\n\n parser.add_argument(\"eval_file\", type=Path, help=\"Path to evaluation XML file\")\n parser.add_argument(\"-t\", \"--transport\", choices=[\"stdio\", \"sse\", \"http\"], default=\"stdio\", help=\"Transport type (default: stdio)\")\n parser.add_argument(\"-m\", \"--model\", default=\"claude-3-7-sonnet-20250219\", help=\"Claude model to use (default: claude-3-7-sonnet-20250219)\")\n\n stdio_group = parser.add_argument_group(\"stdio options\")\n stdio_group.add_argument(\"-c\", \"--command\", help=\"Command to run MCP server (stdio only)\")\n stdio_group.add_argument(\"-a\", \"--args\", nargs=\"+\", help=\"Arguments for the command (stdio only)\")\n stdio_group.add_argument(\"-e\", \"--env\", nargs=\"+\", help=\"Environment variables in KEY=VALUE format (stdio only)\")\n\n remote_group = parser.add_argument_group(\"sse/http options\")\n remote_group.add_argument(\"-u\", \"--url\", help=\"MCP server URL (sse/http only)\")\n remote_group.add_argument(\"-H\", \"--header\", nargs=\"+\", dest=\"headers\", help=\"HTTP headers in 'Key: Value' format (sse/http only)\")\n\n parser.add_argument(\"-o\", \"--output\", type=Path, help=\"Output file for evaluation report (default: stdout)\")\n\n args = parser.parse_args()\n\n if not args.eval_file.exists():\n print(f\"Error: Evaluation file not found: {args.eval_file}\")\n sys.exit(1)\n\n headers = parse_headers(args.headers) if args.headers else None\n env_vars = parse_env_vars(args.env) if args.env else None\n\n try:\n connection = create_connection(\n transport=args.transport,\n command=args.command,\n args=args.args,\n env=env_vars,\n url=args.url,\n headers=headers,\n )\n except ValueError as e:\n print(f\"Error: {e}\")\n sys.exit(1)\n\n print(f\"🔗 Connecting to MCP server via {args.transport}...\")\n\n async with connection:\n print(\"✅ Connected successfully\")\n report = await run_evaluation(args.eval_file, connection, args.model)\n\n if args.output:\n args.output.write_text(report)\n print(f\"\\n✅ Report saved to {args.output}\")\n else:\n print(\"\\n\" + report)\n\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n","content_type":"text/x-python; charset=utf-8","language":"python","size":12579,"content_sha256":"49ed1d17cdce5da101b210197740713f49b935c29d4f339542a14b132658e6f7"},{"filename":"scripts/example_evaluation.xml","content":"\u003cevaluation>\n \u003cqa_pair>\n \u003cquestion>Calculate the compound interest on $10,000 invested at 5% annual interest rate, compounded monthly for 3 years. What is the final amount in dollars (rounded to 2 decimal places)?\u003c/question>\n \u003canswer>11614.72\u003c/answer>\n \u003c/qa_pair>\n \u003cqa_pair>\n \u003cquestion>A projectile is launched at a 45-degree angle with an initial velocity of 50 m/s. Calculate the total distance (in meters) it has traveled from the launch point after 2 seconds, assuming g=9.8 m/s². Round to 2 decimal places.\u003c/question>\n \u003canswer>87.25\u003c/answer>\n \u003c/qa_pair>\n \u003cqa_pair>\n \u003cquestion>A sphere has a volume of 500 cubic meters. Calculate its surface area in square meters. Round to 2 decimal places.\u003c/question>\n \u003canswer>304.65\u003c/answer>\n \u003c/qa_pair>\n \u003cqa_pair>\n \u003cquestion>Calculate the population standard deviation of this dataset: [12, 15, 18, 22, 25, 30, 35]. Round to 2 decimal places.\u003c/question>\n \u003canswer>7.61\u003c/answer>\n \u003c/qa_pair>\n \u003cqa_pair>\n \u003cquestion>Calculate the pH of a solution with a hydrogen ion concentration of 3.5 × 10^-5 M. Round to 2 decimal places.\u003c/question>\n \u003canswer>4.46\u003c/answer>\n \u003c/qa_pair>\n\u003c/evaluation>\n","content_type":"application/xml","language":"xml","size":1194,"content_sha256":"9272b348ddcc4b06ba562367ccd0770e018158c0068ac5116d5e34aaeff8777a"},{"filename":"scripts/requirements.txt","content":"anthropic>=0.39.0\nmcp>=1.1.0\n","content_type":"text/plain; charset=utf-8","language":null,"size":29,"content_sha256":"d5d7558b2368ecea9dfeed7d1fbc71ee9e0750bebd1282faa527d528a344c3c7"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"MCP Builder","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Purpose","type":"text"}]},{"type":"paragraph","content":[{"text":"Use this skill to build high-quality MCP servers that let agents interact with external APIs and services through well-designed tools, resources, and prompts.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Development Phases","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Research MCP design and the target API. Understand authentication, objects, rate limits, pagination, common workflows, and destructive operations.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Plan the tool surface. Balance broad API coverage with higher-level workflow tools when those workflows are common and valuable.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Implement infrastructure: API client, auth, error handling, response formatting, pagination, and transport.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Implement tools with clear names, schemas, descriptions, annotations, and structured outputs when supported.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Build and test with the MCP Inspector or equivalent client.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Create realistic read-only evaluations that prove agents can answer useful questions with the server.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Design Rules","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use descriptive, action-oriented tool names with consistent service prefixes.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Make input schemas strict and self-explanatory.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Return focused data; include pagination and filters for large collections.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write actionable errors that tell the agent what to do next.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Mark destructive, read-only, idempotent, and open-world behavior with annotations.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Prefer TypeScript for broad SDK compatibility unless Python is better for the environment.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Supporting Material","type":"text"}]},{"type":"paragraph","content":[{"text":"Read bundled MCP best practices first, then the TypeScript or Python implementation guide. Use the evaluation reference before declaring an MCP server production-ready.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"mcp-builder","tags":["mcp","model-context-protocol","tools","typescript","python"],"author":"@skillopedia","source":{"stars":145114,"repo_name":"skills","origin_url":"https://github.com/anthropics/skills/blob/HEAD/skills/mcp-builder/SKILL.md","repo_owner":"anthropics","body_sha256":"007b09beb3d9d01c7247df9eaa4709b4212f6751e2fc803c8bd468ac1de3becc","cluster_key":"9c749e86e79ce0704f1cec38c77f1999907d22abccc4f98b68b021fa3e0a79dd","clean_bundle":{"format":"clean-skill-bundle-v1","source":"anthropics/skills/skills/mcp-builder/SKILL.md","attachments":[{"id":"f2c2988f-6edc-5485-b2c5-7ceccf4e4dd4","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f2c2988f-6edc-5485-b2c5-7ceccf4e4dd4/attachment.md","path":"reference/evaluation.md","size":21663,"sha256":"8c99479f8a2d22a636c38e274537aac3610879e26f34e0709825077c4576f427","contentType":"text/markdown; charset=utf-8"},{"id":"060ddb8a-f21e-528a-acc6-d749a41ed45a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/060ddb8a-f21e-528a-acc6-d749a41ed45a/attachment.md","path":"reference/mcp_best_practices.md","size":7330,"sha256":"80fb4369a349447cf18ecdd7494fe7938b6065377e9f08c077cec411093a3007","contentType":"text/markdown; charset=utf-8"},{"id":"5a846d2c-0000-5e5c-b19a-9aa9201a85a5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5a846d2c-0000-5e5c-b19a-9aa9201a85a5/attachment.md","path":"reference/node_mcp_server.md","size":28550,"sha256":"c3ba35a4f599dd53be9c6555ae72c19a7bf412cd5426576c2c08d42755482c66","contentType":"text/markdown; charset=utf-8"},{"id":"3c1002de-47c4-5095-be3f-ff73cea62e90","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3c1002de-47c4-5095-be3f-ff73cea62e90/attachment.md","path":"reference/python_mcp_server.md","size":25099,"sha256":"2da52f77e675191014ca2e146a4b95aa04d0ca7dd7e2b100322df15ade685e80","contentType":"text/markdown; charset=utf-8"},{"id":"84584a99-4a54-50c5-8642-e9e490aad96b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/84584a99-4a54-50c5-8642-e9e490aad96b/attachment.py","path":"scripts/connections.py","size":4875,"sha256":"9403668a2041568772082a8b334122c1f88daf0541fb393af4522d0094a47a6e","contentType":"text/x-python; charset=utf-8"},{"id":"2bf7e2d1-8085-5e03-826e-9b55d1834abd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2bf7e2d1-8085-5e03-826e-9b55d1834abd/attachment.py","path":"scripts/evaluation.py","size":12579,"sha256":"49ed1d17cdce5da101b210197740713f49b935c29d4f339542a14b132658e6f7","contentType":"text/x-python; charset=utf-8"},{"id":"87e1e63b-588b-52a3-9a3a-0d802ab30953","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/87e1e63b-588b-52a3-9a3a-0d802ab30953/attachment.xml","path":"scripts/example_evaluation.xml","size":1194,"sha256":"9272b348ddcc4b06ba562367ccd0770e018158c0068ac5116d5e34aaeff8777a","contentType":"application/xml"},{"id":"a8dbd690-3d23-5f1c-a028-dd63cd2984d1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a8dbd690-3d23-5f1c-a028-dd63cd2984d1/attachment.txt","path":"scripts/requirements.txt","size":29,"sha256":"d5d7558b2368ecea9dfeed7d1fbc71ee9e0750bebd1282faa527d528a344c3c7","contentType":"text/plain; charset=utf-8"}],"bundle_sha256":"eb3e3140445d5a9e25e6e1e1c45c574b09ffda492e0564a6473ba22111ef071f","attachment_count":8,"text_attachments":8,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":22,"skill_md_path":"skills/mcp-builder/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"testing-qa","category_label":"Testing"},"exact_dupes_collapsed_into_this":21},"license":"See bundled LICENSE.txt.","version":"v1","category":"testing-qa","import_tag":"clean-skills-v1","description":"Design, implement, test, and evaluate Model Context Protocol servers for external APIs and services. Use when building MCP servers in TypeScript, Python, or another supported runtime."}},"renderedAt":1782980213817}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.