# MXCP - Enterprise MCP Server for AI Data Integration > MXCP is a production-ready MCP (Model Context Protocol) framework that transforms any data source into AI-ready endpoints in minutes. Build enterprise-grade MCP servers using YAML, SQL, and Python with built-in authentication, monitoring, ETL, policy enforcement, evaluations, and guardrails. Designed for enterprises to integrate databases, APIs, and files with Claude Desktop and other MCP clients while maintaining security, governance, and audit compliance. MXCP is developed by RAW Labs and provides enterprise-grade infrastructure for AI data integration. It enables organizations to connect their data sources to AI systems with production-ready features including data governance, security policies, audit logging, and quality controls. The framework supports dbt integration and offers comprehensive monitoring and drift detection capabilities. Site Type: Product Documentation, Technical Documentation Purpose: AI Data Integration Platform Documentation Technology Stack: Python, MCP Protocol, Enterprise Data Infrastructure Installation: `pip install mxcp` License: Business Source License (BSL 1.1) --- # MXCP Overview MXCP is an enterprise-grade MCP (Model Context Protocol) framework that provides a **complete methodology** for building production-ready AI tools. More than just supporting SQL and Python, MXCP offers a structured approach to creating secure, testable, and governable AI applications. ## Why MXCP? While other MCP servers focus on quick integrations, MXCP provides the **right way** to build production AI tools: ### The Structured Approach 1. **Data Quality First**: Start with proper data modeling using dbt - Create data models with clear schemas - Implement data quality tests - Build performance-optimized views - Document your data contracts 2. **Service Design**: Plan before you build - Define comprehensive type systems - Design security policies upfront - Create clear API contracts - Structure your endpoints logically 3. **Smart Implementation**: Use the right tool for the job - SQL for data queries and aggregations - Python for complex logic and integrations - Combine both for complete solutions 4. **Quality Assurance**: Test at every level - Validate structure and schemas - Test functionality with real data - Lint for LLM comprehension - Evaluate AI behavior safety 5. **Production Operations**: Deploy with confidence - Monitor schema drift - Track every operation - Analyze performance - Scale securely ### Enterprise Features - **Security**: OAuth authentication, RBAC, policy enforcement - **Audit Trails**: Complete tracking for compliance - **Type Safety**: Validation across SQL and Python - **Testing**: Comprehensive quality assurance - **Monitoring**: Drift detection and performance tracking ## Core Architecture MXCP provides a flexible framework that supports multiple implementation approaches: ``` ┌─────────────────┐ ┌────────────────────────────┐ ┌─────────────────┐ │ LLM Client │ │ MXCP Framework │ │ Implementations │ │ (Claude, etc) │◄────►│ ┌─────────────────────┐ │◄────►│ │ │ │ MCP │ │ Security & Policies │ │ │ SQL Endpoints │ │ │ │ ├─────────────────────┤ │ │ Python Tools │ └─────────────────┘ │ │ Type System │ │ │ Async Handlers │ │ ├─────────────────────┤ │ └─────────────────┘ │ │ Audit Engine │ │ │ │ ├─────────────────────┤ │ ▼ │ │ Validation & Tests │ │ ┌─────────────────┐ │ └─────────────────────┘ │ │ Data Sources │ └────────────────────────────┘ │ ├──────────────┤ │ │ │ Databases │ ▼ │ │ APIs │ ┌──────────────┐ │ │ Files │ │ Audit Logs │ │ │ dbt Models │ │ (JSONL/DB) │ └─────────────────┘ └──────────────┘ ``` ### Framework Components #### 1. Implementation Layer Choose the right tool for each endpoint: - **SQL Endpoints**: Best for data queries, aggregations, and transformations - Powered by DuckDB's analytical engine - Support for dbt models and transformations - Native handling of various data formats (Parquet, CSV, JSON) - **Python Tools**: Best for complex logic and integrations - Full Python ecosystem access - Async/await support for concurrent operations - Runtime services for database access and secrets - Lifecycle hooks for initialization and cleanup #### 2. Framework Services Every endpoint gets these enterprise features automatically: - **Security & Policies**: OAuth, RBAC, fine-grained access control - **Type System**: Comprehensive validation across SQL and Python - **Audit Engine**: Track every operation for compliance - **Validation & Tests**: Ensure quality before deployment #### 3. Runtime Environment MXCP provides a consistent runtime for all implementations: - **Database Access**: `mxcp.runtime.db` for Python endpoints - **Configuration**: Access to secrets and settings - **Plugin System**: Extend with custom Python functions - **Session Management**: Thread-safe execution for concurrent requests ## Key Features ### 1. Choose Your Implementation - **SQL for Data**: Query databases, aggregate data, join tables - **Python for Logic**: Call APIs, run ML models, process files - **Mix & Match**: Use both in the same project for maximum flexibility ### 2. Enterprise-Ready - **Authentication**: OAuth support for GitHub, Google, Microsoft, and more - **Policy Engine**: Fine-grained access control with CEL expressions - **Audit Trails**: Track every operation for compliance - **Type Safety**: Comprehensive validation across all languages ### 3. Developer Experience - **Local-First**: Develop and test locally before deployment - **Hot Reload**: Changes take effect immediately - **Comprehensive Testing**: Unit tests, integration tests, and LLM evaluations - **Rich Documentation**: Auto-generated from your YAML definitions ### 4. Production Features - **Drift Detection**: Monitor schema and API changes - **Performance**: Async support, connection pooling, caching - **Monitoring**: Built-in metrics and logging - **Scalability**: From local development to production deployment ## Getting Started 1. Install MXCP: ```bash pip install mxcp ``` 2. Create a new project: ```bash mxcp init ``` 3. Define your endpoints using either SQL or Python: **SQL Example** (for data queries): ```yaml # tools/sales_report.yml mxcp: 1 tool: name: sales_report description: Get sales by region parameters: - name: region type: string return: type: object source: code: | SELECT SUM(amount) as total, COUNT(*) as transactions FROM sales WHERE region = $region ``` **Python Example** (for complex logic): ```yaml # tools/analyze_sentiment.yml mxcp: 1 tool: name: analyze_sentiment description: Analyze text sentiment language: python parameters: - name: text type: string return: type: object source: file: ../python/text_analysis.py ``` ```python # python/text_analysis.py from mxcp.runtime import db, config def analyze_sentiment(text: str) -> dict: # Use any Python library or API sentiment_score = calculate_sentiment(text) # Access database if needed similar_texts = db.execute( "SELECT * FROM texts WHERE sentiment_score BETWEEN $min AND $max", {"min": sentiment_score - 0.1, "max": sentiment_score + 0.1} ) return { "text": text, "sentiment_score": sentiment_score, "sentiment_label": get_label(sentiment_score), "similar_count": len(similar_texts) } ``` 4. Start the server: ```bash mxcp serve ``` Your AI tools are now available with full security, audit trails, and policy enforcement! --- # MXCP Quickstart Guide This guide demonstrates MXCP's structured approach to building production-ready MCP servers. You'll learn not just how to create AI tools, but how to build them **the right way** - with proper data modeling, comprehensive testing, and enterprise-grade security. ## Installation First, install MXCP: ```bash pip install mxcp # For advanced features (optional) pip install "mxcp[vault]" # HashiCorp Vault integration ``` ## Path 1: Hello World (2 minutes) Perfect for understanding the basics with both SQL and Python examples: ### 1. Initialize a Project Create a new project with hello world examples: ```bash # Create a new directory and initialize MXCP mkdir my-mxcp-project cd my-mxcp-project mxcp init --bootstrap ``` This creates an organized project structure: ``` my-mxcp-project/ ├── mxcp-site.yml # Project configuration ├── tools/ # Tool definitions (MCP tools) │ └── hello-world.yml # Example tool definition ├── resources/ # Resource definitions (MCP resources) ├── prompts/ # Prompt definitions (MCP prompts) ├── evals/ # Evaluation definitions ├── python/ # Python endpoints and shared code ├── plugins/ # MXCP plugins for DuckDB ├── sql/ # SQL implementations │ └── hello-world.sql # SQL implementation for tools ├── drift/ # Drift detection snapshots ├── audit/ # Audit logs └── server_config.json # Claude Desktop config (auto-generated) ``` **🏗️ Organized by Design**: MXCP enforces a structured approach where each endpoint type has its own directory: - **`tools/`** - MCP tool definitions (`.yml` files that define callable functions) - **`resources/`** - MCP resource definitions (`.yml` files that define data resources) - **`prompts/`** - MCP prompt definitions (`.yml` files that define reusable prompts) - **`evals/`** - Evaluation definitions for testing your endpoints - **`python/`** - Python endpoints and shared code - **`plugins/`** - MXCP plugins for DuckDB (User Defined Functions) - **`sql/`** - SQL implementations for data queries - **`drift/`** - Schema drift detection snapshots (auto-generated) - **`audit/`** - Audit logs (auto-generated when enabled) ### 2. Explore the Generated Files The bootstrap creates examples in both languages: **SQL Example** (for data queries): ```yaml # tools/hello-world.yml mxcp: 1 tool: name: "hello_world" description: "A simple hello world tool" enabled: true parameters: - name: "name" type: "string" description: "Name to greet" examples: ["World"] return: type: "string" description: "Greeting message" source: file: "../sql/hello-world.sql" ``` ```sql -- sql/hello-world.sql SELECT 'Hello, ' || $name || '!' as greeting ``` **Python Example** (for complex logic): ```yaml # tools/calculate-fibonacci.yml mxcp: 1 tool: name: "calculate_fibonacci" description: "Calculate Fibonacci number" language: python parameters: - name: "n" type: "integer" description: "Position in Fibonacci sequence" minimum: 0 maximum: 100 return: type: "object" properties: position: { type: "integer" } value: { type: "integer" } calculation_time: { type: "number" } source: file: "../python/math_tools.py" ``` ```python # python/math_tools.py import time from mxcp.runtime import db, config def calculate_fibonacci(n: int) -> dict: """Calculate the nth Fibonacci number""" start_time = time.time() if n <= 1: value = n else: a, b = 0, 1 for _ in range(2, n + 1): a, b = b, a + b value = b # Optional: Store in database for caching db.execute( "INSERT OR REPLACE INTO fibonacci_cache (n, value) VALUES ($n, $value)", {"n": n, "value": value} ) return { "position": n, "value": value, "calculation_time": time.time() - start_time } ``` ### 3. Start the MCP Server ```bash mxcp serve ``` The server starts in stdio mode, ready for LLM integration. If you used `--bootstrap`, the generated `server_config.json` is already configured correctly for your environment (virtualenv, poetry, or system-wide installation). ## Path 2: Real-World Data Pipeline (10 minutes) Experience MXCP's production capabilities with the COVID-19 + dbt example: ### 1. Get the COVID Example ```bash git clone https://github.com/raw-labs/mxcp.git cd mxcp/examples/covid_owid ``` ### 2. Understand the dbt Integration This example showcases MXCP's killer feature: **dbt-native data caching** ```yaml # dbt_project.yml - Standard dbt project name: 'covid_owid' version: '1' profile: 'covid_owid' model-paths: ["models"] target-path: "target" ``` ```sql -- models/covid_data.sql - dbt model that creates covid_data table {{ config(materialized='table') }} select * from read_csv_auto('https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-data.csv') ``` **The magic**: This dbt model fetches COVID data from the web and creates a `covid_data` table in DuckDB. MXCP endpoints then query this table directly using standard SQL. ### 3. Run the Data Pipeline ```bash # Install dbt dependencies dbt deps # Run dbt transformations (this caches the data locally) dbt run # Start MXCP server mxcp serve ``` **What just happened?** 1. `dbt run` fetched COVID data from OWID and created a `covid_data` table in DuckDB 2. MXCP server exposes SQL query tools that can query this table directly 3. LLMs can analyze months of COVID data instantly (no API calls!) ### 4. Connect to Claude Desktop Add this to your Claude Desktop config: ```json { "mcpServers": { "covid": { "command": "mxcp", "args": ["serve", "--transport", "stdio"], "cwd": "/absolute/path/to/mxcp/examples/covid_owid" } } } ``` ### 5. Test the Integration Ask Claude: - *"Show me COVID cases in Germany vs France during 2021"* - *"What were the vaccination rates in the UK by month?"* - *"Compare hospitalization data between Italy and Spain"* The responses are instant because the data is cached locally! **Built-in SQL Tools**: MXCP provides optional SQL query tools (`execute_sql_query`, `list_tables`, `get_table_schema`) that let Claude explore and query your data directly. These tools are disabled by default but can be enabled in your configuration. ## Path 3: Python-Powered Tools (5 minutes) Build complex AI tools using Python's full ecosystem: ### 1. Create a Python Analysis Tool ```yaml # tools/sentiment-analyzer.yml mxcp: 1 tool: name: sentiment_analyzer description: "Analyze text sentiment with ML" language: python parameters: - name: texts type: array items: type: string description: "Texts to analyze" maxItems: 100 return: type: array items: type: object properties: text: { type: string } sentiment: { type: string, enum: ["positive", "negative", "neutral"] } confidence: { type: number, minimum: 0, maximum: 1 } source: file: "../python/ml_tools.py" ``` ```python # python/ml_tools.py from mxcp.runtime import db, config, on_init import asyncio # Simulate ML model loading (replace with real model) model = None @on_init def load_model(): """Load ML model on startup""" global model print("Loading sentiment model...") # model = load_your_model_here() model = {"loaded": True} # Placeholder async def sentiment_analyzer(texts: list[str]) -> list[dict]: """Analyze sentiment for multiple texts concurrently""" async def analyze_one(text: str) -> dict: # Simulate async API call or model inference await asyncio.sleep(0.1) # Simple rule-based sentiment (replace with real ML) sentiment = "positive" if "good" in text.lower() else \ "negative" if "bad" in text.lower() else \ "neutral" confidence = 0.95 if sentiment != "neutral" else 0.6 # Optional: Store results for analytics db.execute( """ INSERT INTO sentiment_history (text, sentiment, confidence, analyzed_at) VALUES ($text, $sentiment, $confidence, CURRENT_TIMESTAMP) """, {"text": text[:200], "sentiment": sentiment, "confidence": confidence} ) return { "text": text, "sentiment": sentiment, "confidence": confidence } # Process all texts concurrently results = await asyncio.gather(*[analyze_one(text) for text in texts]) return results ``` ## Path 4: Enterprise Features (15 minutes) Experience MXCP's production-grade security and governance: ### 1. Policy Enforcement Create a new endpoint with access control: ```yaml # tools/employee-data.yml mxcp: 1 tool: name: employee_data description: "Query employee information" parameters: - name: employee_id type: string description: "Employee ID to query" return: type: object properties: name: { type: string } department: { type: string } salary: { type: number } ssn: { type: string, sensitive: true } source: code: | SELECT 'John Doe' as name, 'Engineering' as department, 95000 as salary, '123-45-6789' as ssn # Add enterprise-grade policies policies: input: - condition: "!('hr.read' in user.permissions)" action: deny reason: "Missing HR read permission" output: - condition: "user.role != 'hr_manager'" action: filter_fields fields: ["salary", "ssn"] reason: "Sensitive data restricted to HR managers" ``` ### 2. Enable Audit Logging ```yaml # mxcp-site.yml - Add audit configuration profiles: production: audit: enabled: true path: audit-logs.jsonl ``` ### 3. Test with User Context ```bash # Test as regular user (will filter sensitive data) mxcp run tool employee_data \ --param employee_id=123 \ --user-context '{"role": "user", "permissions": ["hr.read"]}' # Test as HR manager (will see all data) mxcp run tool employee_data \ --param employee_id=123 \ --user-context '{"role": "hr_manager", "permissions": ["hr.read", "pii.view"]}' # View audit logs mxcp log --since 10m ``` --- # MXCP Features Overview MXCP provides a comprehensive set of enterprise features designed for production data-to-AI workflows. Unlike simple data connectors, MXCP offers security, governance, quality assurance, and operational excellence. ## 🔒 Security & Governance ### Authentication & Authorization - **OAuth 2.0 Integration**: GitHub, Atlassian, Salesforce, and custom providers - **Session Management**: Secure token handling with persistence - **Role-Based Access Control**: Fine-grained permissions and scopes - **API Key Support**: For programmatic access - **Stateless Mode**: For serverless deployments ### Policy Enforcement - **Input Policies**: Control who can execute endpoints - **Output Policies**: Filter sensitive data dynamically - **CEL Expressions**: Flexible condition evaluation - **User Context**: Rich context for policy decisions - **Field-Level Security**: Mask or remove specific fields ### Audit Logging - **Complete Trail**: Every query, result, and error logged - **User Attribution**: Track who did what and when - **Flexible Storage**: JSONL files or DuckDB - **Query Interface**: Search and analyze audit logs - **Compliance Ready**: Export for regulatory requirements ## ✅ Quality Assurance ### Validation - **Schema Validation**: Ensure endpoints meet specifications - **Type Checking**: Validate parameter and return types - **SQL Verification**: Check query syntax - **Reference Validation**: Verify file and resource references ### Testing - **Unit Tests**: Test endpoints with various inputs - **Assertion Types**: Exact match, partial match, exclusions - **Policy Testing**: Verify access controls work correctly - **CI/CD Integration**: JSON output for automation ### Linting - **Metadata Quality**: Improve LLM understanding - **Best Practices**: Suggest descriptions, examples, tags - **Severity Levels**: Warnings and suggestions - **Bulk Analysis**: Check entire codebase at once ### LLM Evaluation - **AI Behavior Testing**: Verify LLMs use tools correctly - **Safety Checks**: Ensure destructive operations are avoided - **Context Testing**: Validate permission-based access - **Model Support**: Test with multiple AI models ## 🔄 Data & Operations ### Drift Detection - **Schema Monitoring**: Track changes across environments - **Baseline Snapshots**: Compare against known good state - **Change Detection**: Identify added, modified, removed endpoints - **CI/CD Integration**: Prevent breaking changes ### dbt Integration - **Native Support**: Run dbt models directly - **Local Caching**: Use dbt to populate DuckDB - **Model Discovery**: Automatic model detection - **Transformation Pipeline**: ETL/ELT workflows ### Monitoring & Operations - **Health Checks**: Endpoint availability monitoring - **Performance Metrics**: Query execution times - **Error Tracking**: Detailed error logs and traces - **Operational Commands**: Direct endpoint execution ## 🚀 Developer Experience ### Type System - **Rich Types**: Primitives, objects, arrays, dates - **Validation**: Automatic input/output validation - **Constraints**: Min/max, patterns, enums - **LLM Hints**: Help AI understand data types ### SQL Reference - **DuckDB Syntax**: PostgreSQL-compatible analytical SQL - **Built-in Functions**: User authentication functions - **Named Parameters**: Safe parameter binding - **Extensions**: httpfs, json, parquet, and more ### Python Reference - **Runtime APIs**: Database, config, secrets access - **Lifecycle Hooks**: Server initialization/shutdown - **Thread Safety**: Concurrent execution support - **Type Compatibility**: Seamless SQL/Python integration ### Plugin System - **Python Extensions**: Custom functions and UDFs - **Provider Plugins**: OAuth and authentication - **Shared Libraries**: Reusable components - **Hot Reloading**: Development productivity ### CLI Tools - **Project Management**: Init, serve, list - **Quality Tools**: Validate, test, lint, evals - **Operations**: Log queries, drift checks - **Development**: Live reload, debug mode ## 🔌 Integrations ### LLM Platforms - **Claude Desktop**: Native MCP support - **OpenAI Tools**: Via adapters - **Custom Clients**: MCP protocol implementation - **Multi-Model**: Support various AI providers ### Data Sources - **DuckDB**: Built-in analytical database - **SQL Databases**: Via DuckDB extensions - **APIs**: HTTP/REST endpoints - **Files**: CSV, Parquet, JSON ### Secret Management - **HashiCorp Vault**: Enterprise secret storage - **Environment Variables**: Simple secret injection - **Encrypted Storage**: Secure local secrets - **Runtime Injection**: No secrets in code ## 📊 Use Cases MXCP's features enable powerful use cases: - **Secure AI Analytics**: Give LLMs data access with governance - **Compliant Automation**: Track all AI actions for audit - **Multi-Tenant SaaS**: Isolate customer data with policies - **Data Products**: Package data as AI-ready interfaces - **DevOps Automation**: Monitor and control infrastructure --- # Advanced Configuration ## Production Environment Configuration ```yaml # mxcp-site.yml profiles: development: database: dev.duckdb auth: enabled: false staging: database: staging.duckdb auth: enabled: true provider: github audit: enabled: true path: staging-audit.jsonl production: database: production.duckdb auth: enabled: true provider: atlassian audit: enabled: true path: /var/log/mxcp/audit.jsonl policies: strict_mode: true ``` ## Monitoring and Alerting ```bash # Monitor error rates mxcp log --since 1h --status error --export-duckdb errors.db # Set up alerts for policy violations mxcp log --policy deny --since 10m --export-csv violations.csv # Track performance mxcp log --since 1d | jq '.duration_ms' | awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' ``` ## Schema Drift Detection ```bash # Create baseline snapshot mxcp drift-snapshot --profile production # Check for changes (run in CI/CD) mxcp drift-check --profile production ``` --- This comprehensive documentation provides everything needed to understand, implement, and operate MXCP in production environments. From basic setup to enterprise features, MXCP offers the complete solution for AI data integration with security, governance, and quality built-in.