# MXCP - Enterprise MCP Server for AI Data Integration

> MXCP is a production-ready MCP (Model Context Protocol) framework that transforms any data source into AI-ready endpoints in minutes. Build enterprise-grade MCP servers using YAML, SQL, and Python with built-in authentication, monitoring, ETL, policy enforcement, evaluations, and guardrails. Designed for enterprises to integrate databases, APIs, and files with Claude Desktop and other MCP clients while maintaining security, governance, and audit compliance.

MXCP is developed by RAW Labs and provides enterprise-grade infrastructure for AI data integration. It enables organizations to connect their data sources to AI systems with production-ready features including data governance, security policies, audit logging, and quality controls. The framework supports dbt integration and offers comprehensive monitoring and drift detection capabilities.

Site Type: Product Documentation, Technical Documentation
Purpose: AI Data Integration Platform Documentation
Technology Stack: Python, MCP Protocol, Enterprise Data Infrastructure
Installation: `pip install mxcp`
License: Business Source License (BSL 1.1)

---

# MXCP Overview

MXCP is an enterprise-grade MCP (Model Context Protocol) framework that provides a **complete methodology** for building production-ready AI tools. More than just supporting SQL and Python, MXCP offers a structured approach to creating secure, testable, and governable AI applications.

## Why MXCP?

While other MCP servers focus on quick integrations, MXCP provides the **right way** to build production AI tools:

### The Structured Approach

1. **Data Quality First**: Start with proper data modeling using dbt
   - Create data models with clear schemas
   - Implement data quality tests
   - Build performance-optimized views
   - Document your data contracts

2. **Service Design**: Plan before you build
   - Define comprehensive type systems
   - Design security policies upfront
   - Create clear API contracts
   - Structure your endpoints logically

3. **Smart Implementation**: Use the right tool for the job
   - SQL for data queries and aggregations
   - Python for complex logic and integrations
   - Combine both for complete solutions

4. **Quality Assurance**: Test at every level
   - Validate structure and schemas
   - Test functionality with real data
   - Lint for LLM comprehension
   - Evaluate AI behavior safety

5. **Production Operations**: Deploy with confidence
   - Monitor schema drift
   - Track every operation
   - Analyze performance
   - Scale securely

### Enterprise Features

- **Security**: OAuth authentication, RBAC, policy enforcement
- **Audit Trails**: Complete tracking for compliance
- **Type Safety**: Validation across SQL and Python
- **Testing**: Comprehensive quality assurance
- **Monitoring**: Drift detection and performance tracking

## Core Architecture

MXCP provides a flexible framework that supports multiple implementation approaches:

```
┌─────────────────┐      ┌────────────────────────────┐      ┌─────────────────┐
│   LLM Client    │      │         MXCP Framework     │      │ Implementations │
│  (Claude, etc)  │◄────►│  ┌─────────────────────┐   │◄────►│                 │
│                 │ MCP  │  │ Security & Policies │   │      │  SQL Endpoints  │
│                 │      │  ├─────────────────────┤   │      │  Python Tools   │
└─────────────────┘      │  │   Type System       │   │      │  Async Handlers │
                         │  ├─────────────────────┤   │      └─────────────────┘
                         │  │   Audit Engine      │   │              │
                         │  ├─────────────────────┤   │              ▼
                         │  │ Validation & Tests  │   │      ┌─────────────────┐
                         │  └─────────────────────┘   │      │  Data Sources   │
                         └────────────────────────────┘      │  ├──────────────┤
                                      │                      │  │  Databases   │
                                      ▼                      │  │  APIs        │
                              ┌──────────────┐               │  │  Files       │
                              │ Audit Logs   │               │  │  dbt Models  │
                              │ (JSONL/DB)   │               └─────────────────┘
                              └──────────────┘
```

### Framework Components

#### 1. Implementation Layer

Choose the right tool for each endpoint:

- **SQL Endpoints**: Best for data queries, aggregations, and transformations
  - Powered by DuckDB's analytical engine
  - Support for dbt models and transformations
  - Native handling of various data formats (Parquet, CSV, JSON)

- **Python Tools**: Best for complex logic and integrations
  - Full Python ecosystem access
  - Async/await support for concurrent operations  
  - Runtime services for database access and secrets
  - Lifecycle hooks for initialization and cleanup

#### 2. Framework Services

Every endpoint gets these enterprise features automatically:

- **Security & Policies**: OAuth, RBAC, fine-grained access control
- **Type System**: Comprehensive validation across SQL and Python
- **Audit Engine**: Track every operation for compliance
- **Validation & Tests**: Ensure quality before deployment

#### 3. Runtime Environment

MXCP provides a consistent runtime for all implementations:

- **Database Access**: `mxcp.runtime.db` for Python endpoints
- **Configuration**: Access to secrets and settings
- **Plugin System**: Extend with custom Python functions
- **Session Management**: Thread-safe execution for concurrent requests

## Key Features

### 1. Choose Your Implementation

- **SQL for Data**: Query databases, aggregate data, join tables
- **Python for Logic**: Call APIs, run ML models, process files
- **Mix & Match**: Use both in the same project for maximum flexibility

### 2. Enterprise-Ready

- **Authentication**: OAuth support for GitHub, Google, Microsoft, and more
- **Policy Engine**: Fine-grained access control with CEL expressions
- **Audit Trails**: Track every operation for compliance
- **Type Safety**: Comprehensive validation across all languages

### 3. Developer Experience

- **Local-First**: Develop and test locally before deployment
- **Hot Reload**: Changes take effect immediately
- **Comprehensive Testing**: Unit tests, integration tests, and LLM evaluations
- **Rich Documentation**: Auto-generated from your YAML definitions

### 4. Production Features

- **Drift Detection**: Monitor schema and API changes
- **Performance**: Async support, connection pooling, caching
- **Monitoring**: Built-in metrics and logging
- **Scalability**: From local development to production deployment

## Getting Started

1. Install MXCP:
   ```bash
   pip install mxcp
   ```

2. Create a new project:
   ```bash
   mxcp init
   ```

3. Define your endpoints using either SQL or Python:

   **SQL Example** (for data queries):
   ```yaml
   # tools/sales_report.yml
   mxcp: 1
   tool:
     name: sales_report
     description: Get sales by region
     parameters:
       - name: region
         type: string
     return:
       type: object
     source:
       code: |
         SELECT 
           SUM(amount) as total,
           COUNT(*) as transactions
         FROM sales 
         WHERE region = $region
   ```

   **Python Example** (for complex logic):
   ```yaml
   # tools/analyze_sentiment.yml
   mxcp: 1
   tool:
     name: analyze_sentiment
     description: Analyze text sentiment
     language: python
     parameters:
       - name: text
         type: string
     return:
       type: object
     source:
       file: ../python/text_analysis.py
   ```

   ```python
   # python/text_analysis.py
   from mxcp.runtime import db, config
   
   def analyze_sentiment(text: str) -> dict:
       # Use any Python library or API
       sentiment_score = calculate_sentiment(text)
       
       # Access database if needed
       similar_texts = db.execute(
           "SELECT * FROM texts WHERE sentiment_score BETWEEN $min AND $max",
           {"min": sentiment_score - 0.1, "max": sentiment_score + 0.1}
       )
       
       return {
           "text": text,
           "sentiment_score": sentiment_score,
           "sentiment_label": get_label(sentiment_score),
           "similar_count": len(similar_texts)
       }
   ```

4. Start the server:
   ```bash
   mxcp serve
   ```

Your AI tools are now available with full security, audit trails, and policy enforcement!

---

# MXCP Quickstart Guide

This guide demonstrates MXCP's structured approach to building production-ready MCP servers. You'll learn not just how to create AI tools, but how to build them **the right way** - with proper data modeling, comprehensive testing, and enterprise-grade security.

## Installation

First, install MXCP:

```bash
pip install mxcp

# For advanced features (optional)
pip install "mxcp[vault]"  # HashiCorp Vault integration
```

## Path 1: Hello World (2 minutes)

Perfect for understanding the basics with both SQL and Python examples:

### 1. Initialize a Project

Create a new project with hello world examples:

```bash
# Create a new directory and initialize MXCP
mkdir my-mxcp-project
cd my-mxcp-project
mxcp init --bootstrap
```

This creates an organized project structure:

```
my-mxcp-project/
├── mxcp-site.yml       # Project configuration
├── tools/              # Tool definitions (MCP tools)
│   └── hello-world.yml # Example tool definition
├── resources/          # Resource definitions (MCP resources)
├── prompts/            # Prompt definitions (MCP prompts)
├── evals/              # Evaluation definitions
├── python/             # Python endpoints and shared code
├── plugins/            # MXCP plugins for DuckDB
├── sql/                # SQL implementations
│   └── hello-world.sql # SQL implementation for tools
├── drift/              # Drift detection snapshots
├── audit/              # Audit logs
└── server_config.json  # Claude Desktop config (auto-generated)
```

**🏗️ Organized by Design**: MXCP enforces a structured approach where each endpoint type has its own directory:

- **`tools/`** - MCP tool definitions (`.yml` files that define callable functions)
- **`resources/`** - MCP resource definitions (`.yml` files that define data resources)  
- **`prompts/`** - MCP prompt definitions (`.yml` files that define reusable prompts)
- **`evals/`** - Evaluation definitions for testing your endpoints
- **`python/`** - Python endpoints and shared code
- **`plugins/`** - MXCP plugins for DuckDB (User Defined Functions)
- **`sql/`** - SQL implementations for data queries
- **`drift/`** - Schema drift detection snapshots (auto-generated)
- **`audit/`** - Audit logs (auto-generated when enabled)

### 2. Explore the Generated Files

The bootstrap creates examples in both languages:

**SQL Example** (for data queries):
```yaml
# tools/hello-world.yml
mxcp: 1
tool:
  name: "hello_world"
  description: "A simple hello world tool"
  enabled: true
  parameters:
    - name: "name"
      type: "string"
      description: "Name to greet"
      examples: ["World"]
  return:
    type: "string"
    description: "Greeting message"
  source:
    file: "../sql/hello-world.sql"
```

```sql
-- sql/hello-world.sql
SELECT 'Hello, ' || $name || '!' as greeting
```

**Python Example** (for complex logic):
```yaml
# tools/calculate-fibonacci.yml
mxcp: 1
tool:
  name: "calculate_fibonacci"
  description: "Calculate Fibonacci number"
  language: python
  parameters:
    - name: "n"
      type: "integer"
      description: "Position in Fibonacci sequence"
      minimum: 0
      maximum: 100
  return:
    type: "object"
    properties:
      position: { type: "integer" }
      value: { type: "integer" }
      calculation_time: { type: "number" }
  source:
    file: "../python/math_tools.py"
```

```python
# python/math_tools.py
import time
from mxcp.runtime import db, config

def calculate_fibonacci(n: int) -> dict:
    """Calculate the nth Fibonacci number"""
    start_time = time.time()
    
    if n <= 1:
        value = n
    else:
        a, b = 0, 1
        for _ in range(2, n + 1):
            a, b = b, a + b
        value = b
    
    # Optional: Store in database for caching
    db.execute(
        "INSERT OR REPLACE INTO fibonacci_cache (n, value) VALUES ($n, $value)",
        {"n": n, "value": value}
    )
    
    return {
        "position": n,
        "value": value,
        "calculation_time": time.time() - start_time
    }
```

### 3. Start the MCP Server

```bash
mxcp serve
```

The server starts in stdio mode, ready for LLM integration. If you used `--bootstrap`, the generated `server_config.json` is already configured correctly for your environment (virtualenv, poetry, or system-wide installation).

## Path 2: Real-World Data Pipeline (10 minutes)

Experience MXCP's production capabilities with the COVID-19 + dbt example:

### 1. Get the COVID Example

```bash
git clone https://github.com/raw-labs/mxcp.git
cd mxcp/examples/covid_owid
```

### 2. Understand the dbt Integration

This example showcases MXCP's killer feature: **dbt-native data caching**

```yaml
# dbt_project.yml - Standard dbt project
name: 'covid_owid'
version: '1'
profile: 'covid_owid'
model-paths: ["models"]
target-path: "target"
```

```sql
-- models/covid_data.sql - dbt model that creates covid_data table
{{ config(materialized='table') }}

select *
from read_csv_auto('https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-data.csv')
```

**The magic**: This dbt model fetches COVID data from the web and creates a `covid_data` table in DuckDB. MXCP endpoints then query this table directly using standard SQL.

### 3. Run the Data Pipeline

```bash
# Install dbt dependencies
dbt deps

# Run dbt transformations (this caches the data locally)
dbt run

# Start MXCP server
mxcp serve
```

**What just happened?**
1. `dbt run` fetched COVID data from OWID and created a `covid_data` table in DuckDB
2. MXCP server exposes SQL query tools that can query this table directly
3. LLMs can analyze months of COVID data instantly (no API calls!)

### 4. Connect to Claude Desktop

Add this to your Claude Desktop config:

```json
{
  "mcpServers": {
    "covid": {
      "command": "mxcp",
      "args": ["serve", "--transport", "stdio"],
      "cwd": "/absolute/path/to/mxcp/examples/covid_owid"
    }
  }
}
```

### 5. Test the Integration

Ask Claude:
- *"Show me COVID cases in Germany vs France during 2021"*
- *"What were the vaccination rates in the UK by month?"*
- *"Compare hospitalization data between Italy and Spain"*

The responses are instant because the data is cached locally!

**Built-in SQL Tools**: MXCP provides optional SQL query tools (`execute_sql_query`, `list_tables`, `get_table_schema`) that let Claude explore and query your data directly. These tools are disabled by default but can be enabled in your configuration.

## Path 3: Python-Powered Tools (5 minutes)

Build complex AI tools using Python's full ecosystem:

### 1. Create a Python Analysis Tool

```yaml
# tools/sentiment-analyzer.yml
mxcp: 1
tool:
  name: sentiment_analyzer
  description: "Analyze text sentiment with ML"
  language: python
  parameters:
    - name: texts
      type: array
      items:
        type: string
      description: "Texts to analyze"
      maxItems: 100
  return:
    type: array
    items:
      type: object
      properties:
        text: { type: string }
        sentiment: { type: string, enum: ["positive", "negative", "neutral"] }
        confidence: { type: number, minimum: 0, maximum: 1 }
  source:
    file: "../python/ml_tools.py"
```

```python
# python/ml_tools.py
from mxcp.runtime import db, config, on_init
import asyncio

# Simulate ML model loading (replace with real model)
model = None

@on_init
def load_model():
    """Load ML model on startup"""
    global model
    print("Loading sentiment model...")
    # model = load_your_model_here()
    model = {"loaded": True}  # Placeholder

async def sentiment_analyzer(texts: list[str]) -> list[dict]:
    """Analyze sentiment for multiple texts concurrently"""
    
    async def analyze_one(text: str) -> dict:
        # Simulate async API call or model inference
        await asyncio.sleep(0.1)
        
        # Simple rule-based sentiment (replace with real ML)
        sentiment = "positive" if "good" in text.lower() else \
                   "negative" if "bad" in text.lower() else \
                   "neutral"
        
        confidence = 0.95 if sentiment != "neutral" else 0.6
        
        # Optional: Store results for analytics
        db.execute(
            """
            INSERT INTO sentiment_history (text, sentiment, confidence, analyzed_at)
            VALUES ($text, $sentiment, $confidence, CURRENT_TIMESTAMP)
            """,
            {"text": text[:200], "sentiment": sentiment, "confidence": confidence}
        )
        
        return {
            "text": text,
            "sentiment": sentiment,
            "confidence": confidence
        }
    
    # Process all texts concurrently
    results = await asyncio.gather(*[analyze_one(text) for text in texts])
    return results
```

## Path 4: Enterprise Features (15 minutes)

Experience MXCP's production-grade security and governance:

### 1. Policy Enforcement

Create a new endpoint with access control:

```yaml
# tools/employee-data.yml
mxcp: 1
tool:
  name: employee_data
  description: "Query employee information"
  parameters:
    - name: employee_id
      type: string
      description: "Employee ID to query"
  return:
    type: object
    properties:
      name: { type: string }
      department: { type: string }
      salary: { type: number }
      ssn: { type: string, sensitive: true }
  source:
    code: |
      SELECT 
        'John Doe' as name,
        'Engineering' as department,
        95000 as salary,
        '123-45-6789' as ssn

# Add enterprise-grade policies
policies:
  input:
    - condition: "!('hr.read' in user.permissions)"
      action: deny
      reason: "Missing HR read permission"
  output:
    - condition: "user.role != 'hr_manager'"
      action: filter_fields
      fields: ["salary", "ssn"]
      reason: "Sensitive data restricted to HR managers"
```

### 2. Enable Audit Logging

```yaml
# mxcp-site.yml - Add audit configuration
profiles:
  production:
    audit:
      enabled: true
      path: audit-logs.jsonl
```

### 3. Test with User Context

```bash
# Test as regular user (will filter sensitive data)
mxcp run tool employee_data \
  --param employee_id=123 \
  --user-context '{"role": "user", "permissions": ["hr.read"]}'

# Test as HR manager (will see all data)
mxcp run tool employee_data \
  --param employee_id=123 \
  --user-context '{"role": "hr_manager", "permissions": ["hr.read", "pii.view"]}'

# View audit logs
mxcp log --since 10m
```

---

# MXCP Features Overview

MXCP provides a comprehensive set of enterprise features designed for production data-to-AI workflows. Unlike simple data connectors, MXCP offers security, governance, quality assurance, and operational excellence.

## 🔒 Security & Governance

### Authentication & Authorization
- **OAuth 2.0 Integration**: GitHub, Atlassian, Salesforce, and custom providers
- **Session Management**: Secure token handling with persistence
- **Role-Based Access Control**: Fine-grained permissions and scopes
- **API Key Support**: For programmatic access
- **Stateless Mode**: For serverless deployments

### Policy Enforcement
- **Input Policies**: Control who can execute endpoints
- **Output Policies**: Filter sensitive data dynamically
- **CEL Expressions**: Flexible condition evaluation
- **User Context**: Rich context for policy decisions
- **Field-Level Security**: Mask or remove specific fields

### Audit Logging
- **Complete Trail**: Every query, result, and error logged
- **User Attribution**: Track who did what and when
- **Flexible Storage**: JSONL files or DuckDB
- **Query Interface**: Search and analyze audit logs
- **Compliance Ready**: Export for regulatory requirements

## ✅ Quality Assurance

### Validation
- **Schema Validation**: Ensure endpoints meet specifications
- **Type Checking**: Validate parameter and return types
- **SQL Verification**: Check query syntax
- **Reference Validation**: Verify file and resource references

### Testing
- **Unit Tests**: Test endpoints with various inputs
- **Assertion Types**: Exact match, partial match, exclusions
- **Policy Testing**: Verify access controls work correctly
- **CI/CD Integration**: JSON output for automation

### Linting
- **Metadata Quality**: Improve LLM understanding
- **Best Practices**: Suggest descriptions, examples, tags
- **Severity Levels**: Warnings and suggestions
- **Bulk Analysis**: Check entire codebase at once

### LLM Evaluation
- **AI Behavior Testing**: Verify LLMs use tools correctly
- **Safety Checks**: Ensure destructive operations are avoided
- **Context Testing**: Validate permission-based access
- **Model Support**: Test with multiple AI models

## 🔄 Data & Operations

### Drift Detection
- **Schema Monitoring**: Track changes across environments
- **Baseline Snapshots**: Compare against known good state
- **Change Detection**: Identify added, modified, removed endpoints
- **CI/CD Integration**: Prevent breaking changes

### dbt Integration
- **Native Support**: Run dbt models directly
- **Local Caching**: Use dbt to populate DuckDB
- **Model Discovery**: Automatic model detection
- **Transformation Pipeline**: ETL/ELT workflows

### Monitoring & Operations
- **Health Checks**: Endpoint availability monitoring
- **Performance Metrics**: Query execution times
- **Error Tracking**: Detailed error logs and traces
- **Operational Commands**: Direct endpoint execution

## 🚀 Developer Experience

### Type System
- **Rich Types**: Primitives, objects, arrays, dates
- **Validation**: Automatic input/output validation
- **Constraints**: Min/max, patterns, enums
- **LLM Hints**: Help AI understand data types

### SQL Reference
- **DuckDB Syntax**: PostgreSQL-compatible analytical SQL
- **Built-in Functions**: User authentication functions
- **Named Parameters**: Safe parameter binding
- **Extensions**: httpfs, json, parquet, and more

### Python Reference
- **Runtime APIs**: Database, config, secrets access
- **Lifecycle Hooks**: Server initialization/shutdown
- **Thread Safety**: Concurrent execution support
- **Type Compatibility**: Seamless SQL/Python integration

### Plugin System
- **Python Extensions**: Custom functions and UDFs
- **Provider Plugins**: OAuth and authentication
- **Shared Libraries**: Reusable components
- **Hot Reloading**: Development productivity

### CLI Tools
- **Project Management**: Init, serve, list
- **Quality Tools**: Validate, test, lint, evals
- **Operations**: Log queries, drift checks
- **Development**: Live reload, debug mode

## 🔌 Integrations

### LLM Platforms
- **Claude Desktop**: Native MCP support
- **OpenAI Tools**: Via adapters
- **Custom Clients**: MCP protocol implementation
- **Multi-Model**: Support various AI providers

### Data Sources
- **DuckDB**: Built-in analytical database
- **SQL Databases**: Via DuckDB extensions
- **APIs**: HTTP/REST endpoints
- **Files**: CSV, Parquet, JSON

### Secret Management
- **HashiCorp Vault**: Enterprise secret storage
- **Environment Variables**: Simple secret injection
- **Encrypted Storage**: Secure local secrets
- **Runtime Injection**: No secrets in code

## 📊 Use Cases

MXCP's features enable powerful use cases:

- **Secure AI Analytics**: Give LLMs data access with governance
- **Compliant Automation**: Track all AI actions for audit
- **Multi-Tenant SaaS**: Isolate customer data with policies
- **Data Products**: Package data as AI-ready interfaces
- **DevOps Automation**: Monitor and control infrastructure

---

# Advanced Configuration

## Production Environment Configuration

```yaml
# mxcp-site.yml
profiles:
  development:
    database: dev.duckdb
    auth:
      enabled: false
      
  staging:
    database: staging.duckdb
    auth:
      enabled: true
      provider: github
    audit:
      enabled: true
      path: staging-audit.jsonl
      
  production:
    database: production.duckdb
    auth:
      enabled: true
      provider: atlassian
    audit:
      enabled: true
      path: /var/log/mxcp/audit.jsonl
    policies:
      strict_mode: true
```

## Monitoring and Alerting

```bash
# Monitor error rates
mxcp log --since 1h --status error --export-duckdb errors.db

# Set up alerts for policy violations
mxcp log --policy deny --since 10m --export-csv violations.csv

# Track performance
mxcp log --since 1d | jq '.duration_ms' | awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}'
```

## Schema Drift Detection

```bash
# Create baseline snapshot
mxcp drift-snapshot --profile production

# Check for changes (run in CI/CD)
mxcp drift-check --profile production
```

---

This comprehensive documentation provides everything needed to understand, implement, and operate MXCP in production environments. From basic setup to enterprise features, MXCP offers the complete solution for AI data integration with security, governance, and quality built-in.