MXCP Operational Guide

This comprehensive guide provides everything DevOps professionals need to deploy and operate MXCP in production environments. It consolidates operational information from across the documentation and adds production-ready deployment patterns.

Architecture Overview
Deployment Requirements
Containerization with Docker
Systemd Service Management
Configuration Management
Authentication Setup
Signal Handling & Hot Reload
Monitoring & Observability
Security Hardening
High Availability & Scaling
Backup & Recovery
Troubleshooting
Production Checklist

Architecture Overview

Core Components

┌─────────────────┐      ┌────────────────────────────┐      ┌─────────────────┐
│   LLM Clients   │      │      MXCP Server           │      │   Data Layer    │
│  (Claude, etc)  │◄────►│  ┌─────────────────────┐   │◄────►│                 │
│                 │ MCP  │  │ OAuth Provider      │   │      │  DuckDB         │
│                 │      │  ├─────────────────────┤   │      │  dbt Models     │
└─────────────────┘      │  │ Policy Engine       │   │      │  External APIs  │
                         │  ├─────────────────────┤   │      └─────────────────┘
                         │  │ Endpoint Executor   │   │              │
                         │  ├─────────────────────┤   │              ▼
                         │  │ Audit Logger        │   │      ┌─────────────────┐
                         │  └─────────────────────┘   │      │  File System    │
                         └────────────────────────────┘      │  - mxcp-site.yml│
                                      │                       │  - Python deps  │
                                      ▼                       │  - SQL/Python   │
                              ┌──────────────┐                └─────────────────┘
                              │ Audit Logs   │
                              │ (JSONL)      │
                              └──────────────┘

File System Requirements

MXCP requires access to:

Project files: mxcp-site.yml and endpoint definitions
User config: ~/.mxcp/config.yml (or custom path)
Database: DuckDB file (configured per profile)
Python modules: For Python endpoints and plugins
Audit logs: JSONL files (when enabled)
Drift snapshots: JSON files for schema monitoring

Deployment Requirements

System Requirements

Python: 3.11 or higher
Memory: Minimum 2GB RAM (4GB+ recommended for production)
Disk:
- 1GB for base installation
- Additional space for DuckDB databases
- Space for audit logs (grows over time)
Network: Outbound HTTPS for OAuth and external APIs

Python Dependencies

MXCP and your endpoints may require additional Python packages:

# Core MXCP dependencies (automatically installed)
mcp>=1.9.0
click>=8.1.7
pyyaml>=6.0.1
duckdb>=0.9.2
pandas>=2.0.0
dbt-core>=1.6.0
dbt-duckdb>=1.6.0

# Optional features
hvac>=2.0.0          # For Vault integration
onepassword-sdk>=0.3.0  # For 1Password integration

# Your endpoint dependencies
# Add these to your requirements.txt
requests
numpy
scikit-learn
# ... any other packages your Python endpoints use

Network Ports

HTTP API: Default 8000 (configurable)
OAuth callbacks: Must be accessible from client browsers
External services: Varies by integration

Containerization with Docker

Basic Dockerfile

Here's a production-ready Dockerfile for MXCP:

# Use official Python runtime as base
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN useradd -m -u 1000 mxcp

# Set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Install MXCP
RUN pip install --no-cache-dir mxcp

# Copy project files
COPY --chown=mxcp:mxcp . .

# Create directories for runtime
RUN mkdir -p /app/data /app/logs /app/drift /app/audit && \
    chown -R mxcp:mxcp /app

# Switch to non-root user
USER mxcp

# Environment variables
ENV PYTHONUNBUFFERED=1
ENV MXCP_CONFIG_PATH=/app/config/config.yml

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Default command
CMD ["mxcp", "serve", "--transport", "http", "--host", "0.0.0.0", "--port", "8000"]

Multi-Stage Build for Optimization

# Build stage
FROM python:3.11-slim as builder

RUN apt-get update && apt-get install -y gcc g++ git

WORKDIR /build

# Copy and install dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
RUN pip install --user --no-cache-dir mxcp

# Runtime stage
FROM python:3.11-slim

RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Copy Python packages from builder
COPY --from=builder /root/.local /root/.local

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

WORKDIR /app

# Copy application files
COPY . .

# Create runtime directories
RUN mkdir -p /app/data /app/logs /app/drift /app/audit

EXPOSE 8000

CMD ["mxcp", "serve", "--transport", "http", "--host", "0.0.0.0", "--port", "8000"]

Docker Compose Example

version: '3.8'

services:
  mxcp:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MXCP_CONFIG_PATH=/config/config.yml
      - GITHUB_CLIENT_ID=${GITHUB_CLIENT_ID}
      - GITHUB_CLIENT_SECRET=${GITHUB_CLIENT_SECRET}
      - DATABASE_URL=${DATABASE_URL}
    volumes:
      # Mount project files
      - ./:/app:ro
      # Mount config separately for security
      - ./config:/config:ro
      # Persistent data volumes
      - mxcp-data:/app/data
      - mxcp-logs:/app/logs
      - mxcp-audit:/app/audit
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

  # Optional: Keycloak for authentication
  keycloak:
    image: quay.io/keycloak/keycloak:latest
    environment:
      - KC_BOOTSTRAP_ADMIN_USERNAME=admin
      - KC_BOOTSTRAP_ADMIN_PASSWORD=admin
    ports:
      - "8080:8080"
    command: start-dev

volumes:
  mxcp-data:
  mxcp-logs:
  mxcp-audit:

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mxcp
  labels:
    app: mxcp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mxcp
  template:
    metadata:
      labels:
        app: mxcp
    spec:
      containers:
      - name: mxcp
        image: your-registry/mxcp:latest
        ports:
        - containerPort: 8000
        env:
        - name: MXCP_CONFIG_PATH
          value: /config/config.yml
        - name: GITHUB_CLIENT_ID
          valueFrom:
            secretKeyRef:
              name: mxcp-secrets
              key: github-client-id
        - name: GITHUB_CLIENT_SECRET
          valueFrom:
            secretKeyRef:
              name: mxcp-secrets
              key: github-client-secret
        volumeMounts:
        - name: config
          mountPath: /config
          readOnly: true
        - name: project
          mountPath: /app
          readOnly: true
        - name: data
          mountPath: /app/data
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
      volumes:
      - name: config
        configMap:
          name: mxcp-config
      - name: project
        configMap:
          name: mxcp-project
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: mxcp
spec:
  selector:
    app: mxcp
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

Systemd Service Management

For systems using systemd (most modern Linux distributions), MXCP can be managed as a system service. This provides automatic startup, restart on failure, and integration with system logging.

Systemd Service Files

Basic Service Configuration

Create /etc/systemd/system/mxcp.service:

[Unit]
Description=MXCP Model Context Protocol Server
After=network.target
Documentation=https://github.com/your-org/mxcp

[Service]
Type=simple
User=mxcp
Group=mxcp
WorkingDirectory=/opt/mxcp
ExecStart=/usr/local/bin/mxcp serve --transport http --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=10

# Environment
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
Environment="MXCP_CONFIG_PATH=/etc/mxcp/config.yml"
EnvironmentFile=-/etc/mxcp/environment

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/opt/mxcp/data /opt/mxcp/audit /opt/mxcp/drift

# Resource limits
MemoryLimit=2G
CPUQuota=80%

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=mxcp

[Install]
WantedBy=multi-user.target

Environment File

Create /etc/mxcp/environment for sensitive variables:

# OAuth credentials
GITHUB_CLIENT_ID=your-client-id
GITHUB_CLIENT_SECRET=your-client-secret

# Database credentials
DB_HOST=localhost
DB_USER=mxcp_user
DB_PASSWORD=secure_password

# Vault integration
VAULT_ADDR=https://vault.example.com
VAULT_TOKEN=your-vault-token

Audit Cleanup Timer

To automatically clean up old audit logs, create a timer service:

Service File

Create /etc/systemd/system/mxcp-log-cleanup.service:

[Unit]
Description=MXCP Audit Log Cleanup
After=network.target

[Service]
Type=oneshot
User=mxcp
Group=mxcp
WorkingDirectory=/opt/mxcp
ExecStart=/usr/local/bin/mxcp log-cleanup
StandardOutput=journal
StandardError=journal

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/opt/mxcp/audit

# Resource limits
MemoryLimit=1G
CPUQuota=50%

Timer File

Create /etc/systemd/system/mxcp-log-cleanup.timer:

[Unit]
Description=Run MXCP Audit Cleanup daily at 2 AM
Requires=mxcp-log-cleanup.service

[Timer]
# Run daily at 2:00 AM
OnCalendar=daily
AccuracySec=1h
Persistent=true

# Randomize start time by up to 30 minutes to avoid thundering herd
RandomizedDelaySec=30min

[Install]
WantedBy=timers.target

Installation and Management

Initial Setup

# Create system user
sudo useradd -r -s /bin/false -d /opt/mxcp mxcp

# Create directories
sudo mkdir -p /opt/mxcp/{data,audit,drift,logs}
sudo mkdir -p /etc/mxcp
sudo chown -R mxcp:mxcp /opt/mxcp

# Install MXCP
sudo pip install mxcp -t /usr/local

# Copy configuration files
sudo cp mxcp-site.yml /opt/mxcp/
sudo cp -r tools resources prompts /opt/mxcp/
sudo cp config.yml /etc/mxcp/
sudo chown mxcp:mxcp /etc/mxcp/config.yml
sudo chmod 600 /etc/mxcp/config.yml

# Install systemd files
sudo cp mxcp.service /etc/systemd/system/
sudo cp mxcp-log-cleanup.* /etc/systemd/system/

# Reload systemd
sudo systemctl daemon-reload

Service Management

# Enable and start MXCP
sudo systemctl enable mxcp.service
sudo systemctl start mxcp.service

# Enable audit cleanup timer
sudo systemctl enable mxcp-log-cleanup.timer
sudo systemctl start mxcp-log-cleanup.timer

# Check status
sudo systemctl status mxcp.service
sudo systemctl list-timers mxcp-log-cleanup.timer

# View logs
sudo journalctl -u mxcp.service -f
sudo journalctl -u mxcp-log-cleanup.service --since "1 hour ago"

# Restart service (e.g., after configuration change)
sudo systemctl restart mxcp.service

# Stop service
sudo systemctl stop mxcp.service

Multiple Profiles

To run multiple MXCP instances with different profiles:

# Create profile-specific service files
sudo cp mxcp.service /etc/systemd/system/mxcp-prod.service
sudo cp mxcp.service /etc/systemd/system/mxcp-dev.service

# Edit each service file
# mxcp-prod.service:
ExecStart=/usr/local/bin/mxcp serve --profile prod --transport http --port 8000

# mxcp-dev.service:
ExecStart=/usr/local/bin/mxcp serve --profile dev --transport http --port 8001

# Create separate audit cleanup services
# mxcp-log-cleanup-prod.service:
ExecStart=/usr/local/bin/mxcp log-cleanup --profile prod

# mxcp-log-cleanup-dev.service:
ExecStart=/usr/local/bin/mxcp log-cleanup --profile dev

Integration with System Monitoring

Systemd integrates with various monitoring tools:

# Prometheus node exporter will automatically collect systemd metrics
# Access via: node_systemd_unit_state{name="mxcp.service"}

# For custom metrics, use systemd-cat
echo "mxcp_custom_metric{type=\"startup\"} 1" | systemd-cat -t mxcp-metrics

# Set up systemd journal forwarding to syslog
sudo mkdir -p /etc/systemd/journald.conf.d/
cat <<EOF | sudo tee /etc/systemd/journald.conf.d/forward-to-syslog.conf
[Journal]
ForwardToSyslog=yes
EOF

Systemd Security Features

Take advantage of systemd's security features:

# Additional security options for production
[Service]
# Filesystem isolation
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictAddressFamilies=AF_INET AF_INET6
RestrictNamespaces=true
LockPersonality=true
RestrictRealtime=true
RestrictSUIDSGID=true
RemoveIPC=true

# Capability restrictions
CapabilityBoundingSet=
AmbientCapabilities=

# System call filtering
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM

Troubleshooting Systemd Services

# Check service logs
sudo journalctl -u mxcp.service --since "10 minutes ago"

# Check service configuration
sudo systemctl cat mxcp.service

# Verify service environment
sudo systemctl show-environment

# Debug startup issues
sudo journalctl -xe

# Test service configuration
sudo systemd-analyze verify mxcp.service

# Run service manually for debugging
sudo -u mxcp /usr/local/bin/mxcp serve --debug

Cron Alternative

If you prefer cron over systemd timers:

# Add to mxcp user's crontab
sudo -u mxcp crontab -e

# Run audit cleanup daily at 2 AM
0 2 * * * cd /opt/mxcp && /usr/local/bin/mxcp log-cleanup >> /opt/mxcp/logs/cleanup.log 2>&1

# Run drift check weekly
0 3 * * 0 cd /opt/mxcp && /usr/local/bin/mxcp drift-check >> /opt/mxcp/logs/drift.log 2>&1

Configuration Management

Environment Variables

MXCP supports configuration through environment variables:

# OAuth credentials
export GITHUB_CLIENT_ID="your-client-id"
export GITHUB_CLIENT_SECRET="your-client-secret"

# Database credentials
export DB_HOST="localhost"
export DB_USER="dbuser"
export DB_PASSWORD="dbpass"

# Vault integration
export VAULT_ADDR="https://vault.example.com"
export VAULT_TOKEN="your-vault-token"

# 1Password integration
export OP_SERVICE_ACCOUNT_TOKEN="your-service-account-token"

# Custom paths
export MXCP_CONFIG_PATH="/custom/path/to/config.yml"

Configuration Files

Site Configuration (`mxcp-site.yml`)

Must be accessible to the container:

mxcp: 1
project: my_project
profile: production

profiles:
  production:
    duckdb:
      path: "/app/data/production.duckdb"
      readonly: false
    audit:
      enabled: true
      path: "/app/audit/production.jsonl"
    drift:
      path: "/app/drift/production.json"
    auth:
      enabled: true
      provider: github

sql_tools:
  enabled: false  # Enable only if needed

secrets:
  - db_credentials
  - api_keys

User Configuration (`config.yml`)

Store securely, never in version control:

mxcp: 1
transport:
  provider: streamable-http
  http:
    port: 8000
    host: 0.0.0.0
    stateless: true  # For serverless deployments

projects:
  my_project:
    profiles:
      production:
        secrets:
          - name: db_credentials
            type: database
            parameters:
              host: "${DB_HOST}"
              username: "${DB_USER}"
              password: "${DB_PASSWORD}"
        auth:
          provider: github
          clients:
            - client_id: "${GITHUB_CLIENT_ID}"
              client_secret: "${GITHUB_CLIENT_SECRET}"
              name: "MXCP Production"
              redirect_uris:
                - "https://api.example.com/github/callback"
              scopes:
                - "mxcp:access"
          github:
            client_id: "${GITHUB_CLIENT_ID}"
            client_secret: "${GITHUB_CLIENT_SECRET}"

Volume Mounts

Essential volumes for production:

# Project files (read-only)
-v /path/to/project:/app:ro

# Configuration (read-only, separate for security)
-v /path/to/config:/config:ro

# Persistent data (read-write)
-v mxcp-data:/app/data
-v mxcp-logs:/app/logs
-v mxcp-audit:/app/audit
-v mxcp-drift:/app/drift

Authentication Setup

OAuth Provider Configuration

GitHub OAuth

Create OAuth App at https://github.com/settings/developers
Set callback URL: https://your-domain.com/github/callback
Configure in MXCP:

auth:
  provider: github
  github:
    client_id: "${GITHUB_CLIENT_ID}"
    client_secret: "${GITHUB_CLIENT_SECRET}"
    scope: "user:email"

Keycloak Integration

Deploy Keycloak alongside MXCP:

# docker-compose.yml addition
keycloak:
  image: quay.io/keycloak/keycloak:latest
  environment:
    - KC_DB=postgres
    - KC_DB_URL=jdbc:postgresql://postgres:5432/keycloak
    - KC_DB_USERNAME=keycloak
    - KC_DB_PASSWORD=${KEYCLOAK_DB_PASSWORD}
    - KC_HOSTNAME=auth.example.com
    - KC_PROXY=edge
  command: start

Configure MXCP:

auth:
  provider: keycloak
  keycloak:
    client_id: "${KEYCLOAK_CLIENT_ID}"
    client_secret: "${KEYCLOAK_CLIENT_SECRET}"
    realm: "master"
    server_url: "https://auth.example.com"

Reverse Proxy Configuration

Nginx Example

upstream mxcp {
    server mxcp:8000;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/private/server.key;

    # Security headers
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options DENY;
    add_header X-XSS-Protection "1; mode=block";

    location / {
        proxy_pass http://mxcp;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # WebSocket support for SSE transport
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        
        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }
}

Traefik Example

http:
  routers:
    mxcp:
      rule: "Host(`api.example.com`)"
      service: mxcp
      tls:
        certResolver: letsencrypt
      middlewares:
        - security-headers
        
  services:
    mxcp:
      loadBalancer:
        servers:
          - url: "http://mxcp:8000"
          
  middlewares:
    security-headers:
      headers:
        sslRedirect: true
        stsSeconds: 31536000
        stsIncludeSubdomains: true
        stsPreload: true
        contentTypeNosniff: true
        browserXssFilter: true

Signal Handling & Hot Reload

SIGHUP Configuration Reload

MXCP supports hot configuration reload via SIGHUP:

# Send SIGHUP to reload configuration
kill -HUP <mxcp-pid>

# In Docker
docker kill -s HUP mxcp-container

What gets reloaded:

External configuration values (environment variables, vault://, file://)
Secret values
Database connections

What doesn't reload:

Endpoint definitions (requires restart)
OAuth configuration (requires restart)
Transport settings (requires restart)

Graceful Shutdown

MXCP handles SIGTERM for graceful shutdown:

# docker-compose.yml
services:
  mxcp:
    stop_grace_period: 30s

During shutdown:

Stops accepting new requests
Completes in-flight requests
Flushes audit logs
Closes database connections
Exits cleanly

Monitoring & Observability

MXCP provides comprehensive observability through multiple channels: audit logging, OpenTelemetry (traces and metrics), and application logs. This section covers all aspects of monitoring your MXCP deployment.

Overview of Observability Signals

MXCP provides three complementary observability signals:

Application Logs 📝
- Traditional text-based logs for debugging
- Output to stdout/stderr
- Captured by container runtime
- Best for: Debugging specific issues
Audit Logs 📋
- Structured JSONL format
- Every request is logged
- Includes user, timing, and policy decisions
- Best for: Compliance, security analysis, usage patterns
OpenTelemetry 🔍
- Traces: Distributed tracing for request flow
- Metrics: Performance counters and histograms
- Correlation: Links traces with audit logs
- Best for: Performance analysis, system health

OpenTelemetry Integration

Breaking Change (v0.5.0): MXCP now follows the modern observability pattern where performance metrics are derived from trace spans. Duration histograms are no longer exported directly. You must configure the OpenTelemetry Collector's spanmetrics processor to generate performance metrics. See the Span Metrics section below for configuration details.

What is OpenTelemetry?

OpenTelemetry (OTel) is an open-source observability framework that provides:

Distributed Tracing: Track requests across multiple services and execution steps
Metrics: Collect performance counters and histograms for all operations
Logs: Structured logging with trace context correlation (future)

Enabling Telemetry

Configure telemetry in your user config file (~/.mxcp/config.yml):

mxcp: 1

projects:
  myproject:
    profiles:
      # Development - console output for debugging
      development:
        telemetry:
          enabled: true
          service_name: mxcp-dev
          environment: development
          # Separate configuration for each signal
          tracing:
            enabled: true
            console_export: true  # Print spans to console
          metrics:
            enabled: true
            export_interval: 60  # Export every 60 seconds
            prometheus_port: 9090  # Optional Prometheus endpoint

      # Production - send to OTLP collector
      production:
        telemetry:
          enabled: true
          endpoint: http://otel-collector:4318  # OTLP HTTP endpoint
          service_name: mxcp-prod
          environment: production
          headers:
            Authorization: Bearer your-token
          tracing:
            enabled: true
          metrics:
            enabled: true
            export_interval: 60

What Gets Traced?

MXCP automatically traces:

Endpoint Execution:
- Overall execution time
- Input/output validation
- Policy enforcement
Authentication:
- Token validation
- User context retrieval
- Provider authentication flows
Policy Enforcement:
- Input policy evaluation (before execution)
- Output policy evaluation (after execution)
- Individual policy condition evaluation
- Policy decisions (allow, deny, filter, mask)
Database Operations:
- SQL query execution (queries are hashed for privacy)
- Query type (SELECT, INSERT, UPDATE, etc.)
- Row counts and performance metrics
Python Execution:
- Function calls
- Inline code execution
- Parameter counts

Example trace hierarchy:

mxcp.execution_engine.execute
├── mxcp.policy.enforce_input
│   ├── mxcp.policy.evaluate_input[0]
│   └── mxcp.policy.evaluate_input[1]
├── mxcp.validation.input
├── mxcp.duckdb.execute
│   └── db.query (SELECT * FROM users)
├── mxcp.python.execute
│   └── python.function (calculate_metrics)
├── mxcp.validation.output
└── mxcp.policy.enforce_output
    ├── mxcp.policy.evaluate_output[0]
    └── mxcp.policy.evaluate_output[1]

Metrics Collected

MXCP exports metrics in two ways:

1. Direct Metrics (Default)

These metrics are directly exported by MXCP:

System Metrics:

mxcp.up: Server startup counter
mxcp.config_reloads_total: Configuration reload attempts

Business Metrics (Counters):

mxcp.endpoint.requests_total: Total requests by endpoint, status
mxcp.endpoint.errors_total: Error count by type
mxcp.duckdb.queries_total: Query count by operation type
mxcp.executor.tasks_total: Execution count by language
mxcp.auth.attempts_total: Authentication attempts by provider

Performance Metrics: Performance metrics are derived from trace spans. Configure your OpenTelemetry Collector with the spanmetrics processor to generate duration histograms automatically from span data. This modern approach eliminates manual timing code and ensures consistency across all operations.

Gauge Metrics:

mxcp.endpoint.concurrent_executions: Currently running executions
mxcp.auth.active_sessions: Current active sessions

2. Span Metrics (Required for Performance Metrics)

MXCP follows the modern observability approach where performance metrics are derived from trace spans. This eliminates manual timing code and ensures consistency.

Important: To get performance metrics (latency histograms, percentiles, etc.), you MUST configure your OpenTelemetry Collector with the spanmetrics processor:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    
  spanmetrics:
    metrics_exporter: prometheus
    latency_histogram_buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
    dimensions:
      # Standard dimensions
      - name: service.name
      - name: span.name
      - name: span.kind
      - name: status.code
      
      # MXCP-specific dimensions
      - name: mxcp.endpoint.name
        default: "unknown"
      - name: mxcp.endpoint.type
        default: "unknown"
      - name: mxcp.execution.language
        default: "unknown"
      - name: mxcp.auth.provider
        default: "unknown"
      - name: mxcp.policy.decision
        default: "unknown"
      - name: mxcp.duckdb.operation
        default: "unknown"

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: mxcp

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, spanmetrics]
      exporters: [otlp/tempo]
      
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

This automatically generates:

Request rate: mxcp_calls_total{span_name="mxcp.endpoint.execute"}
Latency histogram: mxcp_latency_bucket{span_name="mxcp.endpoint.execute", le="100"}
Error rate: Derived from status_code="ERROR" label

For example queries and dashboards, see examples/observability/otel-collector-spanmetrics.yaml.

Benefits of span metrics:

No manual timing in code
Automatic P50, P95, P99 calculations
Consistent metrics for all operations
Perfect correlation with traces
Reduced code complexity

See the OpenTelemetry spanmetrics documentation for advanced configuration.

Privacy and Security

MXCP takes a privacy-first approach to telemetry:

What We DO Send:

span: mxcp.endpoint.execute
  attributes:
    mxcp.endpoint.name: "get_customer"     # ✅ Endpoint name
    mxcp.endpoint.type: "tool"             # ✅ Type
    mxcp.execution.language: "sql"         # ✅ Language used
    mxcp.result.count: 42                  # ✅ Result count
    mxcp.auth.authenticated: true          # ✅ Auth status
    mxcp.policy.decision: "allow"          # ✅ Policy decision
  duration: 150ms

What We DON'T Send:

❌ Actual SQL queries (only hashed query signatures)
❌ Parameter values (only parameter names/types)
❌ Result data (only counts and types)
❌ User credentials or tokens
❌ Python code content
❌ Any PII or sensitive business data

Example privacy protection:

# What happens in the code:
sql_query = "SELECT * FROM customers WHERE email = 'user@example.com'"

# What gets sent to telemetry:
span.set_attribute("mxcp.duckdb.query_hash", "a7b9c3...")  # SHA256 hash
span.set_attribute("mxcp.duckdb.operation", "SELECT")      # Just the operation type

Quick Start with Jaeger

For local development, Jaeger provides an all-in-one solution:

# docker-compose.yml
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "4318:4318"    # OTLP HTTP receiver
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  mxcp:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MXCP_CONFIG_PATH=/config/config.yml
    depends_on:
      - jaeger
    volumes:
      - ./config:/config:ro
      - ./:/app:ro

Configure MXCP:

telemetry:
  enabled: true
  endpoint: http://jaeger:4318
  service_name: mxcp-dev
  environment: development
  tracing:
    enabled: true
  metrics:
    enabled: true

Access Jaeger UI at http://localhost:16686 to view traces.

Production Telemetry Backends

Grafana Cloud

telemetry:
  enabled: true
  endpoint: https://otlp-gateway-prod-us-central-0.grafana.net/otlp
  headers:
    Authorization: Basic <base64-encoded-instance-id:api-key>
  service_name: mxcp-prod
  environment: production

AWS X-Ray

Use the AWS Distro for OpenTelemetry Collector:

telemetry:
  enabled: true
  endpoint: http://aws-otel-collector:4318
  service_name: mxcp-prod

Azure Monitor

telemetry:
  enabled: true
  endpoint: https://dc.services.visualstudio.com/v2/track
  headers:
    X-API-Key: your-instrumentation-key

Correlating Traces with Audit Logs

MXCP automatically includes both session IDs and trace IDs in audit logs when telemetry is enabled:

{
  "timestamp": "2024-01-15T10:30:45Z",
  "session_id": "73cb4ef4-a359-484f-a040-c1eb163abb57",  // MCP session ID
  "trace_id": "a1b2c3d4e5f6g7h8",  // OpenTelemetry trace ID
  "operation_name": "query_users",
  "duration_ms": 125,
  "status": "success"
}

Query audit logs by trace ID or session ID:

mxcp log --filter trace_id=a1b2c3d4e5f6g7h8
mxcp log --filter session_id=73cb4ef4-a359-484f-a040-c1eb163abb57

Session vs Trace IDs:

Session ID: The MCP session identifier that persists across multiple requests from the same client (when using HTTP transports). This may be null in stateless mode.
Trace ID: The OpenTelemetry trace identifier unique to each request/operation.

Both IDs are included in telemetry spans as attributes:

mxcp.session.id: The MCP session ID
mxcp.trace.id: The OpenTelemetry trace ID

Log Collection and Shipping

MXCP produces several types of logs that need to be collected and shipped to centralized logging systems.

Log Types

Application Logs
- Format: Standard Python logging to stdout/stderr
- Content: Operational messages, warnings, errors
- Privacy: Should not contain sensitive data
Audit Logs
- Format: JSONL (JSON Lines)
- Location: Configured in mxcp-site.yml
- Content: Structured execution records
- Privacy: Sensitive data is redacted
Access Logs (HTTP mode)
- Format: Standard HTTP access logs
- Content: Request/response metadata
- Privacy: May contain IP addresses

Log Shipping with Promtail

Promtail is the recommended log shipper for Grafana Loki:

# promtail-config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # MXCP Application Logs (from Docker)
  - job_name: mxcp_app
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
        filters:
          - name: label
            values: ["com.mxcp.service=mxcp"]
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'
    pipeline_stages:
      - regex:
          expression: '^(?P<timestamp>\S+) - (?P<logger>\S+) - (?P<level>\w+) - (?P<message>.*)$'
      - labels:
          level:
          logger:
      - timestamp:
          format: RFC3339
          source: timestamp

  # MXCP Audit Logs (JSONL files)
  - job_name: mxcp_audit
    static_configs:
      - targets:
          - localhost
        labels:
          job: mxcp_audit
          __path__: /app/audit/*.jsonl
    pipeline_stages:
      - json:
          expressions:
            timestamp: timestamp
            trace_id: trace_id
            operation: operation_name
            duration: duration_ms
            status: status
            user: caller
      - labels:
          operation:
          status:
      - timestamp:
          format: RFC3339
          source: timestamp

Deploy with Docker Compose:

services:
  promtail:
    image: grafana/promtail:latest
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - mxcp-audit:/app/audit:ro
    command: -config.file=/etc/promtail/config.yml

  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

Log Shipping with Fluentd

For more complex log processing:

# fluent.conf
<source>
  @type forward
  port 24224
</source>

# Parse MXCP application logs
<filter mxcp.app>
  @type parser
  key_name log
  <parse>
    @type regexp
    expression /^(?<time>\S+) - (?<logger>\S+) - (?<level>\w+) - (?<message>.*)$/
    time_format %Y-%m-%dT%H:%M:%S
  </parse>
</filter>

# Parse MXCP audit logs
<source>
  @type tail
  path /app/audit/*.jsonl
  pos_file /var/log/fluentd/audit.pos
  tag mxcp.audit
  <parse>
    @type json
    time_key timestamp
    time_format %Y-%m-%dT%H:%M:%SZ
  </parse>
</source>

# Add metadata
<filter mxcp.**>
  @type record_transformer
  <record>
    environment ${ENV['ENVIRONMENT']}
    service mxcp
    hostname ${hostname}
  </record>
</filter>

# Output to Elasticsearch
<match mxcp.**>
  @type elasticsearch
  host elasticsearch
  port 9200
  index_name mxcp-%Y.%m.%d
  type_name _doc
  <buffer>
    @type file
    path /var/log/fluentd/buffer
    flush_interval 10s
  </buffer>
</match>

Complete Observability Stack

Here's a complete Docker Compose setup for local development with full observability:

version: '3.8'

services:
  # MXCP Application
  mxcp:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MXCP_CONFIG_PATH=/config/config.yml
    volumes:
      - ./config:/config:ro
      - ./:/app:ro
      - mxcp-audit:/app/audit
    labels:
      - "com.mxcp.service=mxcp"
    depends_on:
      - jaeger
      - loki

  # Tracing Backend
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "4318:4318"    # OTLP HTTP
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  # Metrics Backend
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

  # Logs Backend
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml

  # Log Collector
  promtail:
    image: grafana/promtail:latest
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - mxcp-audit:/app/audit:ro
    command: -config.file=/etc/promtail/config.yml

  # Visualization
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_INSTALL_PLUGINS=grafana-clock-panel
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning:ro

volumes:
  mxcp-audit:

Prometheus configuration (prometheus.yml):

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mxcp'
    static_configs:
      - targets: ['mxcp:9090']  # If prometheus_port is configured

Monitoring Without Built-in Health Endpoints

Since MXCP doesn't provide built-in health check endpoints, implement monitoring through:

Create a health check tool:

# tools/health.yml
mxcp: 1
tool:
  name: health_check
  description: "Basic health check endpoint"
  parameters: []
  return:
    type: object
    properties:
      status: { type: string }
      timestamp: { type: string }
  source:
    code: |
      SELECT 
        'healthy' as status,
        CURRENT_TIMESTAMP as timestamp

Process monitoring:

# Check if MXCP process is running
pgrep -f "mxcp serve" || exit 1

# Check if port is listening
nc -z localhost 8000 || exit 1

Synthetic monitoring:

# Test endpoint availability
curl -f http://localhost:8000/tools/list || exit 1

Audit Log Analysis

Query audit logs for operational insights:

# Export to DuckDB for analysis
mxcp log --export-duckdb /app/audit/audit.db

# Analyze with SQL
duckdb /app/audit/audit.db <<EOF
-- Request volume by hour
SELECT 
  DATE_TRUNC('hour', timestamp) as hour,
  COUNT(*) as requests,
  COUNT(DISTINCT caller) as unique_users
FROM logs
GROUP BY hour
ORDER BY hour DESC;

-- Error rate by endpoint
SELECT 
  name as endpoint,
  COUNT(*) as total_requests,
  COUNT(CASE WHEN status = 'error' THEN 1 END) as errors,
  ROUND(100.0 * COUNT(CASE WHEN status = 'error' THEN 1 END) / COUNT(*), 2) as error_rate
FROM logs
WHERE type = 'tool'
GROUP BY name
HAVING COUNT(*) > 10
ORDER BY error_rate DESC;

-- Average response time by endpoint
SELECT 
  name,
  AVG(duration_ms) as avg_duration_ms,
  MAX(duration_ms) as max_duration_ms,
  COUNT(*) as request_count
FROM logs
WHERE status = 'success'
GROUP BY name
ORDER BY avg_duration_ms DESC;

-- Policy violations
SELECT 
  timestamp,
  name,
  caller,
  reason
FROM logs
WHERE policy_decision = 'deny'
ORDER BY timestamp DESC
LIMIT 20;
EOF

Performance Monitoring

Key performance indicators to monitor:

Response Time (from traces/metrics):
- P50, P95, P99 latencies
- Breakdown by operation type
- Slow query identification
Error Rates (from metrics):
- Overall error percentage
- Errors by endpoint
- Authentication failures
Resource Usage (from metrics):
- Concurrent executions
- Memory usage
- Database connection pool
Business Metrics (from audit logs):
- Active users
- Most used endpoints
- Policy violation trends

Alerting Examples

Configure alerts in Grafana based on metrics:

# High error rate
alert: HighErrorRate
expr: |
  rate(mxcp_endpoint_errors_total[5m]) 
  / rate(mxcp_endpoint_requests_total[5m]) > 0.05
for: 5m
annotations:
  summary: "High error rate detected"
  description: "Error rate is above 5% for 5 minutes"

# Slow response time
alert: SlowResponseTime
expr: |
  histogram_quantile(0.95, 
    rate(mxcp_endpoint_duration_seconds_bucket[5m])
  ) > 1.0
for: 10m
annotations:
  summary: "Slow response times"
  description: "95th percentile response time is above 1 second"

# Authentication failures
alert: AuthenticationFailures
expr: |
  rate(mxcp_auth_attempts_total{status!="success"}[5m]) > 0.1
for: 5m
annotations:
  summary: "High authentication failure rate"
  description: "More than 0.1 auth failures per second"

Troubleshooting Telemetry

Enable debug logging:
```
mxcp serve --debug
```

Use console export to verify:

telemetry:
  enabled: true
  tracing:
    console_export: true  # See spans in logs

Check connectivity:

curl -X POST http://your-collector:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{}'

Common issues:
- Firewall blocking OTLP port (4318/4317)
- Invalid authentication headers
- Collector not configured for OTLP
- Incorrect endpoint URL format

Security Hardening

Container Security

Run as non-root user:
```
RUN useradd -m -u 1000 mxcp
USER mxcp
```

Minimal base image:

FROM python:3.11-slim
# Avoid full OS images

Security scanning:

# Scan for vulnerabilities
trivy image your-registry/mxcp:latest

Network Security

TLS everywhere:
- Use HTTPS for all external communication
- Enforce TLS 1.2 minimum
- Use strong cipher suites

Network policies (Kubernetes):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mxcp-network-policy
spec:
  podSelector:
    matchLabels:
      app: mxcp
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: nginx
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432

Secret Management

Never commit secrets:
```
# .gitignore
config.yml
*.key
*.crt
.env
```
Use secret management tools:
- Kubernetes Secrets
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault

Rotate credentials regularly:

# Vault configuration
vault:
  enabled: true
  address: "https://vault.example.com"
  token_env: "VAULT_TOKEN"

Access Control

Enable authentication:
```
auth:
  enabled: true
  provider: github
```

Implement policies:

policies:
  input:
    - condition: "user.role != 'admin'"
      action: deny
      reason: "Admin access required"

Audit everything:

audit:
  enabled: true
  path: "/app/audit/production.jsonl"

Logging Security Guidelines

NEVER Log Sensitive Data:

Tokens/Keys: Never log tokens, API keys, or secrets (not even truncated)
Credentials: No passwords, client secrets, or auth tokens
PII: Avoid logging email addresses, usernames, IP addresses unless necessary
Query Content: Don't log SQL queries or their parameters
Response Data: Never log actual data returned from queries

What TO Log:

Operation names and types
Timing and performance metrics
Status codes and error types
Counts and aggregates
Provider/service names (not user identities)

Log Level Guidelines:

ERROR/WARNING: Only operational context, no data
INFO: High-level operations without sensitive details
DEBUG: May include more context but still no secrets

High Availability & Scaling

Horizontal Scaling

MXCP can be scaled horizontally with considerations:

Stateless mode for multiple instances:
```
transport:
  http:
    stateless: true
```
Shared storage for DuckDB:
- Use read replicas for query distribution
- Consider DuckDB's limitations for concurrent writes

Load balancing:

# HAProxy example
backend mxcp_backend
  balance roundrobin
  option httpchk GET /health
  server mxcp1 10.0.1.10:8000 check
  server mxcp2 10.0.1.11:8000 check
  server mxcp3 10.0.1.12:8000 check

Database Considerations

DuckDB limitations:
- Single writer, multiple readers
- Not suitable for high-concurrency writes
- Consider read replicas for scaling reads
Alternative architectures:
- Use PostgreSQL for high-concurrency needs
- Implement caching layer (Redis)
- Use dbt to pre-aggregate data

Caching Strategy

Implement caching for performance:

# python/cache.py
import redis
import json
from functools import wraps

redis_client = redis.Redis(host='redis', port=6379, decode_responses=True)

def cache_result(ttl=300):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"{func.__name__}:{str(args)}:{str(kwargs)}"
            
            # Try cache first
            cached = redis_client.get(cache_key)
            if cached:
                return json.loads(cached)
            
            # Execute and cache
            result = func(*args, **kwargs)
            redis_client.setex(cache_key, ttl, json.dumps(result))
            return result
        return wrapper
    return decorator

@cache_result(ttl=600)
def expensive_query(param: str) -> dict:
    # Your expensive operation here
    pass

Backup & Recovery

Backup Strategy

Database backups:

# Backup DuckDB
cp /app/data/production.duckdb /backup/production-$(date +%Y%m%d-%H%M%S).duckdb

# Or use DuckDB export
duckdb /app/data/production.duckdb <<EOF
EXPORT DATABASE '/backup/export-$(date +%Y%m%d)' (FORMAT PARQUET);
EOF

Configuration backups:

# Backup configurations (excluding secrets)
tar -czf /backup/config-$(date +%Y%m%d).tar.gz \
  --exclude='*.key' \
  --exclude='config.yml' \
  /app/mxcp-site.yml \
  /app/tools \
  /app/resources \
  /app/prompts

Audit log backups:

# Rotate and backup audit logs
mv /app/audit/production.jsonl /backup/audit-$(date +%Y%m%d).jsonl
gzip /backup/audit-*.jsonl

Recovery Procedures

Database recovery:

# Stop MXCP
docker stop mxcp

# Restore database
cp /backup/production-20240115-120000.duckdb /app/data/production.duckdb

# Start MXCP
docker start mxcp

Point-in-time recovery:

-- Restore from export
IMPORT DATABASE '/backup/export-20240115';

Disaster Recovery

Multi-region setup:
- Replicate data to multiple regions
- Use geo-distributed load balancing
- Implement failover procedures
RTO/RPO targets:
- Define Recovery Time Objective
- Define Recovery Point Objective
- Test recovery procedures regularly

Troubleshooting

Common Issues

Container fails to start

# Check logs
docker logs mxcp

# Common causes:
# - Missing configuration files
# - Invalid YAML syntax
# - Missing environment variables
# - Permission issues

# Debug mode
docker run -it --rm \
  -v $(pwd):/app \
  -e MXCP_CONFIG_PATH=/app/config.yml \
  your-registry/mxcp:latest \
  mxcp validate --debug

Authentication failures

# Check OAuth configuration
curl -v https://api.example.com/github/callback

# Verify environment variables
docker exec mxcp env | grep -E "(CLIENT_ID|CLIENT_SECRET)"

# Check redirect URI match
# Must match exactly in OAuth provider settings

Database connection issues

-- Test DuckDB connection
docker exec mxcp duckdb /app/data/production.duckdb "SELECT 1;"

-- Check file permissions
docker exec mxcp ls -la /app/data/

-- Verify DuckDB isn't locked
lsof | grep production.duckdb

Performance issues

# Monitor resource usage
docker stats mxcp

# Check slow queries
mxcp log --since 1h | jq 'select(.duration_ms > 1000)'

# Analyze query patterns
mxcp log --export-duckdb perf.db
duckdb perf.db "SELECT name, AVG(duration_ms) as avg_ms, COUNT(*) as count FROM logs GROUP BY name ORDER BY avg_ms DESC;"

Debug Tools

Enable debug logging:
```
docker run -e MXCP_LOG_LEVEL=DEBUG ...
```
Interactive shell:
```
docker exec -it mxcp /bin/bash
```

Test endpoints:

docker exec mxcp mxcp run tool my_tool --param value=test

Production Checklist

Pre-Deployment

Deployment

Post-Deployment

Operational

Additional Resources

Configuration Guide - Detailed configuration options
Authentication Guide - OAuth provider setup
Production Methodology - Development best practices
Drift Detection - Schema monitoring
Audit Logging - Compliance and monitoring

Support

For operational support:

Check the troubleshooting section above
Review logs with debug mode enabled
Check telemetry traces for performance issues
Consult the community forums
Open an issue on GitHub with:
- MXCP version
- Deployment method (Docker/K8s/bare metal)
- Error logs
- Configuration (without secrets)

Table of Contents​

Architecture Overview​

Core Components​

File System Requirements​

Deployment Requirements​

System Requirements​

Python Dependencies​

Network Ports​

Containerization with Docker​

Basic Dockerfile​

Multi-Stage Build for Optimization​

Docker Compose Example​

Kubernetes Deployment​

Systemd Service Management​

Systemd Service Files​

Basic Service Configuration​

Environment File​

Audit Cleanup Timer​

Service File​

Timer File​

Installation and Management​

Initial Setup​

Service Management​

Multiple Profiles​

Integration with System Monitoring​

Systemd Security Features​

Troubleshooting Systemd Services​

Cron Alternative​

Configuration Management​

Environment Variables​

Configuration Files​

Site Configuration (mxcp-site.yml)​

User Configuration (config.yml)​

Volume Mounts​

Authentication Setup​

OAuth Provider Configuration​

GitHub OAuth​

Keycloak Integration​

Reverse Proxy Configuration​

Nginx Example​

Traefik Example​

Signal Handling & Hot Reload​

SIGHUP Configuration Reload​

Graceful Shutdown​

Monitoring & Observability​

Overview of Observability Signals​

OpenTelemetry Integration​

What is OpenTelemetry?​

Enabling Telemetry​

What Gets Traced?​

Metrics Collected​

1. Direct Metrics (Default)​

2. Span Metrics (Required for Performance Metrics)​

Privacy and Security​

Quick Start with Jaeger​

Production Telemetry Backends​

Grafana Cloud​

AWS X-Ray​

Azure Monitor​

Correlating Traces with Audit Logs​

Log Collection and Shipping​

Log Types​

Log Shipping with Promtail​

Log Shipping with Fluentd​

Complete Observability Stack​

Monitoring Without Built-in Health Endpoints​

Audit Log Analysis​

Performance Monitoring​

Alerting Examples​

Troubleshooting Telemetry​

Security Hardening​

Container Security​

Network Security​

Secret Management​

Access Control​

Logging Security Guidelines​

High Availability & Scaling​

Horizontal Scaling​

Database Considerations​

Caching Strategy​

Table of Contents