Skip to content
Star -

Monitoring

Related Topics: Deployment (production setup) | Admin Socket (health checks) | Auditing (operation logs) | Drift Detection (schema changes)

This guide covers monitoring and observability for MXCP deployments, including tracing, metrics, and audit log analysis.

MXCP provides four observability signals:

SignalPurposeOutput
App LogsServer events, errorsstdout/stderr
Audit LogsOperation historyJSONL files
OpenTelemetryTracing and metricsOTLP exporters
Admin SocketReal-time statusUnix socket API

MXCP supports OpenTelemetry for distributed tracing and metrics.

Terminal window
# Standard OpenTelemetry variables
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
export OTEL_SERVICE_NAME=mxcp-production
export OTEL_RESOURCE_ATTRIBUTES="environment=production,team=platform"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token"
# MXCP-specific controls
export MXCP_TELEMETRY_ENABLED=true
export MXCP_TELEMETRY_TRACING_CONSOLE=false # true for debugging
export MXCP_TELEMETRY_METRICS_INTERVAL=60 # seconds

Precedence: Environment variables override config file settings.

~/.mxcp/config.yml
projects:
my-project:
profiles:
default:
telemetry:
enabled: true
endpoint: http://otel-collector:4318
service_name: mxcp-production
environment: production
headers:
Authorization: "Bearer ${OTEL_TOKEN}"
tracing:
enabled: true
console_export: false # true for debugging
metrics:
enabled: true
export_interval: 60 # seconds
VariableDescriptionDefault
MXCP_TELEMETRY_ENABLEDEnable/disable telemetryfalse
MXCP_TELEMETRY_TRACING_CONSOLEExport traces to consolefalse
MXCP_TELEMETRY_METRICS_INTERVALMetrics export interval (seconds)60
Terminal window
# Start Jaeger
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
-e COLLECTOR_OTLP_ENABLED=true \
jaegertracing/all-in-one:latest
# Configure MXCP
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=mxcp
# Start MXCP
mxcp serve
# View traces at http://localhost:16686
docker-compose.yml
services:
mxcp:
build: .
environment:
- MXCP_TELEMETRY_ENABLED=true
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
- OTEL_SERVICE_NAME=mxcp
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "4318:4318" # OTLP HTTP
environment:
- COLLECTOR_OTLP_ENABLED=true

MXCP automatically instruments:

  1. Endpoint execution - Overall request handling
  2. Authentication - Token validation, user context
  3. Policy enforcement - Input/output policy evaluation
  4. Database operations - SQL queries (hashed for privacy)
  5. Python execution - Function calls and timing
mxcp.endpoint.execute (155ms) — Root span with endpoint context
├── mxcp.policy.enforce_input (5ms)
├── mxcp.validation.input (2ms)
├── mxcp.execution_engine.execute (138ms)
│ ├── mxcp.duckdb.execute (120ms)
│ └── mxcp.validation.output (3ms)
└── mxcp.policy.enforce_output (20ms)

Endpoint attributes (on mxcp.endpoint.execute):

AttributeDescription
mxcp.endpoint.nameTool/resource/prompt name
mxcp.endpoint.type”tool”, “resource”, or “prompt”
mxcp.auth.authenticatedWhether user is authenticated
mxcp.auth.providerOAuth provider name
mxcp.session.idMCP session ID
mxcp.policy.decision”allow”, “deny”, “filter”, “mask”
mxcp.policy.rules_evaluatedNumber of policy rules checked

Execution attributes (on mxcp.execution_engine.execute):

AttributeDescription
mxcp.execution.language”sql” or “python”
mxcp.params.countNumber of parameters passed
mxcp.has_input_schemaWhether input validation was performed
mxcp.has_output_schemaWhether output validation was performed
mxcp.result.countNumber of rows/items returned

Database attributes (on mxcp.duckdb.execute):

AttributeDescription
db.system”duckdb”
db.statement.hashSHA256 hash of query (for privacy)
db.operation”SELECT”, “INSERT”, etc.
db.parameters.countNumber of parameters
db.readonlyWhether connection is readonly
db.rows_affectedResult row count

MXCP exports these metrics directly:

Counters:

MetricDescription
mxcp.endpoint.requests_totalTotal requests (labels: endpoint, status)
mxcp.endpoint.errors_totalErrors by type (labels: endpoint, error_type)
mxcp.policy.evaluations_totalPolicy evaluations
mxcp.policy.denials_totalPolicy denials

Gauges:

MetricDescription
mxcp.endpoint.concurrent_executionsCurrently active requests

Important: MXCP follows modern observability patterns - performance metrics (latency histograms, percentiles) are derived from trace spans, not exported directly.

Configure your OpenTelemetry Collector with the spanmetrics processor:

otel-collector-config.yaml
processors:
spanmetrics:
metrics_exporter: prometheus
latency_histogram_buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
dimensions:
- name: mxcp.endpoint.name
- name: mxcp.endpoint.type
- name: mxcp.execution.language
- name: mxcp.auth.provider
- name: mxcp.policy.decision
service:
pipelines:
traces:
processors: [spanmetrics]
exporters: [otlp/tempo]
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]

This generates:

  • latency_bucket - Latency histogram for P50/P95/P99 calculations
  • calls_total - Request rate by span
  • Error rates via status_code="ERROR"
  • No manual timing code needed
  • Automatic percentile calculations
  • Perfect correlation between traces and metrics
  • Consistent across all operations

MXCP telemetry is privacy-first and NEVER includes:

  • Actual SQL queries (only hashed signatures)
  • Parameter values (only parameter names/types)
  • Result data (only counts and types)
  • User credentials or tokens
  • Python code content
  • Any PII or sensitive business data

When telemetry is enabled, audit logs include trace IDs for correlation:

{
"timestamp": "2024-01-15T10:30:45Z",
"session_id": "73cb4ef4-a359-484f-a040-c1eb163abb57",
"trace_id": "a1b2c3d4e5f6g7h8",
"operation_name": "query_users",
"duration_ms": 125,
"status": "success"
}

Query by trace ID using grep on audit logs:

Terminal window
grep "a1b2c3d4e5f6g7h8" audit/logs.jsonl

Or export to DuckDB and query:

Terminal window
mxcp log --export-duckdb audit.db
duckdb audit.db "SELECT * FROM logs WHERE trace_id = 'a1b2c3d4e5f6g7h8'"

MXCP works with any OpenTelemetry-compatible backend.

Terminal window
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317
Terminal window
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64-encoded-creds>"
export OTEL_SERVICE_NAME=mxcp-prod
Terminal window
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Datadog agent listens on 4317 for OTLP
Terminal window
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
Terminal window
export MXCP_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4317
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_LICENSE_KEY"

Health checks are available via the admin socket. See Admin Socket for details.

Terminal window
# Enable admin socket
export MXCP_ADMIN_ENABLED=true
export MXCP_ADMIN_SOCKET=/run/mxcp/mxcp.sock
# Health check
curl --unix-socket /run/mxcp/mxcp.sock http://localhost/health
Terminal window
# Watch for errors
tail -f audit/logs.jsonl | grep '"operation_status":"error"'
# Watch for policy denials
tail -f audit/logs.jsonl | grep '"policy_decision":"deny"'
Terminal window
# Errors in last hour
mxcp log --status error --since 1h
# Export for analysis
mxcp log --export-duckdb audit.db
-- Error rate by hour
SELECT
DATE_TRUNC('hour', timestamp) as hour,
COUNT(CASE WHEN operation_status = 'error' THEN 1 END) * 100.0 / COUNT(*) as error_rate
FROM logs
GROUP BY hour
ORDER BY hour DESC;
-- Slowest endpoints
SELECT
operation_name,
AVG(duration_ms) as avg_ms,
MAX(duration_ms) as max_ms,
COUNT(*) as calls
FROM logs
WHERE operation_status = 'success'
GROUP BY operation_name
ORDER BY avg_ms DESC
LIMIT 10;
-- Policy violations
SELECT
operation_name,
policy_reason,
COUNT(*) as denials
FROM logs
WHERE policy_decision = 'deny'
GROUP BY operation_name, policy_reason
ORDER BY denials DESC;

Configure alerts in your observability platform using spanmetrics-derived metrics.

MetricDescriptionAlert Threshold
Error rate% of failed requests> 5%
Response timeP95 latency> 1000ms
Policy denialsUnauthorized attempts> 10/min
Active requestsConcurrent requests> 100
# High error rate (using spanmetrics)
sum(rate(calls_total{status_code="ERROR", span_name="mxcp.endpoint.execute"}[5m]))
/ sum(rate(calls_total{span_name="mxcp.endpoint.execute"}[5m])) > 0.05
# Slow requests (P95 latency using spanmetrics)
histogram_quantile(0.95, rate(latency_bucket{span_name="mxcp.endpoint.execute"}[5m])) > 1
# Policy denials
rate(mxcp_policy_denials_total[5m]) > 0.1
check-mxcp.sh
#!/bin/bash
SOCKET="/run/mxcp/mxcp.sock"
# Check if socket exists
if [ ! -S "$SOCKET" ]; then
echo "ERROR: Admin socket not found"
exit 1
fi
# Get status
STATUS=$(curl -s --unix-socket $SOCKET http://localhost/status)
# Parse status
VERSION=$(echo $STATUS | jq -r '.version')
UPTIME=$(echo $STATUS | jq -r '.uptime')
TOOLS=$(echo $STATUS | jq -r '.endpoints.tools')
echo "MXCP Status"
echo "==========="
echo "Version: $VERSION"
echo "Uptime: $UPTIME"
echo "Tools: $TOOLS"
# Check for issues
RELOAD_STATUS=$(echo $STATUS | jq -r '.reload.last_reload_status')
if [ "$RELOAD_STATUS" = "error" ]; then
echo "WARNING: Last reload failed"
exit 1
fi
echo "Status: OK"
exit 0
#!/usr/bin/env python3
"""Analyze MXCP audit logs."""
import json
import sys
from collections import defaultdict
from datetime import datetime, timedelta
def analyze_logs(log_file, hours=24):
cutoff = datetime.utcnow() - timedelta(hours=hours)
stats = defaultdict(int)
errors = []
with open(log_file) as f:
for line in f:
entry = json.loads(line)
ts = datetime.fromisoformat(entry['timestamp'].replace('Z', '+00:00'))
if ts.replace(tzinfo=None) < cutoff:
continue
stats['total'] += 1
stats[entry['operation_status']] += 1
if entry['operation_status'] == 'error':
errors.append(entry)
print(f"Stats (last {hours}h):")
print(f" Total requests: {stats['total']}")
print(f" Successful: {stats['success']}")
print(f" Errors: {stats['error']}")
print(f" Error rate: {stats['error'] / max(stats['total'], 1) * 100:.2f}%")
if errors:
print(f"\nRecent errors:")
for e in errors[-5:]:
print(f" - {e['operation_name']}: {e.get('error', 'Unknown')}")
if __name__ == '__main__':
log_file = sys.argv[1] if len(sys.argv) > 1 else 'audit/logs.jsonl'
analyze_logs(log_file)

Traces not appearing:

  1. Verify MXCP_TELEMETRY_ENABLED=true
  2. Check collector endpoint: curl -X POST http://collector:4318/v1/traces
  3. Enable debug: mxcp serve --debug

Performance metrics missing:

  • Spanmetrics processor must be configured in your collector
  • See the spanmetrics configuration above

Audit logs empty:

  • Check audit.enabled: true in mxcp-site.yml
  • Verify audit.path directory is writable