MXCP Type System
MXCP's type system provides a robust foundation for defining and validating data structures in your endpoints. It combines the best aspects of JSON Schema, OpenAPI, and AI function calling conventions while maintaining compatibility with SQL/DuckDB types.
Core Concepts
Type Definitions
Every parameter and return value in MXCP endpoints is defined using a type definition. These definitions support:
- Base types (string, number, integer, boolean, array, object)
- Format annotations for specialized string types
- Validation constraints (min/max values, lengths, etc.)
- Nested structures (arrays of objects, etc.)
Type Safety
MXCP enforces strict type checking to ensure:
- Input validation before execution
- Output validation after execution
- Compatibility with DuckDB types
- Safe serialization/deserialization
Supported Types
Base Types
Type | Description | Example | DuckDB Type |
---|---|---|---|
string | Text values | "hello" | VARCHAR |
number | Floating-point number | 3.14 | DOUBLE |
integer | Whole number | 42 | INTEGER |
boolean | true or false | true | BOOLEAN |
array | Ordered list of elements | ["a", "b", "c"] | ARRAY |
object | Key-value structure with schema | { "foo": 1 } | STRUCT |
String Format Annotations
MXCP uses format annotations to specialize string types into well-defined subtypes. These formats are mandatory in certain contexts and control serialization, validation, and SQL/DuckDB type mapping.
Format | Description | Example | DuckDB Type |
---|---|---|---|
RFC 5322 email address | "alice@example.com" | VARCHAR | |
uri | URI/URL string | "https://raw-labs.com" | VARCHAR |
date | ISO 8601 date | "2023-01-01" | DATE |
time | ISO 8601 time | "14:30:00" | TIME |
date-time | ISO 8601 timestamp (Z or offset) | "2023-01-01T14:30:00Z" | TIMESTAMP WITH TIME ZONE |
duration | ISO 8601 duration | "P1DT2H" | INTERVAL |
timestamp | Unix timestamp (seconds since epoch) | 1672531199 | TIMESTAMP (converted) |
Note: Format annotations are validated and converted automatically when passed to SQL endpoints. For example,
timestamp
values are transformed into proper DuckDBTIMESTAMP
types during execution.
Type Annotations
Each type supports standard JSON Schema annotations:
Common Annotations
description
: Human-readable description of the typedefault
: Default value if none is providedexamples
: Example values for documentationenum
: List of allowed valuesrequired
: Whether the field is required (for objects)
String Annotations
minLength
: Minimum string lengthmaxLength
: Maximum string lengthformat
: Specialized string format (see above)
Numeric Annotations
minimum
: Minimum value (inclusive)maximum
: Maximum value (inclusive)exclusiveMinimum
: Minimum value (exclusive)exclusiveMaximum
: Maximum value (exclusive)multipleOf
: Value must be a multiple of this number
Array Annotations
minItems
: Minimum number of itemsmaxItems
: Maximum number of itemsuniqueItems
: Whether items must be uniqueitems
: Schema for array items
Object Annotations
properties
: Schema for object propertiesrequired
: List of required propertiesadditionalProperties
: Whether to allow undefined properties
Type Conversion
MXCP automatically handles type conversion between:
- JSON/YAML input → Python types
- Python types → DuckDB types
- DuckDB results → Python types
- Python types → JSON/YAML output
Example Type Definition
parameters:
- name: user_id
type: string
format: email
description: "User's email address"
examples: ["user@example.com"]
- name: age
type: integer
minimum: 0
maximum: 120
description: "User's age in years"
- name: preferences
type: object
properties:
theme:
type: string
enum: ["light", "dark"]
notifications:
type: boolean
default: true
required: ["theme"]
Limitations
MXCP intentionally restricts schema complexity to promote clarity and compatibility. The following features are not supported:
$ref
(no schema reuse or references)allOf
,oneOf
,anyOf
(no union or intersection types)patternProperties
,pattern
(no regex-based constraints)- Conditional schemas (
if
/then
)
This allows MXCP endpoints to remain:
- Static and serializable
- Directly usable in SQL-based execution
- Compatible with AI tooling
- Easy to validate and test
Best Practices
-
Use Format Annotations
- Always specify
format
for specialized string types - This ensures proper validation and DuckDB type mapping
- Always specify
-
Provide Examples
- Include
examples
for better documentation - Helps with testing and validation
- Include
-
Be Explicit
- Define all required fields
- Set
additionalProperties: false
when appropriate - Use
enum
for constrained choices
-
Validate Early
- Use
mxcp validate
to check type definitions - Test with example values before deployment
- Use
Error Handling
MXCP provides clear error messages for type validation failures:
- Type mismatches
- Format validation errors
- Constraint violations
- Missing required fields
Example error messages:
Error: Invalid email format: not-an-email
Error: Value must be >= 0
Error: String must be at least 3 characters long
Error: Missing required properties: name, email
Sensitive Data Marking
Fields containing sensitive data can be marked with the sensitive
flag. This provides:
- Automatic redaction in audit logs
- Policy-based filtering for access control
- Clear documentation of sensitive data
The sensitive
flag can be applied to any type - strings, numbers, integers, booleans, arrays, or objects. When a type is marked as sensitive, it will be completely redacted in logs and can be filtered out by policies.
Example: Marking Sensitive Fields
parameters:
- name: username
type: string
description: User's username
- name: password
type: string
sensitive: true # This field will be redacted in logs
description: User's password
- name: balance
type: number
sensitive: true # Numbers can also be sensitive
description: Account balance
- name: config
type: object
properties:
host:
type: string
api_key:
type: string
sensitive: true # Nested sensitive field
Marking Entire Objects as Sensitive
You can mark an entire object or array as sensitive:
return:
type: object
properties:
user_info:
type: object
properties:
name:
type: string
email:
type: string
credentials:
type: object
sensitive: true # Entire object is sensitive
properties:
token:
type: string
refresh_token:
type: string
Using with Policies
The filter_sensitive_fields
policy action automatically removes all fields marked as sensitive:
policies:
output:
- condition: "user.role != 'admin'"
action: filter_sensitive_fields
reason: "Non-admin users cannot see sensitive data"
This is more maintainable than filter_fields
as sensitive fields are defined once in the schema rather than repeated in policies.
Examples
Simple Parameter Types
parameters:
- name: user_id
type: integer
description: Unique user identifier
minimum: 1
- name: email
type: string
format: email
description: User's email address
- name: is_active
type: boolean
description: Whether the user is active
default: true
Complex Object Types
parameters:
- name: filter
type: object
description: Filter criteria
properties:
status:
type: string
enum: ["active", "inactive", "pending"]
created_after:
type: string
format: date-time
tags:
type: array
items:
type: string
minItems: 1
required: ["status"]
Return Type Definition
return:
type: array
description: List of matching users
items:
type: object
properties:
id:
type: integer
name:
type: string
email:
type: string
format: email
api_token:
type: string
sensitive: true # Automatically filtered for non-admin users
created_at:
type: string
format: date-time
required: ["id", "name", "email"]
Best Practices
- Always define types - Even for simple parameters
- Use constraints - They provide validation and documentation
- Mark sensitive fields - Use the
sensitive
flag for any data that should be protected - Provide descriptions - Help users understand what each field is for
- Use enums - When there's a fixed set of valid values
- Define return types - Helps with validation and client code generation
- Group sensitive data - Consider putting all sensitive fields in a dedicated object that can be marked sensitive as a whole