SQL Reference
Related Topics: SQL Endpoints Tutorial (step-by-step guide) | DuckDB Integration (extensions, data sources) | Authentication (user functions)
MXCP uses DuckDB SQL syntax with additional built-in functions for authentication and access control.
DuckDB SQL
Section titled “DuckDB SQL”MXCP endpoints use DuckDB SQL, which extends PostgreSQL syntax with analytical features:
- PostgreSQL Compatible: Most PostgreSQL queries work unchanged
- Column-Store Engine: Optimized for analytical queries
- Rich Type System: Arrays, structs, maps, and more
- Window Functions: Full analytical window function support
- CTEs: Common Table Expressions with recursive support
Built-in Functions
Section titled “Built-in Functions”Authentication Functions
Section titled “Authentication Functions”When authentication is enabled, these functions provide user information:
| Function | Returns | Description |
|---|---|---|
get_username() | VARCHAR | Authenticated user’s username |
get_user_email() | VARCHAR | User’s email address |
get_user_provider() | VARCHAR | OAuth provider (github, atlassian, etc.) |
get_user_external_token() | VARCHAR | User’s OAuth token |
All functions return NULL when authentication is disabled or user is not authenticated.
Examples
Section titled “Examples”-- Filter data by authenticated userSELECT * FROM projectsWHERE owner = get_username();
-- Access external API with user's tokenSELECT *FROM read_json_auto( 'https://api.github.com/user/repos', headers = MAP { 'Authorization': 'Bearer ' || get_user_external_token(), 'User-Agent': 'MXCP-' || get_username() });
-- Audit loggingINSERT INTO audit_log (user, action, timestamp)VALUES (get_username(), 'query_executed', NOW());
-- User-specific dataSELECT id, title, CASE WHEN owner = get_username() THEN content ELSE '[Restricted]' END as contentFROM documents;Request Header Functions
Section titled “Request Header Functions”HTTP transport exposes request headers to DuckDB, allowing you to audit or forward them without extra plumbing:
| Function | Returns | Description |
|---|---|---|
get_request_header(name) | VARCHAR | Specific header value (NULL if missing) |
get_request_headers_json() | VARCHAR | All headers as JSON (NULL if no headers) |
Only available when using HTTP transport (streamable-http, sse). Returns NULL for stdio transport.
-- Forward authorization headerSELECT http_post( 'https://example.internal/audit', headers := map { 'Authorization': get_request_header('Authorization') }, body := json('{"status": "ok"}'));
-- Access custom headersSELECT get_request_header('X-Request-ID') as request_id;
-- Get all headersSELECT get_request_headers_json() as all_headers;Parameter Binding
Section titled “Parameter Binding”Use named parameters with $ prefix:
-- Simple parametersSELECT * FROM users WHERE id = $user_id;
-- Multiple parametersSELECT * FROM ordersWHERE customer_id = $customer_id AND status = $status AND created_at > $since;
-- String parametersSELECT * FROM productsWHERE name LIKE '%' || $search || '%';
-- Array parametersSELECT * FROM usersWHERE id IN (SELECT unnest($user_ids));In YAML Endpoints
Section titled “In YAML Endpoints”tool: name: search_users parameters: - name: department type: string - name: min_age type: integer source: code: | SELECT * FROM users WHERE department = $department AND age >= $min_ageFrom CLI
Section titled “From CLI”mxcp query "SELECT * FROM users WHERE age > $age" --param age=18mxcp run tool search_users --param department=Engineering --param min_age=25Common Extensions
Section titled “Common Extensions”MXCP typically loads these DuckDB extensions:
httpfs
Section titled “httpfs”Read from HTTP/S3 endpoints:
-- HTTP filesSELECT * FROM read_csv('https://example.com/data.csv');
-- S3 filesSELECT * FROM read_parquet('s3://bucket/data.parquet');JSON parsing and manipulation:
-- Extract JSON field (returns JSON type)SELECT json_extract(data, '$.name') as name FROM events;
-- Extract as stringSELECT data->>'$.name' as name FROM events;
-- Unnest JSON arrays (convert to list first)SELECT unnest(from_json(data->'$.items', '["JSON"]')) as item FROM orders;parquet
Section titled “parquet”Parquet file support:
-- Read ParquetSELECT * FROM read_parquet('data/*.parquet');
-- Write ParquetCOPY (SELECT * FROM users) TO 'output.parquet';postgres
Section titled “postgres”PostgreSQL connectivity:
-- Attach databaseATTACH 'postgresql://user:pass@host:5432/db' AS pg;
-- Query tablesSELECT * FROM pg.public.users;Excel file support:
-- Read ExcelSELECT * FROM read_xlsx('report.xlsx', sheet='Data');Analytical Features
Section titled “Analytical Features”Window Functions
Section titled “Window Functions”-- Running totalsSELECT date, amount, SUM(amount) OVER (ORDER BY date) as running_totalFROM sales;
-- RankingSELECT name, score, RANK() OVER (ORDER BY score DESC) as rankFROM players;
-- Partitioned calculationsSELECT department, employee, salary, AVG(salary) OVER (PARTITION BY department) as dept_avgFROM employees;Common Table Expressions
Section titled “Common Table Expressions”-- Simple CTEWITH active_users AS ( SELECT * FROM users WHERE status = 'active')SELECT * FROM active_users WHERE age > 25;
-- Multiple CTEsWITH orders_2024 AS ( SELECT * FROM orders WHERE year = 2024 ), high_value AS ( SELECT * FROM orders_2024 WHERE total > 1000 )SELECT customer_id, COUNT(*) as high_value_ordersFROM high_valueGROUP BY customer_id;
-- Recursive CTEWITH RECURSIVE subordinates AS ( SELECT id, name, manager_id, 1 as level FROM employees WHERE id = $root_id
UNION ALL
SELECT e.id, e.name, e.manager_id, s.level + 1 FROM employees e JOIN subordinates s ON e.manager_id = s.id)SELECT * FROM subordinates;PIVOT and UNPIVOT
Section titled “PIVOT and UNPIVOT”Transform data between wide and long formats:
- PIVOT: Converts rows to columns (long → wide). Takes distinct values from one column and creates new columns for each value.
- UNPIVOT: Converts columns to rows (wide → long). Takes multiple columns and stacks them into key-value pairs.
-- Pivot: Convert rows to columns-- Input: sales(product, month, amount) with months as rows-- Output: One row per product with jan, feb, mar as columnsPIVOT sales ON month USING SUM(amount);
-- Unpivot: Convert columns to rows-- Input: monthly_data with jan, feb, mar columns-- Output: Rows with (month, amount) pairsUNPIVOT monthly_data ON jan, feb, mar INTO NAME month VALUE amount;List Aggregations
Section titled “List Aggregations”Aggregate values into arrays instead of scalar results. Useful for collecting related items or preserving ordering within groups.
-- Collect names into arrays per departmentSELECT department, LIST(name) as employeesFROM usersGROUP BY department;
-- Collect with ordering (highest salaries first)SELECT department, ARRAY_AGG(salary ORDER BY salary DESC) as salariesFROM employeesGROUP BY department;Type System
Section titled “Type System”Basic Types
Section titled “Basic Types”| Type | Example |
|---|---|
VARCHAR | 'hello' |
INTEGER | 42 |
DOUBLE | 3.14 |
BOOLEAN | true, false |
DATE | DATE '2024-01-01' |
TIMESTAMP | TIMESTAMP '2024-01-01 12:00:00' |
Complex Types
Section titled “Complex Types”DuckDB supports nested data structures for representing complex data:
-- Arrays: ordered collections of same-type valuesSELECT [1, 2, 3] as numbers;SELECT array_agg(name) as names FROM users;
-- Structs: named fields (like objects/records)SELECT {'name': 'Alice', 'age': 30} as person;SELECT struct_pack(name := 'Alice', age := 30);
-- Maps: key-value pairs with dynamic keysSELECT MAP {'key1': 'value1', 'key2': 'value2'};Type Casting
Section titled “Type Casting”Convert between types explicitly. Use TRY_CAST when the conversion might fail to avoid errors.
-- Explicit castSELECT CAST(age AS VARCHAR) as age_str FROM users;
-- Cast shorthand (PostgreSQL-style)SELECT age::VARCHAR as age_str FROM users;
-- Try cast (returns NULL on failure instead of error)SELECT TRY_CAST(value AS INTEGER) FROM data;Best Practices
Section titled “Best Practices”1. Use Parameters
Section titled “1. Use Parameters”Always use parameters for user input:
-- Good: ParameterizedSELECT * FROM users WHERE id = $user_id;
-- Bad: String concatenation (SQL injection risk)SELECT * FROM users WHERE id = ''' + user_id + ''';2. Limit Results
Section titled “2. Limit Results”Always limit large result sets:
SELECT * FROM logsORDER BY timestamp DESCLIMIT 100;3. Use CTEs for Clarity
Section titled “3. Use CTEs for Clarity”-- Clear and readableWITH recent_orders AS ( SELECT * FROM orders WHERE date > $since),order_totals AS ( SELECT customer_id, SUM(amount) as total FROM recent_orders GROUP BY customer_id)SELECT * FROM order_totals WHERE total > 1000;4. Index-Friendly Queries
Section titled “4. Index-Friendly Queries”-- Use equality for indexed columnsSELECT * FROM users WHERE id = $user_id;
-- Avoid functions on indexed columns-- Bad: WHERE LOWER(email) = 'test@example.com'-- Good: WHERE email = 'test@example.com'Next Steps
Section titled “Next Steps”- Python Reference - Runtime API
- DuckDB Documentation - Full SQL reference
- Tutorials - SQL endpoint examples