Overview
The/v1/messages endpoint provides full Anthropic API compatibility. Clients using the Anthropic SDK can use NanoGPT by simply changing the base URL — no code changes required.
NanoGPT accepts requests in the Anthropic Messages format, routes them to the requested NanoGPT model, and returns responses back in the Anthropic Messages shape.
For non‑Anthropic models, NanoGPT transparently translates the request to an OpenAI-style chat format internally and then converts the response back to Anthropic Messages format.
This endpoint supports:
- Text generation (streaming and non-streaming)
- Multi-turn conversations
- Tool use (function calling)
- Vision (images) and document/PDF processing
- Extended thinking (reasoning models)
- Prompt caching
Endpoint
Authentication
Use either header:Authorization: Bearer YOUR_API_KEYx-api-key: YOUR_API_KEY
Request Format
Required Fields
| Field | Type | Description |
|---|---|---|
model | string | Model identifier (any NanoGPT model, including non‑Anthropic models) |
max_tokens | number | Maximum tokens to generate (must be a finite number) |
messages | array | Array of conversation messages |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
system | string or array | — | System prompt (string or array of text blocks) |
stream | boolean | false | Enable streaming responses |
temperature | number | — | Sampling temperature |
top_p | number | — | Nucleus sampling parameter |
top_k | number | — | Top-k sampling parameter |
stop_sequences | string[] | — | Custom stop sequences |
tools | array | — | Tool definitions for function calling |
tool_choice | string or object | — | Control tool selection behavior |
disable_parallel_tool_use | boolean | — | Disable parallel tool calls |
thinking | object | — | Enable extended thinking for supported models |
metadata | object | — | Request metadata (user or user_id) |
Message Format
Messages must have arole (user or assistant) and content:
Content Block Types
Text Block
Image Block (for vision-capable models)
image/jpeg, image/png, image/gif, image/webp
Document Block (for PDF-capable models)
Tool Use Block (in assistant messages)
Tool Result Block (in user messages)
Tool Definitions
Tool Choice Options
| Value | Description |
|---|---|
"auto" | Model may use tools if appropriate |
"none" | Disable tool use for this request |
"any" | Model must use at least one tool |
"required" | Model must use at least one tool |
{"type": "tool", "name": "tool_name"} | Force use of a specific tool |
Extended Thinking
For models that support extended thinking (reasoning):budget_tokensmust be >= 1024budget_tokensmust be <max_tokens- Model must support thinking (e.g., models with
-thinkingsuffix)
thinking parameter and routes the request to the base model.
Response Format
Non-Streaming Response
Stop Reasons
| Stop Reason | Description |
|---|---|
end_turn | Natural end of response |
max_tokens | Hit token limit |
stop_sequence | Hit a custom stop sequence |
tool_use | Model wants to use a tool |
content_filter | Content was filtered |
Streaming Response (SSE)
See also: Streaming Protocol (SSE). Whenstream: true, the response is Server-Sent Events with named event types:
Event: message_start
Event: content_block_start
Event: content_block_delta
Event: content_block_stop
Event: message_delta
Event: message_stop
Streaming Tool Use
When the model uses tools during streaming:Supported Models
Claude Models (Full Support)
| Model | Streaming | Tools | Vision | Thinking |
|---|---|---|---|---|
| claude-opus-4-6 | Yes | Yes | Yes | Yes* |
| claude-opus-4-5-20251101 | Yes | Yes | Yes | Yes* |
| claude-opus-4-1-20250805 | Yes | Yes | Yes | Yes* |
| claude-sonnet-4-5-20250929 | Yes | Yes | Yes | Yes* |
-thinking suffix for extended reasoning.
Other Models (Via Compatibility Layer)
The v1/messages endpoint also works with non-Anthropic models:| Model | Streaming | Tools | Vision |
|---|---|---|---|
| openai/gpt-5.2 | Yes | Yes | Yes |
| google/gemini-3-flash-preview | Yes | Yes | Yes |
| google/gemini-3-pro-preview | Yes | Yes | Yes |
| zai-org/glm-4.7 | Yes | Yes | — |
Prompt Caching
For the full guide (supported models, thresholds, pricing, and usage fields), see Prompt Caching. Enable prompt caching to reduce costs on repeated prompts.Enable via Header
TTL Options
- Default: 5-minute cache TTL
- Extended: Add
extended-cache-ttl-2025-04-11to request 1-hour TTL
Cache Control in Content
Addcache_control to content blocks:
Cache Usage in Response
Error Handling
For a general guide across NanoGPT APIs, see Error Handling.Error Response Format
Error Types
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Invalid request (missing fields, bad format) |
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | Insufficient permissions |
| 404 | not_found_error | Unknown model |
| 429 | rate_limit_error | Rate limit exceeded |
| 500+ | api_error | Server error |
X-Request-ID header for support requests.
Headers
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes* | Bearer token authentication |
x-api-key | Yes* | Alternative API key header |
Content-Type | Yes | Must be application/json |
anthropic-beta | No | Enable beta features (e.g., prompt caching) |
anthropic-version | No | API version (accepted but not required) |
Authorization or x-api-key is required.
BYOK Headers
For Bring Your Own Key:| Header | Description |
|---|---|
x-use-byok | Set to true to use your own API key |
x-byok-provider | Provider name for your key |
Examples
Basic Request (cURL)
Streaming Request (cURL)
With Tools (cURL)
Anthropic SDK (Node.js)
Anthropic SDK with Streaming (Node.js)
Anthropic SDK with Prompt Caching (Node.js)
Vision Example (Node.js)
Python SDK
Limits
| Limit | Value |
|---|---|
| Request timeout | 800 seconds |
| Tool argument size | ~100 KB per tool call |
| Image types | JPEG, PNG, GIF, WebP |
Limitations
- GPU-TEE models do not support streaming through
POST /api/v1/messages. UsePOST /api/v1/chat/completionsif you need streaming with GPU-TEE models.
Migration from Anthropic
To migrate from Anthropic’s API to NanoGPT:-
Change the base URL:
- From:
https://api.anthropic.com - To:
https://nano-gpt.com/api
https://nano-gpt.com/api/v1/messages - From:
- Use your NanoGPT API key instead of your Anthropic key
- No other code changes required — the API is fully compatible
Service tier compatibility
Anthropic-style service tiers are normalized when routing to providers that support service tiers:standard→defaultpriority→prioritybatch→ ignored for service-tier routing
Notes
- The
anthropic-versionheader is accepted but not required - Token usage numbers use NanoGPT’s token accounting (may differ slightly from Anthropic’s exact counts)
- All Anthropic SDK features are supported, including streaming, tools, and caching