Overview
The/v1/messages endpoint provides full Anthropic API compatibility. Clients using the Anthropic SDK can use NanoGPT by simply changing the base URL — no code changes required.
This endpoint supports:
- Text generation (streaming and non-streaming)
- Multi-turn conversations
- Tool use (function calling)
- Vision (images) and document/PDF processing
- Extended thinking (reasoning models)
- Prompt caching
Endpoint
Authentication
Use either header:Authorization: Bearer YOUR_API_KEYx-api-key: YOUR_API_KEY
Request Format
Required Fields
| Field | Type | Description |
|---|---|---|
model | string | Model identifier (e.g., claude-3-5-sonnet-20241022) |
max_tokens | number | Maximum tokens to generate (must be a finite number) |
messages | array | Array of conversation messages |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
system | string or array | — | System prompt (string or array of text blocks) |
stream | boolean | false | Enable streaming responses |
temperature | number | — | Sampling temperature |
top_p | number | — | Nucleus sampling parameter |
top_k | number | — | Top-k sampling parameter |
stop_sequences | string[] | — | Custom stop sequences |
tools | array | — | Tool definitions for function calling |
tool_choice | string or object | — | Control tool selection behavior |
disable_parallel_tool_use | boolean | — | Disable parallel tool calls |
thinking | object | — | Enable extended thinking for supported models |
metadata | object | — | Request metadata (user or user_id) |
Message Format
Messages must have arole (user or assistant) and content:
Content Block Types
Text Block
Image Block (for vision-capable models)
image/jpeg, image/png, image/gif, image/webp
Document Block (for PDF-capable models)
Tool Use Block (in assistant messages)
Tool Result Block (in user messages)
Tool Definitions
Tool Choice Options
| Value | Description |
|---|---|
"auto" | Model may use tools if appropriate |
"none" | Disable tool use for this request |
"any" | Model must use at least one tool |
"required" | Model must use at least one tool |
{"type": "tool", "name": "tool_name"} | Force use of a specific tool |
Extended Thinking
For models that support extended thinking (reasoning):budget_tokensmust be >= 1024budget_tokensmust be <max_tokens- Model must support thinking (e.g., models with
-thinkingsuffix)
Response Format
Non-Streaming Response
Stop Reasons
| Stop Reason | Description |
|---|---|
end_turn | Natural end of response |
max_tokens | Hit token limit |
stop_sequence | Hit a custom stop sequence |
tool_use | Model wants to use a tool |
content_filter | Content was filtered |
Streaming Response (SSE)
Whenstream: true, the response is Server-Sent Events with named event types:
Event: message_start
Event: content_block_start
Event: content_block_delta
Event: content_block_stop
Event: message_delta
Event: message_stop
Streaming Tool Use
When the model uses tools during streaming:Supported Models
Claude Models (Full Support)
| Model | Streaming | Tools | Vision | Thinking |
|---|---|---|---|---|
| claude-sonnet-4-5-20250929 | Yes | Yes | Yes | Yes* |
| claude-3-5-sonnet-20241022 | Yes | Yes | Yes | — |
| claude-3-5-haiku-20241022 | Yes | Yes | Yes | — |
-thinking suffix for extended reasoning.
Other Models (Via Compatibility Layer)
The v1/messages endpoint also works with non-Anthropic models:| Model | Streaming | Tools | Vision |
|---|---|---|---|
| gpt-4o | Yes | Yes | Yes |
| gpt-4o-mini | Yes | Yes | Yes |
| gemini-2.0-flash | Yes | Yes | Yes |
| llama-3.3-70b | Yes | Yes | — |
| deepseek-chat | Yes | Yes | — |
Prompt Caching
Enable prompt caching to reduce costs on repeated prompts.Enable via Header
TTL Options
- Default: 5-minute cache TTL
- Extended: Add
extended-cache-ttlto request 1-hour TTL
Cache Control in Content
Addcache_control to content blocks:
Cache Usage in Response
Error Handling
Error Response Format
Error Types
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Invalid request (missing fields, bad format) |
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | Insufficient permissions |
| 404 | not_found_error | Unknown model |
| 429 | rate_limit_error | Rate limit exceeded |
| 500+ | api_error | Server error |
X-Request-ID header for support requests.
Headers
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes* | Bearer token authentication |
x-api-key | Yes* | Alternative API key header |
Content-Type | Yes | Must be application/json |
anthropic-beta | No | Enable beta features (e.g., prompt caching) |
anthropic-version | No | API version (accepted but not required) |
Authorization or x-api-key is required.
BYOK Headers
For Bring Your Own Key:| Header | Description |
|---|---|
x-use-byok | Set to true to use your own API key |
x-byok-provider | Provider name for your key |
Examples
Basic Request (cURL)
Streaming Request (cURL)
With Tools (cURL)
Anthropic SDK (Node.js)
Anthropic SDK with Streaming (Node.js)
Anthropic SDK with Prompt Caching (Node.js)
Vision Example (Node.js)
Python SDK
Limits
| Limit | Value |
|---|---|
| Request timeout | 800 seconds |
| Tool argument size | ~100 KB per tool call |
| Image types | JPEG, PNG, GIF, WebP |
Migration from Anthropic
To migrate from Anthropic’s API to NanoGPT:-
Change the base URL:
- From:
https://api.anthropic.com - To:
https://nano-gpt.com/api
https://nano-gpt.com/api/v1/messages - From:
- Use your NanoGPT API key instead of your Anthropic key
- No other code changes required — the API is fully compatible
Notes
- The
anthropic-versionheader is accepted but not required - Token usage numbers use NanoGPT’s token accounting (may differ slightly from Anthropic’s exact counts)
- All Anthropic SDK features are supported, including streaming, tools, and caching