Skip to main content

Overview

The /v1/messages endpoint provides full Anthropic API compatibility. Clients using the Anthropic SDK can use NanoGPT by simply changing the base URL — no code changes required. NanoGPT accepts requests in the Anthropic Messages format, routes them to the requested NanoGPT model, and returns responses back in the Anthropic Messages shape. For non‑Anthropic models, NanoGPT transparently translates the request to an OpenAI-style chat format internally and then converts the response back to Anthropic Messages format. This endpoint supports:
  • Text generation (streaming and non-streaming)
  • Multi-turn conversations
  • Tool use (function calling)
  • Vision (images) and document/PDF processing
  • Extended thinking (reasoning models)
  • Prompt caching

Endpoint

POST https://nano-gpt.com/api/v1/messages

Authentication

Use either header:
  • Authorization: Bearer YOUR_API_KEY
  • x-api-key: YOUR_API_KEY

Request Format

Required Fields

FieldTypeDescription
modelstringModel identifier (any NanoGPT model, including non‑Anthropic models)
max_tokensnumberMaximum tokens to generate (must be a finite number)
messagesarrayArray of conversation messages

Optional Fields

FieldTypeDefaultDescription
systemstring or arraySystem prompt (string or array of text blocks)
streambooleanfalseEnable streaming responses
temperaturenumberSampling temperature
top_pnumberNucleus sampling parameter
top_knumberTop-k sampling parameter
stop_sequencesstring[]Custom stop sequences
toolsarrayTool definitions for function calling
tool_choicestring or objectControl tool selection behavior
disable_parallel_tool_usebooleanDisable parallel tool calls
thinkingobjectEnable extended thinking for supported models
metadataobjectRequest metadata (user or user_id)

Message Format

Messages must have a role (user or assistant) and content:
{
  "role": "user",
  "content": "Hello!"
}
Or with structured content blocks:
{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/jpeg",
        "data": "<base64-encoded-image>"
      }
    }
  ]
}

Content Block Types

Text Block

{ "type": "text", "text": "Your message here" }

Image Block (for vision-capable models)

{
  "type": "image",
  "source": {
    "type": "base64",
    "media_type": "image/jpeg",
    "data": "<base64-data>"
  }
}
Or with URL:
{
  "type": "image",
  "source": {
    "type": "url",
    "url": "https://example.com/image.jpg"
  }
}
Supported media types: image/jpeg, image/png, image/gif, image/webp

Document Block (for PDF-capable models)

{
  "type": "document",
  "source": {
    "type": "base64",
    "media_type": "application/pdf",
    "data": "<base64-data>"
  }
}

Tool Use Block (in assistant messages)

{
  "type": "tool_use",
  "id": "tool_abc123",
  "name": "get_weather",
  "input": { "city": "Paris" }
}

Tool Result Block (in user messages)

{
  "type": "tool_result",
  "tool_use_id": "tool_abc123",
  "content": "The weather in Paris is sunny, 22 C"
}

Tool Definitions

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a city",
      "input_schema": {
        "type": "object",
        "properties": {
          "city": { "type": "string", "description": "City name" }
        },
        "required": ["city"]
      }
    }
  ]
}

Tool Choice Options

ValueDescription
"auto"Model may use tools if appropriate
"none"Disable tool use for this request
"any"Model must use at least one tool
"required"Model must use at least one tool
{"type": "tool", "name": "tool_name"}Force use of a specific tool

Extended Thinking

For models that support extended thinking (reasoning):
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 8192
  }
}
Requirements:
  • budget_tokens must be >= 1024
  • budget_tokens must be < max_tokens
  • Model must support thinking (e.g., models with -thinking suffix)
If the requested model does not support thinking, NanoGPT automatically ignores/strips the thinking parameter and routes the request to the base model.

Response Format

Non-Streaming Response

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-5-20251101",
  "content": [
    { "type": "text", "text": "Hello! How can I help you today?" }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "service_tier": "standard"
  }
}

Stop Reasons

Stop ReasonDescription
end_turnNatural end of response
max_tokensHit token limit
stop_sequenceHit a custom stop sequence
tool_useModel wants to use a tool
content_filterContent was filtered

Streaming Response (SSE)

See also: Streaming Protocol (SSE). When stream: true, the response is Server-Sent Events with named event types:

Event: message_start

event: message_start
data: {"type": "message_start", "message": {"id": "msg_abc", "type": "message", "role": "assistant", "model": "claude-opus-4-5-20251101", "content": [], "stop_reason": null, "usage": {"input_tokens": 10, "output_tokens": 0}}}

Event: content_block_start

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}

Event: content_block_delta

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello"}}

Event: content_block_stop

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

Event: message_delta

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn"}, "usage": {"output_tokens": 12}}

Event: message_stop

event: message_stop
data: {"type": "message_stop"}

Streaming Tool Use

When the model uses tools during streaming:
event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "tool_use", "id": "tool_abc", "name": "get_weather", "input": {}}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": "{\"city\":"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": " \"Paris\"}"}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

Supported Models

Claude Models (Full Support)

ModelStreamingToolsVisionThinking
claude-opus-4-6YesYesYesYes*
claude-opus-4-5-20251101YesYesYesYes*
claude-opus-4-1-20250805YesYesYesYes*
claude-sonnet-4-5-20250929YesYesYesYes*
*Use the model with -thinking suffix for extended reasoning.

Other Models (Via Compatibility Layer)

The v1/messages endpoint also works with non-Anthropic models:
ModelStreamingToolsVision
openai/gpt-5.2YesYesYes
google/gemini-3-flash-previewYesYesYes
google/gemini-3-pro-previewYesYesYes
zai-org/glm-4.7YesYes
See the Models documentation for the full list.

Prompt Caching

For the full guide (supported models, thresholds, pricing, and usage fields), see Prompt Caching. Enable prompt caching to reduce costs on repeated prompts.

Enable via Header

anthropic-beta: prompt-caching-2024-07-31

TTL Options

  • Default: 5-minute cache TTL
  • Extended: Add extended-cache-ttl-2025-04-11 to request 1-hour TTL

Cache Control in Content

Add cache_control to content blocks:
{
  "type": "text",
  "text": "This is a long system prompt...",
  "cache_control": { "type": "ephemeral" }
}

Cache Usage in Response

{
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50,
    "cache_creation_input_tokens": 80,
    "cache_read_input_tokens": 0
  }
}

Error Handling

For a general guide across NanoGPT APIs, see Error Handling.

Error Response Format

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens is required",
    "param": "max_tokens"
  }
}

Error Types

HTTP StatusError TypeDescription
400invalid_request_errorInvalid request (missing fields, bad format)
401authentication_errorInvalid or missing API key
403permission_errorInsufficient permissions
404not_found_errorUnknown model
429rate_limit_errorRate limit exceeded
500+api_errorServer error
All error responses include an X-Request-ID header for support requests.

Headers

Request Headers

HeaderRequiredDescription
AuthorizationYes*Bearer token authentication
x-api-keyYes*Alternative API key header
Content-TypeYesMust be application/json
anthropic-betaNoEnable beta features (e.g., prompt caching)
anthropic-versionNoAPI version (accepted but not required)
*One of Authorization or x-api-key is required.

BYOK Headers

For Bring Your Own Key:
HeaderDescription
x-use-byokSet to true to use your own API key
x-byok-providerProvider name for your key

Examples

Basic Request (cURL)

curl -X POST https://nano-gpt.com/api/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 256,
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Streaming Request (cURL)

curl -N -X POST https://nano-gpt.com/api/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "stream": true,
    "max_tokens": 256,
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

With Tools (cURL)

curl -X POST https://nano-gpt.com/api/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 1024,
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather",
        "input_schema": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    ],
    "messages": [
      { "role": "user", "content": "What is the weather in Tokyo?" }
    ]
  }'

Anthropic SDK (Node.js)

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.NANOGPT_API_KEY,
  baseURL: "https://nano-gpt.com/api"
});

const message = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 256,
  messages: [
    { role: "user", content: "Hello!" }
  ]
});

console.log(message.content[0].text);

Anthropic SDK with Streaming (Node.js)

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.NANOGPT_API_KEY,
  baseURL: "https://nano-gpt.com/api"
});

const stream = await anthropic.messages.stream({
  model: "claude-opus-4-5-20251101",
  max_tokens: 256,
  messages: [
    { role: "user", content: "Tell me a story" }
  ]
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

Anthropic SDK with Prompt Caching (Node.js)

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.NANOGPT_API_KEY,
  baseURL: "https://nano-gpt.com/api"
});

const message = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 256,
  system: [
    {
      type: "text",
      text: "You are a helpful assistant with expertise in...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Hello!" }
  ]
}, {
  headers: {
    "anthropic-beta": "prompt-caching-2024-07-31"
  }
});

Vision Example (Node.js)

import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const anthropic = new Anthropic({
  apiKey: process.env.NANOGPT_API_KEY,
  baseURL: "https://nano-gpt.com/api"
});

const imageData = fs.readFileSync("image.jpg").toString("base64");

const message = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image",
          source: {
            type: "base64",
            media_type: "image/jpeg",
            data: imageData
          }
        }
      ]
    }
  ]
});

Python SDK

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api"
)

message = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=256,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(message.content[0].text)

Limits

LimitValue
Request timeout800 seconds
Tool argument size~100 KB per tool call
Image typesJPEG, PNG, GIF, WebP

Limitations

  • GPU-TEE models do not support streaming through POST /api/v1/messages. Use POST /api/v1/chat/completions if you need streaming with GPU-TEE models.

Migration from Anthropic

To migrate from Anthropic’s API to NanoGPT:
  1. Change the base URL:
    • From: https://api.anthropic.com
    • To: https://nano-gpt.com/api
    The full endpoint will be: https://nano-gpt.com/api/v1/messages
  2. Use your NanoGPT API key instead of your Anthropic key
  3. No other code changes required — the API is fully compatible

Service tier compatibility

Anthropic-style service tiers are normalized when routing to providers that support service tiers:
  • standarddefault
  • prioritypriority
  • batch → ignored for service-tier routing

Notes

  • The anthropic-version header is accepted but not required
  • Token usage numbers use NanoGPT’s token accounting (may differ slightly from Anthropic’s exact counts)
  • All Anthropic SDK features are supported, including streaming, tools, and caching