Skip to main content
POST
/
v1
/
chat
/
completions
cURL
curl --request POST \
  --url https://nano-gpt.com/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "chatgpt-4o-latest",
  "messages": [
    {
      "role": "user",
      "content": "Testing, please reply!"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 4000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "cache_control": {
    "enabled": false,
    "ttl": "5m"
  }
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}

Overview

The Chat Completion endpoint provides OpenAI-compatible chat completions. All models can access real-time web information by appending special suffixes to the model name:
  • :online - Standard web search ($0.006 per request)
    • Returns 10 search results
    • Perfect for straightforward questions
  • :online/linkup-deep - Deep web search ($0.06 per request)
    • Iteratively searches for comprehensive information
    • Ideal when initial results aren’t sufficient

Examples

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Standard web search
data = {
    "model": "chatgpt-4o-latest:online",
    "messages": [
        {"role": "user", "content": "What are the latest developments in AI?"}
    ]
}

response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers=headers,
    json=data
)

# Deep web search
data_deep = {
    "model": "chatgpt-4o-latest:online/linkup-deep",
    "messages": [
        {"role": "user", "content": "Provide a comprehensive analysis of recent AI breakthroughs"}
    ]
}

Image Input

Send images using the OpenAI‑compatible chat format. Provide image parts alongside text in the messages array.

Supported Forms

  • Remote URL: {"type":"image_url","image_url":{"url":"https://..."}}
  • Base64 data URL: {"type":"image_url","image_url":{"url":"data:image/png;base64,...."}}
Notes:
  • Prefer HTTPS URLs; some upstreams reject non‑HTTPS. If in doubt, use base64 data URLs.
  • Accepted mime types: image/png, image/jpeg, image/jpg, image/webp.
  • Inline markdown images in plain text (e.g., ![alt](data:image/...;base64,...)) are auto‑normalized into structured parts server‑side.

Message Shape

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What is in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg" } }
  ]
}

cURL — Image URL (non‑streaming)

curl -sS \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -X POST https://nano-gpt.com/api/v1/chat/completions \
  --data '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Describe this image in three words."},
          {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/3/3f/Fronalpstock_big.jpg"}}
        ]
      }
    ],
    "stream": false
  }'

cURL — Base64 Data URL (non‑streaming)

Embed your image as a data URL. Replace ...BASE64... with your image bytes.
curl -sS \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type": "application/json" \
  -X POST https://nano-gpt.com/api/v1/chat/completions \
  --data '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is shown here?"},
          {"type": "image_url", "image_url": {"url": "data:image/png;base64,...BASE64..."}}
        ]
      }
    ],
    "stream": false
  }'

cURL — Streaming SSE

curl -N \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -X POST https://nano-gpt.com/api/v1/chat/completions \
  --data '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Two words only."},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ],
    "stream": true
  }'
The response streams data: { ... } lines until a final terminator and may include a usage object at the end.

Troubleshooting

  • 400 unsupported image: ensure the image is a valid PNG/JPEG/WebP, not a tiny 1×1 pixel, and either HTTPS URL or a base64 data URL.
  • 503 after fallbacks: try a different model, verify API key/session, and prefer base64 data URL for local or protected assets.

Context Memory

Enable unlimited-length conversations with lossless, hierarchical memory.
  • Append :memory to any model name
  • Or send header memory: true
  • Can be combined with web search: :online:memory
  • Retention: default 30 days; configure via :memory-<days> (1..365) or header memory_expiration_days: <days>; header takes precedence
import requests

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Suffix-based
payload = {
    "model": "chatgpt-4o-latest:memory",
    "messages": [{"role": "user", "content": "Keep our previous discussion in mind and continue."}]
}
requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)

Custom Context Size Override

When Context Memory is enabled, you can override the model-derived context size used for the memory compression step with model_context_limit.
  • Parameter: model_context_limit (number or numeric string)
  • Default: Derived from the selected model’s context size
  • Minimum: Values below 10,000 are clamped internally
  • Scope: Only affects memory compression; does not change the target model’s own window
Examples:
# Enable memory via header; use model default context size
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "stream": false
  }'

# Explicit numeric override
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": 20000,
    "stream": false
  }'

# String override (server coerces to number)
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": "30000",
    "stream": false
  }'

Reasoning Streams

The Chat Completions endpoint separates the model’s visible answer from its internal reasoning. By default, reasoning is included and delivered alongside normal content so that clients can decide whether to display it. Requests that use the thinking model suffix (for example :thinking or -thinking:8192) are normalized before dispatch, but the response contract remains the same.

Endpoint variants

Choose the base path that matches how your client consumes reasoning streams:
  • https://nano-gpt.com/api/v1/chat/completions — default endpoint that streams internal thoughts through choices[0].delta.reasoning (and repeats them in message.reasoning on completion). Recommended for apps like SillyTavern that understand the modern response shape.
  • https://nano-gpt.com/api/v1legacy/chat/completions — legacy contract that swaps the field name to choices[0].delta.reasoning_content / message.reasoning_content for older OpenAI-compatible clients. Use this for LiteLLM’s OpenAI adapter to avoid downstream parsing errors.
  • https://nano-gpt.com/api/v1thinking/chat/completions — reasoning-aware models write everything into the normal choices[0].delta.content stream so clients that ignore reasoning fields still see the full conversation transcript. This is the preferred base URL for JanitorAI.

Streaming payload format

Server-Sent Event (SSE) streams emit the answer in choices[0].delta.content and the thought process in choices[0].delta.reasoning (plus optional delta.reasoning_details). Reasoning deltas are dispatched before or alongside regular content, letting you render both panes in real-time.
data: {
  "choices": [{
    "delta": {
      "reasoning": "Assessing possible tool options…"
    }
  }]
}
data: {
  "choices": [{
    "delta": {
      "content": "Let me walk you through the solution."
    }
  }]
}
When streaming completes, the formatter aggregates the collected values and repeats them in the final payload: choices[0].message.content contains the assistant reply and choices[0].message.reasoning (plus reasoning_details when available) contains the full chain-of-thought. Non-streaming requests reuse the same formatter, so the reasoning block is present as a dedicated field.

Showing or hiding reasoning

Send reasoning: { "exclude": true } to strip the reasoning payload from both streaming deltas and the final message. With this flag set, delta.reasoning and message.reasoning are omitted entirely.
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "reasoning": {"exclude": true}
  }'
Without reasoning.exclude:
{
  "choices": [{
    "message": {
      "content": "The answer is 4.",
      "reasoning": "The user is asking for a simple addition. 2+2 equals 4."
    }
  }]
}
With reasoning.exclude:
{
  "choices": [{
    "message": {
      "content": "The answer is 4."
    }
  }]
}

Model suffix: :reasoning-exclude

You can toggle the filter without altering your JSON body by appending :reasoning-exclude to the model name.
  • Equivalent to sending { "reasoning": { "exclude": true } }
  • Only the :reasoning-exclude suffix is stripped before the request is routed; other suffixes remain active
  • Works for streaming and non-streaming responses on both Chat Completions and Text Completions
{
  "model": "claude-3-5-sonnet-20241022:reasoning-exclude",
  "messages": [{ "role": "user", "content": "What is 2+2?" }]
}

Combine with other suffixes

:reasoning-exclude composes safely with the other routing suffixes you already use:
  • :thinking (and variants like …-thinking:8192)
  • :online and :online/linkup-deep
  • :memory and :memory-<days>
Examples:
  • claude-3-7-sonnet-thinking:8192:reasoning-exclude
  • gpt-4o:online:reasoning-exclude
  • claude-3-5-sonnet-20241022:memory-30:online/linkup-deep:reasoning-exclude

Legacy delta field compatibility

Older clients that expect the legacy reasoning_content field can opt in per request. Set reasoning.delta_field to "reasoning_content", or use the top-level shorthands reasoning_delta_field / reasoning_content_compat if updating nested objects is difficult. When the toggle is active, every streaming and non-streaming response exposes reasoning_content instead of reasoning, and the modern key is omitted. The compatibility pass is skipped if reasoning.exclude is true, because no reasoning payload is emitted. If you cannot change the request payload, target https://nano-gpt.com/api/v1legacy/chat/completions instead—the legacy endpoint keeps reasoning_content without extra flags. LiteLLM’s OpenAI adapter should point here to maintain compatibility. For clients that ignore reasoning-specific fields entirely, use https://nano-gpt.com/api/v1thinking/chat/completions so the full text appears in the standard content stream; this is the correct choice for JanitorAI.
{
  "model": "openai/gpt-4o-mini",
  "messages": [...],
  "reasoning": {
    "delta_field": "reasoning_content"
  }
}

Notes and limitations

  • GPU-TEE models (phala/*) require byte-for-byte SSE passthrough for signature verification. For those models, streaming cannot be filtered; the suffix has no effect on the streaming bytes.
  • When assistant content is an array (e.g., vision/text parts), only text parts are filtered; images and tool/metadata content are untouched.

YouTube Transcripts

Automatically fetch and prepend YouTube video transcripts when the latest user message contains YouTube links.

Defaults

  • Parameter: youtube_transcripts (boolean)
  • Default: true (backwards compatible)
  • Limit: Up to 3 YouTube URLs processed per request
  • Injection: Transcripts are added as a system message before your messages
  • Billing: $0.01 per transcript fetched

Disable automatic transcripts

Set youtube_transcripts to false to skip detection and fetching (no transcript cost applies).
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Summarize this: https://youtu.be/dQw4w9WgXcQ"}
    ],
    "youtube_transcripts": false
  }'

Notes

  • Web scraping is separate. To scrape non‑YouTube URLs, set scraping: true. YouTube transcripts do not require scraping: true.
  • When disabled, YouTube links are ignored for transcript fetching and are not billed.
  • If your balance is insufficient when enabled, the request may be blocked with a 402.

Performance Benchmarks

LinkUp achieves state-of-the-art performance on OpenAI’s SimpleQA benchmark:
ProviderScore
LinkUp Deep Search90.10%
Exa90.04%
Perplexity Sonar Pro86%
LinkUp Standard Search85%
Perplexity Sonar77%
Tavily73%

Important Notes

  • Web search increases input token count, which affects total cost
  • Models gain access to real-time information published less than a minute ago
  • Internet connectivity can provide up to 10x improvement in factuality
  • All models support web search - simply append the suffix to any model name

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Parameters for chat completion

model
string
default:chatgpt-4o-latest
required

The model to use for completion. Append ':online' for web search ($0.005/request) or ':online/linkup-deep' for deep web search ($0.05/request)

Examples:

"chatgpt-4o-latest"

"chatgpt-4o-latest:online"

"chatgpt-4o-latest:online/linkup-deep"

"claude-3-5-sonnet-20241022:online"

messages
object[]
required

Array of message objects with role and content

stream
boolean
default:false

Whether to stream the response

temperature
number
default:0.7

Controls randomness (0-2)

max_tokens
integer
default:4000

Maximum number of tokens to generate

top_p
number
default:1

Nucleus sampling parameter (0-1)

frequency_penalty
number
default:0

Penalty for frequency of tokens (-2 to 2)

presence_penalty
number
default:0

Penalty for presence of tokens (-2 to 2)

cache_control
object

Cache control settings for Claude models only

Response

Chat completion response

id
string

Unique identifier for the completion

object
string

Object type, always 'chat.completion'

created
integer

Unix timestamp of when the completion was created

choices
object[]

Array of completion choices

usage
object
I