POST
/
v1
/
chat
/
completions
cURL
curl --request POST \
  --url https://nano-gpt.com/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "chatgpt-4o-latest",
  "messages": [
    {
      "role": "user",
      "content": "Testing, please reply!"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 4000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "cache_control": {
    "enabled": false,
    "ttl": "5m"
  }
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}

Overview

The Chat Completion endpoint provides OpenAI-compatible chat completions with support for web search capabilities through our LinkUp integration. All models can access real-time web information by appending special suffixes to the model name:
  • :online - Standard web search ($0.006 per request)
    • Returns 10 search results
    • Perfect for straightforward questions
  • :online/linkup-deep - Deep web search ($0.06 per request)
    • Iteratively searches for comprehensive information
    • Ideal when initial results aren’t sufficient

Examples

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Standard web search
data = {
    "model": "chatgpt-4o-latest:online",
    "messages": [
        {"role": "user", "content": "What are the latest developments in AI?"}
    ]
}

response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers=headers,
    json=data
)

# Deep web search
data_deep = {
    "model": "chatgpt-4o-latest:online/linkup-deep",
    "messages": [
        {"role": "user", "content": "Provide a comprehensive analysis of recent AI breakthroughs"}
    ]
}

Context Memory

Enable unlimited-length conversations with lossless, hierarchical memory.
  • Append :memory to any model name
  • Or send header memory: true
  • Can be combined with web search: :online:memory
  • Retention: default 30 days; configure via :memory-<days> (1..365) or header memory_expiration_days: <days>; header takes precedence
import requests

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Suffix-based
payload = {
    "model": "chatgpt-4o-latest:memory",
    "messages": [{"role": "user", "content": "Keep our previous discussion in mind and continue."}]
}
requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)

Custom Context Size Override

When Context Memory is enabled, you can override the model-derived context size used for the memory compression step with model_context_limit.
  • Parameter: model_context_limit (number or numeric string)
  • Default: Derived from the selected model’s context size
  • Minimum: Values below 10,000 are clamped internally
  • Scope: Only affects memory compression; does not change the target model’s own window
Examples:
# Enable memory via header; use model default context size
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "stream": false
  }'

# Explicit numeric override
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": 20000,
    "stream": false
  }'

# String override (server coerces to number)
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": "30000",
    "stream": false
  }'

Reasoning Parameter

The API supports filtering out thinking content (e.g., Claude’s <think> tags) from responses using the optional reasoning parameter:
{
  "reasoning": {
    "exclude": true
  }
}
When reasoning.exclude is set to true, the API removes content between <think> and </think> tags before returning the response. This works for both streaming and non-streaming requests.

Example

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "reasoning": {"exclude": true}
  }'
Without reasoning.exclude: "<think>The user is asking for a simple addition. 2+2 equals 4.</think>\n\nThe answer is 4." With reasoning.exclude: "The answer is 4."

Model suffix: :reasoning-exclude

Enable reasoning exclusion without modifying the request body by appending :reasoning-exclude to the model string.
  • Behaves exactly like sending { "reasoning": { "exclude": true } }
  • Only :reasoning-exclude is stripped before provider routing; all other suffixes remain active
  • Works for both streaming and non-streaming responses
  • Available on both Chat Completions and Text Completions
{
  "model": "claude-3-5-sonnet-20241022:reasoning-exclude",
  "messages": [{ "role": "user", "content": "What is 2+2?" }]
}
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022:reasoning-exclude",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "stream": false
  }'

Combine with other suffixes

You can compose :reasoning-exclude with existing suffixes; only :reasoning-exclude is removed prior to provider routing:
  • :thinking (and variants like …-thinking:8192)
  • :online and :online/linkup-deep
  • :memory and :memory-<days>
Examples:
  • claude-3-7-sonnet-thinking:8192:reasoning-exclude
  • gpt-4o:online:reasoning-exclude
  • claude-3-5-sonnet-20241022:memory-30:online/linkup-deep:reasoning-exclude

Notes and limitations

  • GPU-TEE models (phala/*) require byte-for-byte SSE passthrough for signature verification. For those models, streaming cannot be filtered; the suffix has no effect on the streaming bytes.
  • When assistant content is an array (e.g., vision/text parts), only text parts are filtered; images and tool/metadata content are untouched.

YouTube Transcripts

Automatically fetch and prepend YouTube video transcripts when the latest user message contains YouTube links.

Defaults

  • Parameter: youtube_transcripts (boolean)
  • Default: true (backwards compatible)
  • Limit: Up to 3 YouTube URLs processed per request
  • Injection: Transcripts are added as a system message before your messages
  • Billing: $0.01 per transcript fetched

Disable automatic transcripts

Set youtube_transcripts to false to skip detection and fetching (no transcript cost applies).
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Summarize this: https://youtu.be/dQw4w9WgXcQ"}
    ],
    "youtube_transcripts": false
  }'

Notes

  • Web scraping is separate. To scrape non‑YouTube URLs, set scraping: true. YouTube transcripts do not require scraping: true.
  • When disabled, YouTube links are ignored for transcript fetching and are not billed.
  • If your balance is insufficient when enabled, the request may be blocked with a 402.

Performance Benchmarks

LinkUp achieves state-of-the-art performance on OpenAI’s SimpleQA benchmark:
ProviderScore
LinkUp Deep Search90.10%
Exa90.04%
Perplexity Sonar Pro86%
LinkUp Standard Search85%
Perplexity Sonar77%
Tavily73%

Important Notes

  • Web search increases input token count, which affects total cost
  • Models gain access to real-time information published less than a minute ago
  • Internet connectivity can provide up to 10x improvement in factuality
  • All models support web search - simply append the suffix to any model name

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Parameters for chat completion

model
string
default:chatgpt-4o-latest
required

The model to use for completion. Append ':online' for web search ($0.005/request) or ':online/linkup-deep' for deep web search ($0.05/request)

Examples:

"chatgpt-4o-latest"

"chatgpt-4o-latest:online"

"chatgpt-4o-latest:online/linkup-deep"

"claude-3-5-sonnet-20241022:online"

messages
object[]
required

Array of message objects with role and content

stream
boolean
default:false

Whether to stream the response

temperature
number
default:0.7

Controls randomness (0-2)

max_tokens
integer
default:4000

Maximum number of tokens to generate

top_p
number
default:1

Nucleus sampling parameter (0-1)

frequency_penalty
number
default:0

Penalty for frequency of tokens (-2 to 2)

presence_penalty
number
default:0

Penalty for presence of tokens (-2 to 2)

cache_control
object

Cache control settings for Claude models only

Response

Chat completion response

id
string

Unique identifier for the completion

object
string

Object type, always 'chat.completion'

created
integer

Unix timestamp of when the completion was created

choices
object[]

Array of completion choices

usage
object