POST
/
v1
/
chat
/
completions
cURL
curl --request POST \
  --url https://nano-gpt.com/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "chatgpt-4o-latest",
  "messages": [
    {
      "role": "user",
      "content": "Testing, please reply!"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 4000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "cache_control": {
    "enabled": false,
    "ttl": "5m"
  }
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}

Overview

The Chat Completion endpoint provides OpenAI-compatible chat completions with support for web search capabilities through our LinkUp integration. All models can access real-time web information by appending special suffixes to the model name:
  • :online - Standard web search ($0.006 per request)
    • Returns 10 search results
    • Perfect for straightforward questions
  • :online/linkup-deep - Deep web search ($0.06 per request)
    • Iteratively searches for comprehensive information
    • Ideal when initial results aren’t sufficient

Examples

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Standard web search
data = {
    "model": "chatgpt-4o-latest:online",
    "messages": [
        {"role": "user", "content": "What are the latest developments in AI?"}
    ]
}

response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers=headers,
    json=data
)

# Deep web search
data_deep = {
    "model": "chatgpt-4o-latest:online/linkup-deep",
    "messages": [
        {"role": "user", "content": "Provide a comprehensive analysis of recent AI breakthroughs"}
    ]
}

Context Memory

Enable unlimited-length conversations with lossless, hierarchical memory.
  • Append :memory to any model name
  • Or send header memory: true
  • Can be combined with web search: :online:memory
  • Retention: default 30 days; configure via :memory-<days> (1..365) or header memory_expiration_days: <days>; header takes precedence
import requests

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Suffix-based
payload = {
    "model": "chatgpt-4o-latest:memory",
    "messages": [{"role": "user", "content": "Keep our previous discussion in mind and continue."}]
}
requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)

Custom Context Size Override

When Context Memory is enabled, you can override the model-derived context size used for the memory compression step with model_context_limit.
  • Parameter: model_context_limit (number or numeric string)
  • Default: Derived from the selected model’s context size
  • Minimum: Values below 10,000 are clamped internally
  • Scope: Only affects memory compression; does not change the target model’s own window
Examples:
# Enable memory via header; use model default context size
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "stream": false
  }'

# Explicit numeric override
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": 20000,
    "stream": false
  }'

# String override (server coerces to number)
curl -s -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  https://nano-gpt.com/api/v1/chat/completions \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Briefly say hello."}],
    "model_context_limit": "30000",
    "stream": false
  }'

Reasoning Parameter

The API supports filtering out thinking content (e.g., Claude’s <think> tags) from responses using the optional reasoning parameter:
{
  "reasoning": {
    "exclude": true
  }
}
When reasoning.exclude is set to true, the API removes content between <think> and </think> tags before returning the response. This works for both streaming and non-streaming requests.

Example

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "reasoning": {"exclude": true}
  }'
Without reasoning.exclude: "<think>The user is asking for a simple addition. 2+2 equals 4.</think>\n\nThe answer is 4." With reasoning.exclude: "The answer is 4."

Model suffix: :reasoning-exclude

Enable reasoning exclusion without modifying the request body by appending :reasoning-exclude to the model string.
  • Behaves exactly like sending { "reasoning": { "exclude": true } }
  • Only :reasoning-exclude is stripped before provider routing; all other suffixes remain active
  • Works for both streaming and non-streaming responses
  • Available on both Chat Completions and Text Completions
{
  "model": "claude-3-5-sonnet-20241022:reasoning-exclude",
  "messages": [{ "role": "user", "content": "What is 2+2?" }]
}
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022:reasoning-exclude",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "stream": false
  }'

Combine with other suffixes

You can compose :reasoning-exclude with existing suffixes; only :reasoning-exclude is removed prior to provider routing:
  • :thinking (and variants like …-thinking:8192)
  • :online and :online/linkup-deep
  • :memory and :memory-<days>
Examples:
  • claude-3-7-sonnet-thinking:8192:reasoning-exclude
  • gpt-4o:online:reasoning-exclude
  • claude-3-5-sonnet-20241022:memory-30:online/linkup-deep:reasoning-exclude

Notes and limitations

  • GPU-TEE models (phala/*) require byte-for-byte SSE passthrough for signature verification. For those models, streaming cannot be filtered; the suffix has no effect on the streaming bytes.
  • When assistant content is an array (e.g., vision/text parts), only text parts are filtered; images and tool/metadata content are untouched.

Performance Benchmarks

LinkUp achieves state-of-the-art performance on OpenAI’s SimpleQA benchmark:
ProviderScore
LinkUp Deep Search90.10%
Exa90.04%
Perplexity Sonar Pro86%
LinkUp Standard Search85%
Perplexity Sonar77%
Tavily73%

Important Notes

  • Web search increases input token count, which affects total cost
  • Models gain access to real-time information published less than a minute ago
  • Internet connectivity can provide up to 10x improvement in factuality
  • All models support web search - simply append the suffix to any model name

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Parameters for chat completion

The body is of type object.

Response

200
application/json

Chat completion response

The response is of type object.