Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt

Use this file to discover all available pages before exploring further.

Model Suffixes

Append supported suffixes to the model value to request optional behavior for a single request. Most suffixes are stripped before the base model is routed. Model identity suffixes, such as supported :thinking variants, are preserved because they identify a distinct model or alias. Suffix parsing is case-insensitive for provider routing and provider suffixes. Avoid combining multiple suffixes that make conflicting provider-selection requests.

Provider Routing Preference Suffixes

These apply only to provider-selection-capable models.
SuffixMeaning
:speedPick the provider with the best estimated completion time, using TTFT plus TPS.
:fastAlias for :speed.
:throughputPick the provider with the highest tokens per second.
:latencyPick the provider with the lowest time to first token.
:pricePick the cheapest provider by base input-plus-output token price.
:cheapAlias for :price.
:floorAlias for :price.
:toolsRoute to a tools-capable provider path for models that support tools provider selection.
{ "model": "zai-org/glm-5:fast", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "zai-org/glm-5:cheap", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "moonshotai/kimi-k2.6:tools", "messages": [{ "role": "user", "content": "Hello" }], "tools": [{ "type": "function", "function": { "name": "lookup", "parameters": { "type": "object", "properties": {} } } }] }
Rules:
  • Routing preference suffixes only consider user-selectable providers. Internal routing-only providers are excluded.
  • Routing preference suffixes are stripped before model mapping/routing. Non-routing identity suffixes such as :thinking are preserved.
  • Do not combine :speed, :fast, :throughput, :latency, :price, :cheap, or :floor with X-Provider, body provider, or a provider model suffix.
  • Do not combine :tools with a routing preference suffix, X-Provider, body provider, a provider model suffix, or caching=true.
  • Routing preference requests are billed like explicit provider-selection requests.
  • Conflict error codes use the existing speed_suffix_* family for API compatibility, even when the suffix is not literally :speed.

Provider Suffixes

The API accepts a trailing provider suffix for public user-selectable provider IDs:
{ "model": "zai-org/glm-5:cerebras", "messages": [{ "role": "user", "content": "Hello" }] }
Recommended: use X-Provider for explicit provider overrides. The API also accepts a trailing provider suffix for user-selectable providers, such as model-id:cerebras, but X-Provider is clearer and easier to validate against provider-discovery responses. Provider suffixes must match a public user-selectable provider ID. They are case-insensitive, cannot be combined with routing preference suffixes or :tools, and are billed like explicit provider selection.

Web Search Suffixes

SuffixMeaning
:onlineDefault web search, standard depth. For GPT-5+ / o-series models this may use OpenAI native web search unless a provider is explicit.
:online/linkupLinkup standard.
:online/linkup-deepLinkup deep.
:online/tavilyTavily standard.
:online/tavily-deepTavily deep.
:online/braveBrave standard.
:online/brave-deepBrave deep.
:online/exa-fastExa fast.
:online/exa-autoExa auto.
:online/exa-neuralExa neural.
:online/exa-deepExa deep.
:online/exa-deep-reasoningExa deep-reasoning.
:online/exa-instantExa instant.
:online/kagiKagi standard search.
:online/kagi-webKagi standard web.
:online/kagi-newsKagi standard news.
:online/kagi-searchKagi deep search.
:online/perplexityPerplexity standard.
:online/perplexity-deepPerplexity deep.
:online/valyuValyu standard, all sources.
:online/valyu-deepValyu deep, all sources.
:online/valyu-webValyu standard, web only.
:online/valyu-web-deepValyu deep, web only.
Web search suffixes can compose with memory and reasoning-exclude suffixes. If request body webSearch.enabled or legacy linkup.enabled is true, body configuration takes precedence over model suffix configuration.

Memory Suffixes

SuffixMeaning
:memoryEnable Context Memory with default 30-day retention.
:memory-<days>Enable Context Memory with retention clamped to 1..365 days.
Header memory_expiration_days takes precedence over :memory-<days>. memory: false in the request body explicitly disables memory even if the model has a memory suffix. Memory can compose with web search, for example openai/gpt-5.2:online:memory-90.

Reasoning and Thinking Suffixes

SuffixMeaning
:thinkingModel-specific thinking/reasoning variant when that exact ID or documented alias exists.
-thinkingLegacy alias pattern for some model families only, not universal.
:<number> on Anthropic thinking IDsThinking budget suffix for mapped Anthropic thinking aliases, for example claude-sonnet-4-thinking:8192.
:reasoning-excludeEquivalent to reasoning: { "exclude": true }; hides reasoning fields/blocks and strips the suffix before routing.
:reasoning-exclude works on Chat Completions and Text Completions. It composes with other suffixes such as :thinking, :online, and :memory. :thinking is model identity, not provider routing, and is not universal.

Official and Original Route Suffixes

Some model families expose :official / :original aliases to force the official-provider route. These are model-specific; check the model page or provider-selection docs for supported IDs.

Conflict Rules

Do not combine conflicting routing directives:
{ "model": "zai-org/glm-5:fast:cerebras" }
{ "model": "zai-org/glm-5:cheap", "provider": "cerebras" }
{ "model": "zai-org/glm-5:tools:fast" }

Examples

{ "model": "zai-org/glm-5:fast", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "zai-org/glm-5:cheap", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "zai-org/glm-5:thinking:fast", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "openai/gpt-5.2:online/exa-instant:memory-30", "messages": [{ "role": "user", "content": "Hello" }] }
{ "model": "anthropic/claude-opus-4.6:memory-30:online/linkup-deep:reasoning-exclude", "messages": [{ "role": "user", "content": "Hello" }] }