Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Some models generate a separate reasoning stream (sometimes called thinking) in addition to the final answer content. NanoGPT exposes this in an OpenAI-compatible way for Chat Completions:- Streaming (SSE):
choices[0].delta.reasoning(or legacychoices[0].delta.reasoning_content) - Non-streaming:
choices[0].message.reasoning(or legacychoices[0].message.reasoning_content)
:thinking is model-specific and only works when that exact ID (or a documented alias) exists.
-thinking is a legacy alias pattern for some model families only, not universal.
Do not assume -thinking works for arbitrary model IDs. Always check GET /api/v1/models for exact valid IDs.
Endpoint Variants (Chat Completions)
NanoGPT provides three base paths for chat completions. All accept the same request format and model names, but differ in how reasoning content is delivered:| Base URL | Behavior | Use When |
|---|---|---|
/api/v1/chat/completions | Reasoning and answer are separate fields (reasoning + content). | Most OpenAI-compatible clients |
/api/v1legacy/chat/completions | Same as /api/v1/, but uses the legacy field name reasoning_content. | Clients that only parse reasoning_content |
/api/v1thinking/chat/completions | Reasoning and answer are merged into the normal content stream. | Clients that ignore reasoning fields but should still display thoughts |
Controlling Reasoning Output
Hide Reasoning
To strip reasoning from the response (both streaming and non-streaming), send:reasoning.exclude controls output visibility. It is not the same as disabling reasoning compute.
Or append the model suffix:
:reasoning-exclude
Reasoning Effort
reasoning_effort (or reasoning.effort) controls reasoning depth and also acts as an explicit reasoning-mode signal.
Any value other than none is treated as a request to enable reasoning/thinking behavior.
Use none to explicitly disable reasoning behavior.
reasoning_effort is authoritative for Chat Completions request shaping.
Valid values are none, minimal, low, medium, high, xhigh.
Legacy Field Name Compatibility (reasoning_content)
If a client expects reasoning_content instead of reasoning, you can:
- Use
/api/v1legacy/chat/completions, or - Set
reasoning.delta_field: "reasoning_content", or - Use the shorthands
reasoning_delta_field/reasoning_content_compat.
How Reasoning Appears in Responses
Streaming (SSE)
Reasoning deltas appear before or alongside content deltas:Non-Streaming
The final message may include a separate reasoning field:Cost Notes
Reasoning tokens are billed as output tokens. If you enable higher reasoning effort (or use thinking variants), expect highercompletion_tokens and higher cost.
See Also
- Chat Completions: reasoning controls and endpoint variants (Chat Completion)
- Streaming protocol details across endpoints (Streaming Protocol)