Overview
NanoGPT supports streaming responses via Server-Sent Events (SSE). Streaming is available on these endpoints:| Endpoint | Style | Content-Type |
|---|---|---|
POST /v1/chat/completions | OpenAI Chat Completions compatible | text/event-stream |
POST /v1/messages | Anthropic Messages compatible | text/event-stream (named SSE events) |
POST /v1/responses | OpenAI Responses API compatible | text/event-stream |
data: frames separated by a blank line. Some endpoints also include an event: line to name the event type.
Enabling Streaming
Set"stream": true in the JSON request body.
Chat Completions:
stream is omitted or false, the endpoint returns a single JSON response.
Chat Completions Streaming (/v1/chat/completions)
Chat Completions streams OpenAI-style chat.completion.chunk objects.
Frame Format
Each SSE frame is a JSON object in adata: line:
delta.role: "assistant". Subsequent chunks typically include only incremental deltas like delta.content.
Finish Reasons
The final chunk has a non-nullfinish_reason:
stoplengthtool_callscontent_filter
End of Stream
After the final chunk, the stream terminates with:[DONE] as a literal string (do not JSON-parse it).
Reasoning / Thinking Deltas
Some models stream reasoning alongside content. Depending on the endpoint variant you use, the delta field may bereasoning or reasoning_content.
Example:
Tool Call Deltas
Tool calls stream viadelta.tool_calls[]. Accumulate function.arguments across frames for the same tool_calls[index].
Usage In Streaming
Usage is not included by default in streaming. To receive usage, set:usage field. (Some features, like prompt caching helpers, can cause usage to be included automatically.)
Messages Streaming (/v1/messages)
The Messages endpoint streams Anthropic-style named SSE events. Each event includes an event: line and a data: line.
Typical sequence:
message_startcontent_block_startcontent_block_delta(repeated)content_block_stopmessage_deltamessage_stop
Tool Use
Tool calls appear astool_use content blocks. Tool input streams as input_json_delta fragments inside content_block_delta events.
Thinking Blocks
Thinking can appear as a separate content block type (thinking) before normal text blocks.
Usage
Usage information is included near the end of the stream (for example onmessage_delta), and includes input_tokens and output_tokens. When prompt caching is active, cache token fields may also appear.
Responses Streaming (/v1/responses)
Responses streams a sequence of typed objects. NanoGPT emits these as SSE data: frames containing JSON with a type field (for example response.created, response.output_text.delta, etc.).
Example:
response.completedresponse.incompleteresponse.failed
Tool Calls
Tool calls appear asfunction_call output items. Arguments stream via response.function_call_arguments.delta frames and finish with response.function_call_arguments.done.
Usage
Usage appears on the terminal event inside the fullresponse object (for example response.completed.response.usage).
Error Handling Notes
- If an error happens before streaming begins, you will receive a normal JSON error response with an HTTP status code.
- If an error happens mid-stream, the stream may end early. Your parser should handle EOF without a terminal marker as an error/retry condition.
Raw SSE Parsing Tips
When parsing SSE manually:- Events are separated by a blank line (
\\n\\nor\\r\\n\\r\\n). - A single event can include multiple
data:lines; concatenate them with\\nbefore parsing as JSON. - Handle
[DONE]as a sentinel string.