Responses
Create a response with the OpenAI-compatible Responses API
Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The/v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
- Stateless and stateful (conversation threading) chat completions
- Streaming responses via Server-Sent Events (SSE)
- Background (async) processing for long-running requests
- Response storage and retrieval
- Function/tool calling support
- Multimodal inputs (images, files) for supported models
X-X402: true header. See X-402 Micropayments for details.X-Provider explicitly selects a provider for the request and is always billed pay-as-you-go at the selected provider’s price, including provider-selection markup. For provider-selection-capable models, model may include routing preference suffixes such as :fast (alias for :speed) and :cheap (alias for :price). These are billed like explicit provider selection and follow the same conflict rules. For subscription users, sending X-Provider bypasses subscription coverage for that request; X-Billing-Mode: paygo is only needed when forcing pay-as-you-go without an explicit provider or when saved provider preferences should apply to subscription-included traffic. See Provider Selection, Model Suffixes, and Pay-As-You-Go Billing Override.Authentication
All requests require authentication via API key:x-team-id to choose team context when team defaults are evaluated (for example, retention defaults).
Endpoints
POST /v1/responses- Create a new response from the modelGET /v1/responses- Returns endpoint informationGET /v1/responses/{id}- Retrieve a stored response by IDDELETE /v1/responses/{id}- Delete a stored response (soft delete)
BYOK Encryption (Stored Responses)
If you setstore: true, you can optionally encrypt the stored response at rest using your own key or passphrase.
To encrypt a stored response, include one of these headers on POST /v1/responses:
x-encryption-key: YOUR_ENCRYPTION_KEYx-encryption-passphrase: YOUR_PASSPHRASE
Create Response
Request
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as :fast, :speed, :cheap, :price, :latency, :throughput, :floor, or :tools. |
input | string or array | Yes | The input prompt or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens in the response (minimum: 16) |
max_tool_calls | integer | No | Maximum number of tool calls allowed |
temperature | number | No | Sampling temperature (0-2). If omitted, NanoGPT does not force a value and the routed provider/model default applies (OpenAI defaults to 1.0). Not supported by reasoning-capable models |
top_p | number | No | Nucleus sampling parameter. Not supported by reasoning-capable models |
presence_penalty | number | No | Presence penalty for sampling (-2.0 to 2.0) |
frequency_penalty | number | No | Frequency penalty for sampling (-2.0 to 2.0) |
top_logprobs | integer | No | Number of top logprobs to return (0-20) |
tools | array | No | Array of tools available to the model |
tool_choice | string or object | No | Tool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... } |
parallel_tool_calls | boolean | No | Allow multiple tool calls in parallel |
stream | boolean | No | Enable streaming responses (default: false) |
stream_options | object | No | Streaming options: { include_obfuscation?: boolean } |
store | boolean | No | Store response for later retrieval (default: false) |
retention_days | integer or null | No | Per-request retention override in days (0..365). null means no request-level override |
retentionDays | integer or null | No | Alias for retention_days. If both are sent, values must match |
previous_response_id | string | No | Link to previous response for conversation threading |
reasoning | object | No | Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode. |
text | object | No | Text output configuration (format + verbosity) |
metadata | object | No | Custom metadata (max 16 keys, 64 char keys, 512 char values) |
truncation | string | No | Truncation strategy: auto or disabled |
user | string | No | Unique user identifier |
seed | integer | No | Random seed for reproducibility |
conversation | object | No | Conversation context: { id?: string, messages?: InputItem[] } |
include | string[] | No | Additional fields to include in response |
safety_identifier | string | No | Safety tracking identifier |
prompt_cache_key | string | No | Key for prompt caching |
background | boolean | No | Enable background/async processing |
service_tier | string | No | Service tier: "auto", "default", "flex", or "priority". See Service tiers (flex and priority) near the end. |
Retention Resolution
Effective retention for/v1/responses resolves in this order:
- Request override (
retention_days/retentionDays) - Team setting (
responses_retention_days) - User setting (
responsesRetentionDays) - Platform default (
7days)
retention_daysandretentionDaysaccept integer values0..365, ornull.nullmeans “no request override” and falls back to team/user/platform defaults.- If both request fields are provided, they must match.
- Invalid retention values return
400withinvalid_request_error. 0enables zero-retention behavior for that request.- Existing clients that omit retention fields keep default behavior (team/user/platform retention resolution).
0:
previous_response_idis rejected.backgroundis rejected.
- If
x-team-idis present and the caller is a member, that team is used. - Otherwise, the API uses the caller session’s default team (
default_team_uuid/default_team_id) when membership is valid.
Input Types
Theinput parameter accepts either a simple string or an array of input items.
Simple String Input
Array Input
Input Item Types
| Type | Description |
|---|---|
message | A message with role and content |
function_call | A tool/function call made by the model |
function_call_output | The result of a tool/function call |
Message Item
user, assistant, system, developer
Content can be a string or an array of content parts:
Content Part Types
| Type | Description |
|---|---|
input_text | Text input |
input_image | Image input (via URL or file_id) |
input_file | File input |
output_text | Text output (includes annotations/logprobs) |
refusal | Model refusal |
Image Input
detail parameter can be: auto, low, or high.
Function Call Item
Function Call Output Item
Tools
Provide function tools and built-in tools the model can use:Function Tool
Define functions that the model can call:Web Search Tool
File Search Tool
Code Interpreter Tool
MCP Tool
Image Generation Tool
Tool Choice
Useallowed_tools to restrict which tools the model may choose from:
Function Tool Normalization
Function tools in responses always include nullable fields:Reasoning Configuration
Usereasoning to control depth and visibility of reasoning output:
| Parameter | Values | Description |
|---|---|---|
effort | none, minimal, low, medium, high, xhigh | Reasoning depth. Any value other than none explicitly requests reasoning mode. |
summary | none, auto, detailed, concise | Reasoning summary format |
exclude | true, false | Controls output visibility (hides reasoning fields/blocks). It does not inherently disable reasoning compute. |
Text/Format Configuration
Control response format and verbosity:Text Parameter Structure
Format Types
{ "type": "text" }- Plain text (default){ "type": "json_object" }- JSON object output{ "type": "json_schema", "json_schema": { ... } }- Structured JSON with schema
Verbosity Values
low- Short, compact responsesmedium- Balanced detailhigh- Most detailed output
JSON Schema Format
Response Format
Successful Response
Response Fields
All fields below are always present; nullable values indicate an option was not set.| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (format: resp_*) |
object | string | Always "response" |
created_at | integer | Unix timestamp of creation |
completed_at | integer or null | Unix timestamp when response completed |
model | string | Model used for the response |
status | string | Response status |
instructions | string or null | System instructions used |
previous_response_id | string or null | ID of previous response in conversation |
tools | array | Tools available (normalized with nullable fields) |
tool_choice | string or object | Tool choice setting used |
parallel_tool_calls | boolean | Whether parallel tool calls were enabled |
truncation | string | Truncation strategy: auto or disabled |
text | object | Resolved text configuration |
reasoning | object or null | Reasoning configuration |
temperature | number | Temperature used |
top_p | number | Top-p value used |
presence_penalty | number | Presence penalty used |
frequency_penalty | number | Frequency penalty used |
top_logprobs | number | Top logprobs setting |
max_output_tokens | integer or null | Max output tokens setting |
max_tool_calls | integer or null | Max tool calls setting |
user | string or null | User identifier |
store | boolean | Whether response was stored |
background | boolean | Whether processed in background |
safety_identifier | string or null | Safety identifier |
prompt_cache_key | string or null | Prompt cache key |
output | array | Array of output items |
output_text | string | Convenience field with concatenated text output |
usage | object | Token usage statistics |
error | object | Error details (if status is failed) |
incomplete_details | object | Details if status is incomplete |
metadata | object | Custom metadata (if provided) |
service_tier | string | Service tier used (echoed when provided) |
Usage Object
Theusage object always includes token details:
Response Status Values
| Status | Description |
|---|---|
queued | Background request is queued |
in_progress | Request is being processed |
completed | Request completed successfully |
incomplete | Response was truncated |
failed | Request failed with error |
cancelled | Request was cancelled |
reasoning Response Field
text Response Field (Resolved)
Output Item Types
All output items include astatus field.
Message Output
Function Call Output
Reasoning Output (reasoning-capable models)
Web Search Call Output
Image Generation Call Output
Computer Call Output
Output Item Status Values
| Status | Description |
|---|---|
completed | Item finished successfully |
in_progress | Item still being generated |
incomplete | Item was truncated/interrupted |
Output Text Parts
Output text parts include annotations and logprobs:Annotation Types
URL Citation
File Citation
File Path
Streaming
See also: Streaming Protocol (SSE). Enable streaming to receive incremental response updates:Streaming Response
The response is delivered as Server-Sent Events (SSE):Streaming Event Types
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Processing started |
response.output_item.added | New output item started |
response.output_item.done | Output item completed |
response.content_part.added | Content part started |
response.content_part.done | Content part completed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Text content completed |
response.reasoning.delta | Incremental reasoning text |
response.reasoning.done | Reasoning content completed |
response.function_call_arguments.delta | Incremental function arguments |
response.function_call_arguments.done | Function call completed |
response.completed | Response completed successfully |
response.incomplete | Response truncated |
response.failed | Response failed |
Updated Event Fields
- All content/output events include
item_idfor the parent output item. - Text delta/done events include
logprobs.
response.output_text.delta:
Conversation Threading
Chain responses together for multi-turn conversations. You can useprevious_response_id or the conversation object (id or messages) to manage context.
First Request
id: "resp_abc123"
Follow-up Request
previous_response_id requires authentication, store: true on previous responses, and effective retention greater than 0.
Background Mode
For long-running requests, use background mode to receive an immediate response and poll for results.Initiate Background Request
Immediate Response (202 Accepted)
Poll for Completion
status is completed, failed, or incomplete.
Constraints:
- Cannot be combined with
stream: true - Requires authentication
- Effective retention must be greater than
0 - Maximum processing time: approximately 800 seconds
Retrieve Response
Response
Returns the full response object (same format as POST response).Errors
404- Response not found or belongs to different account401- Authentication required/invalid
Delete Response
Response
Error Handling
Error Response Format
HTTP Status Codes
| HTTP Status | Description |
|---|---|
400 | Invalid request parameters |
401 | Missing or invalid API key |
403 | Insufficient permissions |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable |
Common Error Codes
| Code | Description |
|---|---|
missing_required_parameter | Required parameter not provided |
model_not_found | Specified model does not exist |
response_not_found | Response ID not found |
invalid_response_id | Invalid response ID format |
invalid_request_error | Invalid request shape/value (for example retention out of range or mismatched alias fields) |
authentication_required | No API key provided |
invalid_api_key | API key is invalid or inactive |
Complete Examples
Simple Text Completion
Multi-turn Conversation
Streaming Response
Per-request Retention Override
Function Calling
Submitting Tool Results
Image Input (Vision)
JSON Output
Background Processing
Limitations
- Deep research models: Deep research variants are not supported.
- GPU-TEE streaming: Streaming is not supported for GPU-TEE models. Use
/v1/chat/completionsfor these models. - Background mode: Maximum duration is approximately 800 seconds.
- Metadata limits: Maximum 16 keys, 64 character key names, 512 character values.
Service tiers (flex and priority)
Setservice_tier to request a non-default capacity tier on providers that support service tiers:
autoor omitted: use NanoGPT’s normal routing and the provider default.default: request the provider’s standard tier where the provider accepts an explicit default value.flex: request lower-cost, variable-capacity processing where supported.priority: request higher-cost priority processing where supported.
- Service tier availability is model- and provider-specific. Model pages show which tiers are supported.
- Flex and priority tiers are only applied when the routed provider supports them.
- Header provider overrides (like
X-Provider) and explicit provider selection are honored for pricing and x402 estimates. - Provider-native web search can force routing; tier pricing follows that routing.
- If you explicitly force a provider that does not support service tiers, the requested tier may be ignored by the upstream provider, or routing and pricing may differ from the default route.
- Flex tier billing uses flex pricing where applicable.
- Priority tier billing uses priority pricing where applicable.
- High-context pricing may also apply for models and providers with separate high-context SKUs, such as
es2kpricing for GPT-5.5/GPT-5.4 where available.
Example: flex tier
Example: priority tier
Response Headers
All responses include:| Header | Description |
|---|---|
X-Request-ID | Unique request/response identifier |
Content-Type | application/json or text/event-stream |
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
Optional explicit provider override for supported open-source models (case-insensitive). Explicit provider selection is billed pay-as-you-go at the selected provider's price, including provider-selection markup; for subscription users it bypasses subscription coverage for that request.
Optional billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic (e.g., paygo). Header name is case-insensitive.
Optional team context override for API-key requests. If provided, it must reference a team the caller belongs to.
Body
Parameters for the response request
Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as ':fast', ':speed', ':cheap', ':price', ':latency', ':throughput', ':floor', or ':tools'.
Prompt string or array of input items
Billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.
Alias for billing_mode.
System instructions for the model
Maximum tokens in the response
x >= 16Sampling temperature (not supported by reasoning models)
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Function tools available to the model
How the model should use tools
Allow multiple tool calls in parallel
Enable streaming responses
Store response for later retrieval
Per-request retention override in days. Use null to disable request-level override.
0 <= x <= 365Alias for retention_days. If both are provided, values must match.
0 <= x <= 365Link to previous response for conversation threading
Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode.
Text/format configuration
Custom metadata
Truncation strategy
auto, disabled Unique user identifier
Random seed for reproducibility
Enable background/async processing
Optional service tier: "auto", "default", "flex", or "priority". Use "flex" for lower-cost variable-capacity processing or "priority" for higher-cost priority processing where supported by the routed model/provider.
auto, default, flex, priority Response
Response created
Response object returned by the Responses API