Create a response with the OpenAI-compatible Responses API
/v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
X-Provider header or save preferences to choose a provider. If you are on a subscription and want provider selection for a subscription-included model, force paid routing with the pay-as-you-go billing override (billing_mode: "paygo" or X-Billing-Mode: paygo). See Provider Selection and Pay-As-You-Go Billing Override.x-team-id to choose team context when team defaults are evaluated (for example, retention defaults).
POST /v1/responses - Create a new response from the modelGET /v1/responses - Returns endpoint informationGET /v1/responses/{id} - Retrieve a stored response by IDDELETE /v1/responses/{id} - Delete a stored response (soft delete)store: true, you can optionally encrypt the stored response at rest using your own key or passphrase.
To encrypt a stored response, include one of these headers on POST /v1/responses:
x-encryption-key: YOUR_ENCRYPTION_KEYx-encryption-passphrase: YOUR_PASSPHRASE| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model to use (e.g., openai/gpt-5.2, anthropic/claude-opus-4.5) |
input | string or array | Yes | The input prompt or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens in the response (minimum: 16) |
max_tool_calls | integer | No | Maximum number of tool calls allowed |
temperature | number | No | Sampling temperature (0-2). If omitted, NanoGPT does not force a value and the routed provider/model default applies (OpenAI defaults to 1.0). Not supported by reasoning-capable models |
top_p | number | No | Nucleus sampling parameter. Not supported by reasoning-capable models |
presence_penalty | number | No | Presence penalty for sampling (-2.0 to 2.0) |
frequency_penalty | number | No | Frequency penalty for sampling (-2.0 to 2.0) |
top_logprobs | integer | No | Number of top logprobs to return (0-20) |
tools | array | No | Array of tools available to the model |
tool_choice | string or object | No | Tool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... } |
parallel_tool_calls | boolean | No | Allow multiple tool calls in parallel |
stream | boolean | No | Enable streaming responses (default: false) |
stream_options | object | No | Streaming options: { include_obfuscation?: boolean } |
store | boolean | No | Store response for later retrieval (default: false) |
retention_days | integer or null | No | Per-request retention override in days (0..365). null means no request-level override |
retentionDays | integer or null | No | Alias for retention_days. If both are sent, values must match |
previous_response_id | string | No | Link to previous response for conversation threading |
reasoning | object | No | Reasoning configuration for reasoning-capable models |
text | object | No | Text output configuration (format + verbosity) |
metadata | object | No | Custom metadata (max 16 keys, 64 char keys, 512 char values) |
truncation | string | No | Truncation strategy: auto or disabled |
user | string | No | Unique user identifier |
seed | integer | No | Random seed for reproducibility |
conversation | object | No | Conversation context: { id?: string, messages?: InputItem[] } |
include | string[] | No | Additional fields to include in response |
safety_identifier | string | No | Safety tracking identifier |
prompt_cache_key | string | No | Key for prompt caching |
background | boolean | No | Enable background/async processing |
service_tier | string | No | Service tier. Use "priority" where supported. See Service tiers (priority) near the end. |
/v1/responses resolves in this order:
retention_days / retentionDays)responses_retention_days)responsesRetentionDays)7 days)retention_days and retentionDays accept integer values 0..365, or null.null means “no request override” and falls back to team/user/platform defaults.400 with invalid_request_error.0 enables zero-retention behavior for that request.0:
previous_response_id is rejected.background is rejected.x-team-id is present and the caller is a member, that team is used.default_team_uuid / default_team_id) when membership is valid.input parameter accepts either a simple string or an array of input items.
| Type | Description |
|---|---|
message | A message with role and content |
function_call | A tool/function call made by the model |
function_call_output | The result of a tool/function call |
user, assistant, system, developer
Content can be a string or an array of content parts:
| Type | Description |
|---|---|
input_text | Text input |
input_image | Image input (via URL or file_id) |
input_file | File input |
output_text | Text output (includes annotations/logprobs) |
refusal | Model refusal |
detail parameter can be: auto, low, or high.
allowed_tools to restrict which tools the model may choose from:
| Parameter | Values | Description |
|---|---|---|
effort | low, medium, high | How much effort the model puts into reasoning |
summary | none, auto, detailed, concise | Reasoning summary format |
{ "type": "text" } - Plain text (default){ "type": "json_object" } - JSON object output{ "type": "json_schema", "json_schema": { ... } } - Structured JSON with schemalow - Short, compact responsesmedium - Balanced detailhigh - Most detailed output| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (format: resp_*) |
object | string | Always "response" |
created_at | integer | Unix timestamp of creation |
completed_at | integer or null | Unix timestamp when response completed |
model | string | Model used for the response |
status | string | Response status |
instructions | string or null | System instructions used |
previous_response_id | string or null | ID of previous response in conversation |
tools | array | Tools available (normalized with nullable fields) |
tool_choice | string or object | Tool choice setting used |
parallel_tool_calls | boolean | Whether parallel tool calls were enabled |
truncation | string | Truncation strategy: auto or disabled |
text | object | Resolved text configuration |
reasoning | object or null | Reasoning configuration |
temperature | number | Temperature used |
top_p | number | Top-p value used |
presence_penalty | number | Presence penalty used |
frequency_penalty | number | Frequency penalty used |
top_logprobs | number | Top logprobs setting |
max_output_tokens | integer or null | Max output tokens setting |
max_tool_calls | integer or null | Max tool calls setting |
user | string or null | User identifier |
store | boolean | Whether response was stored |
background | boolean | Whether processed in background |
safety_identifier | string or null | Safety identifier |
prompt_cache_key | string or null | Prompt cache key |
output | array | Array of output items |
output_text | string | Convenience field with concatenated text output |
usage | object | Token usage statistics |
error | object | Error details (if status is failed) |
incomplete_details | object | Details if status is incomplete |
metadata | object | Custom metadata (if provided) |
service_tier | string | Service tier used (echoed when provided) |
usage object always includes token details:
| Status | Description |
|---|---|
queued | Background request is queued |
in_progress | Request is being processed |
completed | Request completed successfully |
incomplete | Response was truncated |
failed | Request failed with error |
cancelled | Request was cancelled |
reasoning Response Fieldtext Response Field (Resolved)status field.
| Status | Description |
|---|---|
completed | Item finished successfully |
in_progress | Item still being generated |
incomplete | Item was truncated/interrupted |
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Processing started |
response.output_item.added | New output item started |
response.output_item.done | Output item completed |
response.content_part.added | Content part started |
response.content_part.done | Content part completed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Text content completed |
response.reasoning.delta | Incremental reasoning text |
response.reasoning.done | Reasoning content completed |
response.function_call_arguments.delta | Incremental function arguments |
response.function_call_arguments.done | Function call completed |
response.completed | Response completed successfully |
response.incomplete | Response truncated |
response.failed | Response failed |
item_id for the parent output item.logprobs.response.output_text.delta:
previous_response_id or the conversation object (id or messages) to manage context.
id: "resp_abc123"
previous_response_id requires authentication, store: true on previous responses, and effective retention greater than 0.
status is completed, failed, or incomplete.
Constraints:
stream: true0404 - Response not found or belongs to different account401 - Authentication required/invalid| HTTP Status | Description |
|---|---|
400 | Invalid request parameters |
401 | Missing or invalid API key |
403 | Insufficient permissions |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable |
| Code | Description |
|---|---|
missing_required_parameter | Required parameter not provided |
model_not_found | Specified model does not exist |
response_not_found | Response ID not found |
invalid_response_id | Invalid response ID format |
invalid_request_error | Invalid request shape/value (for example retention out of range or mismatched alias fields) |
authentication_required | No API key provided |
invalid_api_key | API key is invalid or inactive |
/v1/chat/completions for these models.service_tier: "priority" to request priority processing on providers that support service tiers.
Behavior notes:
X-Provider) and explicit provider selection are honored for pricing and x402 estimates.| Header | Description |
|---|---|
X-Request-ID | Unique request/response identifier |
Content-Type | application/json or text/event-stream |
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Optional provider override for pay-as-you-go requests on supported open-source models (case-insensitive). Subscription requests ignore this header.
Optional billing override to force pay-as-you-go (e.g., paygo). Header name is case-insensitive.
Optional team context override for API-key requests. If provided, it must reference a team the caller belongs to.
Parameters for the response request
Model ID to use for the response
Prompt string or array of input items
Billing override to force pay-as-you-go. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.
Alias for billing_mode.
System instructions for the model
Maximum tokens in the response
x >= 16Sampling temperature (not supported by reasoning models)
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Function tools available to the model
How the model should use tools
Allow multiple tool calls in parallel
Enable streaming responses
Store response for later retrieval
Per-request retention override in days. Use null to disable request-level override.
0 <= x <= 365Alias for retention_days. If both are provided, values must match.
0 <= x <= 365Link to previous response for conversation threading
Reasoning configuration for reasoning-capable models
Text/format configuration
Custom metadata
Truncation strategy
auto, disabled Unique user identifier
Random seed for reproducibility
Enable background/async processing
Optional service tier. Set to "priority" to request priority processing when supported by the routed provider
auto, default, flex, priority Response created
Response object returned by the Responses API