Responses - NanoGPT API Documentation

Overview

The /v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:

Stateless and stateful (conversation threading) chat completions
Streaming responses via Server-Sent Events (SSE)
Background (async) processing for long-running requests
Response storage and retrieval
Function/tool calling support
Multimodal inputs (images, files) for supported models

Provider selection is available for pay-as-you-go requests on supported open-source models. Set the X-Provider header or save preferences to choose a provider. If you are on a subscription and want provider selection for a subscription-included model, force paid routing with the pay-as-you-go billing override (billing_mode: "paygo" or X-Billing-Mode: paygo). See Provider Selection and Pay-As-You-Go Billing Override.

Authentication

All requests require authentication via API key:

Authorization: Bearer YOUR_API_KEY

Or alternatively:

x-api-key: YOUR_API_KEY

For API-key requests, you can optionally pass x-team-id to choose team context when team defaults are evaluated (for example, retention defaults).

Endpoints

POST /v1/responses - Create a new response from the model
GET /v1/responses - Returns endpoint information
GET /v1/responses/{id} - Retrieve a stored response by ID
DELETE /v1/responses/{id} - Delete a stored response (soft delete)

BYOK Encryption (Stored Responses)

If you set store: true, you can optionally encrypt the stored response at rest using your own key or passphrase. To encrypt a stored response, include one of these headers on POST /v1/responses:

x-encryption-key: YOUR_ENCRYPTION_KEY
x-encryption-passphrase: YOUR_PASSPHRASE

When retrieving or deleting an encrypted response, include the same header you used at creation time. Example:

# Create an encrypted, stored response
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-encryption-key: YOUR_ENCRYPTION_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Sensitive information",
    "store": true
  }'

# Retrieve it later (must include the same encryption header)
curl https://nano-gpt.com/api/v1/responses/resp_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-encryption-key: YOUR_ENCRYPTION_KEY"

Create Response

Request

POST /v1/responses
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	The model to use (e.g., `openai/gpt-5.2`, `anthropic/claude-opus-4.5`)
`input`	string or array	Yes	The input prompt or array of input items
`instructions`	string	No	System instructions for the model
`max_output_tokens`	integer	No	Maximum tokens in the response (minimum: 16)
`max_tool_calls`	integer	No	Maximum number of tool calls allowed
`temperature`	number	No	Sampling temperature (0-2). If omitted, NanoGPT does not force a value and the routed provider/model default applies (OpenAI defaults to 1.0). Not supported by reasoning-capable models
`top_p`	number	No	Nucleus sampling parameter. Not supported by reasoning-capable models
`presence_penalty`	number	No	Presence penalty for sampling (-2.0 to 2.0)
`frequency_penalty`	number	No	Frequency penalty for sampling (-2.0 to 2.0)
`top_logprobs`	integer	No	Number of top logprobs to return (0-20)
`tools`	array	No	Array of tools available to the model
`tool_choice`	string or object	No	Tool use: `auto`, `none`, `required`, `{ type: "function", name: "..." }`, or `{ type: "allowed_tools", ... }`
`parallel_tool_calls`	boolean	No	Allow multiple tool calls in parallel
`stream`	boolean	No	Enable streaming responses (default: false)
`stream_options`	object	No	Streaming options: `{ include_obfuscation?: boolean }`
`store`	boolean	No	Store response for later retrieval (default: false)
`retention_days`	integer or null	No	Per-request retention override in days (`0..365`). `null` means no request-level override
`retentionDays`	integer or null	No	Alias for `retention_days`. If both are sent, values must match
`previous_response_id`	string	No	Link to previous response for conversation threading
`reasoning`	object	No	Reasoning configuration for reasoning-capable models
`text`	object	No	Text output configuration (format + verbosity)
`metadata`	object	No	Custom metadata (max 16 keys, 64 char keys, 512 char values)
`truncation`	string	No	Truncation strategy: `auto` or `disabled`
`user`	string	No	Unique user identifier
`seed`	integer	No	Random seed for reproducibility
`conversation`	object	No	Conversation context: `{ id?: string, messages?: InputItem[] }`
`include`	string[]	No	Additional fields to include in response
`safety_identifier`	string	No	Safety tracking identifier
`prompt_cache_key`	string	No	Key for prompt caching
`background`	boolean	No	Enable background/async processing
`service_tier`	string	No	Service tier. Use `"priority"` where supported. See Service tiers (priority) near the end.

Retention Resolution

Effective retention for /v1/responses resolves in this order:

Request override (retention_days / retentionDays)
Team setting (responses_retention_days)
User setting (responsesRetentionDays)
Platform default (7 days)

Rules:

retention_days and retentionDays accept integer values 0..365, or null.
null means “no request override” and falls back to team/user/platform defaults.
If both request fields are provided, they must match.
Invalid retention values return 400 with invalid_request_error.
0 enables zero-retention behavior for that request.
Existing clients that omit retention fields keep default behavior (team/user/platform retention resolution).

With effective retention 0:

previous_response_id is rejected.
background is rejected.

API-key team context for retention defaults:

If x-team-id is present and the caller is a member, that team is used.
Otherwise, the API uses the caller session’s default team (default_team_uuid / default_team_id) when membership is valid.

Input Types

The input parameter accepts either a simple string or an array of input items.

Simple String Input

{
  "model": "openai/gpt-5.2",
  "input": "What is the capital of France?"
}

Array Input

{
  "model": "openai/gpt-5.2",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Input Item Types

Type	Description
`message`	A message with role and content
`function_call`	A tool/function call made by the model
`function_call_output`	The result of a tool/function call

Message Item

{
  "type": "message",
  "role": "user",
  "content": "Hello, how are you?"
}

Supported roles: user, assistant, system, developer Content can be a string or an array of content parts:

{
  "type": "message",
  "role": "user",
  "content": [
    { "type": "input_text", "text": "What's in this image?" },
    { "type": "input_image", "image_url": "https://example.com/image.jpg" }
  ]
}

Content Part Types

Type	Description
`input_text`	Text input
`input_image`	Image input (via URL or file_id)
`input_file`	File input
`output_text`	Text output (includes annotations/logprobs)
`refusal`	Model refusal

Image Input

{
  "type": "input_image",
  "image_url": "https://example.com/image.jpg",
  "detail": "auto"
}

The detail parameter can be: auto, low, or high.

Function Call Item

{
  "type": "function_call",
  "id": "fc_123",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": "{\"location\": \"Paris\"}"
}

Function Call Output Item

{
  "type": "function_call_output",
  "call_id": "call_abc123",
  "output": "{\"temperature\": 22, \"condition\": \"sunny\"}"
}

Tools

Provide function tools and built-in tools the model can use:

Function Tool

Define functions that the model can call:

{
  "model": "openai/gpt-5.2",
  "input": "What's the weather in Paris?",
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"
          }
        },
        "required": ["location"]
      },
      "strict": false
    }
  ],
  "tool_choice": "auto"
}

Web Search Tool

{
  "type": "web_search_preview",
  "search_context_size": "low",
  "user_location": {
    "type": "approximate",
    "country": "US",
    "city": "San Francisco",
    "region": "California"
  }
}

File Search Tool

{
  "type": "file_search",
  "vector_store_ids": ["vs_..."],
  "max_num_results": 10,
  "ranking_options": {
    "ranker": "auto",
    "score_threshold": 0.5
  }
}

Code Interpreter Tool

{
  "type": "code_interpreter",
  "container": { "type": "auto" }
}

MCP Tool

{
  "type": "mcp",
  "server_label": "my-server",
  "server_url": "https://...",
  "headers": { "Authorization": "Bearer ..." },
  "require_approval": "auto"
}

Image Generation Tool

{
  "type": "image_generation"
}

Tool Choice

Use allowed_tools to restrict which tools the model may choose from:

{
  "tool_choice": {
    "type": "allowed_tools",
    "tools": [{ "type": "function", "name": "get_weather" }],
    "mode": "auto"
  }
}

Function Tool Normalization

Function tools in responses always include nullable fields:

{
  "type": "function",
  "name": "get_weather",
  "description": null,
  "parameters": null,
  "strict": null
}

Reasoning Configuration

For reasoning-capable models:

{
  "model": "anthropic/claude-opus-4.5",
  "input": "Solve this complex problem...",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  }
}

Parameter	Values	Description
`effort`	`low`, `medium`, `high`	How much effort the model puts into reasoning
`summary`	`none`, `auto`, `detailed`, `concise`	Reasoning summary format

Text/Format Configuration

Control response format and verbosity:

{
  "model": "openai/gpt-5.2",
  "input": "List 3 colors",
  "text": {
    "format": { "type": "json_object" },
    "verbosity": "medium"
  }
}

Text Parameter Structure

{
  "format": { "type": "text" } | { "type": "json_object" } | { "type": "json_schema", "json_schema": { ... } },
  "verbosity": "low" | "medium" | "high"
}

Format Types

{ "type": "text" } - Plain text (default)
{ "type": "json_object" } - JSON object output
{ "type": "json_schema", "json_schema": { ... } } - Structured JSON with schema

Verbosity Values

low - Short, compact responses
medium - Balanced detail
high - Most detailed output

JSON Schema Format

{
  "text": {
    "format": {
      "type": "json_schema",
      "json_schema": {
        "name": "color_list",
        "schema": {
          "type": "object",
          "properties": {
            "colors": {
              "type": "array",
              "items": { "type": "string" }
            }
          }
        },
        "strict": true
      }
    }
  }
}

Response Format

Successful Response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1699000000,
  "completed_at": 1699000001,
  "model": "openai/gpt-5.2",
  "status": "completed",
  "instructions": null,
  "previous_response_id": null,
  "tools": [],
  "tool_choice": "auto",
  "parallel_tool_calls": false,
  "truncation": "disabled",
  "text": {
    "format": { "type": "text" },
    "verbosity": "medium"
  },
  "reasoning": null,
  "temperature": 1,
  "top_p": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "top_logprobs": 0,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "user": null,
  "store": true,
  "background": false,
  "safety_identifier": null,
  "prompt_cache_key": null,
  "output": [
    {
      "type": "message",
      "id": "msg_xyz789",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": [],
          "logprobs": []
        }
      ]
    }
  ],
  "output_text": "The capital of France is Paris.",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 10,
    "total_tokens": 25,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  },
  "metadata": {},
  "service_tier": "auto"
}

Response Fields

All fields below are always present; nullable values indicate an option was not set.

Field	Type	Description
`id`	string	Unique response identifier (format: `resp_*`)
`object`	string	Always `"response"`
`created_at`	integer	Unix timestamp of creation
`completed_at`	integer or null	Unix timestamp when response completed
`model`	string	Model used for the response
`status`	string	Response status
`instructions`	string or null	System instructions used
`previous_response_id`	string or null	ID of previous response in conversation
`tools`	array	Tools available (normalized with nullable fields)
`tool_choice`	string or object	Tool choice setting used
`parallel_tool_calls`	boolean	Whether parallel tool calls were enabled
`truncation`	string	Truncation strategy: `auto` or `disabled`
`text`	object	Resolved text configuration
`reasoning`	object or null	Reasoning configuration
`temperature`	number	Temperature used
`top_p`	number	Top-p value used
`presence_penalty`	number	Presence penalty used
`frequency_penalty`	number	Frequency penalty used
`top_logprobs`	number	Top logprobs setting
`max_output_tokens`	integer or null	Max output tokens setting
`max_tool_calls`	integer or null	Max tool calls setting
`user`	string or null	User identifier
`store`	boolean	Whether response was stored
`background`	boolean	Whether processed in background
`safety_identifier`	string or null	Safety identifier
`prompt_cache_key`	string or null	Prompt cache key
`output`	array	Array of output items
`output_text`	string	Convenience field with concatenated text output
`usage`	object	Token usage statistics
`error`	object	Error details (if status is `failed`)
`incomplete_details`	object	Details if status is `incomplete`
`metadata`	object	Custom metadata (if provided)
`service_tier`	string	Service tier used (echoed when provided)

Usage Object

The usage object always includes token details:

{
  "input_tokens": 100,
  "output_tokens": 50,
  "total_tokens": 150,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens_details": {
    "reasoning_tokens": 0
  }
}

Response Status Values

Status	Description
`queued`	Background request is queued
`in_progress`	Request is being processed
`completed`	Request completed successfully
`incomplete`	Response was truncated
`failed`	Request failed with error
`cancelled`	Request was cancelled

`reasoning` Response Field

{
  "effort": "low" | "medium" | "high" | null,
  "summary": "none" | "auto" | "detailed" | "concise" | null
}

`text` Response Field (Resolved)

{
  "format": { "type": "text" | "json_object" | "json_schema", "...": "..." },
  "verbosity": "low" | "medium" | "high" | undefined
}

Output Item Types

All output items include a status field.

Message Output

{
  "type": "message",
  "id": "msg_123",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "output_text",
      "text": "Response text here",
      "annotations": [],
      "logprobs": []
    }
  ]
}

Function Call Output

{
  "type": "function_call",
  "id": "fc_123",
  "call_id": "call_abc",
  "name": "get_weather",
  "arguments": "{\"location\": \"Paris\"}",
  "status": "completed"
}

Reasoning Output (reasoning-capable models)

{
  "type": "reasoning",
  "id": "reasoning_123",
  "status": "completed",
  "summary": [
    {
      "type": "summary_text",
      "text": "I analyzed the problem by..."
    }
  ],
  "content": [
    {
      "type": "reasoning_text",
      "text": "Detailed reasoning goes here."
    }
  ],
  "encrypted_content": null
}

Web Search Call Output

{
  "type": "web_search_call",
  "id": "ws_123",
  "status": "completed",
  "action": { "query": "search query" },
  "results": [{ "url": "...", "title": "...", "snippet": "..." }]
}

Image Generation Call Output

{
  "type": "image_generation_call",
  "id": "ig_123",
  "status": "completed",
  "result": {
    "b64_json": "...",
    "url": "...",
    "revised_prompt": "..."
  }
}

Computer Call Output

{
  "type": "computer_call",
  "id": "cc_123",
  "call_id": "call_abc123",
  "status": "completed",
  "action": { "type": "click" },
  "pending_safety_checks": [{ "id": "...", "code": "...", "message": "..." }]
}

Output Item Status Values

Status	Description
`completed`	Item finished successfully
`in_progress`	Item still being generated
`incomplete`	Item was truncated/interrupted

Output Text Parts

Output text parts include annotations and logprobs:

{
  "type": "output_text",
  "text": "Hello world",
  "annotations": [],
  "logprobs": [
    {
      "token": "Hello",
      "logprob": -0.5,
      "bytes": [72, 101, 108, 108, 111],
      "top_logprobs": [
        { "token": "Hello", "logprob": -0.5, "bytes": [72, 101, 108, 108, 111] },
        { "token": "Hi", "logprob": -1.2, "bytes": [72, 105] }
      ]
    }
  ]
}

Annotation Types

URL Citation

{
  "type": "url_citation",
  "start_index": 0,
  "end_index": 10,
  "url": "https://...",
  "title": "Page Title"
}

File Citation

{
  "type": "file_citation",
  "start_index": 0,
  "end_index": 10,
  "file_id": "file_..."
}

File Path

{
  "type": "file_path",
  "start_index": 0,
  "end_index": 10,
  "file_id": "file_..."
}

Streaming

See also: Streaming Protocol (SSE). Enable streaming to receive incremental response updates:

{
  "model": "openai/gpt-5.2",
  "input": "Write a short story",
  "stream": true
}

Streaming Response

The response is delivered as Server-Sent Events (SSE):

data: {"type":"response.created","response":{...},"sequence_number":0}

data: {"type":"response.in_progress","response":{...},"sequence_number":1}

data: {"type":"response.output_item.added","output_index":0,"item":{...},"sequence_number":2}

data: {"type":"response.output_text.delta","item_id":"msg_...","output_index":0,"content_index":0,"delta":"The ","logprobs":[...],"sequence_number":3}

data: {"type":"response.output_text.delta","item_id":"msg_...","output_index":0,"content_index":0,"delta":"capital ","logprobs":[...],"sequence_number":4}

data: {"type":"response.output_text.done","item_id":"msg_...","output_index":0,"content_index":0,"text":"The capital of France is Paris.","logprobs":[...],"sequence_number":10}

data: {"type":"response.completed","response":{...},"sequence_number":11}

data: [DONE]

Streaming Event Types

Event	Description
`response.created`	Response object created
`response.in_progress`	Processing started
`response.output_item.added`	New output item started
`response.output_item.done`	Output item completed
`response.content_part.added`	Content part started
`response.content_part.done`	Content part completed
`response.output_text.delta`	Incremental text chunk
`response.output_text.done`	Text content completed
`response.reasoning.delta`	Incremental reasoning text
`response.reasoning.done`	Reasoning content completed
`response.function_call_arguments.delta`	Incremental function arguments
`response.function_call_arguments.done`	Function call completed
`response.completed`	Response completed successfully
`response.incomplete`	Response truncated
`response.failed`	Response failed

Updated Event Fields

All content/output events include item_id for the parent output item.
Text delta/done events include logprobs.

Example response.output_text.delta:

{
  "type": "response.output_text.delta",
  "item_id": "msg_...",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello",
  "logprobs": [...],
  "sequence_number": 5
}

Conversation Threading

Chain responses together for multi-turn conversations. You can use previous_response_id or the conversation object (id or messages) to manage context.

First Request

{
  "model": "openai/gpt-5.2",
  "input": "My name is Alice."
}

Response includes id: "resp_abc123"

Follow-up Request

{
  "model": "openai/gpt-5.2",
  "input": "What is my name?",
  "previous_response_id": "resp_abc123"
}

The model has access to the conversation history and responds: “Your name is Alice.” Note: previous_response_id requires authentication, store: true on previous responses, and effective retention greater than 0.

Background Mode

For long-running requests, use background mode to receive an immediate response and poll for results.

Initiate Background Request

{
  "model": "openai/gpt-5.2",
  "input": "Write a detailed analysis...",
  "background": true
}

Immediate Response (202 Accepted)

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1699000000,
  "model": "openai/gpt-5.2",
  "status": "queued",
  "output": []
}

Poll for Completion

GET /v1/responses/resp_abc123
Authorization: Bearer YOUR_API_KEY

Keep polling until status is completed, failed, or incomplete. Constraints:

Cannot be combined with stream: true
Requires authentication
Effective retention must be greater than 0
Maximum processing time: approximately 800 seconds

Retrieve Response

GET /v1/responses/{id}
Authorization: Bearer YOUR_API_KEY

Response

Returns the full response object (same format as POST response).

Errors

404 - Response not found or belongs to different account
401 - Authentication required/invalid

Delete Response

DELETE /v1/responses/{id}
Authorization: Bearer YOUR_API_KEY

Response

{
  "id": "resp_abc123",
  "object": "response.deleted",
  "deleted": true
}

Error Handling

Error Response Format

{
  "error": {
    "code": "missing_required_parameter",
    "message": "model is required"
  }
}

HTTP Status Codes

HTTP Status	Description
`400`	Invalid request parameters
`401`	Missing or invalid API key
`403`	Insufficient permissions
`404`	Resource not found
`429`	Rate limit exceeded
`500`	Internal server error
`503`	Service unavailable

Common Error Codes

Code	Description
`missing_required_parameter`	Required parameter not provided
`model_not_found`	Specified model does not exist
`response_not_found`	Response ID not found
`invalid_response_id`	Invalid response ID format
`invalid_request_error`	Invalid request shape/value (for example retention out of range or mismatched alias fields)
`authentication_required`	No API key provided
`invalid_api_key`	API key is invalid or inactive

Complete Examples

Simple Text Completion

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Explain quantum computing in one sentence."
  }'

Multi-turn Conversation

# First turn
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "I want to learn Python programming."
  }'

# Second turn (using response ID from first request)
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Where should I start?",
    "previous_response_id": "resp_abc123"
  }'

Streaming Response

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Write a haiku about programming",
    "stream": true
  }'

Per-request Retention Override

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "input": "hello",
    "store": true,
    "retention_days": 3
  }'

Function Calling

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "What is the weather in Tokyo?",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    ]
  }'

Submitting Tool Results

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": "What is the weather in Tokyo?"
      },
      {
        "type": "function_call",
        "id": "fc_1",
        "call_id": "call_123",
        "name": "get_weather",
        "arguments": "{\"location\": \"Tokyo\"}"
      },
      {
        "type": "function_call_output",
        "call_id": "call_123",
        "output": "{\"temperature\": 18, \"condition\": \"cloudy\"}"
      }
    ]
  }'

Image Input (Vision)

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "What is in this image?" },
          { "type": "input_image", "image_url": "https://example.com/photo.jpg", "detail": "auto" }
        ]
      }
    ]
  }'

JSON Output

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "List the planets in our solar system",
    "text": {
      "format": { "type": "json_object" }
    }
  }'

Background Processing

# Start background request
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Generate a comprehensive report...",
    "background": true
  }'

# Poll for results
curl https://nano-gpt.com/api/v1/responses/resp_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Limitations

Deep research models: Deep research variants are not supported.
GPU-TEE streaming: Streaming is not supported for GPU-TEE models. Use /v1/chat/completions for these models.
Background mode: Maximum duration is approximately 800 seconds.
Metadata limits: Maximum 16 keys, 64 character key names, 512 character values.

Service tiers (priority)

Set service_tier: "priority" to request priority processing on providers that support service tiers. Behavior notes:

Priority tiers are only applied when the routed provider supports them.
Priority tiers are gated on the routed provider, not just the model name.
Header provider overrides (like X-Provider) and explicit provider selection are honored for pricing and x402 estimates.
Provider-native web search can force routing; priority pricing follows that routing.

Billing note:

Priority tier billing uses priority pricing when applicable.

Example: priority tier

{
  "model": "gpt-5.2",
  "input": "Say hi in one sentence.",
  "service_tier": "priority"
}

Response Headers

All responses include:

Header	Description
`X-Request-ID`	Unique request/response identifier
`Content-Type`	`application/json` or `text/event-stream`

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

X-Provider

string

Optional provider override for pay-as-you-go requests on supported open-source models (case-insensitive). Subscription requests ignore this header.

X-Billing-Mode

string

Optional billing override to force pay-as-you-go (e.g., paygo). Header name is case-insensitive.

X-Team-Id

string

Optional team context override for API-key requests. If provided, it must reference a team the caller belongs to.

Body

application/json

Parameters for the response request

model

string

required

Model ID to use for the response

input

required

Prompt string or array of input items

billing_mode

string

Billing override to force pay-as-you-go. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.

billingMode

string

Alias for billing_mode.

instructions

string

System instructions for the model

max_output_tokens

integer

Maximum tokens in the response

Required range: x >= 16

temperature

number

Sampling temperature (not supported by reasoning models)

Required range: 0 <= x <= 2

top_p

number

Nucleus sampling parameter

Required range: 0 <= x <= 1

tools

object[]

Function tools available to the model

tool_choice

How the model should use tools

parallel_tool_calls

boolean

Allow multiple tool calls in parallel

stream

boolean

default:false

Enable streaming responses

store

boolean

default:false

Store response for later retrieval

retention_days

integer | null

Per-request retention override in days. Use null to disable request-level override.

Required range: 0 <= x <= 365

retentionDays

integer | null

Alias for retention_days. If both are provided, values must match.

Required range: 0 <= x <= 365

previous_response_id

string

Link to previous response for conversation threading

reasoning

object

Reasoning configuration for reasoning-capable models

text

object

Text/format configuration

metadata

object

Custom metadata

truncation

enum<string>

Truncation strategy

Available options:

auto,

disabled

user

string

Unique user identifier

seed

integer

Random seed for reproducibility

background

boolean

Enable background/async processing

service_tier

enum<string>

Optional service tier. Set to "priority" to request priority processing when supported by the routed provider

Available options:

auto,

default,

flex,

priority

Response

Response created

Response object returned by the Responses API

string

object

string

created_at

integer

model

string

status

enum<string>

Available options:

queued,

in_progress,

completed,

incomplete,

failed,

cancelled

output

object[]

output_text

string

usage

object

error

object

incomplete_details

object

metadata

object

service_tier

string

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Authentication

​Endpoints

​BYOK Encryption (Stored Responses)

​Create Response

​Request

​Request Body

​Retention Resolution

​Input Types

​Simple String Input

​Array Input

​Input Item Types

​Message Item

​Content Part Types

​Image Input

​Function Call Item

​Function Call Output Item

​Tools

​Function Tool

​Web Search Tool

​File Search Tool

​Code Interpreter Tool

​MCP Tool

​Image Generation Tool

​Tool Choice

​Function Tool Normalization

​Reasoning Configuration

​Text/Format Configuration

​Text Parameter Structure

​Format Types

​Verbosity Values

​JSON Schema Format

​Response Format

​Successful Response

​Response Fields

​Usage Object

​Response Status Values

​reasoning Response Field

​text Response Field (Resolved)

​Output Item Types

​Message Output

​Function Call Output

​Reasoning Output (reasoning-capable models)

​Web Search Call Output

​Image Generation Call Output

​Computer Call Output

​Output Item Status Values

​Output Text Parts

​Annotation Types

​URL Citation

​File Citation

​File Path

​Streaming

​Streaming Response

​Streaming Event Types

​Updated Event Fields

​Conversation Threading

​First Request

​Follow-up Request

​Background Mode

​Initiate Background Request

​Immediate Response (202 Accepted)

​Poll for Completion

​Retrieve Response

​Response

​Errors

​Delete Response

​Response

​Error Handling

​Error Response Format

​HTTP Status Codes

​Common Error Codes

​Complete Examples

​Simple Text Completion

​Multi-turn Conversation

Overview

Authentication

Endpoints

BYOK Encryption (Stored Responses)

Create Response

Request

Request Body

Retention Resolution

Input Types

Simple String Input

Array Input

Input Item Types

Message Item

Content Part Types

Image Input

Function Call Item

Function Call Output Item

Tools

Function Tool

Web Search Tool

File Search Tool

Code Interpreter Tool

MCP Tool

Image Generation Tool

Tool Choice

Function Tool Normalization

Reasoning Configuration

Text/Format Configuration

Text Parameter Structure

Format Types

Verbosity Values

JSON Schema Format

Response Format

Successful Response

Response Fields

Usage Object

Response Status Values

`reasoning` Response Field

`text` Response Field (Resolved)

Output Item Types

Message Output

Function Call Output

Reasoning Output (reasoning-capable models)

Web Search Call Output

Image Generation Call Output

Computer Call Output

Output Item Status Values

Output Text Parts

Annotation Types

URL Citation

File Citation

File Path

Streaming

Streaming Response

Streaming Event Types

Updated Event Fields

Conversation Threading

First Request

Follow-up Request

Background Mode

Initiate Background Request

Immediate Response (202 Accepted)

Poll for Completion

Retrieve Response

Response

Errors

Delete Response

Response

Error Handling

Error Response Format

HTTP Status Codes

Common Error Codes

Complete Examples

Simple Text Completion

Multi-turn Conversation

Streaming Response

Per-request Retention Override

Function Calling

Submitting Tool Results

Image Input (Vision)