Overview
The Chat Completion endpoint provides OpenAI-compatible chat completions with support for web search capabilities through our LinkUp integration.Web Search
All models can access real-time web information by appending special suffixes to the model name::online
- Standard web search ($0.006 per request)- Returns 10 search results
- Perfect for straightforward questions
:online/linkup-deep
- Deep web search ($0.06 per request)- Iteratively searches for comprehensive information
- Ideal when initial results aren’t sufficient
Examples
Context Memory
Enable unlimited-length conversations with lossless, hierarchical memory.- Append
:memory
to any model name - Or send header
memory: true
- Can be combined with web search:
:online:memory
- Retention: default 30 days; configure via
:memory-<days>
(1..365) or headermemory_expiration_days: <days>
; header takes precedence
Custom Context Size Override
When Context Memory is enabled, you can override the model-derived context size used for the memory compression step withmodel_context_limit
.
- Parameter:
model_context_limit
(number or numeric string) - Default: Derived from the selected model’s context size
- Minimum: Values below 10,000 are clamped internally
- Scope: Only affects memory compression; does not change the target model’s own window
Reasoning Parameter
The API supports filtering out thinking content (e.g., Claude’s<think>
tags) from responses using the optional reasoning
parameter:
reasoning.exclude
is set to true
, the API removes content between <think>
and </think>
tags before returning the response. This works for both streaming and non-streaming requests.
Example
"<think>The user is asking for a simple addition. 2+2 equals 4.</think>\n\nThe answer is 4."
With reasoning.exclude: "The answer is 4."
Model suffix: :reasoning-exclude
Enable reasoning exclusion without modifying the request body by appending :reasoning-exclude
to the model
string.
- Behaves exactly like sending
{ "reasoning": { "exclude": true } }
- Only
:reasoning-exclude
is stripped before provider routing; all other suffixes remain active - Works for both streaming and non-streaming responses
- Available on both Chat Completions and Text Completions
Combine with other suffixes
You can compose:reasoning-exclude
with existing suffixes; only :reasoning-exclude
is removed prior to provider routing:
:thinking
(and variants like…-thinking:8192
):online
and:online/linkup-deep
:memory
and:memory-<days>
claude-3-7-sonnet-thinking:8192:reasoning-exclude
gpt-4o:online:reasoning-exclude
claude-3-5-sonnet-20241022:memory-30:online/linkup-deep:reasoning-exclude
Notes and limitations
- GPU-TEE models (
phala/*
) require byte-for-byte SSE passthrough for signature verification. For those models, streaming cannot be filtered; the suffix has no effect on the streaming bytes. - When assistant content is an array (e.g., vision/text parts), only text parts are filtered; images and tool/metadata content are untouched.
YouTube Transcripts
Automatically fetch and prepend YouTube video transcripts when the latest user message contains YouTube links.Defaults
- Parameter:
youtube_transcripts
(boolean) - Default:
true
(backwards compatible) - Limit: Up to 3 YouTube URLs processed per request
- Injection: Transcripts are added as a system message before your messages
- Billing: $0.01 per transcript fetched
Disable automatic transcripts
Setyoutube_transcripts
to false
to skip detection and fetching (no transcript cost applies).
Notes
- Web scraping is separate. To scrape non‑YouTube URLs, set
scraping: true
. YouTube transcripts do not requirescraping: true
. - When disabled, YouTube links are ignored for transcript fetching and are not billed.
- If your balance is insufficient when enabled, the request may be blocked with a 402.
Performance Benchmarks
LinkUp achieves state-of-the-art performance on OpenAI’s SimpleQA benchmark:Provider | Score |
---|---|
LinkUp Deep Search | 90.10% |
Exa | 90.04% |
Perplexity Sonar Pro | 86% |
LinkUp Standard Search | 85% |
Perplexity Sonar | 77% |
Tavily | 73% |
Important Notes
- Web search increases input token count, which affects total cost
- Models gain access to real-time information published less than a minute ago
- Internet connectivity can provide up to 10x improvement in factuality
- All models support web search - simply append the suffix to any model name
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
application/json
Parameters for chat completion
The model to use for completion. Append ':online' for web search ($0.005/request) or ':online/linkup-deep' for deep web search ($0.05/request)
Examples:
"chatgpt-4o-latest"
"chatgpt-4o-latest:online"
"chatgpt-4o-latest:online/linkup-deep"
"claude-3-5-sonnet-20241022:online"
Array of message objects with role and content
Whether to stream the response
Controls randomness (0-2)
Maximum number of tokens to generate
Nucleus sampling parameter (0-1)
Penalty for frequency of tokens (-2 to 2)
Penalty for presence of tokens (-2 to 2)
Cache control settings for Claude models only