Overview
The standalone Context Memory endpoint compresses an entire conversation into a single memory message. This endpoint does not run a model. It returns the compressed memory message and usage so you can pipe it into your own chat completion request or store it.- No model inference is performed
- Pass your
messages
array and optional settings
Authentication
Authorization: Bearer YOUR_API_KEY
orx-api-key: YOUR_API_KEY
Request
Headers
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
orx-api-key: YOUR_API_KEY
memory_expiration_days: <1..365>
(optional) — overrides body; defaults to 30
Body
messages
(required): OpenAI-style messages.user
,assistant
,system
,tool
, andfunction
roles are accepted. Assistanttool_calls
are ignored during compression.expiration_days
(optional): 1..365; default 30. If both header and body are provided, the header takes precedence.model_context_limit
(optional): Context target for compression. Default 128k; values below 10k are clamped internally.
Response
Success (200)
messages
: The single memory-compressed message array to use as your full context in a chat completion requestusage
: Token usage. When available,prompt_tokens_details.cached_tokens
indicates discounted cached input tokens
Error Examples
400 Bad Request
401 Unauthorized
402 Payment Required
429 Too Many Requests
Pricing & Billing
- Non-cached input tokens: $5.00 / 1M
- Cached input tokens: $2.50 / 1M (when applicable)
- Output tokens: $10.00 / 1M
/v1/chat/completions
, model costs are billed separately.
Retention
- Default retention: 30 days
- Configure via body
expiration_days
or headermemory_expiration_days
- Header value takes precedence over body when both are supplied