Overview
The standalone Context Memory endpoint compresses an entire conversation into a single memory message. This endpoint does not run a model. It returns the compressed memory message and usage so you can pipe it into your own chat completion request or store it.- No model inference is performed
- Pass your
messagesarray and optional settings
Authentication
Authorization: Bearer YOUR_API_KEYorx-api-key: YOUR_API_KEY
Request
Headers
Content-Type: application/jsonAuthorization: Bearer YOUR_API_KEYorx-api-key: YOUR_API_KEYmemory_expiration_days: <1..365>(optional) — overrides body; defaults to 30
Body
messages(required): OpenAI-style messages.user,assistant,system,tool, andfunctionroles are accepted. Assistanttool_callsare ignored during compression.expiration_days(optional): 1..365; default 30. If both header and body are provided, the header takes precedence.model_context_limit(optional): Context target for compression. Default 128k; values below 10k are clamped internally.
Response
Success (200)
messages: The single memory-compressed message array to use as your full context in a chat completion requestusage: Token usage. When available,prompt_tokens_details.cached_tokensindicates discounted cached input tokens
Error Examples
400 Bad Request
401 Unauthorized
402 Payment Required
429 Too Many Requests
Pricing & Billing
- Non-cached input tokens: $5.00 / 1M
- Cached input tokens: $2.50 / 1M (when applicable)
- Output tokens: $10.00 / 1M
/v1/chat/completions, model costs are billed separately.
Retention
- Default retention: 30 days
- Configure via body
expiration_daysor headermemory_expiration_days - Header value takes precedence over body when both are supplied