Overview

The standalone Context Memory endpoint compresses an entire conversation into a single memory message. This endpoint does not run a model. It returns the compressed memory message and usage so you can pipe it into your own chat completion request or store it.
  • No model inference is performed
  • Pass your messages array and optional settings

Authentication

  • Authorization: Bearer YOUR_API_KEY or
  • x-api-key: YOUR_API_KEY

Request

Headers

  • Content-Type: application/json
  • Authorization: Bearer YOUR_API_KEY or x-api-key: YOUR_API_KEY
  • memory_expiration_days: <1..365> (optional) — overrides body; defaults to 30

Body

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Summarize our previous discussion and continue." }
  ],
  "expiration_days": 45,            
  "model_context_limit": 128000     
}
  • messages (required): OpenAI-style messages. user, assistant, system, tool, and function roles are accepted. Assistant tool_calls are ignored during compression.
  • expiration_days (optional): 1..365; default 30. If both header and body are provided, the header takes precedence.
  • model_context_limit (optional): Context target for compression. Default 128k; values below 10k are clamped internally.

Response

Success (200)

{
  "messages": [
    { "role": "system", "content": "<compressed-context>..." }
  ],
  "usage": {
    "prompt_tokens": 51234,
    "completion_tokens": 1234,
    "total_tokens": 52468,
    "prompt_tokens_details": {
      "cached_tokens": 4096
    }
  }
}
  • messages: The single memory-compressed message array to use as your full context in a chat completion request
  • usage: Token usage. When available, prompt_tokens_details.cached_tokens indicates discounted cached input tokens

Error Examples

400 Bad Request
{ "error": "messages must be a non-empty array" }
401 Unauthorized
{ "error": "Invalid session" }
402 Payment Required
{ "error": "Insufficient balance" }
429 Too Many Requests
{ "error": "Rate limit exceeded. Please wait before sending another request." }

Pricing & Billing

  • Non-cached input tokens: $5.00 / 1M
  • Cached input tokens: $2.50 / 1M (when applicable)
  • Output tokens: $10.00 / 1M
Note: This endpoint only charges for memory compression. If you later call /v1/chat/completions, model costs are billed separately.

Retention

  • Default retention: 30 days
  • Configure via body expiration_days or header memory_expiration_days
  • Header value takes precedence over body when both are supplied

Examples

const res = await fetch('https://nano-gpt.com/api/v1/memory', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
    'memory_expiration_days': '45'
  },
  body: JSON.stringify({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Optimize our previous plan and continue.' }
    ]
  })
});

const { messages, usage } = await res.json();
// Use `messages` as the full context for a subsequent /v1/chat/completions call