Context Memory (Standalone)

Overview

The standalone Context Memory endpoint compresses an entire conversation into a single memory message. This endpoint does not run a model. It returns the compressed memory message and usage so you can pipe it into your own chat completion request or store it.

No model inference is performed
Pass your messages array and optional settings

Authentication

Authorization: Bearer YOUR_API_KEY or
x-api-key: YOUR_API_KEY

Request

Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY or x-api-key: YOUR_API_KEY
memory_expiration_days: <1..365> (optional) — overrides body; defaults to 30

Body

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Summarize our previous discussion and continue." }
  ],
  "expiration_days": 45,            
  "model_context_limit": 128000     
}

messages (required): OpenAI-style messages. user, assistant, system, tool, and function roles are accepted. Assistant tool_calls are ignored during compression.
expiration_days (optional): 1..365; default 30. If both header and body are provided, the header takes precedence.
model_context_limit (optional): Context target for compression. Default 128k; values below 10k are clamped internally.

Response

Success (200)

{
  "messages": [
    { "role": "system", "content": "<compressed-context>..." }
  ],
  "usage": {
    "prompt_tokens": 51234,
    "completion_tokens": 1234,
    "total_tokens": 52468,
    "prompt_tokens_details": {
      "cached_tokens": 4096
    }
  }
}

messages: The single memory-compressed message array to use as your full context in a chat completion request
usage: Token usage. When available, prompt_tokens_details.cached_tokens indicates discounted cached input tokens

Error Examples

400 Bad Request

{ "error": "messages must be a non-empty array" }

401 Unauthorized

{ "error": "Invalid session" }

402 Payment Required

{ "error": "Insufficient balance" }

429 Too Many Requests

{ "error": "Rate limit exceeded. Please wait before sending another request." }

Pricing & Billing

Non-cached input tokens: $5.00 / 1M
Cached input tokens: $2.50 / 1M (when applicable)
Output tokens: $10.00 / 1M

Note: This endpoint only charges for memory compression. If you later call /v1/chat/completions, model costs are billed separately.

Retention

Default retention: 30 days
Configure via body expiration_days or header memory_expiration_days
Header value takes precedence over body when both are supplied

Examples

const res = await fetch('https://nano-gpt.com/api/v1/memory', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
    'memory_expiration_days': '45'
  },
  body: JSON.stringify({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Optimize our previous plan and continue.' }
    ]
  })
});

const { messages, usage } = await res.json();
// Use `messages` as the full context for a subsequent /v1/chat/completions call

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

Overview

Authentication

Request

Headers

Body

Response

Success (200)

Error Examples

Pricing & Billing

Retention

Examples

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Authentication

​Request

​Headers

​Body

​Response

​Success (200)

​Error Examples

​Pricing & Billing

​Retention

​Examples

Overview

Authentication

Request

Headers

Body

Response

Success (200)

Error Examples

Pricing & Billing

Retention

Examples