Overview

Large Language Models are limited by their context window. As conversations grow, models forget details, degrade in quality, or hit hard limits. Context Memory solves this with lossless, hierarchical compression of your entire conversation history, enabling unlimited-length coding sessions and conversations while preserving full awareness.

The Problem

Traditional memory solutions are semantic and store general facts. They miss episodic memory: recalling specific events at the right level of detail. Simple summarization drops critical details, while RAG surfaces isolated chunks without surrounding context. Without proper episodic memory:
  • Important details get lost during summarization
  • Conversations are cut short when context limits are reached
  • Agents lose track of previous work

How It Works

Context Memory builds a tree where upper levels contain summaries and lower levels preserve verbatim detail. Relevant sections are expanded while others remain compressed:
  • High-level summaries provide overall context
  • Mid-level sections explain relationships
  • Verbatim details are retrieved precisely when needed
Example from a coding session:
Token estimation function refactoring
├── Initial user request
├── Refactoring to support integer inputs
├── Error: "exceeds the character limit"
│   └── Fixed by changing test params from strings to integers
└── Variable name refactoring
Ask, “What errors did we encounter?” and the relevant section expands automatically—no overload, no missing context.

Benefits

  • For Developers: Long coding sessions without losing context; agents learn from past mistakes; documentation retains project-wide context
  • For Conversations: Extended discussions with continuity; research that compounds; complex problem-solving with full history

Use Cases

  • Role‑playing and Storytelling: Preserve 500k+ tokens of story history while delivering 8k–20k tokens of perfectly relevant context
  • Software Development: Summaries keep the big picture; verbatim code snippets are restored only when needed—no overload, no omissions

Using Context Memory

You can enable Context Memory in the POST /v1/chat/completions endpoint in two ways:
  • Model suffix: Append :memory to any model name
  • Header: Add memory: true
  • Combine: Use with web search via :online:memory
import requests

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Use the :memory suffix
payload = {
    "model": "chatgpt-4o-latest:memory",
    "messages": [
        {"role": "user", "content": "Remember our plan and continue from step 3."}
    ]
}

r = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
print(r.json())

Technical Details

Context Memory is implemented as a B‑tree with lossless compression over message histories. Upper nodes store summaries; leaves retain verbatim excerpts relevant to recent turns. Retrieval returns details contextualized by their summaries—unlike RAG which returns isolated chunks. Using messages as identifiers supports:
  • Natural conversation branching
  • Easy reversion to earlier states
  • No complex indexing
Compression targets 8k–20k tokens of output—about 10% of Claude’s context window—while preserving access to full history.

Privacy & Partnership

We partner with Polychat to provide this technology.
  • API usage of Context Memory does not send data to Google Analytics or use cookies
  • Only your conversation messages are sent to Polychat for compression
  • No email, IP address, or other metadata is shared beyond prompts
  • When you delete conversations locally, no memory data persists on Polychat’s systems
See Polychat’s full privacy policy at https://polychat.co/legal/privacy.

Pricing

  • Input Processing: $5.00 per million tokens
  • Output Generation: $10.00 per million tokens
  • Typical Usage: 8k–20k tokens per session

Getting Started

  1. Append :memory to any model name
  2. Or send the memory: true header
  3. Optionally combine with other features like :online
Context Memory ensures your AI remembers everything that matters—for coding, research, and long‑form conversations.