Overview
Large Language Models are limited by their context window. As conversations grow, models forget details, degrade in quality, or hit hard limits. Context Memory solves this with lossless, hierarchical compression of your entire conversation history, enabling unlimited-length coding sessions and conversations while preserving full awareness.The Problem
Traditional memory solutions are semantic and store general facts. They miss episodic memory: recalling specific events at the right level of detail. Simple summarization drops critical details, while RAG surfaces isolated chunks without surrounding context. Without proper episodic memory:- Important details get lost during summarization
- Conversations are cut short when context limits are reached
- Agents lose track of previous work
How It Works
Context Memory builds a tree where upper levels contain summaries and lower levels preserve verbatim detail. Relevant sections are expanded while others remain compressed:- High-level summaries provide overall context
- Mid-level sections explain relationships
- Verbatim details are retrieved precisely when needed
Benefits
- For Developers: Long coding sessions without losing context; agents learn from past mistakes; documentation retains project-wide context
- For Conversations: Extended discussions with continuity; research that compounds; complex problem-solving with full history
Use Cases
- Role‑playing and Storytelling: Preserve 500k+ tokens of story history while delivering 8k–20k tokens of perfectly relevant context
- Software Development: Summaries keep the big picture; verbatim code snippets are restored only when needed—no overload, no omissions
Using Context Memory
You can enable Context Memory in thePOST /v1/chat/completions endpoint in two ways:
- Model suffix: Append
:memoryto any model name - Header: Add
memory: true - Combine: Use with web search via
:online:memory
Retention and Caching
- Default retention: 30 days.
- Configure via model suffix:
:memory-<days>where<days>is 1..365- Example:
gpt-4o:memory-90
- Example:
- Configure via header:
memory_expiration_days: <days>(1..365)- Example:
- Example:
- Precedence: If both suffix and header are provided, the header value takes precedence for retention.
- Data lifecycle: The compressed chat state is retained server‑side for the configured period (or until you delete the conversation). When you delete conversations locally, no memory data persists on Polychat’s systems.
- Caching: There is currently no caching of compressed memory messages; compression runs per request. This simplifies correctness and privacy. Caching strategies may be introduced later.
Technical Details
Context Memory is implemented as a B‑tree with lossless compression over message histories. Upper nodes store summaries; leaves retain verbatim excerpts relevant to recent turns. Retrieval returns details contextualized by their summaries—unlike RAG which returns isolated chunks. Using messages as identifiers supports:- Natural conversation branching
- Easy reversion to earlier states
- No complex indexing
Privacy & Partnership
We partner with Polychat to provide this technology.- API usage of Context Memory does not send data to Google Analytics or use cookies
- Only your conversation messages are sent to Polychat for compression
- No email, IP address, or other metadata is shared beyond prompts
- When you delete conversations locally, no memory data persists on Polychat’s systems
Pricing
- Input Processing: $5.00 per million tokens
- Output Generation: $10.00 per million tokens
- Typical Usage: 8k–20k tokens per session
Getting Started
- Append
:memoryto any model name - Or send the
memory: trueheader - Optionally combine with other features like
:online