Large Language Models are limited by their context window. As conversations grow, models forget details, degrade in quality, or hit hard limits. Context Memory solves this with lossless, hierarchical compression of your entire conversation history, enabling unlimited-length coding sessions and conversations while preserving full awareness.
Traditional memory solutions are semantic and store general facts. They miss episodic memory: recalling specific events at the right level of detail. Simple summarization drops critical details, while RAG surfaces isolated chunks without surrounding context.Without proper episodic memory:
Important details get lost during summarization
Conversations are cut short when context limits are reached
Context Memory builds a tree where upper levels contain summaries and lower levels preserve verbatim detail. Relevant sections are expanded while others remain compressed:
High-level summaries provide overall context
Mid-level sections explain relationships
Verbatim details are retrieved precisely when needed
Example from a coding session:
Copy
Token estimation function refactoring├── Initial user request├── Refactoring to support integer inputs├── Error: "exceeds the character limit"│ └── Fixed by changing test params from strings to integers└── Variable name refactoring
Ask, “What errors did we encounter?” and the relevant section expands automatically—no overload, no missing context.
Context Memory is implemented as a B‑tree with lossless compression over message histories. Upper nodes store summaries; leaves retain verbatim excerpts relevant to recent turns. Retrieval returns details contextualized by their summaries—unlike RAG which returns isolated chunks.Using messages as identifiers supports:
Natural conversation branching
Easy reversion to earlier states
No complex indexing
Compression targets 8k–20k tokens of output—about 10% of Claude’s context window—while preserving access to full history.