Reduce LLM costs and hallucinations by compressing conversation state into evidence-bound "memory packs" with verification and repair.
- Evidence-bound answers — every claim has a citation or is marked
[UNVERIFIED] - Low-token memory packs — ~11x cheaper per grounded-correct answer
- Automatic repair loop — fallback retrieval when verification fails
pip install -e .
python examples/quickstart.pyfrom pcm import PCMConfig
from pcm.stores import InMemorySourceStore, InMemoryMemoryStore
from pcm.extraction import MemoryExtractor
from pcm.runtime import PackBuilder, Generator, Verifier, FallbackHandler
# Initialize
config = PCMConfig()
source_store = InMemorySourceStore()
memory_store = InMemoryMemoryStore()
# Ingest session context
session_id = "demo"
text = """
Project deadline is 2026-02-15.
Authentication must use OAuth2.
Do not store PII in logs.
"""
# ... ingest and extract atoms ...
# Query with PCM
pack = await pack_builder.build(session_id, "What is the deadline?")
output = await generator.generate("What is the deadline?", pack)
print(output.answer)
# "The deadline is 2026-02-15. [mem_deadline_001]"
print(output.claims)
# [Claim(text="The deadline is 2026-02-15", memory_refs=["mem_deadline_001"])]| Endpoint | Method | Description |
|---|---|---|
/ingest |
POST | Ingest text into a session |
/query |
POST | Query with evidence-bound response |
POST /ingest
{
"session_id": "my_session",
"text": "Project deadline is 2026-02-15. Use OAuth2 for auth."
}POST /query
{
"session_id": "my_session",
"query": "What is the deadline?"
}Response
{
"answer": "The deadline is 2026-02-15. [mem_deadline_001]",
"claims": [
{"text": "The deadline is 2026-02-15", "refs": ["mem_deadline_001"]}
],
"verification": {"ok": true},
"tokens_used": 423
}| Metric | v15 | vs Full-Context |
|---|---|---|
| Grounded accuracy (G-Acc) | 71.0% | +45.2 pts |
| Unsupported claims per question | 0.00 | -0.07 |
| Tokens per grounded-correct | 596.1 | 11.1x cheaper |
| Fallback recovery (when triggered) | 60% | — |
| Refusal correctness (missing info) | 100% | +100 pts |
Compared to full_context_cited, PCM is ~11.1x cheaper per grounded-correct answer (Tokens/G-Correct), not per raw query.
Evaluated on LoCoMo benchmark (199 questions, long conversational memory):
| Metric | PCM | Progress |
|---|---|---|
| Accuracy | 24.1% | Baseline established |
| Grounded Accuracy | 15.6% | Evidence-bound answers |
| Wrong Refusals | 38.2% | Reduced via EVENT_TIME extraction |
| Unsupported Claims/Q | 0.31 | Low hallucination rate |
Recent improvements:
- ✅ Added EVENT_TIME atom extraction for temporal queries ("when did X happen?")
- ✅ Implemented refusal override with expanded retrieval
- ✅ Added semantic embeddings for better fact retrieval
- ✅ Fixed citation aggregation and scoring
See docs/EVALUATION_BENCHMARK.md for evaluation methodology.
# Reproduce benchmark
python benchmarks/run_benchmark.py \
--dataset benchmarks/datasets/benchmark_v10.jsonl \
--out benchmarks/results/benchmark_v15.jsonSee docs/benchmark.md for methodology.
- Ingest — chunk text, extract structured atoms (facts, constraints, decisions)
- Pack — build memory pack under strict token budget
- Generate — produce answer with claim-level citations
- Verify — check claims against evidence
- Repair — if failed, retrieve evidence → patch repair or regenerate
- Return — answer + audit trail
See docs/ARCHITECTURE.md for details.
| Guarantee | Description |
|---|---|
| Citation enforcement | Every factual claim has an evidence reference or is marked [UNVERIFIED] |
| Unsupported blocking | Claims without evidence are blocked by verifier |
| Refusal over fabrication | Missing info triggers refusal instead of hallucination |
| Status | Track |
|---|---|
| ✅ v15 stable | Production-ready baseline |
| 🔄 Active development | LoCoMo evaluation, EVENT_TIME extraction |
| 🧪 v16 experimental | Canonical constraint schema (not merged) |
Recent Progress:
- ✅ EVENT_TIME atom extraction with deterministic relative time resolution
- ✅ Semantic embeddings for improved fact retrieval
- ✅ Refusal override mechanism for better recall
- ✅ LoCoMo benchmark integration and evaluation
Roadmap:
- Broader extraction coverage for rare phrasings
- Improved temporal event extraction and resolution
- PostgreSQL + pgvector persistence
- Enhanced citation faithfulness scoring
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
MIT License - see LICENSE file for details.