Skip to content

vinzify/PCM-Proof-Carrying-Memory-Compression

Repository files navigation

PCM: Proof-Carrying Memory for LLM Agents

Reduce LLM costs and hallucinations by compressing conversation state into evidence-bound "memory packs" with verification and repair.

  • Evidence-bound answers — every claim has a citation or is marked [UNVERIFIED]
  • Low-token memory packs — ~11x cheaper per grounded-correct answer
  • Automatic repair loop — fallback retrieval when verification fails

Quickstart

pip install -e .
python examples/quickstart.py
from pcm import PCMConfig
from pcm.stores import InMemorySourceStore, InMemoryMemoryStore
from pcm.extraction import MemoryExtractor
from pcm.runtime import PackBuilder, Generator, Verifier, FallbackHandler

# Initialize
config = PCMConfig()
source_store = InMemorySourceStore()
memory_store = InMemoryMemoryStore()

# Ingest session context
session_id = "demo"
text = """
Project deadline is 2026-02-15.
Authentication must use OAuth2.
Do not store PII in logs.
"""
# ... ingest and extract atoms ...

# Query with PCM
pack = await pack_builder.build(session_id, "What is the deadline?")
    output = await generator.generate("What is the deadline?", pack)

print(output.answer)
# "The deadline is 2026-02-15. [mem_deadline_001]"

print(output.claims)
# [Claim(text="The deadline is 2026-02-15", memory_refs=["mem_deadline_001"])]

API

Endpoint Method Description
/ingest POST Ingest text into a session
/query POST Query with evidence-bound response

POST /ingest

{
  "session_id": "my_session",
  "text": "Project deadline is 2026-02-15. Use OAuth2 for auth."
}

POST /query

{
  "session_id": "my_session",
  "query": "What is the deadline?"
}

Response

{
  "answer": "The deadline is 2026-02-15. [mem_deadline_001]",
  "claims": [
    {"text": "The deadline is 2026-02-15", "refs": ["mem_deadline_001"]}
  ],
  "verification": {"ok": true},
  "tokens_used": 423
}

Benchmark Results

Internal Benchmark (v15 Stable)

Metric v15 vs Full-Context
Grounded accuracy (G-Acc) 71.0% +45.2 pts
Unsupported claims per question 0.00 -0.07
Tokens per grounded-correct 596.1 11.1x cheaper
Fallback recovery (when triggered) 60%
Refusal correctness (missing info) 100% +100 pts

Compared to full_context_cited, PCM is ~11.1x cheaper per grounded-correct answer (Tokens/G-Correct), not per raw query.

LoCoMo Long-Context Evaluation (Recent)

Evaluated on LoCoMo benchmark (199 questions, long conversational memory):

Metric PCM Progress
Accuracy 24.1% Baseline established
Grounded Accuracy 15.6% Evidence-bound answers
Wrong Refusals 38.2% Reduced via EVENT_TIME extraction
Unsupported Claims/Q 0.31 Low hallucination rate

Recent improvements:

  • ✅ Added EVENT_TIME atom extraction for temporal queries ("when did X happen?")
  • ✅ Implemented refusal override with expanded retrieval
  • ✅ Added semantic embeddings for better fact retrieval
  • ✅ Fixed citation aggregation and scoring

See docs/EVALUATION_BENCHMARK.md for evaluation methodology.

# Reproduce benchmark
python benchmarks/run_benchmark.py \
  --dataset benchmarks/datasets/benchmark_v10.jsonl \
  --out benchmarks/results/benchmark_v15.json

See docs/benchmark.md for methodology.

How It Works

  1. Ingest — chunk text, extract structured atoms (facts, constraints, decisions)
  2. Pack — build memory pack under strict token budget
  3. Generate — produce answer with claim-level citations
  4. Verify — check claims against evidence
  5. Repair — if failed, retrieve evidence → patch repair or regenerate
  6. Return — answer + audit trail

See docs/ARCHITECTURE.md for details.

Guarantees (v15)

Guarantee Description
Citation enforcement Every factual claim has an evidence reference or is marked [UNVERIFIED]
Unsupported blocking Claims without evidence are blocked by verifier
Refusal over fabrication Missing info triggers refusal instead of hallucination

Project Status

Status Track
v15 stable Production-ready baseline
🔄 Active development LoCoMo evaluation, EVENT_TIME extraction
🧪 v16 experimental Canonical constraint schema (not merged)

Recent Progress:

  • ✅ EVENT_TIME atom extraction with deterministic relative time resolution
  • ✅ Semantic embeddings for improved fact retrieval
  • ✅ Refusal override mechanism for better recall
  • ✅ LoCoMo benchmark integration and evaluation

Roadmap:

  • Broader extraction coverage for rare phrasings
  • Improved temporal event extraction and resolution
  • PostgreSQL + pgvector persistence
  • Enhanced citation faithfulness scoring

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages