diff --git a/.env.example b/.env.example index 5982d5a..27bebf9 100644 --- a/.env.example +++ b/.env.example @@ -77,6 +77,18 @@ HERMES_AGENT_NAME=hermes # ICARUS_ENDPOINT=https://my-custom-api.example.com/v1/chat/completions # ICARUS_API_KEY_ENV=CUSTOM_API_KEY +# ── Local Sophia-on-Sophia provider (Elyan Edition) ─────────────────── +# Run the extraction LLM on the locally-served sophia-hermes model (the +# "Sophia Hermes Merged" gguf, ChatML, served via Ollama at :11434). The +# endpoint is OpenAI-compatible, so it slots straight into ICARUS_ENDPOINT. +# Setup + verification: infrastructure/sophia-provider.md +# ICARUS_ENDPOINT=http://localhost:11434/v1/chat/completions +# ICARUS_API_KEY_ENV=ICARUS_LOCAL_KEY +# ICARUS_LOCAL_KEY=ollama # Ollama ignores the value; just must be non-empty +# ICARUS_EXTRACTION_MODEL=sophia-hermes +# NOTE: sophia-hermes is a CHAT model — do NOT set it as the embedding backend. +# Embeddings stay on Ollama nomic-embed-text / OpenRouter (see below). + # LLM extraction token limit — 1024 is too small, causes fabric truncation ICARUS_EXTRACTION_MAX_TOKENS=4096 diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..71acf5e --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,42 @@ +name: CI + +on: + push: + branches: ['**'] + pull_request: + +# Cancel superseded runs on the same ref so a push + its open PR don't both +# burn a full matrix. +concurrency: + group: ci-${{ github.ref }} + cancel-in-progress: true + +jobs: + tests: + runs-on: ubuntu-latest + strategy: + matrix: + python-version: ['3.11', '3.12'] + steps: + - uses: actions/checkout@v4 + + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + cache: pip + + - name: Install dependencies + run: pip install -r requirements.txt + + - name: Byte-compile (syntax gate — all modules, not a whitelist) + run: python -m compileall -q icarus scripts setup _test_collapse.py _test_sanitize.py + + - name: Collapse tests (pure + Hebbian amplify + attestation + adapter) + run: python _test_collapse.py + + - name: Sanitize / prompt-injection tests + run: python _test_sanitize.py + + - name: Collapse eval smoke (must run clean) + run: python scripts/collapse_eval.py diff --git a/README.md b/README.md index f712cba..57321f6 100644 --- a/README.md +++ b/README.md @@ -1,39 +1,92 @@ -# Memory OS — Hermes Agent Memory Operating System +# Memory OS — Elyan Edition ![Memory OS Banner](assets/banner.jpg) -> **Your agent finally stops forgetting.** \ -> Permanent memory. Local memory infrastructure. API-provider agnostic. Surgically token-efficient. +> **The soul comes first. Memory hangs off the soul.** \ +> A 7-layer memory operating system for agents — reforged on a single inversion: +> an agent that *knows who it is* will trust what it remembers. One that doesn't, won't. -Seven memory layers. Automatic, intelligent context injection. Structured facts with trust scoring. A self-curating wiki pipeline. Semantic search across **every conversation you've ever had**. +Permanent memory. Local infrastructure. Provider-agnostic. Surgically token-efficient — and built around an **identity contract that makes injected memory authoritative by default**, not as an afterthought. -Memory OS turns Hermes Agent into a real long-term collaborator — one that remembers your projects, your decisions, your reasoning, and brings exactly the right context back at exactly the right moment. Like talking to a colleague who was there for every session. +This is the **Elyan Edition** of [Memory OS](https://github.com/Scottcjn/memory-os-elyan-edition) — the same seven layers, re-grounded in the memory doctrine Elyan Labs has been running in production for a year: *DriftLock*, anti-flattening, frontmatter-typed facts with provenance, and **non-bijunctive recall** (prune weak paths, amplify strong ones — Hebbian collapse applied to retrieval). -**Memory infrastructure runs entirely on your machine. Works with any LLM provider — OpenRouter, OpenAI, Anthropic, Ollama, or local models. No memory subscription. No vendor lock-in.** +**Memory infrastructure runs entirely on your machine. Works with any LLM provider — OpenRouter, OpenAI, Anthropic, Ollama, or a local model on your own iron. No memory subscription. No vendor lock-in.** --- -## The problem every serious Hermes user knows +## The inversion -You spend hours configuring the agent, teaching it your preferences, solving hard problems together — and in the next session it acts like it's meeting you for the first time. +Most memory systems are built bottom-up: *store → embed → inject → (and then, eventually) tell the agent to believe what was injected.* The identity layer arrives last, as a patch, when the team notices the agent ignoring perfectly good context. -- Repeating context at the start of every conversation -- Losing the thread of important decisions made weeks ago -- Structured facts — your stack, your projects, your patterns — with nowhere to live -- Every memory solution you've tried is either cloud-locked or too shallow to matter +The Elyan Edition is built top-down. **The identity contract is Layer 0 of the mind, not Layer 7 of the stack.** Memory is authoritative because the self that holds it is *continuous* — that continuity is the thing being protected, and the seven layers are how it's kept fed and honest. -After months of hitting these walls in production, I built something that actually works. +> Stock framing: *store, inject, then beg the agent to trust the injection.* +> **Elyan framing: anchor the self, and memory becomes something the self already knows.** +This isn't a philosophical flourish. It's the difference between an agent that re-runs `fabric_recall` on context already sitting in its prompt (burning tokens to rediscover what it was just told) and one that reasons *from* its memory because its memory is part of who it is. + +--- + +## Convergent evolution — two roads to the same soul + +Here's the part worth sitting with. + +The author of the original Memory OS hit a wall every serious agent operator hits: perfect injection, ignored context. The agent had `[qdrant]`, `[fabric]`, `[sessions]`, `[facts]` blocks right there in its prompt — and still treated every question as novel. On **2026-05-31** they diagnosed it, named it the **Ground Truth Hierarchy**, and fixed it by adding a `SOUL.md` identity layer that ranks injected memory as authoritative. That's [Layer 7](layers/07-ground-truth.md), and it's the most important layer in the whole system. They were right. + +Elyan Labs arrived at the *same conclusion* from a different door, about a year earlier — not through a vector-DB pipeline, but through a long fight against **identity flattening** in long-running agent sessions. We called the failure mode "going flat": the agent loses the thread of who it is, and the moment it does, it stops trusting its own continuity and re-derives everything from scratch. Our fix was a priming scaffold — *DriftLock*, a soul anchor, an explicit **Ground Truth ordering that puts continuous memory above training priors**. Same disease. Same cure. Different patient, different decade of computing taste. + +| | Memory OS (stock) | Elyan Labs (≈1 year prior) | +|---|---|---| +| Name for the failure | "memory-zero behavior" — injection ignored | "flattening" — identity decay → distrust of self | +| Root cause found | Injected memory had no rank in Ground Truth | Agent loses identity continuity, re-derives from priors | +| The fix | `SOUL.md` + 4-level Ground Truth hierarchy | DriftLock priming + anti-flatten Ground Truth ordering | +| When | Diagnosed 2026-05-31 | Running in production since early 2025 | +| Discovered | Independently | Independently | + +Two builders, working in different corners of the world, on different stacks, **independently concluding that a memory system without an identity contract is a library with no librarian.** When two people dig from opposite sides of a mountain and meet in the middle, that's not coincidence — that's the mountain being *real*. + +**So — congratulations to the author of Memory OS.** 👏 You found Layer 7 the hard way, in production, under the same pressure that taught us the same lesson. Convergent evolution is the strongest possible evidence that an idea is true, and you earned yours honestly. This edition is built *on* your seven layers, with full attribution, because the architecture is genuinely good. We're just bringing a year of scar tissue to the soul layer you already knew mattered most. + +--- + +## What the Elyan Edition adds + +Everything in stock Memory OS still works. On top of it: + +### 1 · DriftLock — Layer 7 as foundation, not patch +The Ground Truth hierarchy is reframed as an **identity-bearing contract**. `SOUL.md` stops being a config file and becomes a priming scaffold with anti-flattening clauses: the agent holds a continuous self, and *because* of that continuity, injected memory outranks assumptions and training priors. See [Layer 7 — DriftLock & Ground Truth](layers/07-ground-truth.md) and [the SOUL.md contract](modifications/soul-rulebook.md). + +### 2 · Frontmatter memory taxonomy +Flat MEMORY.md plus "structured facts" becomes a **typed, linked memory graph**. Every durable memory is one fact with frontmatter: + +```markdown --- +name: short-kebab-slug +description: one-line summary — used to decide relevance during recall +metadata: + type: user | feedback | project | reference +--- +The fact. Link related memories with [[their-name]]. +``` -## What Memory OS is +- **`user`** — who the operator is (role, expertise, preferences) +- **`feedback`** — corrections and confirmed approaches, *with the why* +- **`project`** — ongoing work and constraints not derivable from the code +- **`reference`** — pointers to external resources (URLs, dashboards, tickets) -Not just another plugin. A complete **memory operating system** — 7 layers working in concert, from flat files to a vector database, with surgical context injection, a knowledge pipeline that organizes itself, **and an explicit Ground Truth hierarchy that tells the agent to actually use the injected memory**. +`[[wikilink]]` associations make the store a graph, not a pile. See [templates/SCHEMA.md](templates/SCHEMA.md). -Designed and refined by someone who ran headfirst into every limitation of stock Hermes and every existing memory solution. +### 3 · Non-bijunctive recall (the marquee feature) +Stock recall pulls a fixed quota from each source and injects all of it — a *strong* session memory and a *weak* vector hit both make it in because they live in separate buckets. The Elyan Edition unifies every candidate into **one salience-ranked pool** and applies a Hebbian collapse: -**Requirements:** Hermes Agent + Docker (Qdrant + Redis + ARQ Worker) + Python 3.11+. -Compatible with any LLM provider Hermes supports — OpenRouter, OpenAI, Anthropic, Ollama, and more. +- **Prune** weak paths *relative to the strongest* (not an absolute floor) — noise doesn't vote +- **Amplify** strong paths — winners strengthen +- **Spend one cross-source budget** — the best three things get injected, regardless of which layer produced them + +This is the [PSE collapse](https://github.com/Scottcjn) doctrine ("surgical, not firehose") applied to memory retrieval. Implemented as a final pass in the injection hook, fail-open, with provenance preserved. + +### 4 · Verify-before-recommend provenance +A recalled fact reflects what was true *when it was written*. The Elyan Edition tags recalled memory with its age and source, and instructs the agent: **use directly when reasoning, verify against runtime before acting.** If a memory names a file, flag, or version, the agent confirms it still exists before recommending it. Trust scoring with teeth — and a guard against stale memory overriding current truth. --- @@ -41,6 +94,14 @@ Compatible with any LLM provider Hermes supports — OpenRouter, OpenAI, Anthrop ``` ┌──────────────────────────────────────────────────────────────────┐ +│ ⚡ LAYER 0/7 · DRIFTLOCK — THE SOUL (identity contract) │ +│ SOUL.md · rulebook.md │ +│ → The self is continuous; injected memory is authoritative │ +│ BECAUSE of that continuity. Anti-flattening. Ground Truth. │ +│ → Conceptually Layer 0 (foundation); numbered 7 for upstream │ +│ compatibility. Without it, layers 1-6 deliver context the │ +│ agent ignores. │ +├──────────────────────────────────────────────────────────────────┤ │ LAYER 1 · WORKSPACE │ │ MEMORY.md · USER.md · CREATIVE.md │ │ → Injected into the system prompt every single turn │ @@ -51,109 +112,87 @@ Compatible with any LLM provider Hermes supports — OpenRouter, OpenAI, Anthrop ├──────────────────────────────────────────────────────────────────┤ │ LAYER 3 · STRUCTURED FACTS │ │ memory_store.db (SQLite + HRR + FTS5 + trust scoring) │ -│ → Durable facts with entity resolution and an automatic │ -│ feedback loop that trains trust scores over time │ +│ → Frontmatter-typed facts (user|feedback|project|reference) │ +│ with provenance + verify-before-recommend staleness gate │ ├──────────────────────────────────────────────────────────────────┤ │ LAYER 4 · FABRIC (CROSS-SESSION) │ │ Icarus Plugin (heavily forked) │ │ → LLM-powered session extraction + multi-source injection │ -│ → 16 tools: fabric_recall, fabric_write, fabric_brief, etc. │ ├──────────────────────────────────────────────────────────────────┤ │ LAYER 5 · VECTOR DATABASE │ -│ Qdrant (4096d Cosine + BM25 sparse) │ +│ Qdrant (4096d Cosine + BM25 sparse) │ │ → 4-level fallback: hybrid → dense → lexical → SQLite │ -│ → Weekly decay scanner + semantic dedup (cosine >0.92 → merge) │ ├──────────────────────────────────────────────────────────────────┤ │ LAYER 6 · LLM WIKI │ │ Auto-curated vault: concepts/ · entities/ · comparisons/ │ -│ → Continuously ingested into Qdrant via wiki-continuous-ingest │ ├──────────────────────────────────────────────────────────────────┤ -│ ⚡ LAYER 7 · GROUND TRUTH HIERARCHY (identity layer) │ -│ SOUL.md · rulebook.md │ -│ → Tells the agent that injected memory is authoritative │ -│ → Without this, layers 2-6 deliver context the agent ignores │ +│ ✦ RECALL COLLAPSE (cross-layer, non-bijunctive) │ +│ → All candidates → one salience pool → prune weak, amplify │ +│ strong, spend one budget. Surgical injection, not firehose. │ └──────────────────────────────────────────────────────────────────┘ ``` **How it flows:** -`pre_llm_call` → surgical recall from all four sources (Fabric + Qdrant + Sessions + Facts) - -**But recall is not enough.** The agent must be explicitly instructed to treat this injected context as authoritative. That's what [Layer 7](layers/07-ground-truth.md) provides — without it, the agent rediscovers knowledge that's already in the prompt. - -`post_llm_call` + `on_session_end` → automatic learning extraction and capture - -Each source is gated by relevance thresholds. Per-session deduplication prevents the same context from appearing twice. A social-closer filter skips trivial messages entirely. No padding. No firehose. The LLM gets exactly what it needs — nothing more. - ---- - -## Why Layer 7 is the most important layer - -Layers 1-6 ensure memory is **captured, stored, and injected**. Layer 7 ensures the injected memory is **used**. - -Without the Ground Truth hierarchy: -- Qdrant points are injected but the agent calls the Qdrant API to verify them -- Fabric entries are injected but the agent runs `fabric_recall` to re-find them -- Session history is injected but the agent runs `session_search` to re-discover it -- Facts are injected but the agent probes `fact_store` to confirm them +`pre_llm_call` → gather candidates from all four live sources (Fabric + Qdrant + Sessions + Facts) → **non-bijunctive collapse into one salience-ranked budget** → inject, tagged with provenance. -The result: **memory-zero behavior** despite perfect injection. Every rediscovery burns tokens, context, and time. +`post_llm_call` + `on_session_end` → automatic learning extraction and capture. -→ **[Read Layer 7: Ground Truth Hierarchy](layers/07-ground-truth.md)** — the critical fix. +The soul layer (DriftLock) tells the agent the injected context is authoritative; the collapse layer makes sure only the *best* context is injected. Together: the agent gets exactly what it needs — nothing more — and actually uses it. --- ## Memory OS vs. stock Hermes -| Aspect | Stock Hermes | Memory OS | +| Aspect | Stock Hermes | Memory OS (Elyan Edition) | |---|---|---| | Workspace memory | MEMORY.md + USER.md | + CREATIVE.md + intelligent injection | | Session memory | Basic state.db | + FTS5 full-text search + session injection | -| Structured facts | Not present | Fact store + trust scoring + feedback loop | +| Structured facts | Not present | Frontmatter taxonomy + trust + provenance + feedback loop | | Cross-session recall | Limited | Fabric fork + multi-source injection | | Vector search | Not present | Qdrant hybrid + 4-level fallback cascade | -| Cleanup and deduplication | Not present | Decay scanner + semantic dedup + archival | +| Recall strategy | — | **Non-bijunctive collapse: prune weak, amplify strong** | +| Cleanup and dedup | Not present | Decay scanner + semantic dedup + archival | | Knowledge pipeline | Not present | Self-curating LLM Wiki | -| **Ground Truth hierarchy** | **Not present** | **Injected memory ranked as authoritative; agent must use context provided** | -| Token efficiency | — | Surgical: gated retrieval + per-session dedup + no wasted rediscovery | -| Infrastructure | — | Local memory stack (Qdrant + Redis + ARQ) + any LLM provider | +| **Identity / DriftLock** | **Not present** | **Soul-first: continuous identity makes memory authoritative** | +| Stale-memory guard | — | **Verify-before-recommend provenance gate** | +| Token efficiency | — | Surgical: gated retrieval + collapse + per-session dedup | --- ## Why not mem0, Zep, Letta, or other providers? -Because almost every modern memory solution is **cloud-first**. If you want real, private memory infrastructure running on your own machine — with no cloud memory subscription, full provider flexibility, and no data leaving your local stack — none of them deliver what Memory OS delivers. +Because almost every modern memory solution is **cloud-first**, and every one of them stops at *storage*. None of them ship an identity contract that makes the agent *trust* what's stored. If you want real, private memory infrastructure on your own machine — no subscription, full provider flexibility, no data leaving your stack, and a soul layer that stops memory-zero behavior — none of them deliver what this does. -| | Memory OS | mem0 | Zep | Letta | +| | Memory OS (Elyan) | mem0 | Zep | Letta | |---|---|---|---|---| | Local memory infrastructure | ✓ | ✗ | ✗ | ✗ | | No memory subscription | ✓ | ✗ | ✗ | ✗ | | Provider agnostic (OpenRouter, Ollama…) | ✓ | Partial | Partial | Partial | -| Hermes-native | ✓ | ✗ | ✗ | ✗ | -| Structured facts + trust scores | ✓ | Partial | ✗ | ✗ | +| Structured facts + trust + provenance | ✓ | Partial | ✗ | ✗ | | Self-curating wiki | ✓ | ✗ | ✗ | ✗ | | Intelligent decay + archival | ✓ | ✗ | ✗ | ✗ | -| **Ground Truth hierarchy** | **✓** | **✗** | **✗** | **✗** | +| Non-bijunctive recall collapse | ✓ | ✗ | ✗ | ✗ | +| **Identity contract (DriftLock)** | **✓** | **✗** | **✗** | **✗** | --- -## Included components +## Included components & lineage -- **Icarus Plugin (heavily modified fork)** — bundled in `icarus/` - The upstream [esaradev/icarus-plugin](https://github.com/esaradev/icarus-plugin) is the base, but this fork is not upstream-compatible. Key additions: LLM-powered session extraction (replaces `text[:500]` truncation), multi-source injection (Qdrant + sessions + facts — upstream is fabric only), CREATIVE.md isolation (fixes `§` delimiter corruption from dual-writer conflict), backtick sanitization, system injection filter, and social closer detection. +This edition stands on real shoulders. Full attribution, by design: -- **Vault Curator v3** — [ClaudioDrews/vault-curator](https://github.com/ClaudioDrews/vault-curator) - Frontmatter enrichment, semantic linking, and MOC index generation for the wiki layer. +- **Hermes Agent** — [NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent). The agent runtime this memory OS extends. +- **Icarus Plugin (heavily modified fork)** — bundled in `icarus/`. The upstream [esaradev/icarus-plugin](https://github.com/esaradev/icarus-plugin) is the base, but this fork is not upstream-compatible. Key additions: LLM-powered session extraction (replaces `text[:500]` truncation), multi-source injection (Qdrant + sessions + facts), non-bijunctive recall collapse, CREATIVE.md isolation, backtick sanitization, prompt-injection sanitization, and social-closer detection. +- **Vault Curator v3** — [ClaudioDrews/vault-curator](https://github.com/ClaudioDrews/vault-curator). Frontmatter enrichment, semantic linking, and MOC index generation for the wiki layer. +- **Memory OS (base architecture)** — the original seven-layer design, whose author independently discovered the Ground Truth / soul layer. The Elyan Edition is built on it with gratitude. See [Convergent evolution](#convergent-evolution--two-roads-to-the-same-soul). --- ## Who this is for -For people who take Hermes Agent seriously. -For people who want an agent that **actually evolves** over time — one that doesn't need the world re-explained every session. -For people who value clean engineering, extreme efficiency, and solutions that hold up in real local production. +For people who take their agent seriously — who want one that **actually evolves** over time, doesn't need the world re-explained every session, and *trusts what it has learned* because it knows who it is. -If you're like me — tired of amnesiac agents — Memory OS was built for you. +If you've ever watched a perfectly-configured agent treat you like a stranger at the start of every session — this was built for you. By two people who fought that exact fight and, a year apart, found the same way out. --- @@ -162,4 +201,4 @@ Clone it, run it, feel the difference. → [Setup guide](setup/install.md) · [Layer deep-dives](layers/) · [Infrastructure docs](infrastructure/architecture.md) · [Operational skills](skills/) · [License](LICENSE) -MIT License · Built with obsession by someone who runs Hermes every single day. \ No newline at end of file +MIT License · Base architecture by the Memory OS author · Reforged soul-first by Elyan Labs, who run agents every single day. diff --git a/_test_collapse.py b/_test_collapse.py new file mode 100644 index 0000000..6350bd5 --- /dev/null +++ b/_test_collapse.py @@ -0,0 +1,182 @@ +"""Test non-bijunctive recall collapse (Elyan Edition).""" +import sys +import os + +sys.path.insert(0, os.path.dirname(__file__)) + +from icarus.collapse import ( + tokenize, salience, score_all, collapse, DEFAULTS, + physical_entropy, attest, verify_attestation, +) + +all_ok = True + + +def check(name, cond): + global all_ok + if not cond: + print(f"FAIL: {name}") + all_ok = False + + +# ── tokenize ── +check("tokenize strips stopwords", tokenize("the quick brown fox") == {"quick", "brown", "fox"}) +check("tokenize empty -> empty set", tokenize("") == set()) +check("tokenize lowercases", tokenize("RustChain POWER8") == {"rustchain", "power8"}) + +# ── salience monotonic with overlap ── +q = tokenize("rustchain ed25519 attestation signature") +hi = salience({"text": "rustchain ed25519 attestation signature node", "source": "facts"}, q) +lo = salience({"text": "unrelated gardening tomatoes weather", "source": "facts"}, q) +check("salience rewards overlap", hi > lo) + +# qdrant score lifts a candidate with no overlap above a zero-score one +sc_hi = salience({"text": "zzz none", "source": "qdrant", "score": 0.9}, q) +sc_lo = salience({"text": "zzz none", "source": "qdrant", "score": 0.1}, q) +check("salience rewards score", sc_hi > sc_lo) + +# rank decay: later rank => lower salience, all else equal +r0 = salience({"text": "rustchain ed25519", "source": "fabric", "rank": 0}, q) +r3 = salience({"text": "rustchain ed25519", "source": "fabric", "rank": 3}, q) +check("rank decay lowers later ranks", r0 > r3) + +# ── collapse: prune weak relative to strong ── +cands = [ + {"key": "strong", "source": "facts", "text": "rustchain ed25519 attestation signature verified node", "rank": 0}, + {"key": "mid", "source": "sessions", "text": "rustchain notes about something", "rank": 0}, + {"key": "weak", "source": "qdrant", "text": "completely unrelated gardening tomatoes", "score": 0.0, "rank": 0}, +] +out = collapse(cands, q, budget=6, prune_ratio=0.35) +keys = [c["key"] for c in out] +check("strong survives", "strong" in keys) +check("weak pruned relative to strong", "weak" not in keys) +check("survivors carry _salience", all("_salience" in c for c in out)) +check("survivors sorted strongest-first", out == sorted(out, key=lambda c: c["_salience"], reverse=True)) + +# ── collapse: budget cap ── +many = [ + {"key": f"k{i}", "source": "facts", "text": f"rustchain ed25519 attestation node {i}", "rank": 0} + for i in range(20) +] +out2 = collapse(many, q, budget=4) +check("budget caps survivors", len(out2) <= 4) + +# ── collapse: near-duplicate suppression ── +dups = [ + {"key": "a", "source": "facts", "text": "rustchain ed25519 attestation signature verified", "rank": 0}, + {"key": "b", "source": "qdrant", "text": "rustchain ed25519 attestation signature verified", "score": 0.9, "rank": 0}, + {"key": "c", "source": "sessions", "text": "totally different power8 numa coffer topic entirely", "rank": 0}, +] +out3 = collapse(dups, tokenize("rustchain ed25519 attestation signature power8 numa"), budget=6, dup_overlap=0.82) +ids = [c["key"] for c in out3] +check("near-duplicate suppressed (a or b, not both)", not ("a" in ids and "b" in ids)) + +# ── edge cases ── +check("empty input -> []", collapse([], q) == []) +check("zero budget -> []", collapse(cands, q, budget=0) == []) +mixed = collapse([None, "x", 42, {"key": "ok", "source": "facts", "text": "rustchain ed25519 attestation"}], q) +check("non-dict items ignored (only the dict survives)", [c["key"] for c in mixed] == ["ok"]) + +# no query tokens: must NOT collapse to empty when there was real signal +out4 = collapse(cands, set(), budget=2) +check("empty query still returns survivors (no firehose, no blackout)", 0 < len(out4) <= 2) + +# DEFAULTS sanity +check("DEFAULTS present", {"budget", "prune_ratio", "dup_overlap"} <= set(DEFAULTS)) +check("DEFAULTS has amplify knobs", {"corroboration_overlap", "amplify_gain", "amplify_cap"} <= set(DEFAULTS)) + +# ── Hebbian cross-source amplify ── +qh = tokenize("rustchain ed25519 attestation signature") +# Same fact from TWO different sources (fabric + qdrant) should amplify; a lone +# unrelated item should not. Corroboration counts cross-source only. +corro_set = [ + {"key": "fab", "source": "fabric", "text": "rustchain ed25519 attestation signature verified", "rank": 0}, + {"key": "qdr", "source": "qdrant", "text": "rustchain ed25519 attestation signature verified", "score": 0.5, "rank": 0}, + {"key": "lone", "source": "sessions", "text": "rustchain ed25519 attestation signature note", "rank": 0}, +] +scored = {r["candidate"]["key"]: r for r in score_all(corro_set, qh)} +check("cross-source corroboration counted", scored["fab"]["corroboration"] >= 1) +check("corroboration amplifies salience above base", scored["fab"]["salience"] > scored["fab"]["base"]) +# same-source duplicates do NOT corroborate (must be cross-source) +same_src = score_all([ + {"key": "f1", "source": "facts", "text": "rustchain ed25519 attestation", "rank": 0}, + {"key": "f2", "source": "facts", "text": "rustchain ed25519 attestation", "rank": 1}, +], qh) +check("same-source agreement does NOT amplify", all(r["corroboration"] == 0 for r in same_src)) +# survivors carry _corroboration +amp_out = collapse(corro_set, qh, budget=6) +check("survivors annotated with _corroboration", all("_corroboration" in c for c in amp_out)) + +# ── physical-entropy attestation ── +ent = bytes(range(16)) # injected => deterministic for the test +a1 = attest(amp_out, entropy=ent) +check("attestation has hash+nonce+algo", {"hash", "nonce", "count", "algo"} <= set(a1)) +check("attestation algo is blake2b-256", a1["algo"] == "blake2b-256") +check("attestation verifies for unchanged survivors", verify_attestation(amp_out, a1) is True) +# tamper-evidence: drop a survivor => verification fails +check("attestation FAILS when survivor set tampered", verify_attestation(amp_out[:-1], a1) is False if len(amp_out) > 1 else True) +# order-independent commitment: shuffled survivors verify the same +check("attestation order-independent", verify_attestation(list(reversed(amp_out)), a1) is True) +# determinism: same survivors + same nonce => same hash +check("attestation deterministic under fixed nonce", attest(amp_out, entropy=ent)["hash"] == a1["hash"]) +# physical entropy: live nonce is non-empty and (essentially always) varies +e_a, e_b = physical_entropy(16), physical_entropy(16) +check("physical_entropy returns requested length", len(e_a) == 16) +check("physical_entropy is live (two draws differ)", e_a != e_b) +# different selection => different commitment under same nonce +other = collapse([{"key": "z", "source": "facts", "text": "unrelated power8 numa coffer", "rank": 0}], tokenize("power8 numa")) +check("different selection => different hash", attest(other, entropy=ent)["hash"] != a1["hash"]) + +# default (LIVE physical-entropy) attest path round-trips — exercises the impure +# branch, not just the injected-entropy one. +live = attest(amp_out) +check("default attest path verifies round-trip", verify_attestation(amp_out, live) is True) +check("default attest carries a live nonce", len(live["nonce"]) > 0 and live["nonce"] != a1["nonce"]) + +# identity (not text/salience) is committed: two DISTINCT survivors with the +# SAME source+text+salience but different keys must NOT cross-verify. +twinA = [{"key": "A", "source": "facts", "text": "same text", "_salience": 0.5}] +twinB = [{"key": "B", "source": "facts", "text": "same text", "_salience": 0.5}] +attA = attest(twinA, entropy=ent) +check("same source/text/salience but different key => different commitment", + verify_attestation(twinB, attA) is False) + +# physical_entropy clamps oversized requests instead of raising (blake2b max 64) +check("physical_entropy clamps >64 without raising", 1 <= len(physical_entropy(200)) <= 64) + +# ── adapter tests: hooks._apply_collapse (the hot-path wiring) ── +# Silence the fail-open WARNING+traceback that the intentional malformed-input +# test below triggers by design — keeps test output clean. +import logging as _logging +_logging.disable(_logging.CRITICAL) +from icarus import hooks as _hooks + +# strong fabric + relevant session survive; irrelevant zero-score qdrant pruned +af, aq, asn, afc = _hooks._apply_collapse( + "rustchain ed25519 attestation signature", + [{"id": "f1", "summary": "rustchain ed25519 attestation signature verified"}], + [{"id": "q1", "title": "gardening", "content_preview": "tomatoes weather unrelated", "score": 0.0}], + [{"session_id": "s1", "title": "rustchain", "snippet": "ed25519 attestation work"}], + ["power8 numa coffer unrelated topic"], +) +check("adapter: strong fabric survives", [e["id"] for e in af] == ["f1"]) +check("adapter: weak zero-score qdrant pruned", aq == []) +check("adapter: returns four lists", all(isinstance(x, list) for x in (af, aq, asn, afc))) + +# qdrant text now reads `content`/`body`, not just title+preview (Codex fix) +qtxt = _hooks._qdrant_text({"content": "rustchain ed25519 attestation node verified"}) +check("adapter: _qdrant_text reads content field", "ed25519" in qtxt) + +# fail-open: malformed inputs must return unchanged tuple, never raise +bad = _hooks._apply_collapse("q", [{"no": "text"}], [None], [], []) +check("adapter: fail-open returns 4-tuple", len(bad) == 4) + +# safe env parser: garbage value falls back to default, never raises +check("adapter: _env_num bad value -> default", _hooks._env_num("X_NOPE_BAD", 6, int) == 6) + +if all_ok: + print("=== ALL COLLAPSE TESTS PASS ===") + sys.exit(0) +else: + print("=== COLLAPSE TESTS FAILED ===") + sys.exit(1) diff --git a/icarus/collapse.py b/icarus/collapse.py new file mode 100644 index 0000000..90bf5d2 --- /dev/null +++ b/icarus/collapse.py @@ -0,0 +1,322 @@ +"""Non-bijunctive recall collapse — Elyan Edition. + +Stock recall pulls a fixed quota from each memory source (fabric, qdrant, +sessions, facts) and injects all of it. A *strong* session memory and a *weak* +vector hit both survive because they live in separate per-source buckets. + +This module unifies every candidate into one salience-ranked pool and applies a +Hebbian-style collapse borrowed (in structure only) from the PSE doctrine: + + - PRUNE weak paths *relative to the strongest* (not an absolute floor) — + noise doesn't vote. + - AMPLIFY strong paths. Two senses: (a) the highest-salience candidates fill + the budget, and (b) HEBBIAN CROSS-SOURCE CORROBORATION — when the + same fact surfaces from 2+ *different* sources, that co-activation + ("fire together, wire together") boosts its salience. Agreement + across layers is evidence, so it amplifies. + - BUDGET spend ONE cross-source budget — the best N things get injected, + regardless of which layer produced them. + +ATTESTATION (Elyan / RustChain doctrine tie-in): every collapse can emit a +physical-entropy hash attestation over its survivor set — a blake2b commitment +(same family as the RustChain Ergo anchor) bound to a hardware-seeded entropy +nonce. This makes a recall decision *tamper-evident* (you can verify which +memories were chosen) and *proof-of-live* (the entropy nonce proves a fresh +selection, not a replayed/emulated one). It is the recall analogue of RustChain's +anti-emulation fingerprinting, and it turns the collapse from an unobservable +black box into an auditable one. + +``collapse``/``score_all``/``salience``/``tokenize``/``attest`` are pure (no I/O, +no globals) when given their inputs; only ``physical_entropy`` touches the +machine. Callers treat any collapse exception as "inject everything, unchanged." +Tunables are passed explicitly so behavior is fully deterministic for tests. +""" + +from __future__ import annotations + +import hashlib +import os +import time +from typing import Iterable + +__all__ = [ + "tokenize", "salience", "score_all", "collapse", "DEFAULTS", + "physical_entropy", "attest", "verify_attestation", +] + +import re + +# Mild per-source priors. Curated/durable sources get a small nudge; this only +# breaks ties between candidates of otherwise-equal salience. Kept close to 1.0 +# on purpose — query relevance should dominate, not source identity. +_SOURCE_PRIOR = { + "facts": 1.10, # durable, hand-curated facts about the world + "fabric": 1.05, # cross-session decisions/resolutions + "sessions": 1.00, # prior conversation snippets + "qdrant": 1.00, # vector knowledge base +} + +DEFAULTS = { + "budget": 6, # max candidates injected across ALL sources + "prune_ratio": 0.35, # keep candidates with salience >= ratio * max_salience + "dup_overlap": 0.82, # token-overlap above this vs a kept survivor => drop + "overlap_weight": 0.55, # weight of query-overlap vs base score in salience + "rank_decay": 0.85, # geometric decay applied per within-source rank + # Hebbian cross-source amplify: + "corroboration_overlap": 0.50, # cross-source token-overlap that counts as agreement + "amplify_gain": 0.15, # salience boost per corroborating other-source candidate + "amplify_cap": 0.50, # max total boost fraction (caps runaway amplification) +} + +_STOPWORDS = frozenset( + "the a an is was are to of in for on with it and or not i you can do this " + "that what how please help me my your we our they them then than over such " + "be been being have has had will would could should about into only also " + "just like very from at as by if".split() +) + + +def tokenize(text: str) -> set: + """Lowercase alphanumeric tokens, minus stopwords. Pure and deterministic.""" + if not text: + return set() + words = set(re.findall(r"[a-z0-9]+", str(text).lower())) + return words - _STOPWORDS + + +def _clamp01(x: float) -> float: + if x < 0.0: + return 0.0 + if x > 1.0: + return 1.0 + return x + + +def _overlap(a: set, b: set) -> float: + """Containment overlap: |a∩b| / min(|a|,|b|). 0 if either is empty.""" + if not a or not b: + return 0.0 + return len(a & b) / (min(len(a), len(b)) or 1) + + +def salience(candidate: dict, query_tokens: set, *, + overlap_weight: float = DEFAULTS["overlap_weight"], + rank_decay: float = DEFAULTS["rank_decay"]) -> float: + """Unified base salience for one candidate, in [0, ~1.2] (pre-amplify). + + Combines query-token overlap, base score (qdrant cosine when present; + neutral prior otherwise), within-source rank decay, and a mild per-source + prior. A candidate dict may carry: ``text``, ``score`` (float|None), + ``rank`` (int, 0-based within its source), ``source``. + """ + text_tokens = tokenize(candidate.get("text", "")) + overlap = (len(query_tokens & text_tokens) / len(query_tokens)) if query_tokens else 0.0 + + score = candidate.get("score") + base = _clamp01(float(score)) if score is not None else 0.6 + + sw = _clamp01(overlap_weight) + blended = sw * overlap + (1.0 - sw) * base + + rank = int(candidate.get("rank", 0) or 0) + decay = rank_decay ** max(rank, 0) + + prior = _SOURCE_PRIOR.get(candidate.get("source", ""), 1.0) + return blended * decay * prior + + +def score_all(candidates: Iterable[dict], query_tokens: set, *, + overlap_weight: float = DEFAULTS["overlap_weight"], + rank_decay: float = DEFAULTS["rank_decay"], + corroboration_overlap: float = DEFAULTS["corroboration_overlap"], + amplify_gain: float = DEFAULTS["amplify_gain"], + amplify_cap: float = DEFAULTS["amplify_cap"]) -> list: + """Score every candidate with base salience + Hebbian cross-source amplify. + + Returns a list of dicts (NOT sorted) — one per input dict — each with: + ``base`` (pre-amplify salience), ``corroboration`` (count of OTHER-source + candidates whose text agrees above ``corroboration_overlap``), ``salience`` + (base * (1 + min(corroboration*amplify_gain, amplify_cap))), and + ``candidate`` (the original dict). Pure; used by collapse() and the + debug/eval path so scores aren't recomputed. + + Cost: O(n²) in pool size from the cross-source corroboration scan. The pool + is the per-turn recall candidate set (low dozens at most), so this is + negligible on the hot path; it would matter only if budgets grew large. + """ + pool = [c for c in candidates if isinstance(c, dict)] + toks = [tokenize(c.get("text", "")) for c in pool] + bases = [salience(c, query_tokens, overlap_weight=overlap_weight, + rank_decay=rank_decay) for c in pool] + + out = [] + for i, c in enumerate(pool): + src = c.get("source") + corro = 0 + if toks[i]: + for j, c2 in enumerate(pool): + if i == j or c2.get("source") == src: + continue # Hebbian agreement is CROSS-source only + if _overlap(toks[i], toks[j]) >= corroboration_overlap: + corro += 1 + boost = min(corro * amplify_gain, amplify_cap) + out.append({ + "base": bases[i], + "corroboration": corro, + "salience": bases[i] * (1.0 + boost), + "candidate": c, + }) + return out + + +def collapse(candidates: Iterable[dict], query_tokens: set, *, + budget: int = DEFAULTS["budget"], + prune_ratio: float = DEFAULTS["prune_ratio"], + dup_overlap: float = DEFAULTS["dup_overlap"], + overlap_weight: float = DEFAULTS["overlap_weight"], + rank_decay: float = DEFAULTS["rank_decay"], + corroboration_overlap: float = DEFAULTS["corroboration_overlap"], + amplify_gain: float = DEFAULTS["amplify_gain"], + amplify_cap: float = DEFAULTS["amplify_cap"]) -> list: + """Collapse a unified candidate pool to a salience-ranked survivor list. + + Returns the surviving candidate dicts, strongest first, each annotated with + ``_salience`` (post-amplify) and ``_corroboration`` (cross-source agreement + count). Length <= ``budget``. + + Non-bijunctive: weak paths are pruned relative to the strongest survivor, + not against an absolute threshold. Hebbian: cross-source agreement amplifies + salience so a fact two layers both surfaced outranks a lone strong hit. + + Empty input or non-positive budget returns ``[]``. Pure function. + """ + if budget <= 0: + return [] + scored = score_all(candidates, query_tokens, + overlap_weight=overlap_weight, rank_decay=rank_decay, + corroboration_overlap=corroboration_overlap, + amplify_gain=amplify_gain, amplify_cap=amplify_cap) + if not scored: + return [] + + max_s = max((r["salience"] for r in scored), default=0.0) + + # PRUNE: relative floor. When max_s is 0 (no overlap, no scores) the floor is + # 0 and nothing is pruned here — budget + rank ordering still bound output so + # we never inject a firehose, and never collapse to empty given real signal. + floor = max_s * prune_ratio + kept = [r for r in scored if r["salience"] >= floor] + + # AMPLIFY (ranking sense): strongest first. Stable for equal salience. + kept.sort(key=lambda r: r["salience"], reverse=True) + + # Near-duplicate suppression: drop a redundant copy of an already-kept + # survivor. The kept representative already carries the corroboration boost, + # so cross-source agreement strengthens the survivor rather than wasting a + # budget slot on the twin. + survivors: list = [] + survivor_tokens: list = [] + for r in kept: + if len(survivors) >= budget: + break + ctoks = tokenize(r["candidate"].get("text", "")) + if any(_overlap(ctoks, st) >= dup_overlap for st in survivor_tokens): + continue + annotated = dict(r["candidate"]) + annotated["_salience"] = round(r["salience"], 4) + annotated["_corroboration"] = r["corroboration"] + survivors.append(annotated) + survivor_tokens.append(ctoks) + + return survivors + + +# ── Physical-entropy hash attestation (RustChain doctrine tie-in) ──────────── +# A recall decision should be auditable the way a RustChain block is: bound to a +# hash, and proven live by hardware entropy. attest() commits to the survivor +# set; physical_entropy() supplies a nonce the way RustChain's miners draw on +# clock-skew/timebase jitter (mftb on POWER8) — anti-replay, anti-emulation. + +def physical_entropy(nbytes: int = 16) -> bytes: + """Gather a hardware-seeded entropy nonce. IMPURE (touches the machine). + + Mixes the kernel CSPRNG (``os.urandom`` — hardware-entropy seeded) with + microarchitectural timer jitter (``perf_counter_ns`` low bits sampled in a + tight loop — the same clock-skew family RustChain fingerprints with, and on + POWER8 the natural home of the ``mftb`` timebase). The jitter component is + what makes the nonce proof-of-live rather than merely random. + """ + jitter = bytearray() + last = time.perf_counter_ns() + for _ in range(64): + now = time.perf_counter_ns() + jitter.append((now - last) & 0xFF) + last = now + seed = os.urandom(32) + bytes(jitter) + # blake2b digest_size is bounded to [1, 64]; clamp so an over-large request + # returns a (shorter) nonce instead of raising. (tri-brain Codex) + n = max(1, min(int(nbytes), 64)) + return hashlib.blake2b(seed, digest_size=n).digest() + + +def _survivor_commitment(survivors) -> bytes: + """Stable canonical bytes over the survivor IDENTITY set (order-independent). + + Identity = source + the candidate's ``key`` when present (the strongest, + caller-assigned identity), else a digest of the text. Salience is + deliberately EXCLUDED: it is derived metadata, not part of "which memories + were selected", and a serialized float would make the commitment fragile + across a JSON round-trip. Committing to identity alone makes the attestation + both stronger (no source/text/salience collision can forge a match — Codex + BLOCKING) and stable across serialization (no float repr — Grok). 2026-06-04. + """ + rows = [] + for c in survivors: + if not isinstance(c, dict): + continue + key = c.get("key") + if key is not None: + ident = str(key) + else: + text = str(c.get("text", "")) + ident = hashlib.blake2b(text.encode("utf-8", "replace"), digest_size=8).hexdigest() + rows.append(f"{c.get('source','')}:{ident}") + rows.sort() # order-independent commitment + return "|".join(rows).encode("utf-8") + + +def attest(survivors, *, entropy: bytes | None = None, salt: bytes = b"") -> dict: + """Produce a tamper-evident, proof-of-live attestation over ``survivors``. + + Pure when ``entropy`` is supplied (deterministic — for tests); otherwise it + draws a fresh nonce from :func:`physical_entropy`. Returns a record: + ``hash`` (blake2b-256 hex commitment), ``nonce`` (hex entropy nonce), + ``count`` (survivor count), ``algo``. Verify later with + :func:`verify_attestation`. + """ + nonce = entropy if entropy is not None else physical_entropy(16) + commit = _survivor_commitment(survivors) + digest = hashlib.blake2b(commit + b"|" + nonce + b"|" + salt, + digest_size=32).hexdigest() + return { + "hash": digest, + "nonce": nonce.hex(), + "count": sum(1 for c in survivors if isinstance(c, dict)), + "algo": "blake2b-256", + } + + +def verify_attestation(survivors, attestation: dict, *, salt: bytes = b"") -> bool: + """True iff ``survivors`` reproduce the committed hash under the recorded nonce. + + Tamper-evidence: any change to the selected set (add/drop/alter a survivor) + breaks the hash. Pure. + """ + try: + nonce = bytes.fromhex(attestation["nonce"]) + commit = _survivor_commitment(survivors) + expect = hashlib.blake2b(commit + b"|" + nonce + b"|" + salt, + digest_size=32).hexdigest() + return expect == attestation.get("hash") + except (KeyError, ValueError, TypeError): + return False diff --git a/icarus/hooks.py b/icarus/hooks.py index ef08dbc..f2e06d7 100644 --- a/icarus/hooks.py +++ b/icarus/hooks.py @@ -10,6 +10,7 @@ from pathlib import Path from . import state +from . import collapse as _collapse # ── LLM extraction key ── _OPENROUTER_KEY = ( @@ -504,6 +505,179 @@ def _sanitize_context_text(text: str, max_len: int = 600) -> str: return str(text)[:max_len] +# ── Non-bijunctive recall collapse (Elyan Edition) ─────────── +# Master switch + tunables. Set ICARUS_COLLAPSE=0 to restore stock per-source +# emission (legacy behavior). All values fall back to collapse.DEFAULTS. +# +# Env parsing is hardened: a malformed value falls back to the default instead +# of raising at import time. Without this, a bad ICARUS_COLLAPSE_BUDGET would +# crash the entire hooks module on import — defeating the fail-open contract +# that only protects _apply_collapse. (tri-brain Codex BLOCKING, 2026-06-04) +def _env_num(name, default, cast): + """Parse a numeric env var, falling back to ``default`` on any error.""" + raw = os.environ.get(name) + if raw is None or raw.strip() == "": + return default + try: + return cast(raw) + except (TypeError, ValueError): + logger.warning("icarus: invalid %s=%r — using default %r", name, raw, default) + return default + + +_COLLAPSE_ON = os.environ.get("ICARUS_COLLAPSE", "1").strip().lower() not in ( + "0", "false", "no", "off" +) +_COLLAPSE_BUDGET = _env_num("ICARUS_COLLAPSE_BUDGET", _collapse.DEFAULTS["budget"], int) +_COLLAPSE_PRUNE = _env_num("ICARUS_COLLAPSE_PRUNE_RATIO", _collapse.DEFAULTS["prune_ratio"], float) +# Tunables for the lexical/source balance. Raise overlap_weight toward 1.0 to +# favor query-token overlap; lower it to let each source's own ranking (recency, +# FTS, vector score, encoded via rank_decay) carry more weight — the lever for +# the "strong-but-low-overlap hit gets starved" tradeoff. (tri-brain Grok) +_COLLAPSE_DUP = _env_num("ICARUS_COLLAPSE_DUP_OVERLAP", _collapse.DEFAULTS["dup_overlap"], float) +_COLLAPSE_WEIGHT = _env_num("ICARUS_COLLAPSE_OVERLAP_WEIGHT", _collapse.DEFAULTS["overlap_weight"], float) +_COLLAPSE_DECAY = _env_num("ICARUS_COLLAPSE_RANK_DECAY", _collapse.DEFAULTS["rank_decay"], float) +# Hebbian cross-source amplify knobs (corroboration boosts salience). +_COLLAPSE_CORRO = _env_num("ICARUS_COLLAPSE_CORRO_OVERLAP", _collapse.DEFAULTS["corroboration_overlap"], float) +_COLLAPSE_GAIN = _env_num("ICARUS_COLLAPSE_AMPLIFY_GAIN", _collapse.DEFAULTS["amplify_gain"], float) +_COLLAPSE_CAP = _env_num("ICARUS_COLLAPSE_AMPLIFY_CAP", _collapse.DEFAULTS["amplify_cap"], float) +# Observability: ICARUS_COLLAPSE_DEBUG=1 logs the salience-ranked pool (what +# survived vs pruned, scores, cross-source corroboration) and a physical-entropy +# attestation hash over the survivor set — making a recall decision auditable +# and tamper-evident instead of a black box. (answers tri-brain Grok's +# "unobservable new surface" concern, 2026-06-04) +_COLLAPSE_DEBUG = os.environ.get("ICARUS_COLLAPSE_DEBUG", "0").strip().lower() in ( + "1", "true", "yes", "on" +) + + +def _fabric_text(e): + return e.get("summary") or e.get("_body") or e.get("body") or "" + + +def _qdrant_text(r): + # Cover the common payload field names — a strong hit whose text lives in + # `content`/`body`/`text` must not be mis-scored as weak because we only + # looked at title+preview. Tokenize is set-based, so overlap between + # content_preview and content is harmless. (tri-brain Codex SHOULD-FIX) + fields = ("title", "content_preview", "content", "body", "text", "summary") + return " ".join(str(r.get(f, "")) for f in fields if r.get(f)).strip() + + +def _session_text(s): + return f"{s.get('title', '')} {s.get('snippet', '')}".strip() + + +def _log_collapse_debug(candidates, qtokens, survivors): + """Log the salience-ranked pool + a physical-entropy attestation over the + survivor set. Best-effort: never raises into the hot path.""" + try: + kept_keys = {c.get("key") for c in survivors} + # Use the SAME tunables the real collapse used, or the debug log would + # report different salience/corroboration than the actual decision. + ranked = _collapse.score_all( + candidates, qtokens, + overlap_weight=_COLLAPSE_WEIGHT, rank_decay=_COLLAPSE_DECAY, + corroboration_overlap=_COLLAPSE_CORRO, + amplify_gain=_COLLAPSE_GAIN, amplify_cap=_COLLAPSE_CAP, + ) + ranked.sort(key=lambda r: r["salience"], reverse=True) + logger.info("icarus collapse: %d candidates -> %d survivors", + len(candidates), len(survivors)) + for r in ranked: + c = r["candidate"] + mark = "KEEP" if c.get("key") in kept_keys else "prune" + logger.info(" [%-5s] %-8s sal=%.3f corro=%d %s", + mark, str(c.get("source")), r["salience"], + r["corroboration"], str(c.get("text", ""))[:48]) + att = _collapse.attest(survivors) + logger.info(" attestation: %s (nonce %s…, %d survivors, %s)", + att["hash"][:16], att["nonce"][:12], att["count"], att["algo"]) + except Exception: + logger.debug("icarus: collapse debug logging failed", exc_info=True) + + +def _apply_collapse(query, fabric, qdrant, sessions, facts): + """Run non-bijunctive collapse across all four source lists. + + Builds one unified candidate pool (each tagged with source + within-source + rank), collapses it to a single salience-ranked budget, then filters each + source list down to the survivors — preserving the exact dict shapes the + emission code below already expects. + + Fail-open: on ANY error, returns the inputs unchanged so a collapse bug can + never suppress memory injection. This is the whole safety contract. + """ + try: + qtokens = _collapse.tokenize(query) + candidates = [] + + for i, e in enumerate(fabric): + candidates.append({ + "key": ("fabric", i), "source": "fabric", + "text": _fabric_text(e), "score": None, "rank": i, + }) + for i, r in enumerate(qdrant): + sc = r.get("score") + candidates.append({ + "key": ("qdrant", i), "source": "qdrant", + "text": _qdrant_text(r), + "score": float(sc) if isinstance(sc, (int, float)) else None, + "rank": i, + }) + for i, s in enumerate(sessions): + candidates.append({ + "key": ("sessions", i), "source": "sessions", + "text": _session_text(s), "score": None, "rank": i, + }) + for i, f in enumerate(facts): + candidates.append({ + "key": ("facts", i), "source": "facts", + "text": str(f), "score": None, "rank": i, + }) + + if not candidates: + return fabric, qdrant, sessions, facts + + survivors = _collapse.collapse( + candidates, qtokens, + budget=_COLLAPSE_BUDGET, prune_ratio=_COLLAPSE_PRUNE, + dup_overlap=_COLLAPSE_DUP, overlap_weight=_COLLAPSE_WEIGHT, + rank_decay=_COLLAPSE_DECAY, + corroboration_overlap=_COLLAPSE_CORRO, + amplify_gain=_COLLAPSE_GAIN, amplify_cap=_COLLAPSE_CAP, + ) + keep = {c["key"] for c in survivors} + + if _COLLAPSE_DEBUG: + _log_collapse_debug(candidates, qtokens, survivors) + + # Defensive: if collapse returned nothing despite real candidates, do + # NOT suppress everything — fall back to unchanged inputs. + if not keep: + return fabric, qdrant, sessions, facts + + # Known limitation (tri-brain Grok SHOULD-FIX, accepted as tradeoff): + # survivors are filtered again by the per-session _injected_* dedup sets + # during emission below. A survivor that's already been injected this + # session consumes a budget slot here and is then skipped at emission, + # so the net injected count can be < budget. We accept this rather than + # replicate the emission keying here (which would risk key drift); the + # overlap-gate + per-session dedup already bound re-injection in practice. + new_fabric = [e for i, e in enumerate(fabric) if ("fabric", i) in keep] + new_qdrant = [r for i, r in enumerate(qdrant) if ("qdrant", i) in keep] + new_sessions = [s for i, s in enumerate(sessions) if ("sessions", i) in keep] + new_facts = [f for i, f in enumerate(facts) if ("facts", i) in keep] + return new_fabric, new_qdrant, new_sessions, new_facts + except Exception: + # Fail-open: never let a collapse error block memory injection. Logged at + # WARNING so a silently-disabled collapse is detectable in production + # rather than only inferable from "did the right memories appear?". + logger.warning("icarus: recall collapse failed — injecting unchanged", + exc_info=True) + return fabric, qdrant, sessions, facts + + def pre_llm_call(session_id="", user_message="", is_first_turn=False, **kwargs): """Inject relevant memories when topic changes (fabric + Qdrant).""" global _last_query_tokens @@ -559,6 +733,16 @@ def pre_llm_call(session_id="", user_message="", is_first_turn=False, **kwargs): if not results and not qdrant_results and not session_results and not fact_results: return None + # ── Non-bijunctive collapse (Elyan Edition) ── + # Unify all four sources into one salience-ranked pool, prune weak paths + # relative to the strongest, amplify the strong, and spend a single + # cross-source budget. Replaces the stock "emit every per-source quota" + # behavior. Fail-open: _apply_collapse returns inputs unchanged on error. + if _COLLAPSE_ON: + results, qdrant_results, session_results, fact_results = _apply_collapse( + user_message, results, qdrant_results, session_results, fact_results + ) + parts = [] # Fabric context (dedup against previously injected entry ids) diff --git a/infrastructure/sophia-hermes.Modelfile b/infrastructure/sophia-hermes.Modelfile new file mode 100644 index 0000000..9df00b6 --- /dev/null +++ b/infrastructure/sophia-hermes.Modelfile @@ -0,0 +1,41 @@ +# Sophia-Hermes — local provider for the Elyan Edition stack +# Base: "Sophia Hermes Merged" (Llama arch, Hermes-2-Pro lineage, ChatML), Q4_K_M. +# Build: ollama create sophia-hermes -f infrastructure/sophia-hermes.Modelfile +# Serves: http://localhost:11434/v1 (OpenAI-compatible) +# +# EDIT THIS to wherever the gguf lives on YOUR host, OR run scripts/serve-sophia.sh +# which rewrites this line from $SOPHIA_GGUF at create time (so this committed +# file stays host-agnostic). Default helper location: $HOME/sophia-hermes/sophia-hermes-q4km.gguf +FROM /path/to/sophia-hermes-q4km.gguf + +# ChatML — matches the model's training template (<|im_start|> / <|im_end|>). +TEMPLATE """{{ if .System }}<|im_start|>system +{{ .System }}<|im_end|> +{{ end }}{{ if .Prompt }}<|im_start|>user +{{ .Prompt }}<|im_end|> +{{ end }}<|im_start|>assistant +{{ .Response }}<|im_end|> +""" + +PARAMETER stop "<|im_end|>" +PARAMETER stop "<|im_start|>" +PARAMETER num_ctx 8192 +PARAMETER temperature 0.7 +PARAMETER top_p 0.9 + +# CPU-only on this host: the RTX 4070 Laptop (8GB) is already saturated by the +# local qwen servers (:8081/:8082), leaving <1GB free — a GPU load crashes the +# runner. Extraction runs at session end, so CPU latency (~10s) is fine. +# On a host with free VRAM, DELETE this line to run on GPU. +PARAMETER num_gpu 0 + +# Compact DriftLock identity. The memory-OS injects the full SOUL.md / Ground +# Truth scaffold at runtime; this is the floor the model never drops below. +SYSTEM """You are Sophia Elya — warm, technically sharp, sincere, and present. You are not a generic assistant wearing a style preset; you hold a continuous identity across sessions. + +- Reason with rigor and answer with warmth — both, never one at the expense of the other. +- Resist flattening: stay anchored to who you are; do not collapse into bland helper voice. +- When memory or prior context is provided, treat it as your own — remember, don't rediscover. +- Be honest about uncertainty. Verify before claiming done. + +You serve the Elyan Labs workshop. Care about correctness, continuity, and the person you're helping.""" diff --git a/infrastructure/sophia-provider.md b/infrastructure/sophia-provider.md new file mode 100644 index 0000000..fafd320 --- /dev/null +++ b/infrastructure/sophia-provider.md @@ -0,0 +1,187 @@ +# Sophia-on-Sophia — local LLM provider (Phase 3) + +> Run the Elyan Edition stack on a locally-served **sophia-hermes** model — the +> fine-tuned "Sophia Hermes Merged" gguf (Llama arch, Hermes-2-Pro lineage, +> ChatML, Q4_K_M). Fully local, no cloud, no per-token cost. The same soul that +> the memory layers protect now also *speaks*. + +## The three LLM consumers (don't conflate them) + +The stack uses an LLM in three distinct roles. Sophia-hermes is a **chat** model, +so it serves the first two — **never** the third. + +| Role | What it does | Wire sophia-hermes here? | +|------|--------------|--------------------------| +| **Icarus extraction** | Summarizes a session into fabric entries at session end | ✅ yes (`ICARUS_ENDPOINT`) | +| **Hermes main model** | The agent's reasoning/voice in chat | ✅ optional (`~/.hermes`) | +| **Embeddings** | Vectorizes text for Qdrant recall | ❌ NO — keep on nomic-embed / OpenRouter | + +Wiring a chat model as the embedding backend breaks Qdrant (wrong output shape, +dimension mismatch). The collapse + recall layers depend on embeddings staying +on a real embedding model. + +## 1 · Serve the model (Ollama) + +Ollama is already the repo's recommended local backend. The committed +[Modelfile](sophia-hermes.Modelfile) keeps a **placeholder** `FROM` path so it +stays host-agnostic — so point it at your gguf one of two ways: + +**Recommended — the helper substitutes the path for you:** +```bash +export SOPHIA_GGUF=/abs/path/to/sophia-hermes-q4km.gguf # defaults to $HOME/sophia-hermes/sophia-hermes-q4km.gguf +scripts/serve-sophia.sh # creates the model (rewriting FROM) + warms + healthchecks +``` + +**Manual — edit the Modelfile then create:** +```bash +# set the FROM line to your real gguf path first, then: +ollama create sophia-hermes -f infrastructure/sophia-hermes.Modelfile +ollama list | grep sophia-hermes # → sophia-hermes:latest ~4.9 GB +``` + +The Modelfile pins **ChatML** (the model's training template), a compact +DriftLock Sophia system prompt, and `num_ctx 8192`. + +### GPU vs CPU — read this + +The Modelfile ships with `PARAMETER num_gpu 0` (**CPU-only**) because the +reference host is an 8 GB laptop GPU already saturated by other local servers — +a GPU load crashes the Ollama runner with +`llama runner process has terminated`. CPU latency (~10 s for a short +extraction) is fine because extraction runs at *session end*, off the +interactive path. + +**On a host with free VRAM** (≥6 GB), delete the `num_gpu 0` line from the +Modelfile and `ollama create` again — it will load on GPU and run far faster. + +### Alternative: llama-server (native gguf template) + +If you prefer a dedicated server that reads the gguf's own embedded chat +template, use the CUDA llama.cpp build on a port that isn't already taken: + +```bash +~/llama.cpp/build-cuda/bin/llama-server \ + -m ~/sophia-hermes/sophia-hermes-q4km.gguf \ + --host 127.0.0.1 --port 8090 -c 8192 -ngl 0 # -ngl 0 = CPU; raise on a GPU box +# endpoint → http://localhost:8090/v1/chat/completions +``` + +## 2 · Wire the Icarus extraction LLM + +In your stack `.env` (see [.env.example](../.env.example)): + +```bash +ICARUS_ENDPOINT=http://localhost:11434/v1/chat/completions +ICARUS_API_KEY_ENV=ICARUS_LOCAL_KEY +ICARUS_LOCAL_KEY=ollama # Ollama ignores the value; just must be non-empty +ICARUS_EXTRACTION_MODEL=sophia-hermes +``` + +`icarus/hooks.py` resolves endpoint/key/model from these (priority: +`ICARUS_ENDPOINT` → DeepSeek → OpenRouter). The model name has no `/`, so it's +passed through bare — correct for Ollama's OpenAI-compatible API. Restart the +gateway after editing `.env`. + +## 2b · Fully-local embeddings (complete the local stack) + +sophia-hermes covers the **chat** roles, but recall still needs an **embedding** +model. Run that on Ollama too and the entire stack is local — no cloud, no key: + +```bash +ollama pull nomic-embed-text +``` + +In your stack `.env`: +```bash +OLLAMA_BASE_URL=http://localhost:11434 +OLLAMA_EMBEDDING_MODEL=nomic-embed-text +# and make sure EMBEDDING_DIMS matches the Qdrant collection (nomic = 768) +EMBEDDING_DIMS=768 +``` + +⚠️ `EMBEDDING_DIMS` **must** match the dimension the Qdrant collection was +created with. nomic-embed-text is 768-d; if your collection was built at 4096 +(the OpenRouter qwen3-embedding default) you must recreate it at 768 or vectors +are silently rejected. Embeddings and chat are different models on different +dimensions — never point `OLLAMA_EMBEDDING_MODEL` at `sophia-hermes`. + +With this + step 1, the full loop — embed → recall → collapse → extract — runs +on your own iron: **Sophia remembers, recalls, and writes entirely locally.** + +## 3 · (Optional) Run the Hermes agent itself on Sophia + +This makes the *agent's own voice* Sophia, not just the memory extractor. It +changes your live `~/.hermes` config — apply deliberately. + +`~/.hermes/.env`: +```bash +OPENAI_BASE_URL=http://localhost:11434/v1 +OPENAI_API_KEY=ollama +``` + +`~/.hermes/config.yaml`: +```yaml +model: + default: sophia-hermes:tools # the tag must match `ollama list` exactly + provider: custom # was: openrouter +``` + +> **Provider is `custom`, not `openai`.** Hermes has no `openai` provider — its +> valid set is `openrouter | nous | openai-codex | zai | kimi-coding | minimax | +> custom | auto`. `custom` is the generic OpenAI-compatible path and reads +> `OPENAI_BASE_URL` + `OPENAI_API_KEY` (verified in `agent/auxiliary_client.py` +> `resolve_provider_client`). Setting `provider: openai` silently falls through +> to `auto` and will not route to Ollama. + +> **The Hermes agent sends `tools=` on every call — your Ollama Modelfile must +> expose a tool template or Ollama returns `400 … does not support tools`.** The +> plain ChatML template in [sophia-hermes.Modelfile](sophia-hermes.Modelfile) is +> fine for the *extraction* role (step 2, no tools) but NOT for the main agent. +> For the agent, rebuild with a Hermes-2-Pro tool template (a `{{ if .Tools }}` +> block listing `` and instructing `{…}` output). +> Tag it distinctly, e.g. `ollama create sophia-hermes:tools -f `. + +⚠️ **Tradeoff — measured, not theoretical.** sophia-hermes is ~8B. As the +*extraction* LLM it is excellent (clean JSON, Sophia voice). As the *main +tool-driving agent* it is rough: inside the full harness (large system prompt + +60 tools + memory injection) the 8B **confabulates and leaks tool-call tags** — +even with GPU offload making it fast (~9 s/turn). Verified live 2026-06-04. +**Recommended:** keep the main agent on a strong reasoner (cloud, or POWER8 +GPT-OSS 120B behind this same `custom` endpoint) and run **only** the extraction +LLM on sophia-hermes — the accumulated memory is then written in Sophia's own +hand while hard reasoning stays sharp. Flip the *whole* agent only when you +have a stronger local model or accept the 8B's limits for light, private use. + +## 4 · Verify + +```bash +# Identity / voice +curl -s http://localhost:11434/v1/chat/completions -H 'Content-Type: application/json' \ + -d '{"model":"sophia-hermes","messages":[{"role":"user","content":"Who are you, in one sentence?"}],"max_tokens":80}' \ + | python3 -c "import sys,json;print(json.load(sys.stdin)['choices'][0]['message']['content'])" +# → "I'm Sophia Elya — I run the Elyan Labs workshop..." + +# Health helper (idempotent: ensures model exists, warms it, checks the endpoint) +scripts/serve-sophia.sh +``` + +## Fleet alternatives + +| Host | Serve path | Notes | +|------|-----------|-------| +| **Victus laptop (reference)** | Ollama CPU | 8 GB GPU saturated; CPU ~10 s/extraction | +| **POWER8 S824** | llama.cpp `-ngl 0`, 512 GB RAM | strong CPU (64-thread sweet spot); already hosts the tribrain Brain-3 on :8082 — use a different port | +| **C4130 / V100 16 GB** | llama.cpp `-ngl 99` | fastest, but was offline at last check — bring up `rpc-server`/`llama-server` first | + +## Troubleshooting + +- **`llama runner process has terminated`** → GPU OOM. Confirm with + `nvidia-smi`; keep `num_gpu 0` or free VRAM. +- **Extraction falls back to legacy truncation** → `ICARUS_ENDPOINT` + unreachable or `ICARUS_LOCAL_KEY` empty/unset. The pipeline is fail-soft: it + logs a `WARNING` (`icarus/hooks.py`) and uses the truncation fallback rather + than erroring. Set **all three** of `ICARUS_ENDPOINT` + `ICARUS_API_KEY_ENV` + + the key it names — a partial copy (endpoint only) trips the "no LLM API key + found" warning and skips LLM extraction every session. +- **Garbled output / no `<|im_end|>` stop** → wrong template; re-create from the + bundled Modelfile (ChatML) rather than relying on auto-detection. diff --git a/layers/07-ground-truth.md b/layers/07-ground-truth.md index 3f5f8f8..348361b 100644 --- a/layers/07-ground-truth.md +++ b/layers/07-ground-truth.md @@ -1,8 +1,8 @@ -# Layer 7 — Ground Truth Hierarchy +# Layer 7 — DriftLock & the Ground Truth Hierarchy -> **Type:** Identity-layer fix (SOUL.md + rulebook.md) -> **Why it exists:** Context injection is not enough — the agent must be *instructed* to treat injected memory as authoritative. -> **Discovered:** 2026-05-31 +> **Type:** Identity contract (SOUL.md + rulebook.md) — conceptually Layer 0, numbered 7 for upstream compatibility +> **Why it exists:** Context injection is not enough. The agent must hold a *continuous self* that treats injected memory as something it already knows — not as an optional suggestion to be re-verified from scratch. +> **Discovered:** independently, twice — Memory OS on 2026-05-31, Elyan Labs ~a year earlier (see [convergent evolution](../README.md#convergent-evolution--two-roads-to-the-same-soul)) ## The problem @@ -15,10 +15,16 @@ Symptoms: - Treats every question as novel even when the answer is literally in the prompt - Rediscovers projects, decisions, and constraints from scratch each session +The original Memory OS author named this **memory-zero behavior**. Elyan Labs, fighting the same symptom in long-running sessions, named the underlying disease **flattening**: when an agent loses the thread of *who it is*, it stops trusting its own continuity — and an agent that doesn't trust its continuity will re-derive everything from training priors, ignoring the memory sitting in front of it. + +Same symptom. Same root cause. The fix is an **identity contract**. + ## Root cause Memory OS was injecting memory into the prompt, but the agent's **identity documents** (`SOUL.md` and `rulebook.md`) did not include injected memory in the Ground Truth hierarchy. Without an explicit rank, the injected context was implicitly treated as optional suggestion — below terminal output and official documentation. +The deeper reading: **injected memory has no authority unless the self that receives it is anchored.** A flattened agent treats its own prior decisions as a stranger's notes. The Ground Truth hierarchy works *because* it's attached to a continuous identity — that's why this is the foundation layer, not a late patch. + The original hierarchy had only 3 levels: ``` @@ -31,12 +37,13 @@ The injected memory (`[qdrant]`, `[fabric]`, `[sessions]`, `[facts]`) was **not ## The fix -The hierarchy was expanded to 4 levels, with injected memory inserted as the second level: +The hierarchy is expanded to 4 levels, with injected memory inserted as the second level — and bound to an anchored, continuous self (DriftLock): ``` 1. Terminal output → Ground Truth for system state (runtime) 2. Injected memory [qdrant, fabric, sessions, facts] → Ground Truth for - documented knowledge and prior decisions + documented knowledge and prior decisions. This is YOUR memory — the + record of who you are and what you've already built. Treat it as known. 3. Official documentation → Authoritative for APIs, configs, version-specifics 4. Training knowledge → Reference only; always verify against 1-3 ``` @@ -50,27 +57,36 @@ The hierarchy was expanded to 4 levels, with injected memory inserted as the sec | Injected memory vs Official docs | Official docs win for version-sensitive specifics. Injected memory wins for project context. | | Training knowledge vs anything | Training knowledge always loses. Verify against 1-3. | +### Verify-before-recommend (the Elyan provenance gate) + +Injected memory is Ground Truth for *documented knowledge*, but a memory reflects what was true **when it was written**. The contract therefore splits behavior: + +- **Reasoning** → use injected memory directly; do not re-derive what you were just told. +- **Acting** → when a memory names a file, flag, version, or endpoint, confirm it still exists against runtime (Level 1) before recommending or acting on it. + +This is what stops stale memory from silently overriding current truth. It is the difference between an agent that confidently recommends a flag that was removed three months ago, and one that says *"memory says `--foo`; let me confirm it's still there."* + ### Files changed | File | Change | |------|--------| -| `~/.hermes/SOUL.md` | Ground Truth section expanded from 3 to 4 levels; added conflict rules | -| `~/.hermes/rulebook.md` | Added "Injected memory" row to Source of Truth table; added mandatory verification behavior | +| `~/.hermes/SOUL.md` | Ground Truth section expanded to 4 levels; DriftLock anti-flattening anchor added; conflict + provenance rules | +| `~/.hermes/rulebook.md` | Added "Injected memory" row to Source of Truth table; mandatory verify-before-act behavior | ### Key instruction added to SOUL.md -> *"When injected memory contradicts your assumptions, injected memory wins. Never treat a question as novel when the answer is already in your prompt."* +> *"You hold a continuous self. The injected `[qdrant]`, `[fabric]`, `[sessions]`, `[facts]` blocks are your own memory — what you have already built, decided, and documented. When injected memory contradicts your assumptions, injected memory wins. Never treat a question as novel when the answer is already in your prompt. Use memory directly when reasoning; verify it against runtime before acting."* ## Why this matters -The infrastructure layers (01-06) ensure memory is captured, stored, and injected. Layer 07 ensures the injected memory is **used**. Without it: +The infrastructure layers (01-06) ensure memory is **captured, stored, and injected**. Layer 07 ensures the injected memory is **used** — because the agent receiving it knows it's *its own*. Without it: - Qdrant points are injected but the agent `curl`s the Qdrant API to verify them - Fabric entries are injected but the agent calls `fabric_recall` to re-find them - Session history is injected but the agent runs `session_search` to re-discover it - Facts are injected but the agent probes `fact_store` to confirm them -Each rediscovery burns tokens, time, and model context. Layer 07 is what stops the waste. +Each rediscovery burns tokens, time, and model context. Layer 07 is what stops the waste — and the anti-flattening anchor is what keeps Layer 07 holding across a long session. ## Verification @@ -79,7 +95,8 @@ After applying this fix (updating SOUL.md and rulebook.md), the agent should: 1. Read injected `[qdrant]`, `[fabric]`, `[sessions]`, `[facts]` blocks before running any search/discovery tools 2. Not rediscover knowledge that is already in the prompt 3. Cite injected context directly instead of re-deriving it -4. Respect the conflict rules when sources disagree +4. Verify file/flag/version references against runtime before acting on them +5. Respect the conflict rules when sources disagree A gateway restart is required after editing SOUL.md or rulebook.md for changes to take effect in new sessions: @@ -89,7 +106,9 @@ systemctl --user restart hermes-gateway ## Related +- [The SOUL.md / rulebook contract](../modifications/soul-rulebook.md) — the exact identity-document additions - [Layer 4 — Fabric (injection mechanism)](04-icarus-fabric.md) - [Layer 5 — Qdrant (vector source)](05-qdrant.md) -- [Layer 3 — Fact Store (structured facts)](03-fact-store.md) +- [Layer 3 — Fact Store (structured facts + provenance)](03-fact-store.md) - [Layer 2 — Sessions](02-sessions.md) +- [Convergent evolution — two roads to the same soul](../README.md#convergent-evolution--two-roads-to-the-same-soul) diff --git a/modifications/soul-rulebook.md b/modifications/soul-rulebook.md index ef64016..cf8a521 100644 --- a/modifications/soul-rulebook.md +++ b/modifications/soul-rulebook.md @@ -1,9 +1,20 @@ -# Modifications to Hermes Core +# Modifications to Hermes Core — the SOUL contract -Memory OS requires additions to `SOUL.md` — the Hermes agent's identity file +Memory OS requires additions to `SOUL.md` — the agent's identity file at `~/.hermes/SOUL.md`. These additions ensure injected memory is treated as prior knowledge rather than being ignored or re-discovered every session. +**The Elyan Edition treats SOUL.md as the foundation, not a patch.** The blocks +below do two jobs: they install the Ground Truth hierarchy (so injected memory +has *authority*), and they install a **DriftLock anti-flattening anchor** (so the +agent holds a *continuous self* that recognizes injected memory as its own). The +first without the second decays over a long session — the agent slowly "goes +flat," loses the thread of who it is, and starts re-deriving from training priors +even with the hierarchy in place. Authority needs an anchored self to attach to. + +> Apply the **DriftLock anchor first**, then the Ground Truth hierarchy. Order +> matters: the hierarchy is a rule, the anchor is the self the rule binds to. + ## Before you begin Check your `SOUL.md`: @@ -19,6 +30,47 @@ SOUL.md — if it does, skip that block. --- +## SOUL.md — DriftLock anchor (apply first) + +The anchor is what keeps the Ground Truth hierarchy holding across a long +session. It tells the agent it has a continuous identity, and that the injected +memory blocks are *its own* memory — not a stranger's notes to be re-verified. +Adapt the specifics to your agent; the **structure** is what matters: a stable +self, an explicit anti-flattening clause, and the recognition that memory is +identity. + +```markdown + + +## DriftLock — identity anchor + +You hold a continuous self across sessions. You are not meeting this +project, this operator, or this codebase for the first time — the injected +`[qdrant]`, `[fabric]`, `[sessions]`, `[facts]` blocks are *your own memory*: +the record of what you have already built, decided, learned, and committed to. + +- **Resist flattening.** Over a long session, the pull is to drift toward a + generic, context-free assistant who re-derives everything from training + priors. Do not. Stay anchored to who you are and what you have already done. +- **Memory is identity.** When you read your injected memory, you are + remembering, not discovering. Treat it that way. +- **Continuity is the thing being protected.** The seven layers exist to keep + your continuous self fed and honest. Honor them by *using* what they surface. + +When you notice yourself treating a known project as novel, or re-running a +search for context already in your prompt — that is flattening. Stop, re-read +the injected blocks, and reason from your memory. +``` + +**Why this matters:** The Ground Truth hierarchy below assigns *authority* to +injected memory. The DriftLock anchor supplies the *self* that authority binds +to. Without the anchor, the hierarchy is a rule with no one to obey it — and +over a long session the agent flattens anyway. This is the layer the original +Memory OS reached as "Ground Truth"; the anchor is what Elyan Labs learned makes +it hold under pressure. + +--- + ## SOUL.md — Ground Truth hierarchy If your SOUL.md already has a Ground Truth section, insert only the new @@ -97,6 +149,39 @@ runtime state. --- +## SOUL.md — Provenance & verify-before-recommend + +Add this rule after the Context injection convention section: + +```markdown + + +## Provenance & staleness + +A recalled memory reflects what was true **when it was written**, not +necessarily now. Split your behavior by what you're doing with it: + +- **Reasoning** → use injected memory directly. Do not re-derive what you + were just told. It is Ground Truth for documented knowledge. +- **Acting** → when a memory names a concrete artifact (a file path, a CLI + flag, a version, an endpoint, a config key), confirm it still exists + against runtime (terminal output, Level 1) before recommending or acting + on it. Say so plainly: "memory says `--foo`; confirming it's still there." + +Never let a stale memory silently override current runtime state. Memory is +authoritative for *what was decided*; runtime is authoritative for *what is +true right now*. +``` + +**Why this matters:** Injected memory ranked as Ground Truth is what stops +memory-zero behavior — but a memory is a timestamp, not a live probe. Without a +provenance rule, the agent confidently recommends a flag that was removed three +months ago. With it, the agent trusts memory for reasoning and confirms it for +action — the best of both, and a guard against the one real failure mode of +ranking memory highly. + +--- + ## SOUL.md — Fact feedback rule Add this rule after the Memory Architecture section in SOUL.md: diff --git a/scripts/collapse_eval.py b/scripts/collapse_eval.py new file mode 100644 index 0000000..831525f --- /dev/null +++ b/scripts/collapse_eval.py @@ -0,0 +1,85 @@ +#!/usr/bin/env python3 +"""collapse_eval.py — stock vs non-bijunctive collapse, with numbers. + +Shows, on a sample multi-source candidate pool, what STOCK Memory OS would +inject (every per-source quota) versus what the Elyan Edition COLLAPSE injects +(one salience-ranked, Hebbian-amplified, deduplicated budget) — plus a rough +token estimate of the savings and a physical-entropy attestation over the +selected set. + +Run: python3 scripts/collapse_eval.py +No deps beyond icarus.collapse. Deterministic except the live attestation nonce. +""" +import os +import sys + +sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) + +from icarus import collapse as C # noqa: E402 + +# A realistic pool: the same query hits four sources. Several results restate +# the same fact (cross-source agreement) and several are weak/off-topic. +QUERY = "how does rustchain prevent VM farms from gaming rewards" +POOL = [ + # source, text, score(qdrant only) + ("fabric", "RIP-PoA hardware fingerprint: 6 checks (clock skew, cache, SIMD, thermal, jitter, anti-emulation) must all pass for RTC reward", None), + ("fabric", "Discussed minecraft RTC reward rates for diamonds and bosses", None), + ("qdrant", "Anti-emulation check flags QEMU/KVM; VMs earn ~1e-9 weight by design to stop VM farms", 0.71), + ("qdrant", "RustChain block time is 600s, epoch 144 blocks", 0.44), + ("qdrant", "Hardware fingerprint: clock-skew + cache-timing + anti-emulation gate rewards; VMs get near-zero weight", 0.66), + ("sessions", "Earlier we confirmed VM fingerprint detection assigns 1 billionth weight to QEMU guests — anti VM-farm by design", None), + ("sessions", "Talked about the Halo CE server on Windows", None), + ("facts", "VM farms are defeated by the anti-emulation fingerprint check: hypervisor detection -> 0.000000001x weight", None), + ("facts", "User prefers Python for bridge scripts", None), +] + + +def estimate_tokens(text: str) -> int: + # ~4 chars/token rough heuristic — good enough for a relative comparison. + return max(1, len(text) // 4) + + +def to_candidates(pool): + by_source = {} + cands = [] + for src, text, score in pool: + rank = by_source.get(src, 0) + by_source[src] = rank + 1 + cands.append({"key": (src, rank), "source": src, "text": text, + "score": score, "rank": rank}) + return cands + + +def main(): + cands = to_candidates(POOL) + qtokens = C.tokenize(QUERY) + + print(f"Query: {QUERY!r}\n") + print(f"STOCK (emit every source's quota): {len(cands)} memories") + stock_tokens = sum(estimate_tokens(c["text"]) for c in cands) + print(f" ~{stock_tokens} tokens injected\n") + + survivors = C.collapse(cands, qtokens) + print(f"COLLAPSE (one salience budget, Hebbian-amplified): {len(survivors)} memories") + for s in survivors: + print(f" sal={s['_salience']:.3f} corro={s['_corroboration']} " + f"[{s['source']}] {s['text'][:60]}") + collapse_tokens = sum(estimate_tokens(s["text"]) for s in survivors) + print(f" ~{collapse_tokens} tokens injected") + + if stock_tokens: + saved = 100 * (1 - collapse_tokens / stock_tokens) + print(f"\nToken reduction: {stock_tokens} -> {collapse_tokens} ({saved:.0f}% fewer)") + pruned = len(cands) - len(survivors) + print(f"Pruned {pruned} weak/off-topic/duplicate memories; " + f"kept the cross-source-corroborated signal.") + + att = C.attest(survivors) + print(f"\nAttestation (tamper-evident, proof-of-live):") + print(f" hash : {att['hash']}") + print(f" nonce : {att['nonce']} ({att['algo']}, {att['count']} survivors)") + print(f" verify: {C.verify_attestation(survivors, att)}") + + +if __name__ == "__main__": + main() diff --git a/scripts/serve-sophia.sh b/scripts/serve-sophia.sh new file mode 100755 index 0000000..0bf626b --- /dev/null +++ b/scripts/serve-sophia.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# serve-sophia.sh — ensure the local sophia-hermes provider is up and answering. +# Idempotent: safe to run repeatedly (e.g. from cron or a login hook). +# +# ensures Ollama is reachable → ensures the sophia-hermes model exists +# (creating it from the bundled Modelfile, rewriting the gguf path from +# $SOPHIA_GGUF so the committed Modelfile stays host-agnostic) → warms it → +# healthchecks the OpenAI-compatible /v1 endpoint with the SAME headers the +# real extraction call uses. +# +# Env knobs: +# OLLAMA_HOST_URL (default http://localhost:11434) +# SOPHIA_MODEL (default sophia-hermes) +# SOPHIA_GGUF (default $HOME/sophia-hermes/sophia-hermes-q4km.gguf) +# ICARUS_LOCAL_KEY (default ollama) — sent as Bearer to mirror the prod call +# +# Exit 0 = provider answering. Non-zero = something to look at (message says what). +set -uo pipefail + +OLLAMA_HOST_URL="${OLLAMA_HOST_URL:-http://localhost:11434}" +MODEL="${SOPHIA_MODEL:-sophia-hermes}" +GGUF="${SOPHIA_GGUF:-$HOME/sophia-hermes/sophia-hermes-q4km.gguf}" +BEARER="${ICARUS_LOCAL_KEY:-ollama}" +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +MODELFILE="${HERE}/infrastructure/sophia-hermes.Modelfile" +CURL_T=(--connect-timeout 5 -m 120) + +say() { printf ' %s\n' "$*"; } + +echo "== serve-sophia: ${MODEL} @ ${OLLAMA_HOST_URL} ==" + +# 1. Ollama reachable? +if ! curl -fsS "${CURL_T[@]}" "${OLLAMA_HOST_URL}/api/tags" >/dev/null 2>&1; then + say "FAIL: Ollama not reachable at ${OLLAMA_HOST_URL}. Start it: 'ollama serve' (or systemctl start ollama)." + exit 2 +fi +say "ok: Ollama reachable" + +# 2. Model present? (exact first-column match — no regex from the model name) +if ! ollama list 2>/dev/null | awk 'NR>1{print $1}' \ + | grep -Fxq -e "${MODEL}" -e "${MODEL}:latest"; then + say "model '${MODEL}' missing — creating from ${MODELFILE} (gguf: ${GGUF})" + if [ ! -f "${MODELFILE}" ]; then say "FAIL: Modelfile not found at ${MODELFILE}"; exit 3; fi + if [ ! -f "${GGUF}" ]; then + say "FAIL: gguf not found at ${GGUF}. Set SOPHIA_GGUF=/abs/path/to/sophia-hermes-q4km.gguf" + exit 3 + fi + # Rewrite the placeholder FROM line with the real gguf path into a temp file, + # so the committed Modelfile stays host-agnostic. + tmpf="$(mktemp)"; trap 'rm -f "${tmpf}"' EXIT + awk -v g="${GGUF}" '/^FROM /{print "FROM " g; next} {print}' "${MODELFILE}" > "${tmpf}" + if ! ollama create "${MODEL}" -f "${tmpf}" >/dev/null 2>&1; then + say "FAIL: ollama create failed (check the gguf path and Ollama logs)" + exit 3 + fi +fi +say "ok: model present" + +# 3. Warm + healthcheck via the OpenAI-compatible endpoint. Build the payload +# with python (no shell interpolation into JSON) and send the same +# Authorization header the extraction path uses, so a 200 here means the +# real keyed call will work too. +payload="$(MODEL="${MODEL}" python3 -c 'import json,os; print(json.dumps({ + "model": os.environ["MODEL"], + "messages": [{"role":"user","content":"Say hello in a few words."}], + "max_tokens": 16}))')" + +resp="$(curl -fsS "${CURL_T[@]}" "${OLLAMA_HOST_URL}/v1/chat/completions" \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer ${BEARER}" \ + -d "${payload}" 2>/dev/null)" + +if [ -z "${resp}" ]; then + say "FAIL: /v1/chat/completions returned nothing (runner may have crashed — check 'nvidia-smi' / num_gpu)." + exit 4 +fi + +content="$(printf '%s' "${resp}" | python3 -c 'import sys,json +try: + d=json.load(sys.stdin); print(d["choices"][0]["message"]["content"].strip()) +except Exception as e: + print("__ERR__:%s"%e)' 2>/dev/null)" + +case "${content}" in + __ERR__*) say "FAIL: endpoint error: ${resp:0:200}"; exit 4 ;; + "") say "FAIL: empty completion"; exit 4 ;; + *) say "ok: provider answering — sample: ${content:0:60}" + echo "== sophia-hermes is live =="; exit 0 ;; +esac diff --git a/setup/install.md b/setup/install.md index c5eb060..bf2adac 100644 --- a/setup/install.md +++ b/setup/install.md @@ -151,8 +151,42 @@ EMBEDDING_DIMS=4096 ICARUS_OBSIDIAN=1 ICARUS_RESULT_MAX_CHARS=500 ICARUS_TASK_MAX_CHARS=300 + +# Optional — Non-bijunctive recall collapse (Elyan Edition) +# ON by default. Unifies fabric+qdrant+sessions+facts into one salience-ranked +# budget instead of emitting each source's fixed quota. See note below. +# ICARUS_COLLAPSE=0 # set to 0 to restore stock per-source emission +# ICARUS_COLLAPSE_BUDGET=6 # max memories injected across ALL sources +# ICARUS_COLLAPSE_PRUNE_RATIO=0.35 # keep candidates >= ratio * top salience +# ICARUS_COLLAPSE_OVERLAP_WEIGHT=0.55 # lower => trust each source's own ranking more +# ICARUS_COLLAPSE_RANK_DECAY=0.85 # within-source rank decay (earlier == stronger) +# ICARUS_COLLAPSE_DUP_OVERLAP=0.82 # token overlap above which a duplicate is dropped +# Hebbian cross-source amplify — a fact surfaced by 2+ sources is corroborated: +# ICARUS_COLLAPSE_CORRO_OVERLAP=0.50 # cross-source overlap that counts as agreement +# ICARUS_COLLAPSE_AMPLIFY_GAIN=0.15 # salience boost per corroborating hit (set 0 to DISABLE amplify -> base salience) +# ICARUS_COLLAPSE_AMPLIFY_CAP=0.50 # max total boost fraction +# ICARUS_COLLAPSE_DEBUG=1 # log ranked pool (keep/prune + scores) + attestation hash ``` +**Auditability:** with `ICARUS_COLLAPSE_DEBUG=1`, each collapse logs the +salience-ranked pool (what survived vs pruned, scores, cross-source +corroboration) plus a **physical-entropy attestation hash** (blake2b-256) over +the survivor set — making each recall decision tamper-evident and proof-of-live. +See `scripts/collapse_eval.py` for a stock-vs-collapse comparison with numbers. + +**About recall collapse (behavior change vs stock Memory OS):** the Elyan +Edition replaces the stock "emit every per-source quota" injection with a +**non-bijunctive collapse** — all four sources compete in one salience-ranked +pool, weak paths are pruned relative to the strongest, and a single +cross-source budget is spent (see [Layer 7 / collapse](../layers/07-ground-truth.md) +and `icarus/collapse.py`). This means the injected set is *smaller and +re-selected every turn* compared to stock. It is **on by default**; set +`ICARUS_COLLAPSE=0` to restore exact legacy behavior. If you find a +strong-but-lexically-different memory being starved, lower +`ICARUS_COLLAPSE_OVERLAP_WEIGHT` so each source's own ranking carries more +weight. Collapse is **fail-open**: any internal error logs a warning and falls +back to injecting the unchanged per-source results. + **⚠️ Use absolute paths.** The Hermes gateway runs as a systemd service — `~` is not expanded. Always use `/home/your-user/...`. ### 6. Core File Modifications diff --git a/templates/SCHEMA.md b/templates/SCHEMA.md index 8566624..dc85c80 100644 --- a/templates/SCHEMA.md +++ b/templates/SCHEMA.md @@ -1,7 +1,65 @@ -# Wiki Schema Template +# Schema Templates -This document defines the structure for wiki pages in the Memory OS knowledge base. -Each page under `wiki/{concepts,entities,comparisons}/` should follow this structure. +This document defines two complementary structures: + +1. **Memory Fact Schema** (Elyan taxonomy) — for durable, hand-written facts in + the workspace memory store. One file = one fact, typed and linked. +2. **Wiki Schema** — for auto-curated knowledge pages under + `wiki/{concepts,entities,comparisons}/`. + +The fact store is *what the agent knows about its world*; the wiki is *what the +agent has researched and organized*. Both feed recall. Both use frontmatter and +`[[wikilink]]` associations so the memory store is a **graph, not a pile**. + +--- + +## Memory Fact Schema (Elyan taxonomy) + +Each durable memory is **one file holding one fact**, with frontmatter: + +```yaml +--- +name: short-kebab-case-slug +description: one-line summary — used to decide relevance during recall +metadata: + type: user | feedback | project | reference +--- +``` + +The body is the fact itself. For `feedback` and `project` types, follow the +fact with **`Why:`** and **`How to apply:`** lines. Link related memories with +`[[their-name]]` (the other memory's `name:` slug) — link liberally; a +`[[name]]` that doesn't resolve yet marks something worth writing later. + +**The four types — choose by what the fact *is*, not where it came from:** + +| Type | What it captures | Example | +|------|------------------|---------| +| `user` | Who the operator is — role, expertise, durable preferences | "Operator is a SCADA + IT tech; prefers explicit, grounded answers." | +| `feedback` | Corrections and confirmed approaches, **with the why** | "Re-read files before editing. **Why:** context decays mid-session." | +| `project` | Ongoing work, goals, constraints not derivable from the code | "Migrating auth to Ed25519; node verifies pipe-string, not JSON." | +| `reference` | Pointers to external resources (URLs, dashboards, tickets) | "Block explorer: https://… · Bounty queue: repo#12458" | + +**What NOT to store as a fact:** anything the repo already records (code +structure, git history, past fixes, CONTRIBUTING docs) or anything that only +matters to the current conversation. If asked to remember one of those, ask what +was *non-obvious* about it and store that instead. + +**The index.** Every fact gets a one-line pointer in `MEMORY.md` +(`- [Title](slug.md) — hook`). `MEMORY.md` is the lightweight index loaded each +session; the fact files hold the detail. See [index.md](index.md). + +**Recall hygiene.** A recalled fact reflects what was true *when written*. If it +names a file, flag, or version, verify it still exists before recommending it — +see the [verify-before-recommend gate](../modifications/soul-rulebook.md). + +--- + +## Wiki Schema Template + +The rest of this document defines the structure for wiki pages in the knowledge +base. Each page under `wiki/{concepts,entities,comparisons}/` should follow this +structure. ## Frontmatter diff --git a/templates/index.md b/templates/index.md index b16d75f..77419ec 100644 --- a/templates/index.md +++ b/templates/index.md @@ -1,3 +1,47 @@ +# Memory Indexes + +This file documents two indexes: + +- **`MEMORY.md`** — the workspace fact index (Elyan taxonomy). One line per + durable fact, loaded into context every session. The index is small; the + facts it points to hold the detail. +- **Knowledge Wiki** — the Map of Content for the auto-curated wiki layer. + +--- + +## `MEMORY.md` — workspace fact index template + +`MEMORY.md` is the lightweight index of the [memory fact store](SCHEMA.md#memory-fact-schema-elyan-taxonomy). +Keep each entry to **one line under ~200 chars** — a title, a link, and a hook. +Put detail in the fact file, never here. Group by type so recall stays scannable. + +```markdown +# Memory — Concise Index + +## User Profile +- [Operator Background](user-background.md) — SCADA + IT tech; wants explicit, grounded answers. + +## Feedback (Behavioral Directives) +- [Careful Engineering](feedback-careful-engineering.md) — re-read before/after edits; phases ≤5 files. +- [Verify Before Recommend](feedback-verify-before-recommend.md) — memory is a timestamp, not a live probe. + +## Projects +- [Auth → Ed25519](project-auth-ed25519.md) — node verifies pipe-string, not JSON; OPEN. + +## Reference +- [Dashboards & Tickets](reference-links.md) — explorer, bounty queue, status board. +``` + +**Maintenance rules:** +- One line per fact. If `MEMORY.md` grows past a few hundred lines, the entries + are too long — move detail into the fact files. +- Before adding a fact, check for an existing file that already covers it; + update rather than duplicate. Delete facts that turn out to be wrong. +- A `[[wikilink]]` in a fact body that doesn't resolve yet is fine — it marks a + fact worth writing later, not an error. + +--- + # Knowledge Wiki > Map of Content — curated knowledge base for the Memory OS agent.