feat(agents): paged, multi-fidelity context memory for McpClient by iamMihirT · Pull Request #1915 · dimensionalOS/dimos

iamMihirT · 2026-04-24T20:21:48Z

TL;DR

Replaces the unbounded list[BaseMessage] history in McpClient with a
paged, multi-fidelity memory layer that keeps the assembled prompt inside
the active model's context window. The worker thread also recovers from
turn-level exceptions instead of dying silently, which was the original
user-visible symptom (messages piling up in the queue with agent_idle
stuck at False).

Closes #1899.

Before / after

Before: every BaseMessage the agent saw was appended to self._history
and replayed in full on every turn. As the conversation grew — especially
with image artefacts — the request eventually exceeded the model's context
window. OpenAI returned context_length_exceeded, the exception bubbled
out of _process_message, the worker thread died, and every subsequent
enqueued message sat unprocessed forever.

After: McpClient owns a MemoryEngine that converts each inbound
message into one or more typed Page objects, each storing four fidelity
representations (POINTER, STRUCTURED, COMPRESSED, FULL). A
two-phase greedy selector chooses per-page fidelities to fit the active
model's token budget. Recent image artefacts are pinned at FULL; older
pages gracefully degrade. The worker thread wraps each turn in
try/except, emits a PHYSICAL_INSUFFICIENCY fault, and keeps draining
the queue.

Usage

Existing callers get the new behaviour automatically with sensible defaults
(full model window, 3 recent images pinned at FULL, 4096-token output
reserve). Three new optional knobs on McpClientConfig:

field	default	purpose
`token_budget`	`None`	Override the per-model context window.
`pin_recent_evidence`	`3`	Keep the N most recent image artefacts at `FULL`.
`output_reserve_tokens`	`4096`	Tokens carved out of the window for the response.

Plus a new faults: Out[FaultEvent] stream on the module for
observability (eviction, fidelity-drop, rehydrate, and
physical-insufficiency events flow through it alongside structlog).

Component walkthrough

Everything new lives under dimos/agents/memory/. Each file is a single
responsibility; the public surface is deliberately small.

file	role
`pages.py`	`Page` dataclass, fidelity enum, page-type enum. Invariants enforced in `__post_init__`.
`page_table.py`	Thread-safe page store. Explicit + auto pins, recency tracking, pin-rebalance diffs.
`tokens.py`	`TokenCounter` protocol + `TiktokenCounter` (OpenAI) and `HeuristicCounter` (fallback).
`budget.py`	`ModelBudget`: per-model context window, output reserve, per-message overhead.
`faults.py`	Structured `FaultEvent` + `FaultObserver` (routes to `structlog` and an optional `Out` stream).
`ingestion.py`	`ingest_message`: turns a `BaseMessage` into `list[Page]`, with multimodal split rules.
`selector.py`	Two-phase greedy `assemble_prompt` respecting pins and the effective token budget.
`engine.py`	`MemoryEngine` facade: ingestion + assembly + pin rebalancing + fault emission + rehydration.
`artefact_tool.py`	`build_get_artefact_tool`: LLM-callable rehydration of a degraded `EVIDENCE` page back to `FULL`.

Integration is a single-file change in dimos/agents/mcp/mcp_client.py:
replace self._history with self._engine, feed state_graph.stream from
engine.assemble(), add the get_artefact tool to the agent's tool list,
and wrap _process_message in try/except inside _thread_loop.

Testing

170 tests ship with this PR (all green):

168 unit tests in dimos/agents/memory/test_*.py — cover fidelity
invariants, pin policy, token accounting contracts, budget resolution,
fault emission, multimodal split rules, selector edge cases, and the
rehydration path.
2 integration tests in
dimos/agents/mcp/test_mcp_client_history_compaction.py:
- test_history_compaction_keeps_recent_images_at_full_under_pressure
  drives the real MemoryEngine from a bypass-constructed McpClient
  under an 8k-token budget and verifies the effective-cap,
  recent-image-pinning, and fault-emission contracts.
- test_thread_loop_recovers_from_process_message_exception is the
  regression guard for the motivating bug: makes the second of three
  turns raise and verifies all three are processed, a
  PHYSICAL_INSUFFICIENCY fault is emitted, and the worker exits
  cleanly.

Run them with:

uv run pytest dimos/agents/memory/ \
              dimos/agents/mcp/test_mcp_client_history_compaction.py

Not included in this PR

Agent class: the older dimos.agents.agent.Agent was removed
upstream before this work landed. Only McpClient is wired up.
End-to-end LLM tests: no real API calls in the test suite. The
integration tests bypass-construct McpClient and drive the engine
directly, which is fast and deterministic but does not exercise the
full request round-trip.
Dynamic model window discovery: the ModelBudget table is hardcoded
for current OpenAI/Anthropic/Gemini/Ollama model names (with
longest-prefix matching). Add new entries in budget.py as models ship.

Reviewer test plan

Read dimos/agents/memory/engine.py and
dimos/agents/memory/selector.py — these are the two most
architecturally load-bearing files.
Skim dimos/agents/memory/ingestion.py multimodal split rules
(the ingest_message docstring lists the full matrix).
Verify the new McpClientConfig knobs are optional and
default-equivalent to the old full-history behaviour for small
conversations.
Run the two integration tests locally to confirm thread recovery.
Smoke-test your preferred blueprint (e.g. unitree-go2-agentic)
against a long conversation with image artefacts to watch the
fault stream in action.

Files changed

dimos/agents/memory/ — 10 production files + 9 unit test files (new)
dimos/agents/mcp/mcp_client.py — modified
dimos/agents/mcp/test_mcp_client_history_compaction.py — new
AGENTS.md — new Context Memory subsection
pyproject.toml — tiktoken>=0.8.0 added to the agents extra
uv.lock — lockfile update for tiktoken

Total: 24 files, ~6.3k insertions / ~18 deletions across 4 commits.

Contributor License Agreement

I have read and approved the CLA.

Made with Cursor

Introduces a new ``dimos.agents.memory`` package that manages agent conversation state as a set of typed, multi-fidelity ``Page`` objects rather than an unbounded list of LangChain messages. Each page stores four fidelity rungs eagerly (POINTER, STRUCTURED, COMPRESSED, FULL), and a two-phase greedy selector picks per-page rungs to assemble a prompt that fits the active model's token budget. Core components: - ``pages.py``: ``Page`` dataclass + fidelity / page-type enums, with invariants enforced in ``__post_init__``. - ``page_table.py``: thread-safe store with explicit and auto pins; ``pin_recent_evidence`` keeps the N most recent EVIDENCE pages at FULL fidelity. - ``tokens.py``: ``TokenCounter`` protocol, tiktoken-backed counter for OpenAI models and a heuristic fallback for others. Counters return content-only tokens; per-message ChatML overhead lives on the budget. - ``budget.py``: ``ModelBudget`` with per-model context windows, output reserve, and per-message overhead; longest-prefix model name resolution. - ``faults.py``: structured fault events (PAGE_EVICTED, PAGE_DEGRADED, REFETCH_FAULT, PHYSICAL_INSUFFICIENCY, PIN_REBALANCE) emitted to structlog plus an optional ``Out[FaultEvent]`` stream for live UI. - ``ingestion.py``: turns a LangChain ``BaseMessage`` into one or more pages. Multimodal HumanMessages split into a CONVERSATION page plus one EVIDENCE page per image; AIMessage ``tool_calls`` payloads are cost-counted uniformly across all four fidelity rungs. - ``selector.py``: two-phase greedy ``assemble_prompt`` that respects pin invariants and the effective (per-message-overhead-aware) budget. - ``engine.py``: ``MemoryEngine`` façade coordinating ingestion, assembly, pin rebalancing, fault emission, and artefact rehydration. - ``artefact_tool.py``: ``build_get_artefact_tool`` returns a ``StructuredTool`` an LLM can call to rehydrate a degraded EVIDENCE page back to FULL. Adds ``tiktoken>=0.8.0`` to the ``agents`` extra in ``pyproject.toml``. 168 unit tests ship alongside the package covering fidelity invariants, pin policy, token accounting, budget resolution, fault emission, multimodal split rules, selector edge cases, and the rehydration path.

Replaces the unbounded ``self._history: list[BaseMessage]`` with a ``MemoryEngine`` that ingests every inbound message, hands the LLM an assembled prompt that fits the active model's context window, and emits fault events whenever pages are degraded or evicted to make room. Changes: - New ``McpClientConfig`` fields (all optional, default-equivalent to the model's full budget): - ``token_budget``: override the per-model context window. - ``pin_recent_evidence`` (default 3): keep the N most recent image artefacts at FULL fidelity. - ``output_reserve_tokens`` (default 4096): carved out of the context window for the model's response. - New ``faults: Out[FaultEvent]`` stream on the module so operators can observe memory-layer events alongside the existing ``agent`` and ``agent_idle`` streams. - ``on_system_modules`` now appends the engine's ``get_artefact`` StructuredTool to the agent's tool list so the LLM can rehydrate a degraded EVIDENCE page back to FULL on demand. - ``_process_message`` ingests the inbound message through the engine and streams the assembled ``list[BaseMessage]`` into the state graph instead of the raw history. - ``_thread_loop`` wraps ``_process_message`` in ``try/except`` so a turn-level failure no longer kills the worker thread silently; the exception is forwarded to the fault stream via ``emit_physical_insufficiency`` and the loop continues draining queued messages. Fixes the original motivating bug: once the worker thread died on a ``context_length_exceeded`` or any other turn-level error, subsequent messages piled up in the queue unprocessed and ``agent_idle`` stayed at ``False`` forever.

…hread recovery Adds two integration tests that drive a real ``MemoryEngine`` from a bypass-constructed ``McpClient`` (no MCP server needed). - ``test_history_compaction_keeps_recent_images_at_full_under_pressure``: ingests a system bootstrap + three camera-snapshot images interleaved with 100 chatty text turns under a tight 8k token budget. Asserts that (1) the assembled prompt respects ``budget.effective_budget_for_messages(n_surviving)``, (2) the three most recent EVIDENCE pages stay at FULL fidelity, (3) older CONVERSATION pages get degraded and a PAGE_DEGRADED fault is emitted, and (4) ``physical_insufficient`` stays False. - ``test_thread_loop_recovers_from_process_message_exception``: regression guard for the motivating bug. Enqueues three messages and monkeypatches ``_process_message`` to raise on the second. Asserts that all three messages are processed (the thread survived), a PHYSICAL_INSUFFICIENCY fault carrying ``details["exception"]`` was emitted, ``agent_idle`` returns to True, and the worker exits cleanly on shutdown.

Adds a new ``Context Memory`` subsection under the Agent System block covering: the original unbounded-history problem, the paged multi-fidelity memory layer, the four fidelity rungs, the pin policy, the ``McpClient`` configuration knobs, the fault event taxonomy, the ``get_artefact`` tool, and the ``_thread_loop`` recovery contract. Also adds a Further Reading bullet pointing at the ``dimos.agents.memory`` package.

greptile-apps · 2026-04-24T20:27:22Z

Greptile Summary

This PR replaces the unbounded list[BaseMessage] history in McpClient with a paged, multi-fidelity MemoryEngine that keeps assembled prompts inside the model's context window, and wraps _process_message in a try/except so the worker thread survives turn-level failures. The architecture is well-designed — single-responsibility modules, strong Page invariants, correct tool-call pair atomicity in the selector, and a tolerant fault observer.

P1 — incomplete agent_idle fix: the try/finally that publishes agent_idle=True only wraps state_graph.stream(...). If engine.ingest() raises (e.g. TypeError from _role_of on an unsupported BaseMessage subclass), the exception escapes before that block, _thread_loop catches it, but agent_idle remains False forever — the exact stuck-idle symptom the PR targets. Moving the finally to wrap the entire post-publish(False) body of _process_message fixes this.
P2 — mutable Page objects in snapshots: PageTable.ordered() returns a shallow copy; callers hold references to the same Page instances that rebalance_evidence_pins mutates under its lock, making stale snapshots susceptible to torn reads.

Confidence Score: 3/5

Not safe to merge as-is: the agent_idle fix is incomplete and the stuck-idle symptom can still be triggered by an unexpected message type during ingestion.

One P1 defect present: the try/finally scope in _process_message does not cover engine.ingest() or engine.assemble(), so a TypeError from _role_of (reachable with any non-standard LangChain message subclass) leaves agent_idle permanently False — the exact symptom this PR sets out to fix. The ceiling for a P1 is 4; the defect sits directly on the motivating bug path, pulling the score to 3.

dimos/agents/mcp/mcp_client.py (_process_message try/finally scope) and dimos/agents/memory/ingestion.py (_role_of error handling) need attention before merge.

Important Files Changed

Filename	Overview
dimos/agents/mcp/mcp_client.py	Core integration: replaces `_history` list with `MemoryEngine`; adds fault stream. P1 bug: `try/finally` for `agent_idle.publish(True)` does not cover `ingest`/`assemble` calls, leaving `agent_idle` stuck at `False` on those failure paths.
dimos/agents/memory/engine.py	Façade wiring page table, counter, budget, and fault observer; well-structured single-responsibility design with proper locking and fault emission.
dimos/agents/memory/selector.py	Two-phase greedy selector with correct tool-call pair atomicity and pin handling; Phase-2 upgrade loop is O(N×U) which could degrade at scale but is documented.
dimos/agents/memory/page_table.py	Thread-safe page store with auto/manual pin rebalancing; `ordered()` returns shallow copies exposing mutable `Page` objects that can be seen mid-mutation by stale snapshots.
dimos/agents/memory/pages.py	Core data types with strong `__post_init__` invariants covering role, fidelity monotonicity, pin consistency, and tool-call pairing requirements.
dimos/agents/memory/ingestion.py	Converts `BaseMessage` into `Page` objects with four fidelity levels; `_role_of` raises `TypeError` for unrecognised subclasses, which can trigger the `agent_idle` stuck bug in `mcp_client.py`.
dimos/agents/memory/budget.py	Immutable `ModelBudget` with longest-prefix model lookup; `_normalize_model` comment overstates what suffixes are stripped but the resolve logic is correct.
dimos/agents/memory/faults.py	Fault event types and observer; correctly tolerant (never raises from emit), routes to both structlog and optional `Out` stream.
dimos/agents/memory/artefact_tool.py	LLM-callable rehydration tool; cleanly separates the get_artefact side-effect (marking a page for FULL on the next turn) from prompt assembly.
dimos/agents/memory/tokens.py	Token counter protocol with tiktoken (preferred) and character-heuristic fallback; image cost constants are fixed estimates, expected for budget approximations.

Sequence Diagram

sequenceDiagram
    participant Q as MessageQueue
    participant TL as _thread_loop
    participant PM as _process_message
    participant ME as MemoryEngine
    participant PT as PageTable
    participant SEL as assemble_prompt
    participant LG as LangGraph

    Q->>TL: message (dequeued)
    TL->>PM: _process_message(state_graph, message)
    PM->>ME: ingest(message)
    ME->>PT: add(page(s))
    ME->>PT: rebalance_evidence_pins()
    PT-->>ME: PinRebalance diff
    ME-->>PM: list[Page]
    PM->>ME: assemble()
    ME->>PT: ordered()
    ME->>SEL: assemble_prompt(pages, budget)
    SEL-->>ME: AssembledPrompt
    ME-->>PM: AssembledPrompt
    PM->>LG: state_graph.stream(messages)
    LG-->>PM: streamed node outputs
    PM->>ME: ingest(response_msg)
    PM->>PM: finally → agent_idle.publish(True)
    TL->>TL: except → emit_physical_insufficiency()

Comments Outside Diff (2)

dimos/agents/memory/page_table.py, line 604-607 (link)

Mutable Page objects returned by ordered() can be seen mid-mutation

PageTable.ordered() returns a shallow copy of _pages — so callers receive a list of the same Page instances held in the table. rebalance_evidence_pins and mark_pinned_full mutate those shared objects (setting pinned, pinned_at_full, min_fidelity) while holding _lock. Any code that retains a stale snapshot from a previous ordered() call can observe partially-mutated pages without the lock.

The MemoryEngine drives everything from one worker thread so the practical risk is low today, but the invariant is fragile. Consider freezing Page (@dataclass(frozen=True)) and replacing pins with a side-car mapping inside PageTable, or at minimum document that pages must never be retained across a lock boundary.
dimos/agents/memory/selector.py, line 1343-1413 (link)

Phase-2 upgrade loop is O(N²) over groups

_upgrade_until_full rescans every non-evicted group on every iteration to find the best upgrade candidate. With U upgrades applied and N groups, the loop is O(N×U). The docstring acknowledges this, but a long conversation with hundreds of tool-result pages will exhibit quadratic scan cost on every assemble() call. A priority-queue keyed on (delta, -turn_seq) would keep this O(N log N) if scale becomes a concern.

_{Reviews (1): Last reviewed commit: "docs(AGENTS.md): document the context me..." | Re-trigger Greptile}

greptile-apps · 2026-04-24T20:27:26Z

+        self,
+        state_graph: CompiledStateGraph[Any, Any, Any, Any],
+        message: BaseMessage,
    ) -> None:
        self.agent_idle.publish(False)
-        self._history.append(message)
+        # Ingest the inbound message into the memory engine; this becomes
+        # a Page (or several, for multimodal) and drives the next
+        # assemble. Edge case: under a physically-insufficient budget
+        # the fresh input itself may be degraded. The engine emits a
+        # PHYSICAL_INSUFFICIENCY fault in that case.
+        self._engine.ingest(message)
        pretty_print_langchain_message(message)
        self.agent.publish(message)

-        for update in state_graph.stream({"messages": self._history}, stream_mode="updates"):
-            for node_output in update.values():
-                for msg in node_output.get("messages", []):
-                    self._history.append(msg)
-                    pretty_print_langchain_message(msg)
-                    self.agent.publish(msg)
-
-        if self._message_queue.empty():
-            self.agent_idle.publish(True)
-
-
-def _append_image_to_history(
+        assembled = self._engine.assemble()
+        try:
+            for update in state_graph.stream(
+                {"messages": assembled.messages}, stream_mode="updates"
+            ):
+                for node_output in update.values():
+                    for msg in node_output.get("messages", []):
+                        self._engine.ingest(msg)
+                        pretty_print_langchain_message(msg)
+                        self.agent.publish(msg)
+        finally:
+            if self._message_queue.empty():
+                self.agent_idle.publish(True)
+
+
+def _queue_image_artefact_message(
    mcp_client: McpClient, func_name: str, uuid_: str, result: Any
 ) -> None:
+    """Queue a HumanMessage containing an image artefact returned by a
+    non-text tool result.
+
+    Ingestion splits this multimodal message into a CONVERSATION page
+    (with the preamble text) plus one EVIDENCE page per image artefact
+    automatically — see ``dimos.agents.memory.ingestion``. The caller
+    doesn't need to set a page type hint.
+    """
    mcp_client.add_message(
        HumanMessage(
            content=[


agent_idle stuck at False when ingest raises

The try/finally that resets agent_idle only wraps the state_graph.stream(...) call. If self._engine.ingest(message) or self._engine.assemble() raises — for example, ingestion._role_of raises TypeError for any BaseMessage subclass beyond the four explicit cases (FunctionMessage, etc.) — the exception propagates out of _process_message before the try block is ever entered. _thread_loop catches it (preventing thread death), but agent_idle remains False forever. This is the same stuck-idle symptom the PR intends to cure, re-introduced for a narrower but still real failure mode.

Move the finally to cover the entire post-publish(False) body:

def _process_message( self, state_graph: CompiledStateGraph[Any, Any, Any, Any], message: BaseMessage, ) -> None: self.agent_idle.publish(False) try: self._engine.ingest(message) pretty_print_langchain_message(message) self.agent.publish(message) assembled = self._engine.assemble() for update in state_graph.stream( {"messages": assembled.messages}, stream_mode="updates" ): for node_output in update.values(): for msg in node_output.get("messages", []): self._engine.ingest(msg) pretty_print_langchain_message(msg) self.agent.publish(msg) finally: if self._message_queue.empty(): self.agent_idle.publish(True)

greptile-apps · 2026-04-24T20:27:27Z

+def _default_provenance(msg: BaseMessage) -> str:
+    if isinstance(msg, SystemMessage):
+        return "system"
+    if isinstance(msg, HumanMessage):
+        return "user_input"
+    if isinstance(msg, AIMessage):
+        return "llm_response"
+    if isinstance(msg, ToolMessage):
+        return "tool_response"


_role_of raises TypeError for unrecognised message types

_role_of has an explicit raise TypeError(...) fall-through for any BaseMessage subclass beyond System/Human/AI/Tool. LangChain occasionally surfaces FunctionMessage or vendor-specific subclasses. When that happens, engine.ingest() raises, escaping _process_message before the try/finally that resets agent_idle. Combined with the companion comment on _process_message, this path is the concrete trigger for the stuck-idle bug.

Consider adding a fallback that maps unknown subclasses to "human" (or emits a structured warning and skips ingestion) rather than raising, so a single unexpected message type doesn't stall the agent.

greptile-apps · 2026-04-24T20:27:34Z

+    system_overhead: int = DEFAULT_SYSTEM_OVERHEAD,
+    per_message_overhead: int = DEFAULT_PER_MESSAGE_OVERHEAD,
+) -> ModelBudget:
+    """Resolve a :class:`ModelBudget` for *model_name*.
+
+    All arguments after ``model_name`` are keyword-only; callers must pass
+    ``override=...`` etc. explicitly.
+
+    Args:
+        model_name: Provider-qualified model string (e.g. ``"gpt-4o"`` or
+            ``"ollama:llama3.2:latest"``).
+        override: If set, use this as the ``context_window`` instead of the
+            table lookup. Useful when operators want to keep prompts
+            artificially small for cost / latency control.
+        output_reserve: Tokens held back for the response.
+        system_overhead: Formatting / tool-schema slack.
+        per_message_overhead: Per-message role/framing cost. See
+            :class:`ModelBudget`.


_normalize_model comment overstates what it strips

The docstring says it drops "OpenAI -YYYY-MM-DD date suffixes", but the implementation only strips the :tag suffix. Date-stamped strings like "gpt-4o-2024-08-06" pass through unchanged and are resolved by the longest-prefix fallback in resolve_budget. The fallback works correctly, but the mismatch makes the intended contract ambiguous for future editors. Either update the comment or add the date-suffix strip here.

Mihir Trivedi added 4 commits April 24, 2026 15:08

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): paged, multi-fidelity context memory for McpClient#1915

feat(agents): paged, multi-fidelity context memory for McpClient#1915
iamMihirT wants to merge 4 commits intodimensionalOS:devfrom
iamMihirT:feat/agent-context-memory

iamMihirT commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamMihirT commented Apr 24, 2026

TL;DR

Before / after

Usage

Component walkthrough

Testing

Not included in this PR

Reviewer test plan

Files changed

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading