fix(agents): stabilize context memory CI by iamMihirT · Pull Request #1916 · dimensionalOS/dimos

iamMihirT · 2026-04-24T23:20:45Z

Description

Fixes CI issues in the agent context memory PR without changing the memory plan architecture.

Changes:

Keep agent_idle cleanup in a finally around memory ingest, assemble, and graph streaming.
Make text memory token estimates monotonic for empty/short tool-call messages.
Remove the forbidden memory package __init__.py.
Remove section marker comments and apply repo formatting/lint cleanup.
Add small typing cleanups surfaced by mypy in the memory code.

How to Test

env UV_CACHE_DIR=/tmp/uv-cache uv run --extra dev pre-commit run --all-files
env UV_CACHE_DIR=/tmp/uv-cache uv run pytest --noconftest dimos/agents/memory/ dimos/agents/mcp/test_mcp_client_history_compaction.py -q

Also verified direct ingestion of an empty AIMessage with tool_calls no longer raises the monotonic token estimate error.

Contributor License Agreement

I have read and approved the CLA.

Refs #1899

Introduces a new ``dimos.agents.memory`` package that manages agent conversation state as a set of typed, multi-fidelity ``Page`` objects rather than an unbounded list of LangChain messages. Each page stores four fidelity rungs eagerly (POINTER, STRUCTURED, COMPRESSED, FULL), and a two-phase greedy selector picks per-page rungs to assemble a prompt that fits the active model's token budget. Core components: - ``pages.py``: ``Page`` dataclass + fidelity / page-type enums, with invariants enforced in ``__post_init__``. - ``page_table.py``: thread-safe store with explicit and auto pins; ``pin_recent_evidence`` keeps the N most recent EVIDENCE pages at FULL fidelity. - ``tokens.py``: ``TokenCounter`` protocol, tiktoken-backed counter for OpenAI models and a heuristic fallback for others. Counters return content-only tokens; per-message ChatML overhead lives on the budget. - ``budget.py``: ``ModelBudget`` with per-model context windows, output reserve, and per-message overhead; longest-prefix model name resolution. - ``faults.py``: structured fault events (PAGE_EVICTED, PAGE_DEGRADED, REFETCH_FAULT, PHYSICAL_INSUFFICIENCY, PIN_REBALANCE) emitted to structlog plus an optional ``Out[FaultEvent]`` stream for live UI. - ``ingestion.py``: turns a LangChain ``BaseMessage`` into one or more pages. Multimodal HumanMessages split into a CONVERSATION page plus one EVIDENCE page per image; AIMessage ``tool_calls`` payloads are cost-counted uniformly across all four fidelity rungs. - ``selector.py``: two-phase greedy ``assemble_prompt`` that respects pin invariants and the effective (per-message-overhead-aware) budget. - ``engine.py``: ``MemoryEngine`` façade coordinating ingestion, assembly, pin rebalancing, fault emission, and artefact rehydration. - ``artefact_tool.py``: ``build_get_artefact_tool`` returns a ``StructuredTool`` an LLM can call to rehydrate a degraded EVIDENCE page back to FULL. Adds ``tiktoken>=0.8.0`` to the ``agents`` extra in ``pyproject.toml``. 168 unit tests ship alongside the package covering fidelity invariants, pin policy, token accounting, budget resolution, fault emission, multimodal split rules, selector edge cases, and the rehydration path.

Replaces the unbounded ``self._history: list[BaseMessage]`` with a ``MemoryEngine`` that ingests every inbound message, hands the LLM an assembled prompt that fits the active model's context window, and emits fault events whenever pages are degraded or evicted to make room. Changes: - New ``McpClientConfig`` fields (all optional, default-equivalent to the model's full budget): - ``token_budget``: override the per-model context window. - ``pin_recent_evidence`` (default 3): keep the N most recent image artefacts at FULL fidelity. - ``output_reserve_tokens`` (default 4096): carved out of the context window for the model's response. - New ``faults: Out[FaultEvent]`` stream on the module so operators can observe memory-layer events alongside the existing ``agent`` and ``agent_idle`` streams. - ``on_system_modules`` now appends the engine's ``get_artefact`` StructuredTool to the agent's tool list so the LLM can rehydrate a degraded EVIDENCE page back to FULL on demand. - ``_process_message`` ingests the inbound message through the engine and streams the assembled ``list[BaseMessage]`` into the state graph instead of the raw history. - ``_thread_loop`` wraps ``_process_message`` in ``try/except`` so a turn-level failure no longer kills the worker thread silently; the exception is forwarded to the fault stream via ``emit_physical_insufficiency`` and the loop continues draining queued messages. Fixes the original motivating bug: once the worker thread died on a ``context_length_exceeded`` or any other turn-level error, subsequent messages piled up in the queue unprocessed and ``agent_idle`` stayed at ``False`` forever.

…hread recovery Adds two integration tests that drive a real ``MemoryEngine`` from a bypass-constructed ``McpClient`` (no MCP server needed). - ``test_history_compaction_keeps_recent_images_at_full_under_pressure``: ingests a system bootstrap + three camera-snapshot images interleaved with 100 chatty text turns under a tight 8k token budget. Asserts that (1) the assembled prompt respects ``budget.effective_budget_for_messages(n_surviving)``, (2) the three most recent EVIDENCE pages stay at FULL fidelity, (3) older CONVERSATION pages get degraded and a PAGE_DEGRADED fault is emitted, and (4) ``physical_insufficient`` stays False. - ``test_thread_loop_recovers_from_process_message_exception``: regression guard for the motivating bug. Enqueues three messages and monkeypatches ``_process_message`` to raise on the second. Asserts that all three messages are processed (the thread survived), a PHYSICAL_INSUFFICIENCY fault carrying ``details["exception"]`` was emitted, ``agent_idle`` returns to True, and the worker exits cleanly on shutdown.

Adds a new ``Context Memory`` subsection under the Agent System block covering: the original unbounded-history problem, the paged multi-fidelity memory layer, the four fidelity rungs, the pin policy, the ``McpClient`` configuration knobs, the fault event taxonomy, the ``get_artefact`` tool, and the ``_thread_loop`` recovery contract. Also adds a Further Reading bullet pointing at the ``dimos.agents.memory`` package.

…ry-ci

greptile-apps · 2026-04-24T23:26:36Z

Greptile Summary

This PR stabilizes CI for the paged multi-fidelity memory layer introduced in #1899. The three substantive fixes are: (1) moving agent_idle.publish(True) into a finally block inside _process_message so idle state is always restored even on exception; (2) wrapping the _process_message call in _thread_loop with try/except so turn-level failures no longer kill the worker thread; and (3) clamping token estimates with max() in the representation builders to enforce POINTER ≤ STRUCTURED ≤ COMPRESSED ≤ FULL monotonicity for empty or tool-call-only AIMessage content. The remaining changes are lint/typing cleanup, removal of the forbidden __init__.py, and AGENTS.md documentation for the memory subsystem.

Confidence Score: 4/5

Safe to merge; all findings are style/P2 — no logic bugs or regressions identified.

The three primary CI fixes (finally-block idle restore, thread-loop exception guard, monotonic token clamping) are correct and well-tested. Only P2 issues were found: a misleading comment in _normalize_model, a timing-dependent sleep in one integration test, and mutable Page fields that bypass __post_init__ invariants when mutated externally (currently done correctly, but fragile for future refactors). No security concerns or data-loss paths identified.

dimos/agents/memory/pages.py (mutable dataclass pin invariants), dimos/agents/mcp/test_mcp_client_history_compaction.py (timing-dependent sleep), dimos/agents/memory/budget.py (misleading comment).

Important Files Changed

Filename	Overview
dimos/agents/mcp/mcp_client.py	Core fix: adds `try/finally` around `_process_message` to always restore `agent_idle`, and wraps the call site in `_thread_loop` with `try/except` to prevent thread death on turn exceptions; replaces bare `_history` list with `MemoryEngine`.
dimos/agents/memory/engine.py	New façade: owns PageTable, token counter, FaultObserver, and turn counter; exposes `ingest`, `assemble`, and `emit_physical_insufficiency`; thread-safe via RLock shared with PageTable.
dimos/agents/memory/ingestion.py	New ingestor: converts BaseMessage to Page(s); applies monotonic-clamp fix for empty/short AIMessage tool-call content; handles multimodal splitting into CONVERSATION + EVIDENCE pages.
dimos/agents/memory/pages.py	Core data types: mutable `@dataclass` with `__post_init__` invariants that can be bypassed by direct field mutation; pin invariants are currently upheld externally but not enforced at the `Page` level.
dimos/agents/memory/selector.py	Two-phase greedy selector: Phase 1 evicts least-valuable non-pinned groups; Phase 2 upgrades survivors; correctly accounts for per-message ChatML overhead via floating effective cap.
dimos/agents/memory/page_table.py	Thread-safe page store with O(1) id/artefact lookups; correctly distinguishes auto-pinned (rebalanceable) from manually-pinned (one-way upgrade) pages via `_auto_pinned_ids`.
dimos/agents/memory/tokens.py	Clean separation: counters return content-only tokens; per-message ChatML prelude cost is exclusively owned by `ModelBudget.effective_budget_for_messages`; tiktoken/heuristic fallback is well-guarded.
dimos/agents/memory/budget.py	Model context-window registry with longest-prefix matching for versioned model names; `_normalize_model` comment overstates date-suffix stripping (only strips `:tag`, not `-YYYY-MM-DD`).
dimos/agents/memory/faults.py	Fault observability: FaultObserver routes events to structlog + optional Out stream; never raises; convenience emitters keep call sites clean; `counts` dict supports test assertions.
dimos/agents/mcp/test_mcp_client_history_compaction.py	Good integration coverage; test 2 (`test_thread_loop_recovers_from_process_message_exception`) uses a fixed `time.sleep(0.1)` to wait for idle-state publication, which can flake under scheduler pressure.
dimos/agents/memory/artefact_tool.py	LangChain tool for EVIDENCE rehydration; clear failure message on unknown UUID; deferred langchain_core import avoids hard dependency at package import time.

Sequence Diagram

sequenceDiagram
    participant Q as MessageQueue
    participant TL as _thread_loop
    participant PM as _process_message
    participant ME as MemoryEngine
    participant PT as PageTable
    participant SG as StateGraph
    participant FO as FaultObserver

    Q->>TL: get() message
    TL->>PM: _process_message(state_graph, message)
    PM->>PM: agent_idle.publish(False)
    
    rect rgb(230, 245, 255)
        note over PM: try block
        PM->>ME: ingest(message)
        ME->>PT: add(page)
        ME->>PT: rebalance_evidence_pins()
        PT-->>ME: PinRebalance diff
        ME->>FO: pin_rebalance(...)
        ME-->>PM: list[Page]
        
        PM->>ME: assemble()
        ME->>PT: ordered()
        PT-->>ME: list[Page]
        ME->>ME: assemble_prompt(pages, budget)
        ME->>FO: evict/degrade/physical_insufficiency faults
        ME-->>PM: AssembledPrompt

        PM->>SG: stream({messages: assembled.messages})
        loop each update
            SG-->>PM: node_output messages
            PM->>ME: ingest(response_msg)
        end
    end

    rect rgb(255, 245, 230)
        note over PM: finally block
        PM->>Q: empty()?
        alt queue empty
            PM->>PM: agent_idle.publish(True)
        end
    end

    alt exception raised
        PM-->>TL: Exception propagates
        TL->>ME: emit_physical_insufficiency(exception)
        ME->>FO: physical_insufficiency(needed=-1, ...)
        TL->>TL: log + continue loop
    end

Comments Outside Diff (2)

dimos/agents/memory/budget.py, line 926-929 (link)

Misleading comment about date-suffix stripping

The comment says "Drop ollama :tag and OpenAI -YYYY-MM-DD date suffixes" but the code only strips colon-delimited tags (:tag). Hyphenated date suffixes like gpt-4o-2024-08-06 are not stripped here — they fall through to the longest-prefix matching below, which correctly resolves gpt-4o-2024-08-06 → gpt-4o. The code works, but the comment overstates what this block does and will confuse future maintainers.
dimos/agents/mcp/test_mcp_client_history_compaction.py, line 543-549 (link)

Timing-dependent sleep between queue drain and final assertion

After the queue empties, time.sleep(0.1) is used to let the worker thread publish agent_idle=True. Under heavy scheduler load the worker may not complete fake_process(turn 3) within 100 ms, causing the idle-history assertion below to see False as the last value. A more robust approach is to poll fake_idle.idle_history with a short timeout (similar to how the queue drain is already polled) or use a threading.Event that fake_process sets when idle=True is published.

_{Reviews (1): Last reviewed commit: "fix(agents): stabilize context memory ci" | Re-trigger Greptile}

greptile-apps · 2026-04-24T23:26:48Z

+        id: Stable internal identifier, unique per page within a session.
+        type: What kind of state this page carries (see :class:`PageType`).
+        provenance: Human-readable origin string, e.g. ``"user_input"``,
+            ``"tool:camera.capture"``, ``"system"``.
+        turn_seq: Monotonic per-agent turn counter assigned at ingestion.
+        ts: ``time.time()`` snapshot at ingestion.


Mutable dataclass with invariants enforced only at construction

Page is a plain (non-frozen) @dataclass, so callers can mutate fields like pinned_at_full, pinned, and min_fidelity after construction without __post_init__ re-running. The three mutually-consistent pin fields are currently always written together in page_table.rebalance_evidence_pins and mark_pinned_full, so the invariant holds today. Any future caller that sets only one of the three (e.g., page.pinned_at_full = True without updating pinned or min_fidelity) will silently violate the constraint. Consider either making Page frozen and mutating pin state via a method on PageTable, or adding a dedicated helper like Page.set_pinned_full() that enforces the invariant atomically.

Mihir Trivedi added 6 commits April 24, 2026 15:08

Merge remote-tracking branch 'origin/dev' into fix/agent-context-memo…

c6f50bb

…ry-ci

fix(agents): stabilize context memory ci

d8cbff5

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): stabilize context memory CI#1916

fix(agents): stabilize context memory CI#1916
iamMihirT wants to merge 6 commits intodimensionalOS:devfrom
iamMihirT:fix/agent-context-memory-ci

iamMihirT commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamMihirT commented Apr 24, 2026

Description

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading