Skip to content

fix(agents): stabilize context memory CI#1916

Open
iamMihirT wants to merge 6 commits intodimensionalOS:devfrom
iamMihirT:fix/agent-context-memory-ci
Open

fix(agents): stabilize context memory CI#1916
iamMihirT wants to merge 6 commits intodimensionalOS:devfrom
iamMihirT:fix/agent-context-memory-ci

Conversation

@iamMihirT
Copy link
Copy Markdown

Description

Fixes CI issues in the agent context memory PR without changing the memory plan architecture.

Changes:

  • Keep agent_idle cleanup in a finally around memory ingest, assemble, and graph streaming.
  • Make text memory token estimates monotonic for empty/short tool-call messages.
  • Remove the forbidden memory package __init__.py.
  • Remove section marker comments and apply repo formatting/lint cleanup.
  • Add small typing cleanups surfaced by mypy in the memory code.

How to Test

env UV_CACHE_DIR=/tmp/uv-cache uv run --extra dev pre-commit run --all-files
env UV_CACHE_DIR=/tmp/uv-cache uv run pytest --noconftest dimos/agents/memory/ dimos/agents/mcp/test_mcp_client_history_compaction.py -q

Also verified direct ingestion of an empty AIMessage with tool_calls no longer raises the monotonic token estimate error.

Contributor License Agreement

  • I have read and approved the CLA.

Refs #1899

Mihir Trivedi added 6 commits April 24, 2026 15:08
Introduces a new ``dimos.agents.memory`` package that manages agent
conversation state as a set of typed, multi-fidelity ``Page`` objects
rather than an unbounded list of LangChain messages. Each page stores
four fidelity rungs eagerly (POINTER, STRUCTURED, COMPRESSED, FULL),
and a two-phase greedy selector picks per-page rungs to assemble a
prompt that fits the active model's token budget.

Core components:

- ``pages.py``: ``Page`` dataclass + fidelity / page-type enums, with
  invariants enforced in ``__post_init__``.
- ``page_table.py``: thread-safe store with explicit and auto pins;
  ``pin_recent_evidence`` keeps the N most recent EVIDENCE pages at
  FULL fidelity.
- ``tokens.py``: ``TokenCounter`` protocol, tiktoken-backed counter for
  OpenAI models and a heuristic fallback for others. Counters return
  content-only tokens; per-message ChatML overhead lives on the budget.
- ``budget.py``: ``ModelBudget`` with per-model context windows,
  output reserve, and per-message overhead; longest-prefix model name
  resolution.
- ``faults.py``: structured fault events (PAGE_EVICTED, PAGE_DEGRADED,
  REFETCH_FAULT, PHYSICAL_INSUFFICIENCY, PIN_REBALANCE) emitted to
  structlog plus an optional ``Out[FaultEvent]`` stream for live UI.
- ``ingestion.py``: turns a LangChain ``BaseMessage`` into one or more
  pages. Multimodal HumanMessages split into a CONVERSATION page plus
  one EVIDENCE page per image; AIMessage ``tool_calls`` payloads are
  cost-counted uniformly across all four fidelity rungs.
- ``selector.py``: two-phase greedy ``assemble_prompt`` that respects
  pin invariants and the effective (per-message-overhead-aware)
  budget.
- ``engine.py``: ``MemoryEngine`` façade coordinating ingestion,
  assembly, pin rebalancing, fault emission, and artefact rehydration.
- ``artefact_tool.py``: ``build_get_artefact_tool`` returns a
  ``StructuredTool`` an LLM can call to rehydrate a degraded EVIDENCE
  page back to FULL.

Adds ``tiktoken>=0.8.0`` to the ``agents`` extra in ``pyproject.toml``.

168 unit tests ship alongside the package covering fidelity invariants,
pin policy, token accounting, budget resolution, fault emission,
multimodal split rules, selector edge cases, and the rehydration path.
Replaces the unbounded ``self._history: list[BaseMessage]`` with a
``MemoryEngine`` that ingests every inbound message, hands the LLM an
assembled prompt that fits the active model's context window, and
emits fault events whenever pages are degraded or evicted to make room.

Changes:

- New ``McpClientConfig`` fields (all optional, default-equivalent to
  the model's full budget):
  - ``token_budget``: override the per-model context window.
  - ``pin_recent_evidence`` (default 3): keep the N most recent image
    artefacts at FULL fidelity.
  - ``output_reserve_tokens`` (default 4096): carved out of the
    context window for the model's response.
- New ``faults: Out[FaultEvent]`` stream on the module so operators
  can observe memory-layer events alongside the existing ``agent``
  and ``agent_idle`` streams.
- ``on_system_modules`` now appends the engine's ``get_artefact``
  StructuredTool to the agent's tool list so the LLM can rehydrate a
  degraded EVIDENCE page back to FULL on demand.
- ``_process_message`` ingests the inbound message through the engine
  and streams the assembled ``list[BaseMessage]`` into the state graph
  instead of the raw history.
- ``_thread_loop`` wraps ``_process_message`` in ``try/except`` so a
  turn-level failure no longer kills the worker thread silently; the
  exception is forwarded to the fault stream via
  ``emit_physical_insufficiency`` and the loop continues draining
  queued messages.

Fixes the original motivating bug: once the worker thread died on a
``context_length_exceeded`` or any other turn-level error, subsequent
messages piled up in the queue unprocessed and ``agent_idle`` stayed
at ``False`` forever.
…hread recovery

Adds two integration tests that drive a real ``MemoryEngine`` from a
bypass-constructed ``McpClient`` (no MCP server needed).

- ``test_history_compaction_keeps_recent_images_at_full_under_pressure``:
  ingests a system bootstrap + three camera-snapshot images
  interleaved with 100 chatty text turns under a tight 8k token budget.
  Asserts that (1) the assembled prompt respects
  ``budget.effective_budget_for_messages(n_surviving)``, (2) the three
  most recent EVIDENCE pages stay at FULL fidelity, (3) older
  CONVERSATION pages get degraded and a PAGE_DEGRADED fault is emitted,
  and (4) ``physical_insufficient`` stays False.

- ``test_thread_loop_recovers_from_process_message_exception``:
  regression guard for the motivating bug. Enqueues three messages and
  monkeypatches ``_process_message`` to raise on the second. Asserts
  that all three messages are processed (the thread survived), a
  PHYSICAL_INSUFFICIENCY fault carrying ``details["exception"]`` was
  emitted, ``agent_idle`` returns to True, and the worker exits
  cleanly on shutdown.
Adds a new ``Context Memory`` subsection under the Agent System block
covering: the original unbounded-history problem, the paged
multi-fidelity memory layer, the four fidelity rungs, the pin policy,
the ``McpClient`` configuration knobs, the fault event taxonomy, the
``get_artefact`` tool, and the ``_thread_loop`` recovery contract.

Also adds a Further Reading bullet pointing at the ``dimos.agents.memory``
package.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR stabilizes CI for the paged multi-fidelity memory layer introduced in #1899. The three substantive fixes are: (1) moving agent_idle.publish(True) into a finally block inside _process_message so idle state is always restored even on exception; (2) wrapping the _process_message call in _thread_loop with try/except so turn-level failures no longer kill the worker thread; and (3) clamping token estimates with max() in the representation builders to enforce POINTER ≤ STRUCTURED ≤ COMPRESSED ≤ FULL monotonicity for empty or tool-call-only AIMessage content. The remaining changes are lint/typing cleanup, removal of the forbidden __init__.py, and AGENTS.md documentation for the memory subsystem.

Confidence Score: 4/5

Safe to merge; all findings are style/P2 — no logic bugs or regressions identified.

The three primary CI fixes (finally-block idle restore, thread-loop exception guard, monotonic token clamping) are correct and well-tested. Only P2 issues were found: a misleading comment in _normalize_model, a timing-dependent sleep in one integration test, and mutable Page fields that bypass __post_init__ invariants when mutated externally (currently done correctly, but fragile for future refactors). No security concerns or data-loss paths identified.

dimos/agents/memory/pages.py (mutable dataclass pin invariants), dimos/agents/mcp/test_mcp_client_history_compaction.py (timing-dependent sleep), dimos/agents/memory/budget.py (misleading comment).

Important Files Changed

Filename Overview
dimos/agents/mcp/mcp_client.py Core fix: adds try/finally around _process_message to always restore agent_idle, and wraps the call site in _thread_loop with try/except to prevent thread death on turn exceptions; replaces bare _history list with MemoryEngine.
dimos/agents/memory/engine.py New façade: owns PageTable, token counter, FaultObserver, and turn counter; exposes ingest, assemble, and emit_physical_insufficiency; thread-safe via RLock shared with PageTable.
dimos/agents/memory/ingestion.py New ingestor: converts BaseMessage to Page(s); applies monotonic-clamp fix for empty/short AIMessage tool-call content; handles multimodal splitting into CONVERSATION + EVIDENCE pages.
dimos/agents/memory/pages.py Core data types: mutable @dataclass with __post_init__ invariants that can be bypassed by direct field mutation; pin invariants are currently upheld externally but not enforced at the Page level.
dimos/agents/memory/selector.py Two-phase greedy selector: Phase 1 evicts least-valuable non-pinned groups; Phase 2 upgrades survivors; correctly accounts for per-message ChatML overhead via floating effective cap.
dimos/agents/memory/page_table.py Thread-safe page store with O(1) id/artefact lookups; correctly distinguishes auto-pinned (rebalanceable) from manually-pinned (one-way upgrade) pages via _auto_pinned_ids.
dimos/agents/memory/tokens.py Clean separation: counters return content-only tokens; per-message ChatML prelude cost is exclusively owned by ModelBudget.effective_budget_for_messages; tiktoken/heuristic fallback is well-guarded.
dimos/agents/memory/budget.py Model context-window registry with longest-prefix matching for versioned model names; _normalize_model comment overstates date-suffix stripping (only strips :tag, not -YYYY-MM-DD).
dimos/agents/memory/faults.py Fault observability: FaultObserver routes events to structlog + optional Out stream; never raises; convenience emitters keep call sites clean; counts dict supports test assertions.
dimos/agents/mcp/test_mcp_client_history_compaction.py Good integration coverage; test 2 (test_thread_loop_recovers_from_process_message_exception) uses a fixed time.sleep(0.1) to wait for idle-state publication, which can flake under scheduler pressure.
dimos/agents/memory/artefact_tool.py LangChain tool for EVIDENCE rehydration; clear failure message on unknown UUID; deferred langchain_core import avoids hard dependency at package import time.

Sequence Diagram

sequenceDiagram
    participant Q as MessageQueue
    participant TL as _thread_loop
    participant PM as _process_message
    participant ME as MemoryEngine
    participant PT as PageTable
    participant SG as StateGraph
    participant FO as FaultObserver

    Q->>TL: get() message
    TL->>PM: _process_message(state_graph, message)
    PM->>PM: agent_idle.publish(False)
    
    rect rgb(230, 245, 255)
        note over PM: try block
        PM->>ME: ingest(message)
        ME->>PT: add(page)
        ME->>PT: rebalance_evidence_pins()
        PT-->>ME: PinRebalance diff
        ME->>FO: pin_rebalance(...)
        ME-->>PM: list[Page]
        
        PM->>ME: assemble()
        ME->>PT: ordered()
        PT-->>ME: list[Page]
        ME->>ME: assemble_prompt(pages, budget)
        ME->>FO: evict/degrade/physical_insufficiency faults
        ME-->>PM: AssembledPrompt

        PM->>SG: stream({messages: assembled.messages})
        loop each update
            SG-->>PM: node_output messages
            PM->>ME: ingest(response_msg)
        end
    end

    rect rgb(255, 245, 230)
        note over PM: finally block
        PM->>Q: empty()?
        alt queue empty
            PM->>PM: agent_idle.publish(True)
        end
    end

    alt exception raised
        PM-->>TL: Exception propagates
        TL->>ME: emit_physical_insufficiency(exception)
        ME->>FO: physical_insufficiency(needed=-1, ...)
        TL->>TL: log + continue loop
    end
Loading

Comments Outside Diff (2)

  1. dimos/agents/memory/budget.py, line 926-929 (link)

    P2 Misleading comment about date-suffix stripping

    The comment says "Drop ollama :tag and OpenAI -YYYY-MM-DD date suffixes" but the code only strips colon-delimited tags (:tag). Hyphenated date suffixes like gpt-4o-2024-08-06 are not stripped here — they fall through to the longest-prefix matching below, which correctly resolves gpt-4o-2024-08-06gpt-4o. The code works, but the comment overstates what this block does and will confuse future maintainers.

  2. dimos/agents/mcp/test_mcp_client_history_compaction.py, line 543-549 (link)

    P2 Timing-dependent sleep between queue drain and final assertion

    After the queue empties, time.sleep(0.1) is used to let the worker thread publish agent_idle=True. Under heavy scheduler load the worker may not complete fake_process(turn 3) within 100 ms, causing the idle-history assertion below to see False as the last value. A more robust approach is to poll fake_idle.idle_history with a short timeout (similar to how the queue drain is already polled) or use a threading.Event that fake_process sets when idle=True is published.

Reviews (1): Last reviewed commit: "fix(agents): stabilize context memory ci" | Re-trigger Greptile

Comment on lines +150 to +155
id: Stable internal identifier, unique per page within a session.
type: What kind of state this page carries (see :class:`PageType`).
provenance: Human-readable origin string, e.g. ``"user_input"``,
``"tool:camera.capture"``, ``"system"``.
turn_seq: Monotonic per-agent turn counter assigned at ingestion.
ts: ``time.time()`` snapshot at ingestion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Mutable dataclass with invariants enforced only at construction

Page is a plain (non-frozen) @dataclass, so callers can mutate fields like pinned_at_full, pinned, and min_fidelity after construction without __post_init__ re-running. The three mutually-consistent pin fields are currently always written together in page_table.rebalance_evidence_pins and mark_pinned_full, so the invariant holds today. Any future caller that sets only one of the three (e.g., page.pinned_at_full = True without updating pinned or min_fidelity) will silently violate the constraint. Consider either making Page frozen and mutating pin state via a method on PageTable, or adding a dedicated helper like Page.set_pinned_full() that enforces the invariant atomically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant