fix(agents): stabilize context memory CI#1916
fix(agents): stabilize context memory CI#1916iamMihirT wants to merge 6 commits intodimensionalOS:devfrom
Conversation
Introduces a new ``dimos.agents.memory`` package that manages agent conversation state as a set of typed, multi-fidelity ``Page`` objects rather than an unbounded list of LangChain messages. Each page stores four fidelity rungs eagerly (POINTER, STRUCTURED, COMPRESSED, FULL), and a two-phase greedy selector picks per-page rungs to assemble a prompt that fits the active model's token budget. Core components: - ``pages.py``: ``Page`` dataclass + fidelity / page-type enums, with invariants enforced in ``__post_init__``. - ``page_table.py``: thread-safe store with explicit and auto pins; ``pin_recent_evidence`` keeps the N most recent EVIDENCE pages at FULL fidelity. - ``tokens.py``: ``TokenCounter`` protocol, tiktoken-backed counter for OpenAI models and a heuristic fallback for others. Counters return content-only tokens; per-message ChatML overhead lives on the budget. - ``budget.py``: ``ModelBudget`` with per-model context windows, output reserve, and per-message overhead; longest-prefix model name resolution. - ``faults.py``: structured fault events (PAGE_EVICTED, PAGE_DEGRADED, REFETCH_FAULT, PHYSICAL_INSUFFICIENCY, PIN_REBALANCE) emitted to structlog plus an optional ``Out[FaultEvent]`` stream for live UI. - ``ingestion.py``: turns a LangChain ``BaseMessage`` into one or more pages. Multimodal HumanMessages split into a CONVERSATION page plus one EVIDENCE page per image; AIMessage ``tool_calls`` payloads are cost-counted uniformly across all four fidelity rungs. - ``selector.py``: two-phase greedy ``assemble_prompt`` that respects pin invariants and the effective (per-message-overhead-aware) budget. - ``engine.py``: ``MemoryEngine`` façade coordinating ingestion, assembly, pin rebalancing, fault emission, and artefact rehydration. - ``artefact_tool.py``: ``build_get_artefact_tool`` returns a ``StructuredTool`` an LLM can call to rehydrate a degraded EVIDENCE page back to FULL. Adds ``tiktoken>=0.8.0`` to the ``agents`` extra in ``pyproject.toml``. 168 unit tests ship alongside the package covering fidelity invariants, pin policy, token accounting, budget resolution, fault emission, multimodal split rules, selector edge cases, and the rehydration path.
Replaces the unbounded ``self._history: list[BaseMessage]`` with a
``MemoryEngine`` that ingests every inbound message, hands the LLM an
assembled prompt that fits the active model's context window, and
emits fault events whenever pages are degraded or evicted to make room.
Changes:
- New ``McpClientConfig`` fields (all optional, default-equivalent to
the model's full budget):
- ``token_budget``: override the per-model context window.
- ``pin_recent_evidence`` (default 3): keep the N most recent image
artefacts at FULL fidelity.
- ``output_reserve_tokens`` (default 4096): carved out of the
context window for the model's response.
- New ``faults: Out[FaultEvent]`` stream on the module so operators
can observe memory-layer events alongside the existing ``agent``
and ``agent_idle`` streams.
- ``on_system_modules`` now appends the engine's ``get_artefact``
StructuredTool to the agent's tool list so the LLM can rehydrate a
degraded EVIDENCE page back to FULL on demand.
- ``_process_message`` ingests the inbound message through the engine
and streams the assembled ``list[BaseMessage]`` into the state graph
instead of the raw history.
- ``_thread_loop`` wraps ``_process_message`` in ``try/except`` so a
turn-level failure no longer kills the worker thread silently; the
exception is forwarded to the fault stream via
``emit_physical_insufficiency`` and the loop continues draining
queued messages.
Fixes the original motivating bug: once the worker thread died on a
``context_length_exceeded`` or any other turn-level error, subsequent
messages piled up in the queue unprocessed and ``agent_idle`` stayed
at ``False`` forever.
…hread recovery Adds two integration tests that drive a real ``MemoryEngine`` from a bypass-constructed ``McpClient`` (no MCP server needed). - ``test_history_compaction_keeps_recent_images_at_full_under_pressure``: ingests a system bootstrap + three camera-snapshot images interleaved with 100 chatty text turns under a tight 8k token budget. Asserts that (1) the assembled prompt respects ``budget.effective_budget_for_messages(n_surviving)``, (2) the three most recent EVIDENCE pages stay at FULL fidelity, (3) older CONVERSATION pages get degraded and a PAGE_DEGRADED fault is emitted, and (4) ``physical_insufficient`` stays False. - ``test_thread_loop_recovers_from_process_message_exception``: regression guard for the motivating bug. Enqueues three messages and monkeypatches ``_process_message`` to raise on the second. Asserts that all three messages are processed (the thread survived), a PHYSICAL_INSUFFICIENCY fault carrying ``details["exception"]`` was emitted, ``agent_idle`` returns to True, and the worker exits cleanly on shutdown.
Adds a new ``Context Memory`` subsection under the Agent System block covering: the original unbounded-history problem, the paged multi-fidelity memory layer, the four fidelity rungs, the pin policy, the ``McpClient`` configuration knobs, the fault event taxonomy, the ``get_artefact`` tool, and the ``_thread_loop`` recovery contract. Also adds a Further Reading bullet pointing at the ``dimos.agents.memory`` package.
Greptile SummaryThis PR stabilizes CI for the paged multi-fidelity memory layer introduced in #1899. The three substantive fixes are: (1) moving Confidence Score: 4/5Safe to merge; all findings are style/P2 — no logic bugs or regressions identified. The three primary CI fixes (finally-block idle restore, thread-loop exception guard, monotonic token clamping) are correct and well-tested. Only P2 issues were found: a misleading comment in
Important Files Changed
Sequence DiagramsequenceDiagram
participant Q as MessageQueue
participant TL as _thread_loop
participant PM as _process_message
participant ME as MemoryEngine
participant PT as PageTable
participant SG as StateGraph
participant FO as FaultObserver
Q->>TL: get() message
TL->>PM: _process_message(state_graph, message)
PM->>PM: agent_idle.publish(False)
rect rgb(230, 245, 255)
note over PM: try block
PM->>ME: ingest(message)
ME->>PT: add(page)
ME->>PT: rebalance_evidence_pins()
PT-->>ME: PinRebalance diff
ME->>FO: pin_rebalance(...)
ME-->>PM: list[Page]
PM->>ME: assemble()
ME->>PT: ordered()
PT-->>ME: list[Page]
ME->>ME: assemble_prompt(pages, budget)
ME->>FO: evict/degrade/physical_insufficiency faults
ME-->>PM: AssembledPrompt
PM->>SG: stream({messages: assembled.messages})
loop each update
SG-->>PM: node_output messages
PM->>ME: ingest(response_msg)
end
end
rect rgb(255, 245, 230)
note over PM: finally block
PM->>Q: empty()?
alt queue empty
PM->>PM: agent_idle.publish(True)
end
end
alt exception raised
PM-->>TL: Exception propagates
TL->>ME: emit_physical_insufficiency(exception)
ME->>FO: physical_insufficiency(needed=-1, ...)
TL->>TL: log + continue loop
end
|
| id: Stable internal identifier, unique per page within a session. | ||
| type: What kind of state this page carries (see :class:`PageType`). | ||
| provenance: Human-readable origin string, e.g. ``"user_input"``, | ||
| ``"tool:camera.capture"``, ``"system"``. | ||
| turn_seq: Monotonic per-agent turn counter assigned at ingestion. | ||
| ts: ``time.time()`` snapshot at ingestion. |
There was a problem hiding this comment.
Mutable dataclass with invariants enforced only at construction
Page is a plain (non-frozen) @dataclass, so callers can mutate fields like pinned_at_full, pinned, and min_fidelity after construction without __post_init__ re-running. The three mutually-consistent pin fields are currently always written together in page_table.rebalance_evidence_pins and mark_pinned_full, so the invariant holds today. Any future caller that sets only one of the three (e.g., page.pinned_at_full = True without updating pinned or min_fidelity) will silently violate the constraint. Consider either making Page frozen and mutating pin state via a method on PageTable, or adding a dedicated helper like Page.set_pinned_full() that enforces the invariant atomically.
Description
Fixes CI issues in the agent context memory PR without changing the memory plan architecture.
Changes:
agent_idlecleanup in afinallyaround memory ingest, assemble, and graph streaming.__init__.py.How to Test
Also verified direct ingestion of an empty
AIMessagewithtool_callsno longer raises the monotonic token estimate error.Contributor License Agreement
Refs #1899