Feature/agent compaction 1899#1923
Conversation
- Add `max_history_tokens` to `AgentConfig` and `McpClientConfig` to allow bounding the LLM context window. - Implement in-memory conversation pruning using LangChain's `trim_messages` before running the state graph. - Configure trimming to preserve the system prompt (`include_system=True`) and strictly maintain matched ToolCall/ToolMessage pairs (`allow_partial=False`) to prevent API `BadRequestError`s. - Add a custom `_estimate_tokens` heuristic to gracefully estimate token counts for both text and multimodal/JSON tool artifacts without requiring an instantiated model tokenizer. Co-authored-by: Copilot <copilot@github.com>
- Extract `_estimate_tokens` from `Agent` loop into `dimos/agents/utils.py` to allow isolated unit testing. - Create `dimos/agents/test_compaction.py` utilizing pytest to verify boundaries against real LangChain structured messages. - Assert `allow_partial=False` safely drops both `AIMessage` containing tool calls and subsequent `ToolMessage` artifacts concurrently so orphaned APIs do not crash execution bounds. - Assert `include_system=True` retains robot system prompt despite aggressive `max_history_tokens` pruning heuristics. Co-authored-by: Copilot <copilot@github.com>
- Created `bin/test-compaction` bash script to automatically load-test the agent history truncation under heavy token limits. - Script dynamically lowers `max_history_tokens` via sed, spawns a daemon simulation, and hammers the agent with spam text to force context limits.
error: Distribution `pyrealsense2==2.56.5.9235 @ registry+https://pypi.org/simple` doesn't have a source distribution or wheel for the current platform i.e. (`macosx_26_0_arm64`)
Co-authored-by: Copilot <copilot@github.com>
Greptile SummaryThis PR adds configurable history compaction to both
Confidence Score: 3/5Core compaction logic is sound but the E2E validation script is broken on Linux/CI, leaving the only automated integration test non-functional on the recommended test platform. One P1 (platform-incompatible sed command in the E2E test script) brings the ceiling to 4; the additional P2s (import ordering in utils.py, unconditional copy in the disabled path) pull the score down one more notch to 3. bin/test-compaction needs a cross-platform sed fix; dimos/agents/utils.py needs import reorganization. Important Files Changed
Sequence DiagramsequenceDiagram
participant Q as MessageQueue
participant P as _process_message
participant TM as trim_messages
participant SG as state_graph.stream
Q->>P: message (HumanMessage)
P->>P: _history.append(message)
alt max_history_tokens is set
P->>TM: trim_messages(_history, max_tokens, strategy=last, include_system=True, allow_partial=False)
TM-->>P: trimmed_history (list)
P->>P: _history = trimmed_history.copy()
else max_history_tokens is None
P->>P: _history = _history.copy() (no-op copy)
end
P->>SG: stream({messages: _history})
SG-->>P: node_output messages (AI/tool responses)
P->>P: _history.append(each response msg)
Note over P,SG: LLM responses appended without compaction until next _process_message call
Reviews (1): Last reviewed commit: "solved error with ctransformers[cuda]==0..." | Re-trigger Greptile |
| # 1. Temporarily patch the agent configuration to a very low 500-token limit | ||
| echo "Temporarily patching AgentConfig to max_history_tokens=500..." | ||
| # macOS requires -i '' for sed | ||
| sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py |
There was a problem hiding this comment.
sed -i '' is macOS-only and breaks on Linux/CI
sed -i '' is BSD sed syntax. On GNU sed (Linux), the optional suffix argument to -i must be directly concatenated (e.g. -i.bak); passing it as a separate token causes GNU sed to treat the empty string '' as the script and 's/...' as a filename — the source file is never modified and the cleanup trap reverts nothing useful. The PR description specifically recommends running this E2E script on Linux/CI, so the script will silently produce a no-op on that platform.
Use an in-line conditional for portability:
if [[ "$OSTYPE" == "darwin"* ]]; then
sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py
else
sed -i 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py
fi| import json | ||
|
|
||
| def estimate_tokens(msgs: list[BaseMessage]) -> int: | ||
| """Safely estimates token counts for agent history compaction.""" | ||
| count = 0 | ||
| for m in msgs: | ||
| content_str = json.dumps(m.content) if not isinstance(m.content, str) else m.content | ||
| count += len(content_str) // 4 + 10 | ||
| if getattr(m, "tool_calls", None): | ||
| count += 50 * len(m.tool_calls) # type: ignore | ||
| return count |
There was a problem hiding this comment.
Function definition inserted between module-level imports
import json and estimate_tokens are placed after the BaseMessage import but before setup_logger and the rest of the module. PEP 8 requires all imports to be grouped at the top, followed by module-level code. Having a function definition in the middle of the import block makes the module harder to scan and can surprise static analysis tools. Move import json to the top import group and relocate estimate_tokens below the setup_logger / constant declarations.
| else: | ||
| trimmed_history = self._history | ||
|
|
||
| # We replace the internal history with the pruned one so it doesn't grow indefinitely in RAM | ||
| self._history = trimmed_history.copy() |
There was a problem hiding this comment.
Unconditional
list.copy() wastes memory when compaction is disabled
When max_history_tokens is None, trimmed_history is just an alias for self._history, so self._history = trimmed_history.copy() creates a full copy of the entire message list on every single call — growing cost as sessions age. The same pattern appears in mcp_client.py. The copy is only meaningful when trim_messages returns a new list; in the else branch it is a no-op allocation.
Problem
Long-running agent sessions accumulate extensive message history in
self._history. Without limits, this eventually exceeds the context window of underlying LLMs (e.g., GPT-4o, Claude), leading to API crash errors (TokenLimitExceeded/BadRequest). We need a mechanism to compact the history automatically based on model limits while preserving vital system context and avoiding orphaned tool calls.Closes #1899
Closes DIM-807
Solution
max_history_tokenstoAgentConfigandMcpClientConfigto allow configurable history caps.trim_messagesinside_process_messageacross bothAgentandMcpClient.docs/agents/history_compaction.md.Key design decisions / tradeoffs:
allow_partial=Falseto ensure we never split aToolCallfrom its correspondingToolMessage. This prevents HTTP 400 errors from strict LLM APIs.include_system=Trueto protect the robot's base system prompt from eviction.dimos.agents.utils.estimate_tokens) instead of blindly passing the model string totrim_messages(which crashes). This manually handles JSON dumping of complex artifacts and artificially padstool_callentries with a +50 bloat token factor to safely overestimate limits.Breaking Changes
None
How to Test
Unit Tests:
Run the new compaction suite to verify the
estimate_tokenslogic and the LangChain array trimmers:E2E Integration:
Run the automated testing bash script:
max_history_tokens=500viased.unitree-go2-agenticdaemon.dimos agent-send) to instantly breach the context window.trim_messagessuccessfully dropped the oldest messages cleanly.Note on macOS local testing: E2E daemon testing (
dimos --simulation) on macOS natively requires extensive manual system network optimizations (net.inet.udp.recvspace/maxdgram) andmjpythonHomebrew symlinking. It is recommended to test the E2E script on Linux/CI environments where the MuJoCo simulation natively spins up cleanly.Contributor License Agreement