Skip to content

Feature/agent compaction 1899#1923

Open
jeevanbhatta wants to merge 5 commits intodimensionalOS:mainfrom
jeevanbhatta:feature/agent-compaction-1899
Open

Feature/agent compaction 1899#1923
jeevanbhatta wants to merge 5 commits intodimensionalOS:mainfrom
jeevanbhatta:feature/agent-compaction-1899

Conversation

@jeevanbhatta
Copy link
Copy Markdown

Problem

Long-running agent sessions accumulate extensive message history in self._history. Without limits, this eventually exceeds the context window of underlying LLMs (e.g., GPT-4o, Claude), leading to API crash errors (TokenLimitExceeded/BadRequest). We need a mechanism to compact the history automatically based on model limits while preserving vital system context and avoiding orphaned tool calls.

Closes #1899
Closes DIM-807

Solution

  • Added max_history_tokens to AgentConfig and McpClientConfig to allow configurable history caps.
  • Integrated LangChain's trim_messages inside _process_message across both Agent and McpClient.
  • Appended real-world E2E testing methodologies to docs/agents/history_compaction.md.

Key design decisions / tradeoffs:

  • Enforced allow_partial=False to ensure we never split a ToolCall from its corresponding ToolMessage. This prevents HTTP 400 errors from strict LLM APIs.
  • Enforced include_system=True to protect the robot's base system prompt from eviction.
  • Created a custom heuristic token estimator (dimos.agents.utils.estimate_tokens) instead of blindly passing the model string to trim_messages (which crashes). This manually handles JSON dumping of complex artifacts and artificially pads tool_call entries with a +50 bloat token factor to safely overestimate limits.

Breaking Changes

None

How to Test

Unit Tests:
Run the new compaction suite to verify the estimate_tokens logic and the LangChain array trimmers:

uv run pytest dimos/agents/test_compaction.py -v

E2E Integration:
Run the automated testing bash script:

./bin/test-compaction
  1. This dynamically lowers max_history_tokens=500 via sed.
  2. Starts the unitree-go2-agentic daemon.
  3. Hammers the agent with 1200+ padded words over LCM (dimos agent-send) to instantly breach the context window.
  4. Submits a closing physical CLI tool call to guarantee the LLM model accepts the request and trim_messages successfully dropped the oldest messages cleanly.

Note on macOS local testing: E2E daemon testing (dimos --simulation) on macOS natively requires extensive manual system network optimizations (net.inet.udp.recvspace/maxdgram) and mjpython Homebrew symlinking. It is recommended to test the E2E script on Linux/CI environments where the MuJoCo simulation natively spins up cleanly.

Contributor License Agreement

  • I have read and approved the CLA.

jeevanbhatta and others added 5 commits April 23, 2026 14:20
- Add `max_history_tokens` to `AgentConfig` and `McpClientConfig` to allow bounding the LLM context window.
- Implement in-memory conversation pruning using LangChain's `trim_messages` before running the state graph.
- Configure trimming to preserve the system prompt (`include_system=True`) and strictly maintain matched ToolCall/ToolMessage pairs (`allow_partial=False`) to prevent API `BadRequestError`s.
- Add a custom `_estimate_tokens` heuristic to gracefully estimate token counts for both text and multimodal/JSON tool artifacts without requiring an instantiated model tokenizer.

Co-authored-by: Copilot <copilot@github.com>
- Extract `_estimate_tokens` from `Agent` loop into `dimos/agents/utils.py` to allow isolated unit testing.
- Create `dimos/agents/test_compaction.py` utilizing pytest to verify boundaries against real LangChain structured messages.
- Assert `allow_partial=False` safely drops both `AIMessage` containing tool calls and subsequent `ToolMessage` artifacts concurrently so orphaned APIs do not crash execution bounds.
- Assert `include_system=True` retains robot system prompt despite aggressive `max_history_tokens` pruning heuristics.

Co-authored-by: Copilot <copilot@github.com>
- Created `bin/test-compaction` bash script to automatically load-test the agent history truncation under heavy token limits.
- Script dynamically lowers `max_history_tokens` via sed, spawns a daemon simulation, and hammers the agent with spam text to force context limits.
error: Distribution `pyrealsense2==2.56.5.9235 @ registry+https://pypi.org/simple` doesn't have a source distribution or wheel for the current platform
i.e.
(`macosx_26_0_arm64`)
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR adds configurable history compaction to both Agent and McpClient by integrating LangChain's trim_messages inside _process_message, guarded by a new max_history_tokens field in both config dataclasses. A custom estimate_tokens heuristic handles complex content and pads tool-call entries to avoid split tool-call/tool-result pairs.

  • P1 — E2E test script broken on Linux/CI: bin/test-compaction uses sed -i '' (BSD/macOS syntax); GNU sed on Linux treats the empty-string argument as the script and never patches the source file, so the test silently does nothing on the platform the PR description recommends for running it.

Confidence Score: 3/5

Core compaction logic is sound but the E2E validation script is broken on Linux/CI, leaving the only automated integration test non-functional on the recommended test platform.

One P1 (platform-incompatible sed command in the E2E test script) brings the ceiling to 4; the additional P2s (import ordering in utils.py, unconditional copy in the disabled path) pull the score down one more notch to 3.

bin/test-compaction needs a cross-platform sed fix; dimos/agents/utils.py needs import reorganization.

Important Files Changed

Filename Overview
bin/test-compaction New E2E test script that uses macOS-only sed -i '' syntax, breaking on Linux/CI where the PR description recommends running it.
dimos/agents/agent.py Adds max_history_tokens to AgentConfig and integrates trim_messages in _process_message; minor inefficiency with unconditional .copy() in the disabled path.
dimos/agents/mcp/mcp_client.py Mirror of agent.py compaction changes for MCP client; same unconditional .copy() inefficiency in the else branch.
dimos/agents/utils.py Adds estimate_tokens heuristic; import json and function body are inserted mid-file between import groups, violating PEP 8 ordering.
dimos/agents/test_compaction.py New unit tests for estimate_tokens and trim_messages integration; token arithmetic in comments and assertions is consistent and correct.
pyproject.toml Adds platform guards for pyrealsense2 and ctransformers[cuda] to exclude macOS, unrelated to compaction but improves cross-platform packaging.

Sequence Diagram

sequenceDiagram
    participant Q as MessageQueue
    participant P as _process_message
    participant TM as trim_messages
    participant SG as state_graph.stream

    Q->>P: message (HumanMessage)
    P->>P: _history.append(message)
    alt max_history_tokens is set
        P->>TM: trim_messages(_history, max_tokens, strategy=last, include_system=True, allow_partial=False)
        TM-->>P: trimmed_history (list)
        P->>P: _history = trimmed_history.copy()
    else max_history_tokens is None
        P->>P: _history = _history.copy() (no-op copy)
    end
    P->>SG: stream({messages: _history})
    SG-->>P: node_output messages (AI/tool responses)
    P->>P: _history.append(each response msg)
    Note over P,SG: LLM responses appended without compaction until next _process_message call
Loading

Reviews (1): Last reviewed commit: "solved error with ctransformers[cuda]==0..." | Re-trigger Greptile

Comment thread bin/test-compaction
# 1. Temporarily patch the agent configuration to a very low 500-token limit
echo "Temporarily patching AgentConfig to max_history_tokens=500..."
# macOS requires -i '' for sed
sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 sed -i '' is macOS-only and breaks on Linux/CI

sed -i '' is BSD sed syntax. On GNU sed (Linux), the optional suffix argument to -i must be directly concatenated (e.g. -i.bak); passing it as a separate token causes GNU sed to treat the empty string '' as the script and 's/...' as a filename — the source file is never modified and the cleanup trap reverts nothing useful. The PR description specifically recommends running this E2E script on Linux/CI, so the script will silently produce a no-op on that platform.

Use an in-line conditional for portability:

if [[ "$OSTYPE" == "darwin"* ]]; then
    sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py
else
    sed -i 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py
fi

Comment thread dimos/agents/utils.py
Comment on lines +19 to +29
import json

def estimate_tokens(msgs: list[BaseMessage]) -> int:
"""Safely estimates token counts for agent history compaction."""
count = 0
for m in msgs:
content_str = json.dumps(m.content) if not isinstance(m.content, str) else m.content
count += len(content_str) // 4 + 10
if getattr(m, "tool_calls", None):
count += 50 * len(m.tool_calls) # type: ignore
return count
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Function definition inserted between module-level imports

import json and estimate_tokens are placed after the BaseMessage import but before setup_logger and the rest of the module. PEP 8 requires all imports to be grouped at the top, followed by module-level code. Having a function definition in the middle of the import block makes the module harder to scan and can surprise static analysis tools. Move import json to the top import group and relocate estimate_tokens below the setup_logger / constant declarations.

Comment thread dimos/agents/agent.py
Comment on lines +155 to +159
else:
trimmed_history = self._history

# We replace the internal history with the pruned one so it doesn't grow indefinitely in RAM
self._history = trimmed_history.copy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconditional list.copy() wastes memory when compaction is disabled

When max_history_tokens is None, trimmed_history is just an alias for self._history, so self._history = trimmed_history.copy() creates a full copy of the entire message list on every single call — growing cost as sessions age. The same pattern appears in mcp_client.py. The copy is only meaningful when trim_messages returns a new list; in the else branch it is a no-op allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent Compaction

1 participant