Feature/agent compaction 1899 by jeevanbhatta · Pull Request #1923 · dimensionalOS/dimos

jeevanbhatta · 2026-04-28T08:32:19Z

Problem

Long-running agent sessions accumulate extensive message history in self._history. Without limits, this eventually exceeds the context window of underlying LLMs (e.g., GPT-4o, Claude), leading to API crash errors (TokenLimitExceeded/BadRequest). We need a mechanism to compact the history automatically based on model limits while preserving vital system context and avoiding orphaned tool calls.

Closes #1899
Closes DIM-807

Solution

Added max_history_tokens to AgentConfig and McpClientConfig to allow configurable history caps.
Integrated LangChain's trim_messages inside _process_message across both Agent and McpClient.
Appended real-world E2E testing methodologies to docs/agents/history_compaction.md.

Key design decisions / tradeoffs:

Enforced allow_partial=False to ensure we never split a ToolCall from its corresponding ToolMessage. This prevents HTTP 400 errors from strict LLM APIs.
Enforced include_system=True to protect the robot's base system prompt from eviction.
Created a custom heuristic token estimator (dimos.agents.utils.estimate_tokens) instead of blindly passing the model string to trim_messages (which crashes). This manually handles JSON dumping of complex artifacts and artificially pads tool_call entries with a +50 bloat token factor to safely overestimate limits.

Breaking Changes

None

How to Test

Unit Tests:
Run the new compaction suite to verify the estimate_tokens logic and the LangChain array trimmers:

uv run pytest dimos/agents/test_compaction.py -v

E2E Integration:
Run the automated testing bash script:

./bin/test-compaction

This dynamically lowers max_history_tokens=500 via sed.
Starts the unitree-go2-agentic daemon.
Hammers the agent with 1200+ padded words over LCM (dimos agent-send) to instantly breach the context window.
Submits a closing physical CLI tool call to guarantee the LLM model accepts the request and trim_messages successfully dropped the oldest messages cleanly.

Note on macOS local testing: E2E daemon testing (dimos --simulation) on macOS natively requires extensive manual system network optimizations (net.inet.udp.recvspace/maxdgram) and mjpython Homebrew symlinking. It is recommended to test the E2E script on Linux/CI environments where the MuJoCo simulation natively spins up cleanly.

Contributor License Agreement

I have read and approved the CLA.

- Add `max_history_tokens` to `AgentConfig` and `McpClientConfig` to allow bounding the LLM context window. - Implement in-memory conversation pruning using LangChain's `trim_messages` before running the state graph. - Configure trimming to preserve the system prompt (`include_system=True`) and strictly maintain matched ToolCall/ToolMessage pairs (`allow_partial=False`) to prevent API `BadRequestError`s. - Add a custom `_estimate_tokens` heuristic to gracefully estimate token counts for both text and multimodal/JSON tool artifacts without requiring an instantiated model tokenizer. Co-authored-by: Copilot <copilot@github.com>

- Extract `_estimate_tokens` from `Agent` loop into `dimos/agents/utils.py` to allow isolated unit testing. - Create `dimos/agents/test_compaction.py` utilizing pytest to verify boundaries against real LangChain structured messages. - Assert `allow_partial=False` safely drops both `AIMessage` containing tool calls and subsequent `ToolMessage` artifacts concurrently so orphaned APIs do not crash execution bounds. - Assert `include_system=True` retains robot system prompt despite aggressive `max_history_tokens` pruning heuristics. Co-authored-by: Copilot <copilot@github.com>

- Created `bin/test-compaction` bash script to automatically load-test the agent history truncation under heavy token limits. - Script dynamically lowers `max_history_tokens` via sed, spawns a daemon simulation, and hammers the agent with spam text to force context limits.

error: Distribution `pyrealsense2==2.56.5.9235 @ registry+https://pypi.org/simple` doesn't have a source distribution or wheel for the current platform i.e. (`macosx_26_0_arm64`)

Co-authored-by: Copilot <copilot@github.com>

greptile-apps · 2026-04-28T08:41:53Z

Greptile Summary

This PR adds configurable history compaction to both Agent and McpClient by integrating LangChain's trim_messages inside _process_message, guarded by a new max_history_tokens field in both config dataclasses. A custom estimate_tokens heuristic handles complex content and pads tool-call entries to avoid split tool-call/tool-result pairs.

P1 — E2E test script broken on Linux/CI: bin/test-compaction uses sed -i '' (BSD/macOS syntax); GNU sed on Linux treats the empty-string argument as the script and never patches the source file, so the test silently does nothing on the platform the PR description recommends for running it.

Confidence Score: 3/5

Core compaction logic is sound but the E2E validation script is broken on Linux/CI, leaving the only automated integration test non-functional on the recommended test platform.

One P1 (platform-incompatible sed command in the E2E test script) brings the ceiling to 4; the additional P2s (import ordering in utils.py, unconditional copy in the disabled path) pull the score down one more notch to 3.

bin/test-compaction needs a cross-platform sed fix; dimos/agents/utils.py needs import reorganization.

Important Files Changed

Filename	Overview
bin/test-compaction	New E2E test script that uses macOS-only `sed -i ''` syntax, breaking on Linux/CI where the PR description recommends running it.
dimos/agents/agent.py	Adds `max_history_tokens` to `AgentConfig` and integrates `trim_messages` in `_process_message`; minor inefficiency with unconditional `.copy()` in the disabled path.
dimos/agents/mcp/mcp_client.py	Mirror of agent.py compaction changes for MCP client; same unconditional `.copy()` inefficiency in the `else` branch.
dimos/agents/utils.py	Adds `estimate_tokens` heuristic; `import json` and function body are inserted mid-file between import groups, violating PEP 8 ordering.
dimos/agents/test_compaction.py	New unit tests for `estimate_tokens` and `trim_messages` integration; token arithmetic in comments and assertions is consistent and correct.
pyproject.toml	Adds platform guards for `pyrealsense2` and `ctransformers[cuda]` to exclude macOS, unrelated to compaction but improves cross-platform packaging.

Sequence Diagram

sequenceDiagram
    participant Q as MessageQueue
    participant P as _process_message
    participant TM as trim_messages
    participant SG as state_graph.stream

    Q->>P: message (HumanMessage)
    P->>P: _history.append(message)
    alt max_history_tokens is set
        P->>TM: trim_messages(_history, max_tokens, strategy=last, include_system=True, allow_partial=False)
        TM-->>P: trimmed_history (list)
        P->>P: _history = trimmed_history.copy()
    else max_history_tokens is None
        P->>P: _history = _history.copy() (no-op copy)
    end
    P->>SG: stream({messages: _history})
    SG-->>P: node_output messages (AI/tool responses)
    P->>P: _history.append(each response msg)
    Note over P,SG: LLM responses appended without compaction until next _process_message call

_{Reviews (1): Last reviewed commit: "solved error with ctransformers[cuda]==0..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T08:41:57Z

+# 1. Temporarily patch the agent configuration to a very low 500-token limit
+echo "Temporarily patching AgentConfig to max_history_tokens=500..."
+# macOS requires -i '' for sed
+sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py


sed -i '' is macOS-only and breaks on Linux/CI

sed -i '' is BSD sed syntax. On GNU sed (Linux), the optional suffix argument to -i must be directly concatenated (e.g. -i.bak); passing it as a separate token causes GNU sed to treat the empty string '' as the script and 's/...' as a filename — the source file is never modified and the cleanup trap reverts nothing useful. The PR description specifically recommends running this E2E script on Linux/CI, so the script will silently produce a no-op on that platform.

Use an in-line conditional for portability:

if [[ "$OSTYPE" == "darwin"* ]]; then sed -i '' 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py else sed -i 's/max_history_tokens: int | None = None/max_history_tokens: int | None = 500/g' dimos/agents/agent.py fi

greptile-apps · 2026-04-28T08:41:58Z

+import json
+
+def estimate_tokens(msgs: list[BaseMessage]) -> int:
+    """Safely estimates token counts for agent history compaction."""
+    count = 0
+    for m in msgs:
+        content_str = json.dumps(m.content) if not isinstance(m.content, str) else m.content
+        count += len(content_str) // 4 + 10
+        if getattr(m, "tool_calls", None):
+            count += 50 * len(m.tool_calls) # type: ignore
+    return count


Function definition inserted between module-level imports

import json and estimate_tokens are placed after the BaseMessage import but before setup_logger and the rest of the module. PEP 8 requires all imports to be grouped at the top, followed by module-level code. Having a function definition in the middle of the import block makes the module harder to scan and can surprise static analysis tools. Move import json to the top import group and relocate estimate_tokens below the setup_logger / constant declarations.

greptile-apps · 2026-04-28T08:41:59Z

+        else:
+            trimmed_history = self._history
+
+        # We replace the internal history with the pruned one so it doesn't grow indefinitely in RAM
+        self._history = trimmed_history.copy()


Unconditional list.copy() wastes memory when compaction is disabled

When max_history_tokens is None, trimmed_history is just an alias for self._history, so self._history = trimmed_history.copy() creates a full copy of the entire message list on every single call — growing cost as sessions age. The same pattern appears in mcp_client.py. The copy is only meaningful when trim_messages returns a new list; in the else branch it is a no-op allocation.

jeevanbhatta and others added 5 commits April 23, 2026 14:20

Solving error:

c564b72

error: Distribution `pyrealsense2==2.56.5.9235 @ registry+https://pypi.org/simple` doesn't have a source distribution or wheel for the current platform i.e. (`macosx_26_0_arm64`)

solved error with ctransformers[cuda]==0.2.27 not available for Macos

19d58e3

Co-authored-by: Copilot <copilot@github.com>

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/agent compaction 1899#1923

Feature/agent compaction 1899#1923
jeevanbhatta wants to merge 5 commits intodimensionalOS:mainfrom
jeevanbhatta:feature/agent-compaction-1899

jeevanbhatta commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeevanbhatta commented Apr 28, 2026

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant