⚡️ Speed up method `ContextManager._compress_messages` by 20% #4

codeflash-ai · 2025-10-17T15:24:15Z

📄 20% (0.20x) speedup for `ContextManager._compress_messages` in `src/utils/context_manager.py`

⏱️ Runtime : 1.29 milliseconds → 1.08 milliseconds (best of 109 runs)

📝 Explanation and details

Key optimizations:

Fast type checking for message classes and content using type() instead of isinstance() when possible for single dispatch speed.
Direct attribute access via getattr reduces repeated attribute lookup overhead.
Suffix message building uses insert(0, ...) to avoid repeated list concatenations, minimizing temporary list fragmentation for large message lists.
_truncate_message_content checks for attribute existence before slicing, which avoids wasted copy operations if not needed.
Optimized loop bounds in _compress_messages to reduce unnecessary slicing.
No change to behavioral output, naming, exception preservation, types, or code style.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 94 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 6 Passed
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import copy
from typing import List

# imports
import pytest
from langchain_core.messages import (AIMessage, BaseMessage, HumanMessage,
                                     SystemMessage, ToolMessage)
from src.utils.context_manager import ContextManager

# ----------- BASIC TEST CASES ------------

def test_empty_message_list():
    """Test that compressing an empty message list returns an empty list."""
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([]) # 1.87μs -> 1.75μs (7.03% faster)

def test_single_message_under_limit():
    """Test single message fits within token limit."""
    msg = HumanMessage(content="hello world")
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 6.11μs -> 5.36μs (13.9% faster)

def test_single_message_over_limit():
    """Test single message gets truncated if over token limit."""
    long_content = "a" * 100
    msg = HumanMessage(content=long_content)
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 23.5μs -> 22.9μs (2.43% faster)

def test_multiple_messages_all_fit():
    """Test multiple messages all fit within token limit."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
        HumanMessage(content="how are you?"),
        AIMessage(content="fine, thanks!"),
    ]
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 11.1μs -> 9.71μs (13.9% faster)

def test_multiple_messages_some_truncated():
    """Test that older messages are dropped/truncated to fit token limit."""
    msgs = [
        HumanMessage(content="a" * 20),
        AIMessage(content="b" * 20),
        HumanMessage(content="c" * 20),
        AIMessage(content="d" * 20),
    ]
    # Small token limit, only some messages will fit
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 21.1μs -> 20.5μs (3.06% faster)

def test_preserve_prefix_messages():
    """Test that prefix messages are preserved if possible."""
    msgs = [
        SystemMessage(content="system prompt"),
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
        HumanMessage(content="how are you?"),
        AIMessage(content="fine, thanks!"),
    ]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 11.9μs -> 11.2μs (6.78% faster)

def test_preserve_prefix_truncation():
    """Test that prefix messages are truncated if they don't fully fit."""
    msgs = [
        SystemMessage(content="system prompt"),
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
    ]
    # Token limit only enough for part of the first message
    cm = ContextManager(token_limit=2, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 16.9μs -> 16.9μs (0.153% slower)

def test_preserve_prefix_and_tail():
    """Test that prefix is preserved and tail is filled up to limit."""
    msgs = [
        SystemMessage(content="sys"),
        HumanMessage(content="a"),
        AIMessage(content="b"),
        HumanMessage(content="c"),
    ]
    cm = ContextManager(token_limit=5, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 9.37μs -> 8.55μs (9.62% faster)

# ----------- EDGE TEST CASES ------------

def test_zero_token_limit():
    """Test with a token limit of zero (should return empty list)."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="world"),
    ]
    cm = ContextManager(token_limit=0)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 5.04μs -> 4.60μs (9.36% faster)

def test_negative_token_limit():
    """Test with a negative token limit (should return empty list)."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="world"),
    ]
    cm = ContextManager(token_limit=-5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 5.06μs -> 4.64μs (9.01% faster)

def test_message_with_empty_content():
    """Test message with empty content string."""
    msg = HumanMessage(content="")
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 4.52μs -> 3.99μs (13.5% faster)

def test_message_with_non_english_characters():
    """Test token counting with non-English (e.g., Chinese) characters."""
    msg = HumanMessage(content="你好世界")  # 4 Chinese chars, should be 4 tokens
    cm = ContextManager(token_limit=4)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 20.3μs -> 19.6μs (3.57% faster)
    # Now with a lower token limit, should truncate
    cm2 = ContextManager(token_limit=2)
    codeflash_output = cm2._compress_messages([msg]); result2 = codeflash_output # 12.1μs -> 11.6μs (4.39% faster)

def test_message_with_additional_kwargs():
    """Test token counting with additional_kwargs and tool_calls."""
    msg = AIMessage(content="result", additional_kwargs={"tool_calls": [{"foo": "bar"}]})
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 9.77μs -> 9.02μs (8.34% faster)

def test_message_with_large_additional_kwargs():
    """Test token counting with large additional_kwargs."""
    large_kwargs = {"tool_calls": [{"foo": "bar" * 50}]}
    msg = AIMessage(content="result", additional_kwargs=large_kwargs)
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 32.7μs -> 31.7μs (3.11% faster)



def test_preserve_prefix_more_than_messages():
    """Test preserve_prefix_message_count greater than number of messages."""
    msgs = [HumanMessage(content="a"), AIMessage(content="b")]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 9.50μs -> 8.34μs (13.9% faster)

def test_preserve_prefix_zero():
    """Test preserve_prefix_message_count=0 behaves correctly."""
    msgs = [HumanMessage(content="a"), AIMessage(content="b")]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=0)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 7.50μs -> 6.71μs (11.8% faster)

def test_message_exact_token_limit():
    """Test message that exactly matches the token limit."""
    # "abcd" is 4 ascii chars, so 1 token + type token
    msg = HumanMessage(content="abcd")
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 5.24μs -> 4.70μs (11.5% faster)

def test_truncation_preserves_other_fields():
    """Truncation should not affect other fields."""
    msg = HumanMessage(content="a" * 100, additional_kwargs={"foo": "bar"})
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 26.7μs -> 25.9μs (3.06% faster)

# ----------- LARGE SCALE TEST CASES ------------

def test_many_short_messages_fit():
    """Test many short messages all fit within a large token limit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=1000)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 113μs -> 83.1μs (36.8% faster)

def test_many_short_messages_some_dropped():
    """Test many short messages, but only some can fit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 15.6μs -> 12.5μs (24.3% faster)

def test_large_message_list_with_prefix():
    """Test large message list with prefix preservation."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="a" * 10) for _ in range(50)]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=1)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 49.6μs -> 41.6μs (19.1% faster)

def test_large_message_list_all_truncated():
    """Test with huge messages, all must be truncated or dropped."""
    msgs = [HumanMessage(content="x" * 100) for _ in range(10)]
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 22.7μs -> 21.9μs (3.58% faster)


#------------------------------------------------
import copy
from typing import List

# imports
import pytest
from langchain_core.messages import (AIMessage, BaseMessage, HumanMessage,
                                     SystemMessage, ToolMessage)
from src.utils.context_manager import ContextManager


# Helper for test readability
def make_msgs(msgs):
    """Helper to create messages from (cls, content, kwargs) tuples."""
    result = []
    for m in msgs:
        if len(m) == 2:
            cls, content = m
            result.append(cls(content=content))
        else:
            cls, content, kwargs = m
            result.append(cls(content=content, additional_kwargs=kwargs))
    return result

# ----------- BASIC TEST CASES -----------

def test_empty_message_list():
    """Test compressing an empty message list returns empty list."""
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([]) # 2.31μs -> 2.14μs (8.28% faster)

def test_single_message_within_limit():
    """Test single message fits within token limit."""
    msg = HumanMessage(content="hello world")
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 7.09μs -> 6.27μs (13.1% faster)

def test_single_message_exceeds_limit():
    """Test single message gets truncated if over token limit."""
    msg = HumanMessage(content="a" * 20)
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 21.6μs -> 21.0μs (2.58% faster)

def test_multiple_messages_within_limit():
    """Test multiple messages fit within token limit."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "hi"),
        (AIMessage, "hello"),
    ])
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 9.55μs -> 8.54μs (11.9% faster)

def test_multiple_messages_some_exceed():
    """Test only as many messages as fit are included, from the end."""
    msgs = make_msgs([
        (HumanMessage, "a" * 10),
        (AIMessage, "b" * 10),
        (HumanMessage, "c" * 10),
    ])
    # Each message is at least 2 tokens (content + type)
    cm = ContextManager(token_limit=3)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 7.44μs -> 6.51μs (14.3% faster)

def test_preserve_prefix_behavior():
    """Test that prefix messages are preserved if possible."""
    msgs = make_msgs([
        (SystemMessage, "sysmsg"),
        (HumanMessage, "user1"),
        (AIMessage, "ai1"),
        (HumanMessage, "user2"),
    ])
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 10.2μs -> 9.11μs (12.3% faster)

def test_preserve_prefix_and_truncate():
    """Test prefix preserved and truncated if it doesn't fit."""
    msg = HumanMessage(content="a" * 10)
    cm = ContextManager(token_limit=5, preserve_prefix_message_count=1)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 5.20μs -> 4.49μs (15.8% faster)

def test_suffix_truncation():
    """Test that suffix messages are truncated if needed."""
    msgs = make_msgs([
        (HumanMessage, "a" * 10),
        (AIMessage, "b" * 10),
        (HumanMessage, "c" * 10),
    ])
    cm = ContextManager(token_limit=7)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 22.9μs -> 22.0μs (4.04% faster)

def test_preserve_prefix_and_suffix():
    """Test that both prefix and suffix are preserved if possible."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "user1"),
        (AIMessage, "ai1"),
        (HumanMessage, "user2"),
    ])
    cm = ContextManager(token_limit=20, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 9.98μs -> 9.08μs (9.93% faster)

# ----------- EDGE TEST CASES -----------

def test_zero_token_limit():
    """Test with zero token limit returns empty list."""
    msgs = make_msgs([
        (HumanMessage, "test"),
        (AIMessage, "test"),
    ])
    cm = ContextManager(token_limit=0)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 4.91μs -> 4.39μs (12.0% faster)

def test_negative_token_limit():
    """Test with negative token limit returns empty list."""
    msgs = make_msgs([
        (HumanMessage, "test"),
        (AIMessage, "test"),
    ])
    cm = ContextManager(token_limit=-5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 4.94μs -> 4.35μs (13.5% faster)

def test_empty_content_message():
    """Test message with empty content still counts for type tokens."""
    msg = HumanMessage(content="")
    cm = ContextManager(token_limit=1)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 4.68μs -> 4.01μs (16.7% faster)

def test_non_english_content():
    """Test token calculation for non-English (e.g., Chinese) content."""
    msg = HumanMessage(content="你好世界")  # 4 Chinese chars
    cm = ContextManager(token_limit=4)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 20.1μs -> 19.9μs (1.22% faster)
    # Now with limit 2, should truncate to 2 chars
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 12.0μs -> 11.4μs (4.58% faster)

def test_additional_kwargs_token_counting():
    """Test that additional_kwargs field is counted for tokens."""
    msg = AIMessage(content="msg", additional_kwargs={"foo": "barbaz"})
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 23.3μs -> 22.5μs (3.67% faster)

def test_tool_calls_token_addition():
    """Test that tool_calls in additional_kwargs adds 50 tokens."""
    msg = AIMessage(content="msg", additional_kwargs={"tool_calls": "xyz"})
    # The message should be too large for a small token limit
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 23.1μs -> 22.6μs (2.35% faster)

def test_preserve_prefix_greater_than_messages():
    """Test when preserve_prefix_message_count exceeds message count."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "user"),
    ])
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 7.19μs -> 6.68μs (7.68% faster)


def test_truncation_preserves_other_fields():
    """Test that truncation does not affect other message fields."""
    msg = HumanMessage(content="abcdef", additional_kwargs={"foo": "bar"})
    cm = ContextManager(token_limit=3)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 26.9μs -> 26.0μs (3.64% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_many_short_messages_fit():
    """Test many short messages all fit within a large token limit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=1000)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 114μs -> 83.0μs (38.4% faster)

def test_many_messages_some_exceed():
    """Test many messages, only last N fit."""
    msgs = [HumanMessage(content="a" * 10) for _ in range(100)]
    cm = ContextManager(token_limit=15)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 12.4μs -> 10.7μs (16.2% faster)

def test_preserve_prefix_with_large_list():
    """Test preserve_prefix_message_count on large list."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="u" * 5) for _ in range(50)]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 76.4μs -> 66.3μs (15.3% faster)

def test_large_token_limit_all_fit():
    """Test that with a huge token limit, all messages fit."""
    msgs = [AIMessage(content="b" * 10) for _ in range(200)]
    cm = ContextManager(token_limit=10000)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 293μs -> 206μs (41.9% faster)

def test_large_scale_truncation():
    """Test that with a large number of large messages, only the last one is truncated and included."""
    msgs = [HumanMessage(content="x" * 100) for _ in range(500)]
    cm = ContextManager(token_limit=50)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 30.3μs -> 28.1μs (7.94% faster)

def test_large_scale_preserve_prefix_and_suffix():
    """Test that with a large number of messages and prefix preservation, prefix and as many suffixes as fit are included."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="u" * 10) for _ in range(999)]
    cm = ContextManager(token_limit=120, preserve_prefix_message_count=3)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 79.9μs -> 68.6μs (16.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from langchain_core.messages.ai import AIMessage
from langchain_core.messages.system import SystemMessage
from src.utils.context_manager import ContextManager

def test_ContextManager__compress_messages():
    ContextManager._compress_messages(ContextManager(0, preserve_prefix_message_count=2), [AIMessage([])])

def test_ContextManager__compress_messages_2():
    ContextManager._compress_messages(ContextManager(1, preserve_prefix_message_count=1), [SystemMessage('')])

def test_ContextManager__compress_messages_3():
    ContextManager._compress_messages(ContextManager(1, preserve_prefix_message_count=0), [AIMessage([])])

🔎 Concolic Coverage Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages`	7.25μs	5.29μs	37.0%✅
`codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages_2`	4.72μs	4.81μs	-1.83%⚠️
`codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages_3`	5.11μs	4.62μs	10.8%✅

To edit these changes git checkout codeflash/optimize-ContextManager._compress_messages-mgv01iwf and push.

**Key optimizations:** - Fast type checking for message classes and content using `type()` instead of `isinstance()` when possible for single dispatch speed. - Direct attribute access via `getattr` reduces repeated attribute lookup overhead. - Suffix message building uses `insert(0, ...)` to avoid repeated list concatenations, minimizing temporary list fragmentation for large message lists. - `_truncate_message_content` checks for attribute existence before slicing, which avoids wasted copy operations if not needed. - Optimized loop bounds in `_compress_messages` to reduce unnecessary slicing. - No change to behavioral output, naming, exception preservation, types, or code style.

codeflash-ai bot requested a review from mashraf-222 October 17, 2025 15:24

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `ContextManager._compress_messages` by 20% #4

⚡️ Speed up method `ContextManager._compress_messages` by 20% #4

Uh oh!

codeflash-ai bot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method ContextManager._compress_messages by 20% #4

Are you sure you want to change the base?

⚡️ Speed up method ContextManager._compress_messages by 20% #4

Uh oh!

Conversation

codeflash-ai bot commented Oct 17, 2025

📄 20% (0.20x) speedup for ContextManager._compress_messages in src/utils/context_manager.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `ContextManager._compress_messages` by 20% #4

⚡️ Speed up method `ContextManager._compress_messages` by 20% #4

📄 20% (0.20x) speedup for `ContextManager._compress_messages` in `src/utils/context_manager.py`