Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 17, 2025

📄 20% (0.20x) speedup for ContextManager._compress_messages in src/utils/context_manager.py

⏱️ Runtime : 1.29 milliseconds 1.08 milliseconds (best of 109 runs)

📝 Explanation and details

Key optimizations:

  • Fast type checking for message classes and content using type() instead of isinstance() when possible for single dispatch speed.
  • Direct attribute access via getattr reduces repeated attribute lookup overhead.
  • Suffix message building uses insert(0, ...) to avoid repeated list concatenations, minimizing temporary list fragmentation for large message lists.
  • _truncate_message_content checks for attribute existence before slicing, which avoids wasted copy operations if not needed.
  • Optimized loop bounds in _compress_messages to reduce unnecessary slicing.
  • No change to behavioral output, naming, exception preservation, types, or code style.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 94 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 6 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import copy
from typing import List

# imports
import pytest
from langchain_core.messages import (AIMessage, BaseMessage, HumanMessage,
                                     SystemMessage, ToolMessage)
from src.utils.context_manager import ContextManager

# ----------- BASIC TEST CASES ------------

def test_empty_message_list():
    """Test that compressing an empty message list returns an empty list."""
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([]) # 1.87μs -> 1.75μs (7.03% faster)

def test_single_message_under_limit():
    """Test single message fits within token limit."""
    msg = HumanMessage(content="hello world")
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 6.11μs -> 5.36μs (13.9% faster)

def test_single_message_over_limit():
    """Test single message gets truncated if over token limit."""
    long_content = "a" * 100
    msg = HumanMessage(content=long_content)
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 23.5μs -> 22.9μs (2.43% faster)

def test_multiple_messages_all_fit():
    """Test multiple messages all fit within token limit."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
        HumanMessage(content="how are you?"),
        AIMessage(content="fine, thanks!"),
    ]
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 11.1μs -> 9.71μs (13.9% faster)

def test_multiple_messages_some_truncated():
    """Test that older messages are dropped/truncated to fit token limit."""
    msgs = [
        HumanMessage(content="a" * 20),
        AIMessage(content="b" * 20),
        HumanMessage(content="c" * 20),
        AIMessage(content="d" * 20),
    ]
    # Small token limit, only some messages will fit
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 21.1μs -> 20.5μs (3.06% faster)

def test_preserve_prefix_messages():
    """Test that prefix messages are preserved if possible."""
    msgs = [
        SystemMessage(content="system prompt"),
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
        HumanMessage(content="how are you?"),
        AIMessage(content="fine, thanks!"),
    ]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 11.9μs -> 11.2μs (6.78% faster)

def test_preserve_prefix_truncation():
    """Test that prefix messages are truncated if they don't fully fit."""
    msgs = [
        SystemMessage(content="system prompt"),
        HumanMessage(content="hello"),
        AIMessage(content="hi"),
    ]
    # Token limit only enough for part of the first message
    cm = ContextManager(token_limit=2, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 16.9μs -> 16.9μs (0.153% slower)

def test_preserve_prefix_and_tail():
    """Test that prefix is preserved and tail is filled up to limit."""
    msgs = [
        SystemMessage(content="sys"),
        HumanMessage(content="a"),
        AIMessage(content="b"),
        HumanMessage(content="c"),
    ]
    cm = ContextManager(token_limit=5, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 9.37μs -> 8.55μs (9.62% faster)

# ----------- EDGE TEST CASES ------------

def test_zero_token_limit():
    """Test with a token limit of zero (should return empty list)."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="world"),
    ]
    cm = ContextManager(token_limit=0)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 5.04μs -> 4.60μs (9.36% faster)

def test_negative_token_limit():
    """Test with a negative token limit (should return empty list)."""
    msgs = [
        HumanMessage(content="hello"),
        AIMessage(content="world"),
    ]
    cm = ContextManager(token_limit=-5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 5.06μs -> 4.64μs (9.01% faster)

def test_message_with_empty_content():
    """Test message with empty content string."""
    msg = HumanMessage(content="")
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 4.52μs -> 3.99μs (13.5% faster)

def test_message_with_non_english_characters():
    """Test token counting with non-English (e.g., Chinese) characters."""
    msg = HumanMessage(content="你好世界")  # 4 Chinese chars, should be 4 tokens
    cm = ContextManager(token_limit=4)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 20.3μs -> 19.6μs (3.57% faster)
    # Now with a lower token limit, should truncate
    cm2 = ContextManager(token_limit=2)
    codeflash_output = cm2._compress_messages([msg]); result2 = codeflash_output # 12.1μs -> 11.6μs (4.39% faster)

def test_message_with_additional_kwargs():
    """Test token counting with additional_kwargs and tool_calls."""
    msg = AIMessage(content="result", additional_kwargs={"tool_calls": [{"foo": "bar"}]})
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 9.77μs -> 9.02μs (8.34% faster)

def test_message_with_large_additional_kwargs():
    """Test token counting with large additional_kwargs."""
    large_kwargs = {"tool_calls": [{"foo": "bar" * 50}]}
    msg = AIMessage(content="result", additional_kwargs=large_kwargs)
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 32.7μs -> 31.7μs (3.11% faster)



def test_preserve_prefix_more_than_messages():
    """Test preserve_prefix_message_count greater than number of messages."""
    msgs = [HumanMessage(content="a"), AIMessage(content="b")]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 9.50μs -> 8.34μs (13.9% faster)

def test_preserve_prefix_zero():
    """Test preserve_prefix_message_count=0 behaves correctly."""
    msgs = [HumanMessage(content="a"), AIMessage(content="b")]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=0)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 7.50μs -> 6.71μs (11.8% faster)

def test_message_exact_token_limit():
    """Test message that exactly matches the token limit."""
    # "abcd" is 4 ascii chars, so 1 token + type token
    msg = HumanMessage(content="abcd")
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 5.24μs -> 4.70μs (11.5% faster)

def test_truncation_preserves_other_fields():
    """Truncation should not affect other fields."""
    msg = HumanMessage(content="a" * 100, additional_kwargs={"foo": "bar"})
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); result = codeflash_output # 26.7μs -> 25.9μs (3.06% faster)

# ----------- LARGE SCALE TEST CASES ------------

def test_many_short_messages_fit():
    """Test many short messages all fit within a large token limit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=1000)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 113μs -> 83.1μs (36.8% faster)

def test_many_short_messages_some_dropped():
    """Test many short messages, but only some can fit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 15.6μs -> 12.5μs (24.3% faster)

def test_large_message_list_with_prefix():
    """Test large message list with prefix preservation."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="a" * 10) for _ in range(50)]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=1)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 49.6μs -> 41.6μs (19.1% faster)

def test_large_message_list_all_truncated():
    """Test with huge messages, all must be truncated or dropped."""
    msgs = [HumanMessage(content="x" * 100) for _ in range(10)]
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages(msgs); result = codeflash_output # 22.7μs -> 21.9μs (3.58% faster)


#------------------------------------------------
import copy
from typing import List

# imports
import pytest
from langchain_core.messages import (AIMessage, BaseMessage, HumanMessage,
                                     SystemMessage, ToolMessage)
from src.utils.context_manager import ContextManager


# Helper for test readability
def make_msgs(msgs):
    """Helper to create messages from (cls, content, kwargs) tuples."""
    result = []
    for m in msgs:
        if len(m) == 2:
            cls, content = m
            result.append(cls(content=content))
        else:
            cls, content, kwargs = m
            result.append(cls(content=content, additional_kwargs=kwargs))
    return result

# ----------- BASIC TEST CASES -----------

def test_empty_message_list():
    """Test compressing an empty message list returns empty list."""
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([]) # 2.31μs -> 2.14μs (8.28% faster)

def test_single_message_within_limit():
    """Test single message fits within token limit."""
    msg = HumanMessage(content="hello world")
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 7.09μs -> 6.27μs (13.1% faster)

def test_single_message_exceeds_limit():
    """Test single message gets truncated if over token limit."""
    msg = HumanMessage(content="a" * 20)
    cm = ContextManager(token_limit=5)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 21.6μs -> 21.0μs (2.58% faster)

def test_multiple_messages_within_limit():
    """Test multiple messages fit within token limit."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "hi"),
        (AIMessage, "hello"),
    ])
    cm = ContextManager(token_limit=100)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 9.55μs -> 8.54μs (11.9% faster)

def test_multiple_messages_some_exceed():
    """Test only as many messages as fit are included, from the end."""
    msgs = make_msgs([
        (HumanMessage, "a" * 10),
        (AIMessage, "b" * 10),
        (HumanMessage, "c" * 10),
    ])
    # Each message is at least 2 tokens (content + type)
    cm = ContextManager(token_limit=3)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 7.44μs -> 6.51μs (14.3% faster)

def test_preserve_prefix_behavior():
    """Test that prefix messages are preserved if possible."""
    msgs = make_msgs([
        (SystemMessage, "sysmsg"),
        (HumanMessage, "user1"),
        (AIMessage, "ai1"),
        (HumanMessage, "user2"),
    ])
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 10.2μs -> 9.11μs (12.3% faster)

def test_preserve_prefix_and_truncate():
    """Test prefix preserved and truncated if it doesn't fit."""
    msg = HumanMessage(content="a" * 10)
    cm = ContextManager(token_limit=5, preserve_prefix_message_count=1)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 5.20μs -> 4.49μs (15.8% faster)

def test_suffix_truncation():
    """Test that suffix messages are truncated if needed."""
    msgs = make_msgs([
        (HumanMessage, "a" * 10),
        (AIMessage, "b" * 10),
        (HumanMessage, "c" * 10),
    ])
    cm = ContextManager(token_limit=7)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 22.9μs -> 22.0μs (4.04% faster)

def test_preserve_prefix_and_suffix():
    """Test that both prefix and suffix are preserved if possible."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "user1"),
        (AIMessage, "ai1"),
        (HumanMessage, "user2"),
    ])
    cm = ContextManager(token_limit=20, preserve_prefix_message_count=2)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 9.98μs -> 9.08μs (9.93% faster)

# ----------- EDGE TEST CASES -----------

def test_zero_token_limit():
    """Test with zero token limit returns empty list."""
    msgs = make_msgs([
        (HumanMessage, "test"),
        (AIMessage, "test"),
    ])
    cm = ContextManager(token_limit=0)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 4.91μs -> 4.39μs (12.0% faster)

def test_negative_token_limit():
    """Test with negative token limit returns empty list."""
    msgs = make_msgs([
        (HumanMessage, "test"),
        (AIMessage, "test"),
    ])
    cm = ContextManager(token_limit=-5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 4.94μs -> 4.35μs (13.5% faster)

def test_empty_content_message():
    """Test message with empty content still counts for type tokens."""
    msg = HumanMessage(content="")
    cm = ContextManager(token_limit=1)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 4.68μs -> 4.01μs (16.7% faster)

def test_non_english_content():
    """Test token calculation for non-English (e.g., Chinese) content."""
    msg = HumanMessage(content="你好世界")  # 4 Chinese chars
    cm = ContextManager(token_limit=4)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 20.1μs -> 19.9μs (1.22% faster)
    # Now with limit 2, should truncate to 2 chars
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 12.0μs -> 11.4μs (4.58% faster)

def test_additional_kwargs_token_counting():
    """Test that additional_kwargs field is counted for tokens."""
    msg = AIMessage(content="msg", additional_kwargs={"foo": "barbaz"})
    cm = ContextManager(token_limit=2)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 23.3μs -> 22.5μs (3.67% faster)

def test_tool_calls_token_addition():
    """Test that tool_calls in additional_kwargs adds 50 tokens."""
    msg = AIMessage(content="msg", additional_kwargs={"tool_calls": "xyz"})
    # The message should be too large for a small token limit
    cm = ContextManager(token_limit=10)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 23.1μs -> 22.6μs (2.35% faster)

def test_preserve_prefix_greater_than_messages():
    """Test when preserve_prefix_message_count exceeds message count."""
    msgs = make_msgs([
        (SystemMessage, "sys"),
        (HumanMessage, "user"),
    ])
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 7.19μs -> 6.68μs (7.68% faster)


def test_truncation_preserves_other_fields():
    """Test that truncation does not affect other message fields."""
    msg = HumanMessage(content="abcdef", additional_kwargs={"foo": "bar"})
    cm = ContextManager(token_limit=3)
    codeflash_output = cm._compress_messages([msg]); out = codeflash_output # 26.9μs -> 26.0μs (3.64% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_many_short_messages_fit():
    """Test many short messages all fit within a large token limit."""
    msgs = [HumanMessage(content="hi") for _ in range(100)]
    cm = ContextManager(token_limit=1000)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 114μs -> 83.0μs (38.4% faster)

def test_many_messages_some_exceed():
    """Test many messages, only last N fit."""
    msgs = [HumanMessage(content="a" * 10) for _ in range(100)]
    cm = ContextManager(token_limit=15)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 12.4μs -> 10.7μs (16.2% faster)

def test_preserve_prefix_with_large_list():
    """Test preserve_prefix_message_count on large list."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="u" * 5) for _ in range(50)]
    cm = ContextManager(token_limit=100, preserve_prefix_message_count=5)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 76.4μs -> 66.3μs (15.3% faster)

def test_large_token_limit_all_fit():
    """Test that with a huge token limit, all messages fit."""
    msgs = [AIMessage(content="b" * 10) for _ in range(200)]
    cm = ContextManager(token_limit=10000)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 293μs -> 206μs (41.9% faster)

def test_large_scale_truncation():
    """Test that with a large number of large messages, only the last one is truncated and included."""
    msgs = [HumanMessage(content="x" * 100) for _ in range(500)]
    cm = ContextManager(token_limit=50)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 30.3μs -> 28.1μs (7.94% faster)

def test_large_scale_preserve_prefix_and_suffix():
    """Test that with a large number of messages and prefix preservation, prefix and as many suffixes as fit are included."""
    msgs = [SystemMessage(content="sys")] + [HumanMessage(content="u" * 10) for _ in range(999)]
    cm = ContextManager(token_limit=120, preserve_prefix_message_count=3)
    codeflash_output = cm._compress_messages(msgs); out = codeflash_output # 79.9μs -> 68.6μs (16.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from langchain_core.messages.ai import AIMessage
from langchain_core.messages.system import SystemMessage
from src.utils.context_manager import ContextManager

def test_ContextManager__compress_messages():
    ContextManager._compress_messages(ContextManager(0, preserve_prefix_message_count=2), [AIMessage([])])

def test_ContextManager__compress_messages_2():
    ContextManager._compress_messages(ContextManager(1, preserve_prefix_message_count=1), [SystemMessage('')])

def test_ContextManager__compress_messages_3():
    ContextManager._compress_messages(ContextManager(1, preserve_prefix_message_count=0), [AIMessage([])])
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages 7.25μs 5.29μs 37.0%✅
codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages_2 4.72μs 4.81μs -1.83%⚠️
codeflash_concolic_0_gkn0tr/tmpzc54f8fh/test_concolic_coverage.py::test_ContextManager__compress_messages_3 5.11μs 4.62μs 10.8%✅

To edit these changes git checkout codeflash/optimize-ContextManager._compress_messages-mgv01iwf and push.

Codeflash

**Key optimizations:**
- Fast type checking for message classes and content using `type()` instead of `isinstance()` when possible for single dispatch speed.
- Direct attribute access via `getattr` reduces repeated attribute lookup overhead.
- Suffix message building uses `insert(0, ...)` to avoid repeated list concatenations, minimizing temporary list fragmentation for large message lists.
- `_truncate_message_content` checks for attribute existence before slicing, which avoids wasted copy operations if not needed.
- Optimized loop bounds in `_compress_messages` to reduce unnecessary slicing.
- No change to behavioral output, naming, exception preservation, types, or code style.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 17, 2025 15:24
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant