Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Dec 11, 2025

⚡️ This pull request contains optimizations for PR #945

If you approve this dependent PR, these changes will be merged into the original PR branch feat/feedback-loop-for-unmatched-test-results.

This PR will be automatically closed if the original PR is merged.


📄 45% (0.45x) speedup for AiServiceClient._get_valid_candidates in codeflash/api/aiservice.py

⏱️ Runtime : 3.25 milliseconds 2.24 milliseconds (best of 106 runs)

📝 Explanation and details

The optimization achieves a 45% speedup by restructuring how Pydantic model instances are created during markdown parsing.

Key Change: Instead of creating an empty CodeStringsMarkdown() object and repeatedly appending to its code_strings list (which triggers Pydantic field validation on each append), the optimized version collects all code blocks into a plain Python list first, then creates the Pydantic model once with the complete list.

Why This is Faster:

  • Reduced Pydantic overhead: The original code performed O(n) Pydantic field validations as each CodeString was appended. The optimization reduces this to O(1) by doing a single model instantiation.
  • Fewer object mutations: Plain list operations (code_string_list.append()) are significantly faster than mutating Pydantic model fields.
  • Profiler evidence: The line creating CodeStringsMarkdown() dropped from 89.6% of function time (18.05ms) to 81% (8.45ms) - nearly a 2x improvement on the bottleneck line.

Impact on Workloads: This optimization is particularly effective for scenarios processing multiple markdown code blocks (as shown in test results where larger datasets see 46-47% improvements). Since parse_markdown_code is called in a tight loop within _get_valid_candidates, the per-call savings compound significantly when processing batches of optimization candidates.

Test Case Performance: The optimization shows consistent 25-47% improvements across various test scenarios, with the largest gains on tests with multiple candidates or code blocks, confirming the batching approach scales well.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 6 Passed
🌀 Generated Regression Tests 1786 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 85.7%
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
# imports
import pytest
from pydantic import BaseModel

from codeflash.api.aiservice import AiServiceClient


# Simulate the OptimizedCandidateSource enum or type
class OptimizedCandidateSource(str):
    TEST = "test"
    PROD = "prod"


# Simulate the OptimizedCandidate model
class OptimizedCandidate(BaseModel):
    source_code: "CodeStringsMarkdown"
    explanation: str
    optimization_id: str
    source: OptimizedCandidateSource
    parent_id: str | None = None


# --- Unit Tests ---


@pytest.fixture
def client():
    return AiServiceClient()


# --- 1. BASIC TEST CASES ---


def test_single_valid_candidate(client):
    """Test a single optimization with one valid code block."""
    optimizations = [
        {
            "source_code": "```main.py\nprint('Hello, world!')```",
            "explanation": "Prints hello world",
            "optimization_id": "opt_1",
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    oc = result[0]
    cs = oc.source_code.code_strings[0]


def test_multiple_valid_candidates(client):
    """Test multiple optimizations, each with a valid code block."""
    optimizations = [
        {"source_code": "```a.py\nx=1```", "explanation": "Set x", "optimization_id": "opt_2"},
        {"source_code": "```b.py\ny=2```", "explanation": "Set y", "optimization_id": "opt_3"},
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 16.8μs -> 12.9μs (29.9% faster)


def test_candidate_with_parent_id(client):
    """Test that parent_id is included if present."""
    optimizations = [
        {
            "source_code": "```foo.py\npass```",
            "explanation": "No-op",
            "optimization_id": "opt_4",
            "parent_id": "parent_1",
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 10.6μs -> 8.23μs (29.2% faster)


def test_multiple_code_blocks_in_one_candidate(client):
    """Test a single optimization with multiple code blocks."""
    optimizations = [
        {
            "source_code": ("```foo.py\nprint('foo')``````bar.py\nprint('bar')```"),
            "explanation": "Prints foo and bar",
            "optimization_id": "opt_5",
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    code_strings = result[0].source_code.code_strings
    paths = {cs.file_path for cs in code_strings}
    codes = {cs.code for cs in code_strings}


# --- 2. EDGE TEST CASES ---


def test_empty_optimizations_list(client):
    """Test that an empty optimizations list returns an empty result."""
    codeflash_output = client._get_valid_candidates([], OptimizedCandidateSource.TEST)
    result = codeflash_output  # 602ns -> 621ns (3.06% slower)


def test_invalid_markdown_code_block(client):
    """Test that an invalid markdown (no code block) is skipped."""
    optimizations = [
        {"source_code": "This is not a code block.", "explanation": "No code block", "optimization_id": "opt_6"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 12.2μs -> 9.96μs (22.3% faster)


def test_code_block_with_empty_code(client):
    """Test that a code block with empty code is still considered valid."""
    optimizations = [{"source_code": "```empty.py\n```", "explanation": "Empty code", "optimization_id": "opt_7"}]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    cs = result[0].source_code.code_strings[0]


def test_code_block_with_whitespace_code(client):
    """Test that a code block with only whitespace is considered valid."""
    optimizations = [
        {"source_code": "```ws.py\n   \n```", "explanation": "Whitespace code", "optimization_id": "opt_8"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    cs = result[0].source_code.code_strings[0]


def test_missing_explanation_field(client):
    """Test that missing explanation field raises KeyError."""
    optimizations = [{"source_code": "```foo.py\nprint(1)```", "optimization_id": "opt_9"}]
    with pytest.raises(KeyError):
        client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)


def test_missing_optimization_id_field(client):
    """Test that missing optimization_id field raises KeyError."""
    optimizations = [{"source_code": "```foo.py\nprint(1)```", "explanation": "Missing id"}]
    with pytest.raises(KeyError):
        client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)


def test_code_block_with_invalid_path(client):
    """Test that a code block with an invalid file path returns empty candidates."""
    # Path is empty, which is invalid for pathlib.Path
    optimizations = [{"source_code": "```\nprint('no path')```", "explanation": "No path", "optimization_id": "opt_10"}]
    # parse_markdown_code will treat empty path as valid, but let's check
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    cs = result[0].source_code.code_strings[0]


def test_mixed_valid_and_invalid_candidates(client):
    """Test that only valid candidates are returned when some are invalid."""
    optimizations = [
        {"source_code": "This is not a code block.", "explanation": "Not valid", "optimization_id": "opt_11"},
        {"source_code": "```valid.py\nx=42```", "explanation": "Valid", "optimization_id": "opt_12"},
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 17.0μs -> 13.4μs (27.1% faster)


def test_candidate_with_multiple_parent_id_types(client):
    """Test that parent_id can be None or a string."""
    optimizations = [
        {"source_code": "```foo.py\npass```", "explanation": "No parent", "optimization_id": "opt_13"},
        {
            "source_code": "```bar.py\npass```",
            "explanation": "Has parent",
            "optimization_id": "opt_14",
            "parent_id": "parent_14",
        },
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 15.1μs -> 11.0μs (37.2% faster)


def test_code_block_with_newlines_in_path(client):
    """Test that a code block with a newline in the path is parsed correctly (should not match)."""
    # The regex expects the path to be on the first line, so a newline in the path breaks parsing.
    optimizations = [
        {"source_code": "```foo\nbar.py\nprint('bad')```", "explanation": "Bad path", "optimization_id": "opt_15"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 10.6μs -> 8.06μs (31.5% faster)


def test_code_block_with_unicode_path_and_code(client):
    """Test that unicode characters in file path and code are handled."""
    optimizations = [
        {"source_code": "```üñîçødë.py\nprint('你好,世界')```", "explanation": "Unicode", "optimization_id": "opt_16"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    cs = result[0].source_code.code_strings[0]


# --- 3. LARGE SCALE TEST CASES ---


def test_large_number_of_candidates(client):
    """Test with a large number of valid candidates (performance and correctness)."""
    N = 500  # Keep under 1000 for test speed
    optimizations = [
        {"source_code": f"```file_{i}.py\nprint({i})```", "explanation": f"Prints {i}", "optimization_id": f"opt_{i}"}
        for i in range(N)
    ]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    # Spot check a few
    for i in [0, N // 2, N - 1]:
        oc = result[i]
        cs = oc.source_code.code_strings[0]


def test_large_candidate_with_many_code_blocks(client):
    """Test a single optimization with many code blocks."""
    M = 300  # Keep under 1000 for test speed
    code_blocks = "".join(f"```block_{j}.py\nprint({j})```\n" for j in range(M))
    optimizations = [{"source_code": code_blocks, "explanation": "Many blocks", "optimization_id": "opt_many"}]
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output
    code_strings = result[0].source_code.code_strings
    # Spot check a few
    for j in [0, M // 2, M - 1]:
        cs = code_strings[j]


def test_large_mixed_valid_invalid_candidates(client):
    """Test a large list with a mix of valid and invalid candidates."""
    N = 200
    optimizations = []
    for i in range(N):
        if i % 2 == 0:
            # Valid
            optimizations.append(
                {"source_code": f"```good_{i}.py\nx={i}```", "explanation": f"Good {i}", "optimization_id": f"opt_{i}"}
            )
        else:
            # Invalid (no code block)
            optimizations.append(
                {"source_code": f"not a code block {i}", "explanation": f"Bad {i}", "optimization_id": f"opt_{i}"}
            )
    codeflash_output = client._get_valid_candidates(optimizations, OptimizedCandidateSource.TEST)
    result = codeflash_output  # 516μs -> 353μs (46.1% faster)
    for idx, oc in enumerate(result):
        even_i = idx * 2
        cs = oc.source_code.code_strings[0]


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path

# imports
import pytest

from codeflash.api.aiservice import AiServiceClient

# --- Minimal stubs for dependencies to make the test self-contained ---


class CodeString:
    def __init__(self, code: str, file_path: Path):
        self.code = code
        self.file_path = file_path

    def __eq__(self, other):
        return isinstance(other, CodeString) and self.code == other.code and self.file_path == other.file_path


class OptimizedCandidateSource:
    # Just a stub for typing
    pass


class OptimizedCandidate:
    def __init__(self, source_code, explanation, optimization_id, source, parent_id=None):
        self.source_code = source_code
        self.explanation = explanation
        self.optimization_id = optimization_id
        self.source = source
        self.parent_id = parent_id

    def __eq__(self, other):
        return (
            isinstance(other, OptimizedCandidate)
            and self.source_code.code_strings == other.source_code.code_strings
            and self.explanation == other.explanation
            and self.optimization_id == other.optimization_id
            and self.source == other.source
            and self.parent_id == other.parent_id
        )


# --- Unit tests ---


@pytest.fixture
def client():
    return AiServiceClient()


@pytest.fixture
def source():
    return OptimizedCandidateSource()


# 1. Basic Test Cases


def test_single_valid_candidate(client, source):
    # One valid candidate with one valid code block
    optimizations_json = [
        {"source_code": "```foo.py\nprint('hello')```", "explanation": "prints hello", "optimization_id": "opt1"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 12.4μs -> 9.88μs (25.4% faster)


def test_multiple_valid_candidates(client, source):
    # Two valid candidates
    optimizations_json = [
        {"source_code": "```foo.py\nprint('foo')```", "explanation": "foo", "optimization_id": "id1"},
        {
            "source_code": "```bar.py\nprint('bar')```",
            "explanation": "bar",
            "optimization_id": "id2",
            "parent_id": "parent1",
        },
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 15.3μs -> 11.5μs (33.7% faster)


def test_candidate_with_multiple_code_blocks(client, source):
    # A candidate with multiple code blocks
    optimizations_json = [
        {
            "source_code": "```foo.py\nfoo()\n```\n```bar.py\nbar()\n```",
            "explanation": "two files",
            "optimization_id": "multi1",
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 10.7μs -> 8.19μs (30.8% faster)


# 2. Edge Test Cases


def test_empty_optimizations_json(client, source):
    # No candidates
    codeflash_output = client._get_valid_candidates([], source)
    candidates = codeflash_output  # 541ns -> 541ns (0.000% faster)


def test_invalid_code_block_returns_no_candidate(client, source):
    # Invalid code block (illegal file path character)
    optimizations_json = [
        {"source_code": "```foo?.py\nprint('bad')```", "explanation": "bad", "optimization_id": "bad1"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 10.9μs -> 8.35μs (30.2% faster)


def test_mixed_valid_and_invalid_candidates(client, source):
    # One valid, one invalid
    optimizations_json = [
        {"source_code": "```foo.py\nprint('ok')```", "explanation": "ok", "optimization_id": "ok1"},
        {"source_code": "```foo?.py\nprint('bad')```", "explanation": "bad", "optimization_id": "bad1"},
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 14.6μs -> 10.7μs (36.1% faster)


def test_candidate_with_no_code_blocks(client, source):
    # Code block missing (no code block syntax)
    optimizations_json = [{"source_code": "no code blocks here", "explanation": "none", "optimization_id": "none1"}]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 10.00μs -> 7.72μs (29.4% faster)


def test_candidate_with_empty_code_block(client, source):
    # Code block present but file path is empty
    optimizations_json = [
        {"source_code": "```\nprint('no file')```", "explanation": "empty file path", "optimization_id": "empty1"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 10.1μs -> 7.57μs (33.1% faster)


def test_candidate_with_parent_id_none(client, source):
    # parent_id explicitly set to None
    optimizations_json = [
        {
            "source_code": "```foo.py\nprint('foo')```",
            "explanation": "foo",
            "optimization_id": "pid1",
            "parent_id": None,
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 10.2μs -> 7.69μs (32.4% faster)


def test_candidate_with_missing_explanation_and_id(client, source):
    # Should raise KeyError because explanation and optimization_id are required
    optimizations_json = [{"source_code": "```foo.py\nprint('foo')```"}]
    with pytest.raises(KeyError):
        client._get_valid_candidates(optimizations_json, source)


def test_candidate_with_special_characters_in_code(client, source):
    # Code block with special characters, but valid file path
    code = "def foo():\n    print('!@#$%^&*()_+|')"
    optimizations_json = [
        {"source_code": f"```foo.py\n{code}```", "explanation": "special chars", "optimization_id": "special1"}
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 12.3μs -> 10.3μs (19.5% faster)


def test_candidate_with_duplicate_code_blocks(client, source):
    # Two identical code blocks in one candidate
    optimizations_json = [
        {
            "source_code": "```foo.py\nprint('foo')```\n```foo.py\nprint('foo')```",
            "explanation": "dupes",
            "optimization_id": "dupe1",
        }
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 11.0μs -> 8.47μs (29.4% faster)


# 3. Large Scale Test Cases


def test_large_number_of_candidates(client, source):
    # 500 valid candidates
    N = 500
    optimizations_json = [
        {"source_code": f"```file{i}.py\nprint({i})```", "explanation": f"explanation {i}", "optimization_id": f"id{i}"}
        for i in range(N)
    ]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 1.26ms -> 865μs (46.1% faster)
    # Spot check a few
    for i in (0, N // 2, N - 1):
        pass


def test_large_number_of_code_blocks_per_candidate(client, source):
    # One candidate with 100 code blocks
    code_blocks = "\n".join(f"```file{i}.py\nprint({i})```" for i in range(100))
    optimizations_json = [{"source_code": code_blocks, "explanation": "100 blocks", "optimization_id": "largeblocks"}]
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output
    for i, cs in enumerate(candidates[0].source_code.code_strings):
        pass


def test_large_mixed_valid_and_invalid_candidates(client, source):
    # 500 candidates, every 10th is invalid
    N = 500
    optimizations_json = []
    for i in range(N):
        if i % 10 == 0:
            # invalid file path
            optimizations_json.append(
                {"source_code": f"```bad?.py\nprint({i})```", "explanation": f"bad {i}", "optimization_id": f"bad{i}"}
            )
        else:
            optimizations_json.append(
                {"source_code": f"```file{i}.py\nprint({i})```", "explanation": f"ok {i}", "optimization_id": f"ok{i}"}
            )
    codeflash_output = client._get_valid_candidates(optimizations_json, source)
    candidates = codeflash_output  # 1.27ms -> 863μs (47.3% faster)
    # Ensure no candidate has optimization_id starting with 'bad'
    for c in candidates:
        pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr945-2025-12-11T15.08.58 and push.

Codeflash Static Badge

The optimization achieves a **45% speedup** by restructuring how Pydantic model instances are created during markdown parsing. 

**Key Change**: Instead of creating an empty `CodeStringsMarkdown()` object and repeatedly appending to its `code_strings` list (which triggers Pydantic field validation on each append), the optimized version collects all code blocks into a plain Python list first, then creates the Pydantic model once with the complete list.

**Why This is Faster**: 
- **Reduced Pydantic overhead**: The original code performed O(n) Pydantic field validations as each `CodeString` was appended. The optimization reduces this to O(1) by doing a single model instantiation.
- **Fewer object mutations**: Plain list operations (`code_string_list.append()`) are significantly faster than mutating Pydantic model fields.
- **Profiler evidence**: The line creating `CodeStringsMarkdown()` dropped from 89.6% of function time (18.05ms) to 81% (8.45ms) - nearly a 2x improvement on the bottleneck line.

**Impact on Workloads**: This optimization is particularly effective for scenarios processing multiple markdown code blocks (as shown in test results where larger datasets see 46-47% improvements). Since `parse_markdown_code` is called in a tight loop within `_get_valid_candidates`, the per-call savings compound significantly when processing batches of optimization candidates.

**Test Case Performance**: The optimization shows consistent 25-47% improvements across various test scenarios, with the largest gains on tests with multiple candidates or code blocks, confirming the batching approach scales well.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 11, 2025
@mohammedahmed18 mohammedahmed18 merged commit 4dea247 into feat/feedback-loop-for-unmatched-test-results Dec 11, 2025
14 of 23 checks passed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr945-2025-12-11T15.08.58 branch December 11, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants