[FEAT] Code-repair for candidates with unmatched test results #945

mohammedahmed18 · 2025-11-27T14:20:31Z

PR Type

Enhancement, Tests

Description

Add AI-driven code repair flow
Return rich test diff metadata
Parse pytest failures from stdout
Deduplicate candidate evaluation

Diagram Walkthrough

flowchart LR
  OPT["FunctionOptimizer"] -- "compare results" --> EQ["compare_test_results"]
  EQ -- "mismatch + diffs" --> REPAIR["AiServiceClient /code_repair"]
  REPAIR -- "new candidate" --> OPT
  PARSER["parse_test_failures_from_stdout"] -- "test_failures map" --> TR["TestResults"]

File Walkthrough

Relevant files

Enhancement

5 files

aiservice.py `Add code repair request/response handling`	+51/-1
models.py `Introduce TestDiff and code repair request models`	+69/-0
function_optimizer.py `Integrate repair loop and candidate deduplication`	+157/-35
equivalence.py `Return match flag with detailed TestDiffs`	+56/-26
parse_test_output.py `Parse pytest failures into TestResults`	+60/-0

Tests

5 files

test_codeflash_capture.py `Adapt tests to new compare API and add E2E repair scenario`	+305/-6
test_comparator.py `Update assertions for tuple return from comparator`	+14/-7
test_instrument_all_and_run.py `Replace boolean compares with (match, diffs) usage`	+16/-8
test_instrumentation_run_results_aiservice.py `Adjust to new comparator API and expectations`	+6/-4
test_pickle_patcher.py `Migrate to match/diffs comparator semantics`	+4/-4

github-actions · 2025-11-27T14:21:32Z

PR Reviewer Guide 🔍

(Review updated until commit `79387c3`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In compare_test_results, when original_test_result or cdd_test_result is None the function immediately returns (False, []). This short-circuits and discards any accumulated diffs and may hide useful context; also it can be inconsistent if some tests were processed already. Consider appending a DID_PASS/TIMED_OUT diff or continuing to gather diffs before returning. # If helper function instance_state verification is not present, that's ok. continue if ( original_test_result.verification_type and original_test_result.verification_type == VerificationType.INIT_STATE_HELPER and cdd_test_result is None ): continue if original_test_result is None or cdd_test_result is None: return False, [] did_all_timeout = did_all_timeout and original_test_result.timed_out Robustness parse_test_failures_from_stdout relies on regex parsing of pytest stdout blocks which can vary across pytest versions and plugins. The end-of-failures detection and header matching may be brittle; add guards for different formats or fallback to pytest JSON/–maxfail/–reportlog if available. def parse_test_failures_from_stdout(test_results: TestResults, stdout: str) -> TestResults: """Extract individual pytest test failures from stdout grouped by test case qualified name, and add them to the test results.""" lines = stdout.splitlines() start = end = None for i, line in enumerate(lines): if FAILURES_HEADER_RE.search(line.strip()): start = i break if start is None: return test_results for j in range(start + 1, len(lines)): stripped = lines[j].strip() if "short test summary info" in stripped: end = j break # any new === section === block if stripped.startswith("=") and stripped.count("=") > 3: end = j break # If no clear "end", just grap the rest of the string if end is None: end = len(lines) failure_block = lines[start:end] failures: dict[str, str] = {} current_name = None current_lines: list[str] = [] for line in failure_block: m = TEST_HEADER_RE.match(line.strip()) if m: if current_name is not None: failures[current_name] = "".join(current_lines) current_name = m.group(1) current_lines = [] elif current_name: current_lines.append(line + "\n") if current_name: failures[current_name] = "".join(current_lines) test_results.test_failures = failures return test_results State Consistency ast_code_to_id is a mutable instance attribute used across candidate runs and recursive code-repair calls; ensure it’s correctly reset on each determine_best_candidate cycle and remains consistent after code_repair recursion to avoid stale mappings or mismatched optimization_ids. logger.info( f"Determining best optimization candidate (out of {len(candidates)}) for " f"{self.function_to_optimize.qualified_name}…" ) console.rule() future_all_refinements: list[concurrent.futures.Future] = [] self.ast_code_to_id.clear() valid_optimizations = [] optimizations_post = {} # we need to overwrite some opt candidates' code strings as they are no longer evaluated, instead their shorter/longer versions might be evaluated # Start a new thread for AI service request ai_service_client = self.aiservice_client if exp_type == "EXP0" else self.local_aiservice_client future_line_profile_results = self.executor.submit( ai_service_client.optimize_python_code_line_profiler, source_code=code_context.read_writable_code.markdown, dependency_code=code_context.read_only_context_code, trace_id=self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id, line_profiler_results=original_code_baseline.line_profile_results["str_out"], num_candidates=N_CANDIDATES_LP_EFFECTIVE, experiment_metadata=ExperimentMetadata( id=self.experiment_id, group="control" if exp_type == "EXP0" else "experiment" ) if self.experiment_id else None, ) # Initialize candidate processor processor = CandidateProcessor(candidates, future_line_profile_results, future_all_refinements) candidate_index = 0 # Process candidates using queue-based approach while not processor.is_done(): candidate = processor.get_next_candidate() if candidate is None: logger.debug("everything done, exiting") break try: candidate_index += 1 get_run_tmp_file(Path(f"test_return_values_{candidate_index}.bin")).unlink(missing_ok=True) get_run_tmp_file(Path(f"test_return_values_{candidate_index}.sqlite")).unlink(missing_ok=True) logger.info(f"h3\|Optimization candidate {candidate_index}/{processor.candidate_len}:") code_print( candidate.source_code.flat, file_name=f"candidate_{candidate_index}.py", lsp_message_id=LSPMessageId.CANDIDATE.value, ) # map ast normalized code to diff len, unnormalized code # map opt id to the shortest unnormalized code try: did_update = self.replace_function_and_helpers_with_optimized_code( code_context=code_context, optimized_code=candidate.source_code, original_helper_code=original_helper_code, ) if not did_update: logger.warning( "force_lsp\|No functions were replaced in the optimized code. Skipping optimization candidate." ) console.rule() continue except (ValueError, SyntaxError, cst.ParserSyntaxError, AttributeError) as e: logger.error(e) self.write_code_and_helpers( self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path ) continue # check if this code has been evaluated before by checking the ast normalized code string normalized_code = normalize_code(candidate.source_code.flat.strip()) if self.was_candidate_tested_before(normalized_code): self.update_results_for_duplicate_candidate( candidate=candidate, code_context=code_context, normalized_code=normalized_code, speedup_ratios=speedup_ratios, is_correct=is_correct, optimized_runtimes=optimized_runtimes, optimized_line_profiler_results=optimized_line_profiler_results, optimizations_post=optimizations_post, ) continue self.ast_code_to_id[normalized_code] = { "optimization_id": candidate.optimization_id, "shorter_source_code": candidate.source_code, "diff_len": diff_length(candidate.source_code.flat, code_context.read_writable_code.flat), } run_results, new_candidate = self.run_optimized_candidate( optimization_candidate_index=candidate_index, baseline_results=original_code_baseline, original_helper_code=original_helper_code, file_path_to_helper_classes=file_path_to_helper_classes, code_context=code_context, candidate=candidate, exp_type=exp_type, ) if candidate.optimization_id != new_candidate.optimization_id: # override the candidate if the optimization_id has changed, this may happen if the candidate was modified by the code-repair candidate = new_candidate console.rule()

github-actions · 2025-11-27T14:21:55Z

PR Code Suggestions ✨

Latest suggestions up to 79387c3
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Create independent diff records Avoid reusing a single `TestDiff` instance for multiple mismatch scopes; it causes mixed data if multiple fields differ. Create and append a fresh `TestDiff` per mismatch, ensuring each recorded diff is independent and accurate. codeflash/verification/equivalence.py [65-114] test_src_code = original_test_result.id.get_src_code(original_test_result.file_name) -test_diff = TestDiff( - scope=TestDiffScope.RETURN_VALUE, - original_value=f"{original_test_result.return_value!r}", - candidate_value=f"{cdd_test_result.return_value!r}", - test_src_code=test_src_code, - candidate_pytest_error=cdd_pytest_error, - original_pass=original_test_result.did_pass, - candidate_pass=cdd_test_result.did_pass, - original_pytest_error=original_pytest_error, -) + +# Return value diff if not comparator(original_test_result.return_value, cdd_test_result.return_value, superset_obj=superset_obj): - test_diff.scope = TestDiffScope.RETURN_VALUE - test_diffs.append(test_diff) + test_diffs.append( + TestDiff( + scope=TestDiffScope.RETURN_VALUE, + original_value=f"{original_test_result.return_value!r}", + candidate_value=f"{cdd_test_result.return_value!r}", + test_src_code=test_src_code, + candidate_pytest_error=cdd_pytest_error, + original_pass=original_test_result.did_pass, + candidate_pass=cdd_test_result.did_pass, + original_pytest_error=original_pytest_error, + ) + ) -... +# Stdout diff if (original_test_result.stdout and cdd_test_result.stdout) and not comparator( original_test_result.stdout, cdd_test_result.stdout ): - test_diff.scope = TestDiffScope.STDOUT - test_diff.original_value = str(original_test_result.stdout) - test_diff.candidate_value = str(cdd_test_result.stdout) - test_diffs.append(test_diff) + test_diffs.append( + TestDiff( + scope=TestDiffScope.STDOUT, + original_value=str(original_test_result.stdout), + candidate_value=str(cdd_test_result.stdout), + test_src_code=test_src_code, + candidate_pytest_error=cdd_pytest_error, + original_pass=original_test_result.did_pass, + candidate_pass=cdd_test_result.did_pass, + original_pytest_error=original_pytest_error, + ) + ) +# Did-pass diff if original_test_result.test_type in { TestType.EXISTING_UNIT_TEST, TestType.CONCOLIC_COVERAGE_TEST, TestType.GENERATED_REGRESSION, TestType.REPLAY_TEST, } and (cdd_test_result.did_pass != original_test_result.did_pass): - test_diff.scope = TestDiffScope.DID_PASS - test_diff.original_value = str(original_test_result.did_pass) - test_diff.candidate_value = str(cdd_test_result.did_pass) - test_diffs.append(test_diff) + test_diffs.append( + TestDiff( + scope=TestDiffScope.DID_PASS, + original_value=str(original_test_result.did_pass), + candidate_value=str(cdd_test_result.did_pass), + test_src_code=test_src_code, + candidate_pytest_error=cdd_pytest_error, + original_pass=original_test_result.did_pass, + candidate_pass=cdd_test_result.did_pass, + original_pytest_error=original_pytest_error, + ) + ) Suggestion importance[1-10]: 8 __ Why: The current code mutates and reuses a single `TestDiff` object across multiple scopes, risking mixed data; creating separate instances per mismatch is correct and significantly improves accuracy of reported diffs.	Medium
Possible issue	Prevent None access in comparisons Guard against `original_test_result` being `None` before accessing its fields. When a test id exists only on one side, the function currently short-circuits, but these lines can still execute earlier and raise `AttributeError`. Add a defensive check to only compute failure lookups when both results exist. codeflash/verification/equivalence.py [31-42] candidate_test_failures = candidate_results.test_failures original_test_failures = original_results.test_failures -cdd_pytest_error = ( - candidate_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") - if candidate_test_failures - else "" -) -original_pytest_error = ( - original_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") - if original_test_failures - else "" -) +cdd_pytest_error = "" +original_pytest_error = "" + +if original_test_result is not None: + test_name = original_test_result.id.test_fn_qualified_name() + if candidate_test_failures: + cdd_pytest_error = candidate_test_failures.get(test_name, "") + if original_test_failures: + original_pytest_error = original_test_failures.get(test_name, "") + Suggestion importance[1-10]: 7 __ Why: This guards against accessing `original_test_result.id` when `original_test_result` could be None, which would raise an AttributeError. The change is accurate and low-risk, improving robustness without altering logic.	Medium
General	Fix mismatch ratio calculation Use the count of compared test invocations, not the length of the `TestResults` container, to compute the unmatched percentage. `len(candidate_behavior_results)` may not reflect the number of tests compared and can be zero-dividing or misleading; base the denominator on the number of unique ids or on `len(diffs) + matched_count`. codeflash/optimization/function_optimizer.py [1869-1876] -result_unmatched_perc = len(diffs) / len(candidate_behavior_results) +total_compared = len(candidate_behavior_results.get_all_unique_invocation_loop_ids()) +result_unmatched_perc = (len(diffs) / total_compared) if total_compared else 1.0 Suggestion importance[1-10]: 6 __ Why: Using the count of unique invocation ids avoids misleading denominators and division by zero; the proposal is reasonable and improves correctness, though its impact is moderate within the broader flow.	Low

Previous suggestions

Suggestions up to commit 5830a70

Category	Suggestion	Impact
Possible issue	Safely access optional mapping Guard access to `test_failures` since it can be `None` and avoid AttributeError. Also handle missing keys safely to keep comparison robust when no failures were parsed. codeflash/verification/equivalence.py [43] -candidate_pytest_error = candidate_results.test_failures.get(original_test_result.id.test_function_name) +candidate_pytest_error = None +if getattr(candidate_results, "test_failures", None): + candidate_pytest_error = candidate_results.test_failures.get(original_test_result.id.test_function_name) Suggestion importance[1-10]: 8 __ Why: `test_failures` is declared Optional in `TestResults`, so direct `.get` can raise if None; guarding prevents an AttributeError and aligns with new parsing logic.	Medium
Possible issue	Ensure recursion limit restoration Preserve the recursion limit restoration even on early returns to avoid leaving the process with a higher limit. Move recursion limit increase before any early return or ensure restoration in all paths. codeflash/verification/equivalence.py [30-35] if len(original_results) == 0 or len(candidate_results) == 0: - return False, [] # empty test results are not equal + return False, [] +original_recursion_limit = sys.getrecursionlimit() +try: + if original_recursion_limit < INCREASED_RECURSION_LIMIT: + sys.setrecursionlimit(INCREASED_RECURSION_LIMIT) + # ... rest of the function body unchanged ... +finally: + sys.setrecursionlimit(original_recursion_limit) Suggestion importance[1-10]: 6 __ Why: Early return before saving/restoring the recursion limit can skip restoration if that logic ever moves; wrapping with try/finally improves robustness though current early return happens before any change.	Low
General	Use logger instead of print Replace `print` with the existing logger to keep consistent output handling and avoid noisy stdout in library code. Log the exception with traceback for better diagnostics. codeflash/verification/equivalence.py [76-87] try: - print( - f"File Name: {original_test_result.file_name}\n" - f"Test Type: {original_test_result.test_type}\n" - f"Verification Type: {original_test_result.verification_type}\n" - f"Invocation ID: {original_test_result.id}\n" - f"Original return value: {original_test_result.return_value}\n" - f"Candidate return value: {cdd_test_result.return_value}\n" + logger.debug( + "File Name: %s\nTest Type: %s\nVerification Type: %s\nInvocation ID: %s\nOriginal return value: %r\nCandidate return value: %r", + original_test_result.file_name, + original_test_result.test_type, + original_test_result.verification_type, + original_test_result.id, + original_test_result.return_value, + cdd_test_result.return_value, ) -except Exception as e: - logger.error(e) +except Exception: + logger.exception("Failed to log return value comparison details") break Suggestion importance[1-10]: 7 __ Why: Replacing `print` with `logger.debug/exception` keeps output consistent and avoids noisy stdout; the improved code accurately mirrors the existing block’s intent with better diagnostics.	Medium

codeflash-ai · 2025-11-27T14:39:27Z

codeflash/discovery/functions_to_optimize.py

+                x = prev[index1]
+                y = prev[index1 + 1]
+                z = curr[index1]
+                min_xy = min(x, y)
+                min_xyz = min(z, min_xy)
+                curr[index1 + 1] = 1 + min_xyz


⚡️Codeflash found 73% (0.73x) speedup for levenshtein_distance in codeflash/discovery/functions_to_optimize.py

⏱️ Runtime : 2.04 seconds → 1.18 seconds (best of 8 runs)

📝 Explanation and details

The optimized version achieves a 73% speedup by eliminating Python's built-in min() function calls and replacing them with direct comparisons. This is a targeted micro-optimization that addresses one of the most expensive operations in the Levenshtein distance algorithm.

Key optimization:

Replaced min() calls with direct comparisons: The original code used min(x, y) and min(z, min_xy) which create temporary tuples and invoke Python's generic minimum function. The optimized version uses nested if statements to find the minimum value directly, avoiding function call overhead and tuple creation.

Why this provides a speedup:

The min() function in Python has significant overhead for small numbers of arguments, especially when called millions of times in nested loops

Direct comparisons (if x < y) are primitive operations that execute much faster than function calls

Eliminates temporary tuple creation that min() uses internally

Reduces the call stack depth in the inner loop

Performance impact by test case type:

Identical/similar strings: 55-65% faster - benefits from reduced overhead in character matching paths

Completely different strings: 109-121% faster - maximizes benefit since every character comparison triggers the min() replacement logic

Large strings with many differences: 83-93% faster - compounds the per-operation savings across many iterations

Small strings: 15-50% faster - still benefits but overhead reduction is less pronounced

The optimization is particularly effective for the Levenshtein algorithm because the min() operation occurs in the innermost loop that executes O(n×m) times, making even small per-call improvements significant when multiplied across all iterations.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 148 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 96.6%

🌀 Generated Regression Tests and Runtime

from __future__ import annotations # imports import pytest # used for our unit tests from codeflash.discovery.functions_to_optimize import levenshtein_distance # unit tests # 1. Basic Test Cases def test_identical_strings(): # Levenshtein distance between identical strings should be 0 codeflash_output = levenshtein_distance("kitten", "kitten") # 15.8μs -> 9.90μs (60.0% faster) codeflash_output = levenshtein_distance("", "") # 480ns -> 441ns (8.84% faster) codeflash_output = levenshtein_distance("a", "a") # 2.09μs -> 1.95μs (7.22% faster) def test_single_insertion(): # Inserting one character codeflash_output = levenshtein_distance("kitten", "kitte") # 13.0μs -> 8.69μs (49.4% faster) codeflash_output = levenshtein_distance("kitte", "kitten") # 10.1μs -> 6.12μs (65.6% faster) codeflash_output = levenshtein_distance("", "a") # 421ns -> 421ns (0.000% faster) codeflash_output = levenshtein_distance("a", "") # 360ns -> 361ns (0.277% slower) def test_single_deletion(): # Deleting one character codeflash_output = levenshtein_distance("kitten", "kittn") # 12.6μs -> 8.52μs (48.5% faster) codeflash_output = levenshtein_distance("kittn", "kitten") # 10.0μs -> 5.96μs (67.9% faster) def test_single_substitution(): # Substituting one character codeflash_output = levenshtein_distance("kitten", "sitten") # 14.8μs -> 9.26μs (60.2% faster) codeflash_output = levenshtein_distance("kitten", "kitteb") # 11.7μs -> 6.81μs (72.4% faster) codeflash_output = levenshtein_distance("a", "b") # 2.22μs -> 1.89μs (17.5% faster) def test_multiple_operations(): # Multiple edits required codeflash_output = levenshtein_distance("kitten", "sitting") # 16.4μs -> 10.3μs (58.8% faster) codeflash_output = levenshtein_distance("flaw", "lawn") # 6.70μs -> 4.47μs (50.0% faster) def test_empty_and_nonempty(): # One string empty, one non-empty codeflash_output = levenshtein_distance("", "abc") # 751ns -> 751ns (0.000% faster) codeflash_output = levenshtein_distance("abc", "") # 431ns -> 451ns (4.43% slower) # 2. Edge Test Cases def test_both_empty(): # Both strings are empty codeflash_output = levenshtein_distance("", "") # 781ns -> 761ns (2.63% faster) def test_one_char_vs_empty(): # One string is a single character, other is empty codeflash_output = levenshtein_distance("a", "") # 771ns -> 781ns (1.28% slower) codeflash_output = levenshtein_distance("", "z") # 431ns -> 441ns (2.27% slower) def test_case_sensitivity(): # Case should matter codeflash_output = levenshtein_distance("abc", "Abc") # 7.70μs -> 5.87μs (31.1% faster) codeflash_output = levenshtein_distance("ABC", "abc") # 5.14μs -> 3.73μs (37.9% faster) def test_unicode_characters(): # Unicode characters codeflash_output = levenshtein_distance("café", "cafe") # 9.39μs -> 6.81μs (37.8% faster) codeflash_output = levenshtein_distance("naïve", "naive") # 9.85μs -> 5.75μs (71.3% faster) codeflash_output = levenshtein_distance("你好", "你") # 3.12μs -> 2.81μs (10.7% faster) codeflash_output = levenshtein_distance("你好", "您好") # 3.10μs -> 2.71μs (14.5% faster) def test_completely_different_strings(): # No characters in common codeflash_output = levenshtein_distance("abc", "xyz") # 7.45μs -> 5.61μs (32.9% faster) codeflash_output = levenshtein_distance("123", "abc") # 5.14μs -> 3.46μs (48.7% faster) def test_prefix_and_suffix(): # One string is a prefix or suffix of the other codeflash_output = levenshtein_distance("abc", "abcd") # 7.88μs -> 6.11μs (29.0% faster) codeflash_output = levenshtein_distance("abcd", "abc") # 5.18μs -> 3.78μs (37.1% faster) codeflash_output = levenshtein_distance("abc", "zabc") # 5.23μs -> 3.41μs (53.6% faster) codeflash_output = levenshtein_distance("abc", "abcz") # 4.87μs -> 3.19μs (52.8% faster) def test_repeated_characters(): # Strings with repeated characters codeflash_output = levenshtein_distance("aaa", "aaaa") # 4.89μs -> 4.79μs (2.11% faster) codeflash_output = levenshtein_distance("aaaa", "aaa") # 2.92μs -> 3.06μs (4.89% slower) codeflash_output = levenshtein_distance("aaa", "bbb") # 5.54μs -> 3.56μs (55.7% faster) def test_numbers_and_symbols(): # Strings with digits and symbols codeflash_output = levenshtein_distance("1234", "1243") # 8.68μs -> 6.73μs (28.9% faster) codeflash_output = levenshtein_distance("!@#$", "!@#") # 5.76μs -> 4.13μs (39.6% faster) codeflash_output = levenshtein_distance("!@#$", "$#@!") # 6.25μs -> 4.45μs (40.5% faster) def test_long_identical_strings(): # Long identical strings (edge, but also performance) s = "a" * 100 codeflash_output = levenshtein_distance(s, s) # 519μs -> 535μs (2.86% slower) def test_long_strings_one_difference(): # Long strings with one difference at the end s1 = "a" * 999 + "b" s2 = "a" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 60.1ms -> 59.3ms (1.27% faster) codeflash_output = levenshtein_distance(s2, s1) # 60.3ms -> 59.7ms (1.11% faster) def test_long_strings_completely_different(): # Long completely different strings s1 = "a" * 500 s2 = "b" * 500 codeflash_output = levenshtein_distance(s1, s2) # 67.1ms -> 30.4ms (121% faster) # 3. Large Scale Test Cases def test_large_equal_strings(): # Large identical strings s = "abcde" * 200 # length 1000 codeflash_output = levenshtein_distance(s, s) # 242ms -> 114ms (111% faster) def test_large_one_insertion(): # Large string with one insertion s1 = "a" * 500 + "b" + "a" * 499 # length 1000 s2 = "a" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 58.2ms -> 56.2ms (3.59% faster) def test_large_one_substitution(): # Large string with one substitution in the middle s1 = "a" * 499 + "b" + "a" * 500 s2 = "a" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 57.9ms -> 57.2ms (1.16% faster) def test_large_completely_different(): # Large strings, all substitutions s1 = "a" * 1000 s2 = "b" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 274ms -> 129ms (112% faster) def test_large_half_and_half(): # Half the string is the same, half is different s1 = "a" * 500 + "b" * 500 s2 = "a" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 171ms -> 93.5ms (83.5% faster) def test_large_with_unicode(): # Large string with unicode characters s1 = "你" * 500 + "好" * 500 s2 = "你" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 174ms -> 96.3ms (81.0% faster) # 4. Additional Robustness Cases @pytest.mark.parametrize( "s1,s2,expected", [ ("", "", 0), ("", "abc", 3), ("abc", "", 3), ("abc", "abc", 0), ("abc", "ab", 1), ("a", "b", 1), ("", "a", 1), ("a", "", 1), ("kitten", "sitting", 3), ("flaw", "lawn", 2), ("intention", "execution", 5), ("distance", "difference", 5), ("abcdef", "azced", 3), ("short", "ports", 3), ], ) def test_various_cases(s1, s2, expected): # Parametrized test for various scenarios codeflash_output = levenshtein_distance(s1, s2) # 130μs -> 85.5μs (52.5% faster) # 5. Commutativity property (Levenshtein distance is symmetric) def test_commutativity(): pairs = [ ("kitten", "sitting"), ("flaw", "lawn"), ("abc", "xyz"), ("", "abc"), ("a" * 500, "b" * 500), ("abcde" * 100, "edcba" * 100), ] for s1, s2 in pairs: codeflash_output = levenshtein_distance(s1, s2) d1 = codeflash_output # 126ms -> 58.6ms (116% faster) codeflash_output = levenshtein_distance(s2, s1) d2 = codeflash_output # 126ms -> 58.8ms (115% faster) # 6. Triangle inequality property def test_triangle_inequality(): # For Levenshtein distance, d(x,z) <= d(x,y) + d(y,z) triples = [("kitten", "sitting", "sittin"), ("abc", "abd", "ab"), ("a" * 100, "a" * 99 + "b", "a" * 99 + "c")] for x, y, z in triples: codeflash_output = levenshtein_distance(x, z) d_xz = codeflash_output # 557μs -> 537μs (3.89% faster) codeflash_output = levenshtein_distance(x, y) d_xy = codeflash_output # 553μs -> 532μs (3.98% faster) codeflash_output = levenshtein_distance(y, z) d_yz = codeflash_output # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations # imports import pytest # used for our unit tests from codeflash.discovery.functions_to_optimize import levenshtein_distance # unit tests # 1. Basic Test Cases def test_identical_strings(): # Identical strings should have distance 0 codeflash_output = levenshtein_distance("kitten", "kitten") # 14.4μs -> 9.29μs (55.1% faster) codeflash_output = levenshtein_distance("", "") # 611ns -> 521ns (17.3% faster) codeflash_output = levenshtein_distance("a", "a") # 2.03μs -> 1.98μs (2.52% faster) def test_single_insertion(): # One insertion required codeflash_output = levenshtein_distance("kitten", "kittena") # 16.1μs -> 9.74μs (65.7% faster) codeflash_output = levenshtein_distance("abc", "abcd") # 5.73μs -> 3.86μs (48.6% faster) def test_single_deletion(): # One deletion required codeflash_output = levenshtein_distance("kitten", "kittn") # 12.9μs -> 8.69μs (49.0% faster) codeflash_output = levenshtein_distance("abcd", "abc") # 5.71μs -> 4.03μs (41.8% faster) def test_single_substitution(): # One substitution required codeflash_output = levenshtein_distance("kitten", "kittan") # 14.5μs -> 9.22μs (57.3% faster) codeflash_output = levenshtein_distance("abc", "adc") # 4.67μs -> 3.47μs (34.7% faster) def test_multiple_operations(): # Multiple operations needed codeflash_output = levenshtein_distance("kitten", "sitting") # 16.6μs -> 10.1μs (65.1% faster) codeflash_output = levenshtein_distance("flaw", "lawn") # 6.70μs -> 4.50μs (49.0% faster) codeflash_output = levenshtein_distance("gumbo", "gambol") # 10.7μs -> 6.22μs (72.6% faster) def test_case_sensitivity(): # Should be case-sensitive codeflash_output = levenshtein_distance("a", "A") # 4.12μs -> 3.55μs (16.1% faster) codeflash_output = levenshtein_distance("Python", "python") # 13.1μs -> 7.71μs (69.8% faster) def test_completely_different_strings(): # All characters different codeflash_output = levenshtein_distance("abc", "xyz") # 7.57μs -> 5.60μs (35.2% faster) codeflash_output = levenshtein_distance("aaa", "bbb") # 4.95μs -> 3.26μs (52.0% faster) # 2. Edge Test Cases def test_empty_strings(): # One or both strings empty codeflash_output = levenshtein_distance("", "abc") # 822ns -> 751ns (9.45% faster) codeflash_output = levenshtein_distance("abc", "") # 441ns -> 460ns (4.13% slower) codeflash_output = levenshtein_distance("", "") # 290ns -> 321ns (9.66% slower) def test_one_character_strings(): # Single character to/from empty or another char codeflash_output = levenshtein_distance("a", "") # 742ns -> 771ns (3.76% slower) codeflash_output = levenshtein_distance("", "a") # 431ns -> 411ns (4.87% faster) codeflash_output = levenshtein_distance("a", "b") # 3.80μs -> 3.29μs (15.5% faster) def test_unicode_strings(): # Unicode and multi-byte characters codeflash_output = levenshtein_distance("café", "cafe") # 9.28μs -> 6.86μs (35.2% faster) codeflash_output = levenshtein_distance("你好", "你们好") # 4.51μs -> 3.69μs (22.3% faster) codeflash_output = levenshtein_distance("🙂", "🙃") # 2.33μs -> 2.08μs (12.0% faster) codeflash_output = levenshtein_distance("a🙂b", "a🙃b") # 4.81μs -> 3.54μs (36.0% faster) def test_whitespace_and_special_chars(): # Strings with whitespace and special characters codeflash_output = levenshtein_distance("a b", "ab") # 6.26μs -> 5.17μs (21.1% faster) codeflash_output = levenshtein_distance("a_b", "a-b") # 5.12μs -> 3.48μs (47.3% faster) codeflash_output = levenshtein_distance("hello!", "hello") # 10.1μs -> 5.99μs (68.2% faster) def test_long_repeated_chars(): # Strings with repeated characters codeflash_output = levenshtein_distance("aaaaa", "aaaa") # 5.47μs -> 5.39μs (1.48% faster) codeflash_output = levenshtein_distance("aaaaa", "bbbbb") # 10.9μs -> 6.39μs (71.0% faster) def test_palindromes_and_reverses(): # Palindrome and reversed strings codeflash_output = levenshtein_distance("abcde", "edcba") # 11.9μs -> 7.68μs (54.8% faster) def test_large_difference_in_length(): # One string much longer than the other codeflash_output = levenshtein_distance("a", "a" * 100) # 25.4μs -> 25.7μs (1.09% slower) codeflash_output = levenshtein_distance("b" * 100, "b") # 23.3μs -> 23.4μs (0.474% slower) def test_strings_with_numbers(): # Strings with numbers codeflash_output = levenshtein_distance("abc123", "abc124") # 14.5μs -> 9.02μs (60.9% faster) codeflash_output = levenshtein_distance("12345", "54321") # 9.13μs -> 5.82μs (56.8% faster) # 3. Large Scale Test Cases def test_large_identical_strings(): # Large identical strings should have distance 0 s = "a" * 500 codeflash_output = levenshtein_distance(s, s) # 13.9ms -> 13.5ms (2.37% faster) def test_large_one_insertion(): # Large string with one insertion s1 = "a" * 499 s2 = "a" * 250 + "b" + "a" * 249 codeflash_output = levenshtein_distance(s1, s2) # 13.8ms -> 13.6ms (1.61% faster) def test_large_one_deletion(): # Large string with one deletion s1 = "a" * 500 s2 = "a" * 499 codeflash_output = levenshtein_distance(s1, s2) # 13.7ms -> 13.5ms (1.69% faster) def test_large_one_substitution(): # Large string with one substitution in the middle s1 = "a" * 250 + "b" + "a" * 249 s2 = "a" * 500 codeflash_output = levenshtein_distance(s1, s2) # 13.9ms -> 13.5ms (2.27% faster) def test_large_completely_different(): # Large strings, all characters different s1 = "a" * 500 s2 = "b" * 500 codeflash_output = levenshtein_distance(s1, s2) # 67.2ms -> 30.7ms (119% faster) def test_large_partial_overlap(): # Large strings with partial overlap s1 = "a" * 250 + "b" * 250 s2 = "a" * 200 + "b" * 300 # 50 a's replaced with b's codeflash_output = levenshtein_distance(s1, s2) # 41.7ms -> 21.7ms (92.6% faster) def test_large_strings_with_unicode(): # Large strings with unicode characters s1 = "é" * 500 s2 = "e" * 500 codeflash_output = levenshtein_distance(s1, s2) # 67.2ms -> 30.4ms (121% faster) def test_large_strings_with_alternating_chars(): # Alternating characters s1 = "ab" * 250 s2 = "ba" * 250 # Each position is different except for the middle if even length codeflash_output = levenshtein_distance(s1, s2) # 41.5ms -> 21.5ms (92.9% faster) # 4. Additional Edge Cases def test_nonequivalent_lengths_and_content(): # Both length and content differ codeflash_output = levenshtein_distance("abcdefg", "xyz") # 12.9μs -> 8.40μs (53.8% faster) def test_substring(): # One string is a substring of the other codeflash_output = levenshtein_distance("abcdef", "abc") # 9.93μs -> 7.42μs (33.7% faster) codeflash_output = levenshtein_distance("abc", "abcdef") # 7.66μs -> 4.98μs (53.7% faster) def test_strings_with_tabs_and_newlines(): # Special whitespace characters codeflash_output = levenshtein_distance("abc\tdef", "abcdef") # 16.8μs -> 10.3μs (62.8% faster) codeflash_output = levenshtein_distance("abc\ndef", "abcdef") # 13.7μs -> 7.80μs (76.0% faster) def test_zero_length_and_long_string(): # One empty, one long codeflash_output = levenshtein_distance("", "a" * 999) # 912ns -> 811ns (12.5% faster) codeflash_output = levenshtein_distance("b" * 999, "") # 631ns -> 541ns (16.6% faster) # 5. Determinism and Symmetry @pytest.mark.parametrize( "s1,s2", [ ("kitten", "sitting"), ("flaw", "lawn"), ("", "abc"), ("abc", ""), ("abc", "cba"), ("abc", "abc"), ("", ""), ("a", "b"), ("abc123", "abc124"), ("a" * 500, "a" * 500), ], ) def test_symmetry(s1, s2): # Levenshtein distance is symmetric codeflash_output = levenshtein_distance(s1, s2) # 13.8ms -> 13.5ms (1.90% faster) # 6. Type robustness def test_non_string_inputs(): # Should raise TypeError if input is not string with pytest.raises(TypeError): levenshtein_distance(123, "abc") with pytest.raises(TypeError): levenshtein_distance("abc", None) with pytest.raises(TypeError): levenshtein_distance(["a", "b"], "ab") with pytest.raises(TypeError): levenshtein_distance("ab", ["a", "b"]) # 7. Stress test: Large but feasible within constraints def test_large_strings_max_size(): # Both strings at the upper limit (1000 chars) s1 = "a" * 1000 s2 = "b" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 272ms -> 130ms (109% faster) def test_large_strings_one_char_difference(): # 999 identical, 1 different s1 = "a" * 999 + "b" s2 = "a" * 1000 codeflash_output = levenshtein_distance(s1, s2) # 58.4ms -> 57.5ms (1.56% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr945-2025-11-27T14.39.26

Click to see suggested changes

Suggested change

x = prev[index1]

y = prev[index1 + 1]

z = curr[index1]

min_xy = min(x, y)

min_xyz = min(z, min_xy)

curr[index1 + 1] = 1 + min_xyz

# Avoid min() function call overhead by using direct comparisons

x = prev[index1]

y = prev[index1 + 1]

z = curr[index1]

if x < y:

if x < z:

curr[index1 + 1] = 1 + x

else:

curr[index1 + 1] = 1 + z

elif y < z:

curr[index1 + 1] = 1 + y

else:

curr[index1 + 1] = 1 + z

The optimized code achieves a **15% speedup** through several targeted micro-optimizations that reduce computational overhead in the parsing loop: **Key Optimizations:** 1. **Single-pass boundary search**: Instead of checking both conditions (`start_line != -1 and end_line != -1`) on every iteration, the optimized version uses `None` values and breaks immediately when both markers are found, eliminating redundant condition checks. 2. **Fast-path string matching**: Before calling the expensive `.startswith("_______")` method, it first checks if `line[0] == "_"`, avoiding the method call for most lines that don't start with underscores. 3. **Method lookup optimization**: Pulls `current_failure_lines.append` into a local variable to avoid repeated attribute lookups in the hot loop where failure lines are processed. 4. **Memory-efficient list management**: Uses `current_failure_lines.clear()` instead of creating new list objects (`current_failure_lines = []`), reducing object allocation pressure. **Performance Impact:** The optimizations show the most significant gains in large-scale scenarios: - **Large failure sets**: 14.2% faster with 500 failures, 14.0% faster with 999 failures - **Large output**: 29.2% faster for single failures with 1000 lines of output - **Complex scenarios**: 22.3% faster with 50 cases having 10 lines each **Hot Path Context:** Based on the function reference, `parse_test_failures_from_stdout` is called from `parse_test_results`, which appears to be part of a test optimization pipeline. The function processes pytest stdout to extract failure information, making it performance-critical when dealing with large test suites or verbose test outputs. The 15% improvement becomes meaningful when processing hundreds of test failures in CI/CD environments or during iterative code optimization workflows.

codeflash-ai · 2025-11-27T14:49:11Z

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for `parse_test_failures_from_stdout` in `codeflash/verification/parse_test_output.py`

⏱️ Runtime : 2.76 milliseconds → 2.39 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function parse_test_failures_from_stdout by 16% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #946

If you approve, it will be merged into this PR (branch feat/feedback-loop-for-unmatched-test-results).

…25-11-27T14.49.01 ⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

codeflash-ai · 2025-11-27T16:01:33Z

This PR is now faster! 🚀 @mohammedahmed18 accepted my optimizations from:

⚡️ Speed up function parse_test_failures_from_stdout by 16% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #946

codeflash-ai · 2025-11-27T18:27:01Z

⚡️ Codeflash found optimizations for this PR

📄 655% (6.55x) speedup for `compare_test_results` in `codeflash/verification/equivalence.py`

⏱️ Runtime : 90.0 milliseconds → 11.9 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function compare_test_results by 655% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #947

If you approve, it will be merged into this PR (branch feat/feedback-loop-for-unmatched-test-results).

CLAassistant · 2025-11-30T22:11:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

…edback-loop-for-unmatched-test-results

aseembits93 · 2025-12-10T22:20:09Z

codeflash/optimization/function_optimizer.py

-                console.rule()
-                return Failure("Test results did not match the test results of the original code.")
+
+                def repair_if_possible() -> None:


nitpick: not a big fan of nested functions

aseembits93 · 2025-12-10T22:24:25Z

codeflash/verification/equivalence.py

+        )
+
+        test_src_code = original_test_result.id.get_src_code(original_test_result.file_name)
+        test_diff = TestDiff(


we're constructing it now for cases when the test cases match, this would slow down the comparator, something to improve

…edback-loop-for-unmatched-test-results

The optimization achieves a **45% speedup** by restructuring how Pydantic model instances are created during markdown parsing. **Key Change**: Instead of creating an empty `CodeStringsMarkdown()` object and repeatedly appending to its `code_strings` list (which triggers Pydantic field validation on each append), the optimized version collects all code blocks into a plain Python list first, then creates the Pydantic model once with the complete list. **Why This is Faster**: - **Reduced Pydantic overhead**: The original code performed O(n) Pydantic field validations as each `CodeString` was appended. The optimization reduces this to O(1) by doing a single model instantiation. - **Fewer object mutations**: Plain list operations (`code_string_list.append()`) are significantly faster than mutating Pydantic model fields. - **Profiler evidence**: The line creating `CodeStringsMarkdown()` dropped from 89.6% of function time (18.05ms) to 81% (8.45ms) - nearly a 2x improvement on the bottleneck line. **Impact on Workloads**: This optimization is particularly effective for scenarios processing multiple markdown code blocks (as shown in test results where larger datasets see 46-47% improvements). Since `parse_markdown_code` is called in a tight loop within `_get_valid_candidates`, the per-call savings compound significantly when processing batches of optimization candidates. **Test Case Performance**: The optimization shows consistent 25-47% improvements across various test scenarios, with the largest gains on tests with multiple candidates or code blocks, confirming the batching approach scales well.

codeflash-ai · 2025-12-11T15:09:07Z

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for `AiServiceClient._get_valid_candidates` in `codeflash/api/aiservice.py`

⏱️ Runtime : 3.25 milliseconds → 2.24 milliseconds (best of 106 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method AiServiceClient._get_valid_candidates by 45% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #966

If you approve, it will be merged into this PR (branch feat/feedback-loop-for-unmatched-test-results).

…25-12-11T15.08.58 ⚡️ Speed up method `AiServiceClient._get_valid_candidates` by 45% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

codeflash-ai · 2025-12-11T17:00:15Z

This PR is now faster! 🚀 @mohammedahmed18 accepted my optimizations from:

⚡️ Speed up method AiServiceClient._get_valid_candidates by 45% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #966

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

…-test-results' into feat/shorten-test-feedback

shorten error string, more enhancements needed

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

quick and dirty

5830a70

github-actions bot added the Review effort 3/5 label Nov 27, 2025

safter

3e0440b

mohammedahmed18 marked this pull request as draft November 27, 2025 14:27

codeflash-ai bot reviewed Nov 27, 2025

View reviewed changes

codeflash-ai bot mentioned this pull request Nov 27, 2025

⚡️ Speed up function parse_test_failures_from_stdout by 16% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #946

Merged

Merge pull request #946 from codeflash-ai/codeflash/optimize-pr945-20…

168118a

…25-11-27T14.49.01 ⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

mohammedahmed18 added 2 commits November 27, 2025 19:51

fix tests

a7f8816

linting

4e9f894

codeflash-ai bot mentioned this pull request Nov 27, 2025

⚡️ Speed up function compare_test_results by 655% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #947

Open

mohammedahmed18 and others added 5 commits November 28, 2025 10:49

did it pass ?

1c9abaf

revert test optimization

0b2d894

cleaner

ecfa89f

test: try to fix the candidate and see if the diff is empty

6ea2545

capture all test discrepancies

fe68772

Codeflash Bot and others added 8 commits November 30, 2025 18:28

do the repair in main loop

ed39ec8

todo write backend endpoint

142da4c

need to test now

5a7c356

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

8a28d0d

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

works, figure out logging

5ed5dfc

local db logging

fe33c82

ready to run experiments

83814be

logging fix

0325444

mohammedahmed18 added 10 commits December 4, 2025 15:48

fixes

4c13bb9

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/fe…

12dc7e1

…edback-loop-for-unmatched-test-results

small changes

d66d2ce

add code repairs to the queue

b4474f3

optimization source

726405b

make it work

ee4749a

repair the code after refinement if needed

15b72b1

typo

e63d39f

optimization source and parents

cc9ad56

some heuristics to limit code repair from generating many candidates

4976d5d

aseembits93 reviewed Dec 10, 2025

View reviewed changes

Codeflash Bot and others added 5 commits December 10, 2025 18:50

shorten error string, more enhancements needed

704a4b0

reprlib repr for shorter repr

ff584fb

enhancements

ae080d0

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/fe…

6c8be65

…edback-loop-for-unmatched-test-results

codeflash-ai bot mentioned this pull request Dec 11, 2025

⚡️ Speed up method AiServiceClient._get_valid_candidates by 45% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #966

Merged

Merge pull request #966 from codeflash-ai/codeflash/optimize-pr945-20…

4dea247

…25-12-11T15.08.58 ⚡️ Speed up method `AiServiceClient._get_valid_candidates` by 45% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

mohammedahmed18 and others added 9 commits December 11, 2025 19:00

fix failing test

b5ca2b4

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

b438243

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

fix validation error for python code

f219996

remove comment

9d099ac

Merge remote-tracking branch 'origin/feat/feedback-loop-for-unmatched…

68e0e7c

…-test-results' into feat/shorten-test-feedback

Merge pull request #965 from codeflash-ai/feat/shorten-test-feedback

cb8ce22

shorten error string, more enhancements needed

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

41de7be

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

fixes

a6a5578

Merge branch 'main' into feat/feedback-loop-for-unmatched-test-results

93f7331

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 148 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	96.6%

[FEAT] Code-repair for candidates with unmatched test results #945

Are you sure you want to change the base?

[FEAT] Code-repair for candidates with unmatched test results #945

Uh oh!

Conversation

mohammedahmed18 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 79387c3)

Uh oh!

github-actions bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

codeflash-ai bot Nov 27, 2025

Choose a reason for hiding this comment

⚡️Codeflash found 73% (0.73x) speedup for levenshtein_distance in codeflash/discovery/functions_to_optimize.py

Uh oh!

codeflash-ai bot commented Nov 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for parse_test_failures_from_stdout in codeflash/verification/parse_test_output.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function parse_test_failures_from_stdout by 16% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #946

Uh oh!

codeflash-ai bot commented Nov 27, 2025

Uh oh!

codeflash-ai bot commented Nov 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 655% (6.55x) speedup for compare_test_results in codeflash/verification/equivalence.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function compare_test_results by 655% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #947

Uh oh!

CLAassistant commented Nov 30, 2025

Uh oh!

aseembits93 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

aseembits93 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Dec 11, 2025

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for AiServiceClient._get_valid_candidates in codeflash/api/aiservice.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method AiServiceClient._get_valid_candidates by 45% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #966

Uh oh!

codeflash-ai bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohammedahmed18 commented Nov 27, 2025 •

edited

Loading

github-actions bot commented Nov 27, 2025 •

edited

Loading

(Review updated until commit `79387c3`)

github-actions bot commented Nov 27, 2025 •

edited

Loading

⚡️Codeflash found 73% (0.73x) speedup for `levenshtein_distance` in `codeflash/discovery/functions_to_optimize.py`

📄 16% (0.16x) speedup for `parse_test_failures_from_stdout` in `codeflash/verification/parse_test_output.py`

⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #946

📄 655% (6.55x) speedup for `compare_test_results` in `codeflash/verification/equivalence.py`

⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

📄 45% (0.45x) speedup for `AiServiceClient._get_valid_candidates` in `codeflash/api/aiservice.py`

⚡️ Speed up method `AiServiceClient._get_valid_candidates` by 45% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #966