Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 19, 2025

📄 20% (0.20x) speedup for EvaluateEngine._check_any_pass in loom/engines/evaluate.py

⏱️ Runtime : 769 microseconds 639 microseconds (best of 42 runs)

📝 Explanation and details

The optimization achieves a 20% speedup by eliminating redundant attribute lookups within the loop through method localization.

Key Changes:

  • Localized record.evaluation_scores to avoid repeated attribute access on each iteration
  • Cached the .get() method as a local variable to eliminate method lookup overhead

Why This Works:
In Python, attribute access (like record.evaluation_scores.get) involves dictionary lookups in the object's __dict__ and method resolution. By storing these references as local variables before the loop, we convert expensive attribute/method lookups into fast local variable access during each iteration.

Performance Impact:
The line profiler shows the optimization is most effective with larger evaluator sets:

  • Small cases (1-3 evaluators): Mixed results, sometimes slightly slower due to setup overhead
  • Large cases (1000 evaluators): Consistent 13-33% improvements, with the best gains when all evaluators must be checked (missing scores: 33%, all fail: 20-25%)
  • Early termination cases: Still benefit (17-18% faster) since the localization overhead is minimal

Real-World Benefits:
This optimization is particularly valuable for evaluation engines processing many records with numerous evaluators, which is common in ML model evaluation pipelines. The consistent performance gains on large-scale test cases demonstrate this will meaningfully improve throughput in production workloads where evaluation latency directly impacts system performance.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 84 Passed
⏪ Replay Tests 2 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import logging

# Mocks for loom.core.models
from collections import namedtuple

# imports
import pytest

from loom.engines.evaluate import EvaluateEngine

# Mock EvaluateConfig, Record, and EvaluatorConfig to avoid external dependencies
EvaluatorConfig = namedtuple("EvaluatorConfig", ["name", "threshold"])
EvaluateConfig = namedtuple("EvaluateConfig", ["evaluators", "quality_gate", "timeout"])
Record = namedtuple("Record", ["evaluation_scores"])


# Dummy quality_gate and timeout (not used in _check_any_pass)
class DummyQualityGate:
    value = "dummy"


dummy_quality_gate = DummyQualityGate()
dummy_timeout = 10


logger = logging.getLogger(__name__)

# -------------------------
# Unit tests for _check_any_pass
# -------------------------


# Basic Test Cases
def test_single_evaluator_pass():
    """Single evaluator, score above threshold."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.7})
    codeflash_output = engine._check_any_pass(record)  # 1.41μs -> 1.34μs (5.37% faster)


def test_single_evaluator_fail():
    """Single evaluator, score below threshold."""
    evals = [EvaluatorConfig(name="e1", threshold=0.8)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.7})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.29μs -> 1.30μs (0.693% slower)


def test_multiple_evaluators_one_pass():
    """Multiple evaluators, one passes."""
    evals = [
        EvaluatorConfig(name="e1", threshold=0.5),
        EvaluatorConfig(name="e2", threshold=0.9),
        EvaluatorConfig(name="e3", threshold=0.3),
    ]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.2, "e2": 0.95, "e3": 0.1})
    codeflash_output = engine._check_any_pass(record)  # 1.42μs -> 1.41μs (1.14% faster)


def test_multiple_evaluators_none_pass():
    """Multiple evaluators, none pass."""
    evals = [
        EvaluatorConfig(name="e1", threshold=0.5),
        EvaluatorConfig(name="e2", threshold=0.9),
        EvaluatorConfig(name="e3", threshold=0.3),
    ]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.2, "e2": 0.8, "e3": 0.1})
    codeflash_output = engine._check_any_pass(record)  # 1.51μs -> 1.41μs (7.68% faster)


def test_multiple_evaluators_all_pass():
    """Multiple evaluators, all pass."""
    evals = [
        EvaluatorConfig(name="e1", threshold=0.5),
        EvaluatorConfig(name="e2", threshold=0.9),
        EvaluatorConfig(name="e3", threshold=0.3),
    ]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.6, "e2": 0.95, "e3": 0.5})
    codeflash_output = engine._check_any_pass(record)  # 1.10μs -> 1.21μs (9.17% slower)


# Edge Test Cases
def test_score_exactly_at_threshold():
    """Score exactly equals threshold."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.5})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.22μs -> 1.22μs (0.082% slower)


def test_evaluator_score_is_none():
    """Evaluator score is None (missing from record)."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={})
    codeflash_output = engine._check_any_pass(record)  # 1.10μs -> 1.04μs (6.15% faster)


def test_evaluator_score_is_explicit_none():
    """Evaluator score is explicitly None in the dict."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": None})
    codeflash_output = engine._check_any_pass(record)  # 1.04μs -> 1.08μs (3.53% slower)


def test_evaluator_score_is_negative():
    """Evaluator score is negative, threshold is zero."""
    evals = [EvaluatorConfig(name="e1", threshold=0.0)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": -0.1})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.19μs -> 1.19μs (0.590% faster)


def test_evaluator_score_is_large_negative():
    """Evaluator score is a large negative number."""
    evals = [EvaluatorConfig(name="e1", threshold=-100)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": -100})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.22μs -> 1.21μs (0.990% faster)


def test_evaluator_score_is_inf():
    """Evaluator score is infinity."""
    evals = [EvaluatorConfig(name="e1", threshold=1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": float("inf")})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.27μs -> 1.25μs (0.956% faster)


def test_evaluator_score_is_nan():
    """Evaluator score is NaN."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": float("nan")})
    # NaN comparisons always fail
    codeflash_output = engine._check_any_pass(record)  # 1.23μs -> 1.21μs (1.65% faster)


def test_empty_evaluators_list():
    """No evaluators configured."""
    evals = []
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 1.0})
    codeflash_output = engine._check_any_pass(record)  # 687ns -> 829ns (17.1% slower)


def test_record_has_extra_scores():
    """Record contains scores for evaluators not in config."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.4, "e2": 0.99})
    codeflash_output = engine._check_any_pass(record)  # 1.26μs -> 1.28μs (1.95% slower)


def test_threshold_is_zero():
    """Threshold is zero, score is zero."""
    evals = [EvaluatorConfig(name="e1", threshold=0.0)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.0})
    codeflash_output = engine._check_any_pass(
        record
    )  # 1.28μs -> 1.28μs (0.156% slower)


def test_threshold_is_negative():
    """Threshold is negative, score is zero."""
    evals = [EvaluatorConfig(name="e1", threshold=-1.0)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": 0.0})
    codeflash_output = engine._check_any_pass(record)  # 1.30μs -> 1.19μs (9.23% faster)


def test_score_is_non_numeric():
    """Score is a non-numeric value (should not pass)."""
    evals = [EvaluatorConfig(name="e1", threshold=0.5)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={"e1": "not_a_number"})
    # Should not raise, just not pass (comparison fails)
    with pytest.raises(TypeError):
        engine._check_any_pass(record)  # 2.76μs -> 2.67μs (3.25% faster)


# Large Scale Test Cases
def test_many_evaluators_all_fail():
    """Large number of evaluators, none pass."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.99) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    scores = {f"e{i}": 0.5 for i in range(1000)}
    record = Record(evaluation_scores=scores)
    codeflash_output = engine._check_any_pass(record)  # 92.1μs -> 76.3μs (20.7% faster)


def test_many_evaluators_one_pass_first():
    """Large number of evaluators, first one passes."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    scores = {f"e{i}": 0.4 for i in range(1000)}
    scores["e0"] = 0.6  # First passes
    record = Record(evaluation_scores=scores)
    codeflash_output = engine._check_any_pass(record)  # 1.82μs -> 1.72μs (5.82% faster)


def test_many_evaluators_one_pass_last():
    """Large number of evaluators, last one passes."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    scores = {f"e{i}": 0.4 for i in range(1000)}
    scores["e999"] = 0.6  # Last passes
    record = Record(evaluation_scores=scores)
    codeflash_output = engine._check_any_pass(record)  # 91.2μs -> 77.2μs (18.1% faster)


def test_many_evaluators_some_missing_scores():
    """Large number of evaluators, some missing scores, one passes."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    scores = {f"e{i}": 0.4 for i in range(990)}  # 10 missing
    scores["e500"] = 0.7  # One passes
    record = Record(evaluation_scores=scores)
    codeflash_output = engine._check_any_pass(record)  # 45.4μs -> 39.3μs (15.6% faster)


def test_many_evaluators_all_missing_scores():
    """Large number of evaluators, all scores missing."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    record = Record(evaluation_scores={})
    codeflash_output = engine._check_any_pass(record)  # 62.8μs -> 47.3μs (32.9% faster)


def test_many_evaluators_all_scores_none():
    """Large number of evaluators, all scores explicitly None."""
    evals = [EvaluatorConfig(name=f"e{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(evals, dummy_quality_gate, dummy_timeout)
    engine = EvaluateEngine(config)
    scores = {f"e{i}": None for i in range(1000)}
    record = Record(evaluation_scores=scores)
    codeflash_output = engine._check_any_pass(record)  # 76.3μs -> 60.6μs (25.8% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from dataclasses import dataclass
from typing import Dict, List

# imports
from loom.engines.evaluate import EvaluateEngine


# Minimal stubs for EvaluateConfig, Record, and EvaluatorConfig for testing
@dataclass
class EvaluatorConfig:
    name: str
    threshold: float


@dataclass
class EvaluateConfig:
    evaluators: List[EvaluatorConfig]
    quality_gate: object  # Not used in _check_any_pass
    timeout: int  # Not used in _check_any_pass


@dataclass
class Record:
    evaluation_scores: Dict[str, float]


# --- Begin: unit tests ---
# Helper for dummy quality_gate
class DummyQualityGate:
    value = "dummy"


# Basic Test Cases
def test_single_evaluator_pass():
    """Single evaluator, score above threshold should pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.5)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.7})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.28μs -> 1.32μs (3.18% slower)


def test_single_evaluator_fail():
    """Single evaluator, score below threshold should fail."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.8)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.7})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.09μs -> 1.21μs (9.86% slower)


def test_multiple_evaluators_one_pass():
    """Multiple evaluators, one score above threshold should pass."""
    config = EvaluateConfig(
        evaluators=[
            EvaluatorConfig(name="eval1", threshold=0.8),
            EvaluatorConfig(name="eval2", threshold=0.5),
            EvaluatorConfig(name="eval3", threshold=0.9),
        ],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.7, "eval2": 0.6, "eval3": 0.4})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.37μs -> 1.40μs (2.29% slower)


def test_multiple_evaluators_none_pass():
    """Multiple evaluators, all scores below threshold should fail."""
    config = EvaluateConfig(
        evaluators=[
            EvaluatorConfig(name="eval1", threshold=0.8),
            EvaluatorConfig(name="eval2", threshold=0.5),
            EvaluatorConfig(name="eval3", threshold=0.9),
        ],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.7, "eval2": 0.4, "eval3": 0.8})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.39μs -> 1.31μs (6.66% faster)


def test_multiple_evaluators_multiple_pass():
    """Multiple evaluators, more than one score above threshold should pass."""
    config = EvaluateConfig(
        evaluators=[
            EvaluatorConfig(name="eval1", threshold=0.5),
            EvaluatorConfig(name="eval2", threshold=0.5),
            EvaluatorConfig(name="eval3", threshold=0.5),
        ],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.6, "eval2": 0.7, "eval3": 0.4})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 984ns -> 1.13μs (12.6% slower)


# Edge Test Cases
def test_score_exactly_at_threshold():
    """Score exactly at threshold should pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.5)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.5})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.10μs -> 1.19μs (7.00% slower)


def test_missing_evaluator_score():
    """Evaluator listed but score missing from record should not pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.5)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={})  # No score for eval1
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 929ns -> 1.04μs (10.5% slower)


def test_score_is_none():
    """Evaluator score explicitly set to None should not pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.5)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": None})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 977ns -> 1.04μs (6.15% slower)


def test_empty_evaluators_list():
    """No evaluators configured should not pass."""
    config = EvaluateConfig(evaluators=[], quality_gate=DummyQualityGate(), timeout=10)
    record = Record(evaluation_scores={"eval1": 0.9})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 539ns -> 812ns (33.6% slower)


def test_evaluator_with_negative_threshold_and_score():
    """Negative threshold and negative score, score above threshold should pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=-1.0)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": -0.5})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.11μs -> 1.19μs (6.56% slower)


def test_evaluator_with_zero_threshold():
    """Zero threshold, score zero should pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.0)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.0})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.08μs -> 1.16μs (6.72% slower)


def test_evaluator_with_high_threshold_and_score():
    """Very high threshold, score just below threshold should not pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=1e6)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 1e6 - 1})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.09μs -> 1.16μs (6.29% slower)


def test_evaluator_with_high_threshold_and_score_pass():
    """Very high threshold, score exactly at threshold should pass."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=1e6)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 1e6})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.06μs -> 1.15μs (7.69% slower)


def test_evaluator_with_float_precision():
    """Test floating point precision edge case."""
    config = EvaluateConfig(
        evaluators=[EvaluatorConfig(name="eval1", threshold=0.30000000000000004)],
        quality_gate=DummyQualityGate(),
        timeout=10,
    )
    record = Record(evaluation_scores={"eval1": 0.3})
    engine = EvaluateEngine(config)
    # 0.3 < 0.30000000000000004 due to floating point, so should fail
    codeflash_output = engine._check_any_pass(record)  # 1.06μs -> 1.15μs (7.57% slower)


# Large Scale Test Cases
def test_many_evaluators_none_pass():
    """1000 evaluators, none pass."""
    evaluators = [EvaluatorConfig(name=f"eval{i}", threshold=1.0) for i in range(1000)]
    config = EvaluateConfig(
        evaluators=evaluators, quality_gate=DummyQualityGate(), timeout=10
    )
    # All scores below threshold
    record = Record(evaluation_scores={f"eval{i}": 0.5 for i in range(1000)})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 109μs -> 96.7μs (13.6% faster)


def test_many_evaluators_one_pass():
    """1000 evaluators, one passes."""
    evaluators = [EvaluatorConfig(name=f"eval{i}", threshold=1.0) for i in range(1000)]
    config = EvaluateConfig(
        evaluators=evaluators, quality_gate=DummyQualityGate(), timeout=10
    )
    # Only eval500 passes
    scores = {f"eval{i}": 0.5 for i in range(1000)}
    scores["eval500"] = 1.2
    record = Record(evaluation_scores=scores)
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 55.5μs -> 47.0μs (17.9% faster)


def test_many_evaluators_all_pass():
    """1000 evaluators, all pass."""
    evaluators = [EvaluatorConfig(name=f"eval{i}", threshold=1.0) for i in range(1000)]
    config = EvaluateConfig(
        evaluators=evaluators, quality_gate=DummyQualityGate(), timeout=10
    )
    record = Record(evaluation_scores={f"eval{i}": 2.0 for i in range(1000)})
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 1.77μs -> 1.74μs (2.07% faster)


def test_many_evaluators_random_scores():
    """1000 evaluators, random scores, some pass, some fail."""
    import random

    random.seed(42)
    evaluators = [EvaluatorConfig(name=f"eval{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(
        evaluators=evaluators, quality_gate=DummyQualityGate(), timeout=10
    )
    scores = {f"eval{i}": random.random() for i in range(1000)}
    record = Record(evaluation_scores=scores)
    engine = EvaluateEngine(config)
    # At least one score will be >= 0.5 due to random distribution
    codeflash_output = engine._check_any_pass(record)  # 119μs -> 95.0μs (25.7% faster)


def test_large_record_missing_scores():
    """1000 evaluators, all missing scores."""
    evaluators = [EvaluatorConfig(name=f"eval{i}", threshold=0.5) for i in range(1000)]
    config = EvaluateConfig(
        evaluators=evaluators, quality_gate=DummyQualityGate(), timeout=10
    )
    record = Record(evaluation_scores={})  # No scores for any evaluator
    engine = EvaluateEngine(config)
    codeflash_output = engine._check_any_pass(record)  # 72.5μs -> 54.5μs (33.2% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
⏪ Replay Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pytest_testsunittest_evaluate_engine_py_testsunittest_transform_engine_py_testsunittest_extract_engi__replay_test_0.py::test_loom_engines_evaluate_EvaluateEngine__check_any_pass 3.31μs 3.17μs 4.61%✅

To edit these changes git checkout codeflash/optimize-EvaluateEngine._check_any_pass-mi6lbmt3 and push.

Codeflash Static Badge

The optimization achieves a **20% speedup** by eliminating redundant attribute lookups within the loop through **method localization**. 

**Key Changes:**
- **Localized `record.evaluation_scores`** to avoid repeated attribute access on each iteration
- **Cached the `.get()` method** as a local variable to eliminate method lookup overhead

**Why This Works:**
In Python, attribute access (like `record.evaluation_scores.get`) involves dictionary lookups in the object's `__dict__` and method resolution. By storing these references as local variables before the loop, we convert expensive attribute/method lookups into fast local variable access during each iteration.

**Performance Impact:**
The line profiler shows the optimization is most effective with larger evaluator sets:
- **Small cases (1-3 evaluators)**: Mixed results, sometimes slightly slower due to setup overhead
- **Large cases (1000 evaluators)**: Consistent 13-33% improvements, with the best gains when all evaluators must be checked (missing scores: 33%, all fail: 20-25%)
- **Early termination cases**: Still benefit (17-18% faster) since the localization overhead is minimal

**Real-World Benefits:**
This optimization is particularly valuable for evaluation engines processing many records with numerous evaluators, which is common in ML model evaluation pipelines. The consistent performance gains on large-scale test cases demonstrate this will meaningfully improve throughput in production workloads where evaluation latency directly impacts system performance.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 19, 2025 22:45
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant