Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 18, 2025

📄 148% (1.48x) speedup for any_none in pandas/core/common.py

⏱️ Runtime : 238 microseconds 96.1 microseconds (best of 123 runs)

📝 Explanation and details

The optimization replaces a generator expression with the in operator to check for None values. The original code any(arg is None for arg in args) creates a generator that iterates through each argument, checking if it's None, then applies the any() builtin. The optimized version None in args directly uses Python's optimized in operator on the tuple of arguments.

Key optimization: The in operator for tuples is implemented in C and uses optimized comparison logic, avoiding the overhead of creating a generator object and Python-level iteration. When None is found, both approaches short-circuit, but the optimized version does so at the C level rather than Python level.

Performance characteristics: The test results show consistent 60-190% speedups across all scenarios. The optimization is particularly effective for:

  • Early None detection (182% faster when None is first argument) - in operator finds it immediately
  • Large datasets without None (238-243% faster) - C-level iteration is much faster than generator overhead
  • No arguments case (117% faster) - avoids generator creation entirely

Impact on workloads: Based on the function reference, any_none() is called in pandas' date_range() function for parameter validation. Since date_range() is a frequently used pandas function, this micro-optimization will provide measurable performance improvements in data analysis workflows that create many date ranges. The speedup is most beneficial when users pass multiple parameters to date_range(), as the validation check runs faster.

Test case suitability: The optimization performs well across all test scenarios, with the largest gains in cases with many arguments or when None appears early/late in the argument list, making it broadly applicable.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 14 Passed
🌀 Generated Regression Tests 68 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_common.py::test_any_none 2.13μs 907ns 135%✅
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from pandas.core.common import any_none

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------


def test_all_non_none():
    # All arguments are non-None; should return False
    codeflash_output = any_none(
        1, "hello", [3, 4], {}
    )  # 1.16μs -> 726ns (59.2% faster)


def test_one_none_at_start():
    # First argument is None; should return True
    codeflash_output = any_none(None, 2, 3)  # 1.27μs -> 450ns (182% faster)


def test_one_none_at_end():
    # Last argument is None; should return True
    codeflash_output = any_none(2, 3, None)  # 1.30μs -> 517ns (151% faster)


def test_one_none_in_middle():
    # Middle argument is None; should return True
    codeflash_output = any_none(2, None, 3)  # 1.24μs -> 428ns (190% faster)


def test_multiple_nones():
    # Multiple None arguments; should return True
    codeflash_output = any_none(None, None, None)  # 1.13μs -> 414ns (173% faster)


def test_no_arguments():
    # No arguments passed; should return False
    codeflash_output = any_none()  # 879ns -> 406ns (117% faster)


def test_mixed_types_with_none():
    # Mix of types, one is None; should return True
    codeflash_output = any_none(
        0, False, [], None, "string"
    )  # 1.37μs -> 606ns (126% faster)


def test_mixed_types_without_none():
    # Mix of types, none is None; should return False
    codeflash_output = any_none(
        0, False, [], "string"
    )  # 1.10μs -> 606ns (82.0% faster)


# ------------------------
# Edge Test Cases
# ------------------------


def test_single_none():
    # Only one argument, which is None; should return True
    codeflash_output = any_none(None)  # 1.21μs -> 486ns (148% faster)


def test_single_non_none():
    # Only one argument, which is not None; should return False
    codeflash_output = any_none(0)  # 1.02μs -> 542ns (88.2% faster)


def test_none_vs_false_vs_empty():
    # None, False, 0, "", [], {}, () - only None should trigger True
    codeflash_output = any_none(
        False, 0, "", [], {}, ()
    )  # 1.11μs -> 809ns (37.6% faster)
    codeflash_output = any_none(
        False, 0, "", [], {}, (), None
    )  # 1.04μs -> 435ns (138% faster)


def test_nested_none():
    # None inside a container should NOT trigger True
    codeflash_output = any_none(
        [None], {"a": None}, ("None",), {"x": [None]}
    )  # 1.07μs -> 649ns (64.6% faster)


def test_none_in_set():
    # None as an element of a set should NOT trigger True
    codeflash_output = any_none({None})  # 1.00μs -> 596ns (67.8% faster)


def test_none_in_dict_key_value():
    # None as a key or value in a dict should NOT trigger True
    codeflash_output = any_none(
        {"key": None, None: "value"}
    )  # 962ns -> 590ns (63.1% faster)


def test_none_in_string():
    # The string "None" is not the same as None
    codeflash_output = any_none("None")  # 1.02μs -> 540ns (88.1% faster)


def test_none_in_bytes():
    # The bytes object b'None' is not None
    codeflash_output = any_none(b"None")  # 1.01μs -> 564ns (79.1% faster)


def test_none_in_custom_object():
    # Custom object with None as an attribute should NOT trigger True
    class Dummy:
        def __init__(self):
            self.value = None

    dummy = Dummy()
    codeflash_output = any_none(dummy)  # 988ns -> 532ns (85.7% faster)


def test_none_in_lambda():
    # Lambda returning None should NOT trigger True
    f = lambda: None
    codeflash_output = any_none(f)  # 975ns -> 500ns (95.0% faster)


# ------------------------
# Large Scale Test Cases
# ------------------------


def test_large_all_non_none():
    # Large number of non-None arguments; should return False
    args = [i for i in range(1000)]
    codeflash_output = any_none(*args)  # 21.9μs -> 6.59μs (232% faster)


def test_large_one_none_at_start():
    # Large number of arguments, first is None; should return True
    args = [None] + [i for i in range(999)]
    codeflash_output = any_none(*args)  # 3.04μs -> 2.13μs (42.7% faster)


def test_large_one_none_at_end():
    # Large number of arguments, last is None; should return True
    args = [i for i in range(999)] + [None]
    codeflash_output = any_none(*args)  # 22.3μs -> 6.50μs (243% faster)


def test_large_one_none_in_middle():
    # Large number of arguments, one None in the middle; should return True
    args = [i for i in range(500)] + [None] + [i for i in range(499)]
    codeflash_output = any_none(*args)  # 12.6μs -> 4.36μs (189% faster)


def test_large_multiple_nones():
    # Large number of arguments, multiple Nones; should return True
    args = [None if i % 100 == 0 else i for i in range(1000)]
    codeflash_output = any_none(*args)  # 2.94μs -> 2.15μs (36.7% faster)


def test_large_all_none():
    # All arguments are None; should return True
    args = [None] * 1000
    codeflash_output = any_none(*args)  # 3.08μs -> 2.21μs (39.5% faster)


def test_large_all_falsey_but_not_none():
    # Many falsey values but no None; should return False
    args = [0, "", [], {}, (), False] * 166  # 996 elements
    args += [0, "", [], {}, (), False]  # 1002 elements, but all non-None
    codeflash_output = any_none(*args)  # 21.7μs -> 7.52μs (189% faster)


# ------------------------
# Additional Robustness Cases
# ------------------------


def test_args_are_none_type_object():
    # Passing the NoneType type itself, not the value None
    codeflash_output = any_none(type(None))  # 1.05μs -> 547ns (92.1% faster)


def test_args_are_none_string():
    # Passing the string "None" should not trigger True
    codeflash_output = any_none("None")  # 990ns -> 522ns (89.7% faster)


def test_args_are_none_bytes():
    # Passing bytes b'None' should not trigger True
    codeflash_output = any_none(b"None")  # 967ns -> 544ns (77.8% faster)


def test_args_are_none_in_tuple():
    # Passing a tuple containing None should not trigger True
    codeflash_output = any_none((None,))  # 954ns -> 541ns (76.3% faster)


def test_args_are_none_in_list():
    # Passing a list containing None should not trigger True
    codeflash_output = any_none([None])  # 979ns -> 512ns (91.2% faster)


def test_args_are_none_in_dict():
    # Passing a dict containing None as value should not trigger True
    codeflash_output = any_none({"a": None})  # 959ns -> 654ns (46.6% faster)


def test_args_are_none_in_set():
    # Passing a set containing None should not trigger True
    codeflash_output = any_none({None})  # 967ns -> 581ns (66.4% faster)


def test_args_are_none_in_frozenset():
    # Passing a frozenset containing None should not trigger True
    codeflash_output = any_none(frozenset([None]))  # 919ns -> 517ns (77.8% faster)


def test_args_are_none_in_complex_structure():
    # Passing a complex structure containing None should not trigger True
    complex_structure = [{"a": [None, 2]}, (None, 3), {"b": None}]
    codeflash_output = any_none(complex_structure)  # 967ns -> 532ns (81.8% faster)


def test_args_are_none_as_keyword_argument():
    # Should work with keyword arguments as well
    # But any_none only takes positional arguments, so keyword arguments are not accepted
    # This test is to confirm that passing keyword arguments raises TypeError
    with pytest.raises(TypeError):
        any_none(a=None, b=1)  # 1.18μs -> 1.24μs (5.08% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
from pandas.core.common import any_none

# unit tests

# -------------------------
# 1. Basic Test Cases
# -------------------------


def test_no_arguments_returns_false():
    # No arguments: should return False (since there is no None)
    codeflash_output = any_none()  # 960ns -> 374ns (157% faster)


def test_all_non_none_arguments():
    # All arguments are non-None
    codeflash_output = any_none(
        1, "a", 0, False, [], {}, set()
    )  # 1.19μs -> 862ns (38.4% faster)


def test_one_none_among_others():
    # One argument is None
    codeflash_output = any_none(1, None, "test")  # 1.30μs -> 502ns (160% faster)


def test_multiple_nones():
    # Multiple arguments are None
    codeflash_output = any_none(None, None, 3)  # 1.17μs -> 434ns (170% faster)


def test_none_as_only_argument():
    # Only argument is None
    codeflash_output = any_none(None)  # 1.14μs -> 468ns (144% faster)


def test_first_argument_none():
    # First argument is None
    codeflash_output = any_none(None, 1, 2, 3)  # 1.10μs -> 456ns (142% faster)


def test_last_argument_none():
    # Last argument is None
    codeflash_output = any_none(1, 2, 3, None)  # 1.32μs -> 529ns (150% faster)


def test_middle_argument_none():
    # Middle argument is None
    codeflash_output = any_none(1, "x", None, "y")  # 1.25μs -> 512ns (144% faster)


# -------------------------
# 2. Edge Test Cases
# -------------------------


def test_all_arguments_none():
    # All arguments are None
    codeflash_output = any_none(None, None, None)  # 1.12μs -> 444ns (153% faster)


def test_zero_and_false_are_not_none():
    # 0 and False are not None
    codeflash_output = any_none(0, False, "")  # 1.06μs -> 540ns (95.9% faster)


def test_empty_collections_are_not_none():
    # Empty collections are not None
    codeflash_output = any_none([], {}, set(), "")  # 1.08μs -> 785ns (37.2% faster)


def test_mixed_types_with_none():
    # Mixed types, including None
    codeflash_output = any_none(0, [], None, {}, False)  # 1.29μs -> 546ns (136% faster)


def test_nested_none_not_detected():
    # None inside a list is not detected
    codeflash_output = any_none([None])  # 1.03μs -> 563ns (82.2% faster)


def test_none_in_dict_value_not_detected():
    # None as a value inside a dict is not detected
    codeflash_output = any_none({"a": None})  # 992ns -> 601ns (65.1% faster)


def test_none_in_tuple_argument():
    # None inside a tuple is not detected
    codeflash_output = any_none((None,))  # 987ns -> 522ns (89.1% faster)


def test_object_with_custom_eq_none():
    # Object whose __eq__ returns True for None, but is not None
    class Weird:
        def __eq__(self, other):
            return other is None

    w = Weird()
    codeflash_output = any_none(w)  # 1.26μs -> 921ns (36.6% faster)


def test_nan_is_not_none():
    # float('nan') is not None
    codeflash_output = any_none(float("nan"))  # 1.07μs -> 620ns (72.6% faster)


def test_bool_none_is_false():
    # bool(None) is False, but None is detected
    codeflash_output = any_none(None, False)  # 1.30μs -> 459ns (183% faster)


def test_bytearray_and_bytes_are_not_none():
    # bytearray and bytes are not None
    codeflash_output = any_none(b"", bytearray())  # 1.05μs -> 677ns (55.7% faster)


def test_none_in_args_only():
    # None only in args
    codeflash_output = any_none(1, 2, None)  # 1.67μs -> 624ns (167% faster)


def test_none_with_various_falsey_values():
    # None among various falsey values
    codeflash_output = any_none(
        0, False, None, "", [], {}
    )  # 1.42μs -> 540ns (163% faster)


# -------------------------
# 3. Large Scale Test Cases
# -------------------------


def test_large_no_none():
    # Large number of arguments, none are None
    args = [i for i in range(1000)]
    codeflash_output = any_none(*args)  # 22.1μs -> 6.53μs (238% faster)


def test_large_with_one_none_at_start():
    # Large number of arguments, first is None
    args = [None] + [i for i in range(999)]
    codeflash_output = any_none(*args)  # 3.02μs -> 2.24μs (35.0% faster)


def test_large_with_one_none_at_end():
    # Large number of arguments, last is None
    args = [i for i in range(999)] + [None]
    codeflash_output = any_none(*args)  # 22.2μs -> 6.51μs (242% faster)


def test_large_with_one_none_in_middle():
    # Large number of arguments, one None in the middle
    args = [i for i in range(500)] + [None] + [i for i in range(499)]
    codeflash_output = any_none(*args)  # 12.5μs -> 4.33μs (188% faster)


def test_large_many_nones():
    # Large number of arguments, many are None
    args = [None if i % 100 == 0 else i for i in range(1000)]
    codeflash_output = any_none(*args)  # 2.83μs -> 2.14μs (32.2% faster)


def test_large_all_none():
    # Large number of arguments, all are None
    args = [None] * 1000
    codeflash_output = any_none(*args)  # 2.75μs -> 2.61μs (5.29% faster)


def test_large_with_no_none_and_falsey_values():
    # Large number of arguments, all falsey but not None
    args = [0, False, "", [], {}, set()] * 166  # 996 elements
    args += [0, False, ""]  # 999 elements
    codeflash_output = any_none(*args)  # 21.8μs -> 8.22μs (165% faster)


# -------------------------
# Additional edge cases for robustness
# -------------------------


def test_unicode_and_none():
    # Unicode strings and None
    codeflash_output = any_none("αβγ", None, "δεζ")  # 1.25μs -> 484ns (158% faster)


def test_bytes_and_none():
    # Bytes and None
    codeflash_output = any_none(b"hello", None)  # 1.19μs -> 495ns (140% faster)


def test_object_is_none():
    # Passing None as an object
    obj = None
    codeflash_output = any_none(obj)  # 1.49μs -> 523ns (185% faster)


def test_object_not_none():
    # Passing a non-None object
    obj = object()
    codeflash_output = any_none(obj)  # 1.07μs -> 576ns (85.8% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-any_none-mi3z7b53 and push.

Codeflash Static Badge

The optimization replaces a generator expression with the `in` operator to check for None values. The original code `any(arg is None for arg in args)` creates a generator that iterates through each argument, checking if it's None, then applies the `any()` builtin. The optimized version `None in args` directly uses Python's optimized `in` operator on the tuple of arguments.

**Key optimization**: The `in` operator for tuples is implemented in C and uses optimized comparison logic, avoiding the overhead of creating a generator object and Python-level iteration. When None is found, both approaches short-circuit, but the optimized version does so at the C level rather than Python level.

**Performance characteristics**: The test results show consistent 60-190% speedups across all scenarios. The optimization is particularly effective for:
- **Early None detection** (182% faster when None is first argument) - `in` operator finds it immediately
- **Large datasets without None** (238-243% faster) - C-level iteration is much faster than generator overhead
- **No arguments case** (117% faster) - avoids generator creation entirely

**Impact on workloads**: Based on the function reference, `any_none()` is called in pandas' `date_range()` function for parameter validation. Since `date_range()` is a frequently used pandas function, this micro-optimization will provide measurable performance improvements in data analysis workflows that create many date ranges. The speedup is most beneficial when users pass multiple parameters to `date_range()`, as the validation check runs faster.

**Test case suitability**: The optimization performs well across all test scenarios, with the largest gains in cases with many arguments or when None appears early/late in the argument list, making it broadly applicable.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 18, 2025 02:50
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant