⚡️ Speed up function nested_to_record by 364%
#343
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 364% (3.64x) speedup for
nested_to_recordinpandas/io/json/_normalize.py⏱️ Runtime :
10.1 milliseconds→2.17 milliseconds(best of67runs)📝 Explanation and details
The key optimization in this code is replacing
copy.deepcopy(d)withdict(d)on line 67, which provides a dramatic performance improvement of 363% speedup.What changed:
copy.deepcopy(d)was replaced withdict(d)to create a shallow copy instead of a deep copycopymodule import was removed since it's no longer neededWhy this optimization works:
The original code used
deepcopyunnecessarily because the algorithm only mutates the top-level dictionary keys during flattening. When recursing into nested dictionaries, those nested values are completely popped from the parent and replaced with flattened key-value pairs. Since nested dictionaries are never modified in-place (only removed and replaced), a shallow copy is sufficient and much faster.deepcopyrecursively copies all nested objects, which is expensive for deeply nested structures.dict(d)only copies the top-level key-value pairs, leaving nested objects as references - exactly what's needed here.Performance impact based on test results:
Hot path considerations:
Based on the function references,
nested_to_recordis called fromjson_normalize, which is a primary pandas JSON processing function. It's used both directly for simple normalization and within recursive extraction for complex record paths. This optimization significantly benefits any JSON data processing workflows in pandas, especially those dealing with nested structures or large datasets.The optimization is particularly effective for the common pandas use case of flattening JSON data with moderate to deep nesting, where the original
deepcopyoverhead dominated execution time.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
io/json/test_normalize.py::TestNestedToRecord.test_donot_drop_nonevaluesio/json/test_normalize.py::TestNestedToRecord.test_flat_stays_flatio/json/test_normalize.py::TestNestedToRecord.test_nested_flattensio/json/test_normalize.py::TestNestedToRecord.test_nonetype_multiple_levelsio/json/test_normalize.py::TestNestedToRecord.test_nonetype_top_level_bottom_levelio/json/test_normalize.py::TestNestedToRecord.test_one_level_deep_flattensio/json/test_normalize.py::TestNestedToRecord.test_with_large_max_levelio/json/test_normalize.py::TestNestedToRecord.test_with_max_level🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-nested_to_record-mi3x6q78and push.