⚡️ Speed up function merge_strings by 17%
#1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 17% (0.17x) speedup for
merge_stringsindoctr/models/recognition/utils.py⏱️ Runtime :
7.12 milliseconds→6.09 milliseconds(best of208runs)📝 Explanation and details
The optimized code achieves a 16% speedup through several targeted micro-optimizations:
Key Performance Improvements:
Eliminated expensive list comprehension for Hamming distance calculation: The original code used a list comprehension that called
Hamming.distance()for every potential overlap (37.8% of total time). The optimized version replaces this with a manual loop that includes string equality checks before calling Hamming, avoiding expensive distance calculations when strings are identical.Pre-cached string lengths: Added
len_a_cropandlen_b_cropvariables to avoid repeatedlen()calls during substring operations.Replaced lambda-based
min()with manual loop: The original code usedmin(zero_matches, key=lambda x: abs(x - expected_overlap))which was expensive (13.7% of time). The optimized version uses a simple loop to find the minimum, eliminating function call overhead.Optimized final scoring loop: Instead of creating a
combined_scoreslist and then finding its minimum index (5.9% of time), the optimized code scans through scores once, tracking the best score and index directly.Manual zero-matches collection: Replaced list comprehension for finding zero scores with a manual loop and
append(), reducing overhead.Performance Characteristics by Test Case:
The optimizations are most effective for cases with longer strings and repeated character patterns, where the string equality checks can bypass expensive Hamming distance calculations.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
common/test_models_recognition_utils.py::test_merge_strings🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-merge_strings-mg7ihvjwand push.