⚡️ Speed up method ViTSTR.compute_loss by 6%
#4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 6% (0.06x) speedup for
ViTSTR.compute_lossindoctr/models/recognition/vitstr/pytorch.py⏱️ Runtime :
4.25 milliseconds→4.03 milliseconds(best of183runs)📝 Explanation and details
The optimized code achieves a 5% speedup through three key improvements:
What optimizations were applied:
seq_len_instead of modifying the inputseq_lentensor in-place, preventing potential memory allocation overhead from tensor mutationcce[mask_2d] = 0withcce.masked_fill_(mask_2d, 0), which uses PyTorch's optimized in-place masking operationrow_range = torch.arange(...)andmask_2d = row_range.unsqueeze(0) >= seq_len_.unsqueeze(1)to avoid repeated tensor indexing operationsWhy these optimizations work:
masked_fill_operation is a specialized PyTorch kernel that's faster than general tensor assignment for zeroing masked elementsPerformance characteristics:
The optimizations show consistent 6-15% improvements across varied sequence lengths and batch sizes, with particularly strong gains on:
The changes preserve all original behavior and error handling while delivering measurable performance gains across the full range of typical ViTSTR loss computation scenarios.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ViTSTR.compute_loss-mg7j7c4hand push.