⚡️ Speed up function _leapfrog_step_body_tf by 589%
#1080
+3
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 589% (5.89x) speedup for
_leapfrog_step_body_tfincode_to_optimize/sample_code.py⏱️ Runtime :
573 milliseconds→83.2 milliseconds(best of28runs)📝 Explanation and details
The optimized code achieves a 588% speedup (from 573ms to 83.2ms) through two key optimizations:
Primary Optimizations
XLA Compilation (
@tf.function(jit_compile=True))_leapfrog_compute_accelerations_tfand_leapfrog_step_body_tftf.einsumfor Acceleration Calculationtf.reduce_sum(tf.expand_dims(force_factor, -1) * diff, axis=1)withtf.einsum('ij,ijk->ik', force_factor, diff)expand_dimsand allows XLA to optimize the contraction more effectivelyWhy This Works
The original line profiler shows that
_leapfrog_compute_accelerations_tfconsumed 94.7% of the total runtime in_leapfrog_step_body_tf. Within this function:tf.whereoperation took 51.9% of timetf.reduce_sumfor distance calculation took 33.8%XLA compilation dramatically reduces this overhead by:
Test Case Performance
All test cases show consistent 13-14x speedups (1300-1400% improvements), indicating the optimization is uniformly effective across:
The optimization particularly benefits scenarios with repeated calls (like the 50-step sequential test showing 245% speedup total), as XLA compilation overhead is amortized across multiple invocations.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-_leapfrog_step_body_tf-mkgou0lqand push.