Speedup Lut-gemm for issue #30 #31

5000user5000 · 2025-05-07T15:00:24Z

Based on the investigation in ISSUE #30 , it was found that LUT GEMM suffers from high L1 cache miss rates, making memory latency the performance bottleneck rather than computation. Several optimizations were applied, such as prefetching the LUT into L1 cache and tiling the matrices to avoid loading overly large chunks that cause cache misses. Additionally, multithreading was introduced to further improve performance. The original naive implementation was also updated to support multithreading, making it easier to compare performance under a fixed thread count.
Lastly, tests/test_matrix_ops.cpp was added to help dump the assembly of both the naive (float) and LUT GEMM versions for comparison.

…nd LUT versions

5000user5000 · 2025-05-07T15:25:01Z

@yungyuc
I've already looked into Monday's request — identifying why the LUT implementation performs similarly to, or even slightly worse than, the naive version. In this PR, I've also made efforts to mitigate some of LUT's drawbacks, such as excessive L1 cache misses.

yungyuc · 2025-05-07T23:51:27Z

Good findings. When adding your findings into the presentation, make sure you address all the assessing points for grading.

Where in the code you want my review? Please make inline annotation.

Don't hesitate to merge the PR. Review and discussions still work after merging.

5000user5000 added 3 commits May 7, 2025 18:43

for analysis naive and lut gemm

3926b26

perf: add parallel SIMD implementation with blocking for both float a…

63366b3

…nd LUT versions

remove redundant variable references

f4664fc

5000user5000 merged commit f97df84 into main May 8, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup Lut-gemm for issue #30 #31

Speedup Lut-gemm for issue #30 #31

Uh oh!

5000user5000 commented May 7, 2025

Uh oh!

5000user5000 commented May 7, 2025

Uh oh!

yungyuc commented May 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Speedup Lut-gemm for issue #30 #31

Speedup Lut-gemm for issue #30 #31

Uh oh!

Conversation

5000user5000 commented May 7, 2025

Uh oh!

5000user5000 commented May 7, 2025

Uh oh!

yungyuc commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yungyuc commented May 7, 2025 •

edited

Loading