-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: skip bound checks in take native for better performance #9277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
run benchmark take |
|
🤖 Hi @rluvaton, thanks for the request (#9277 (comment)).
Please choose one or more of these with |
|
run benchmark take_kernels |
|
The failing tests are out of bounds access tests that is now undefined behavior rather than a panic |
| let index = index.as_usize(); | ||
| // Safety: we either checked already bounds (passed check_bounds = true) or the user | ||
| // guarantees the value to be in range. | ||
| // Avoiding bound checks allows the compiler to vectorize it and do better loop unrolling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just do this, as we get the indices from the user / other safe ckdd (and we don't check bounds by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I want first to merge:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can take the conversation about that there?
We can't just make it unsafe as it will make safe code UB. See also #8879 |
|
show benchmark queue |
|
🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).
|
I will add validation before and see what is the performance impact |
|
Show benchmark queue |
|
🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).
|
Which issue does this PR close?
None
Rationale for this change
Making take kernel faster for building native.
You can see that the compiler vectorize and unroll the loop in GodBolt when using unchecked
What changes are included in this PR?
use
get_uncheckedintake_nativeAre these changes tested?
Existing tests
Are there any user-facing changes?
Undefined behavior if out of range
See: