-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Rollup of 5 pull requests #151727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 5 pull requests #151727
Conversation
Refactor the current functionality into a helper function Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function Add a codegen test Add benches for `eq_ignore_ascii_case` The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation. Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.
Refactor the eq check into an inner function for reuse in tail checking Rather than fall back to the simple implementation for tail handling, load the last 16 bytes to take advantage of vectorization. This doesn't seem to negatively impact check time even when the remainder count is low.
Add #[inline(always)] to inner function and check not for filecheck test
Add comments for the optimized function invariants to the caller Add const-hack fixme for using while-loops Document the invariant for the `_chunks` function Add a debug assert for the tail handling invariant
Remove copyright notices for files licensed under the standard terms (MIT OR Apache-2.0).
I tried removing it in rust-lang#151203, to replace it with something simpler. But a couple of fuzzing failures have come up and I don't have a clear picture on how to fix them. So I'm reverting the main part of rust-lang#151203. This commit also adds the two fuzzing tests. Fixes rust-lang#151226, rust-lang#151358.
They currently aren't used because r-a didn't support them, but r-a support was recently merged in rust-lang/rust-analyzer#21243.
…youxu Try to reduce rustdoc GUI tests flakyness Should help with rust-lang#93784. I replaced a use of `puppeteer.wait` function with a loop instead (like the rest of `browser-ui-test`). r? @jieyouxu
…=scottmcm
slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization
- Refactor the current functionality into a helper function
- Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
- Add a codegen test checking for vectorization and no panicking
- Add benches for `eq_ignore_ascii_case`
---
The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.
Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.
Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.
```
before:
str::eq_ignore_ascii_case::bench_large_str_eq 4942.30ns/iter +/- 48.20
str::eq_ignore_ascii_case::bench_medium_str_eq 632.01ns/iter +/- 16.87
str::eq_ignore_ascii_case::bench_str_17_bytes_eq 16.28ns/iter +/- 0.45
str::eq_ignore_ascii_case::bench_str_31_bytes_eq 35.23ns/iter +/- 2.28
str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.56ns/iter +/- 0.22
str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.64ns/iter +/- 0.06
after:
str::eq_ignore_ascii_case::bench_large_str_eq 611.63ns/iter +/- 28.29
str::eq_ignore_ascii_case::bench_medium_str_eq 77.10ns/iter +/- 19.76
str::eq_ignore_ascii_case::bench_str_17_bytes_eq 3.49ns/iter +/- 0.39
str::eq_ignore_ascii_case::bench_str_31_bytes_eq 3.50ns/iter +/- 0.27
str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.27ns/iter +/- 0.09
str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.60ns/iter +/- 0.05
```
Reintroduce `QueryStackFrame` split. I tried removing it in rust-lang#151203, to replace it with something simpler. But a couple of fuzzing failures have come up and I don't have a clear picture on how to fix them. So I'm reverting the main part of rust-lang#151203. This commit also adds the two fuzzing tests. Fixes rust-lang#151226, rust-lang#151358. r? @oli-obk
…ts-query-key, r=Noratrieb Use an associated type default for `Key::Cache`. They currently aren't used because r-a didn't support them, but r-a support was recently merged in rust-lang/rust-analyzer#21243. r? @Noratrieb
…jieyouxu Omit standard copyright notice Remove copyright notices for files licensed under the standard terms (MIT OR Apache-2.0).
|
Rollup of everything. @bors r+ rollup=never p=5 |
This comment has been minimized.
This comment has been minimized.
|
📌 Perf builds for each rolled up PR:
previous master: ebf13cca58 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing ebf13cc (parent) -> 78df2f9 (this PR) Test differencesShow 38 test diffsStage 1
Stage 2
Additionally, 18 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 78df2f92de1da3601d967dc8beb9f9cea267e45f --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (78df2f9): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.6%, secondary 2.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -0.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 472.178s -> 472.873s (0.15%) |
|
The regressions are very tiny and only on two secondary benchmarks, I don't think we have to investigate further. @rustbot label: +perf-regression-triaged |
Successful merges:
eq_ignore_ascii_casewith auto-vectorization #147436 (slice/ascii: Optimizeeq_ignore_ascii_casewith auto-vectorization)QueryStackFramesplit. #151390 (ReintroduceQueryStackFramesplit.)Key::Cache. #151097 (Use an associated type default forKey::Cache.)r? @ghost
Create a similar rollup