-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Rollup of 6 pull requests #151667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 6 pull requests #151667
Conversation
Per my understanding, needed for mut access next line.
Use explicit SSE2 intrinsics to avoid LLVM's broken AVX-512 auto-vectorization which generates ~31 kshiftrd instructions. Performance - AVX-512: 34-48x faster - SSE2: 1.5-2x faster Improves on earlier pr
The SSE2 helper function is not inlined across crate boundaries, so we cannot verify the codegen in an assembly test. The fix is still verified by the absence of performance regression.
…tests by default A `codegen-llvm` test (and other codegen test mode tests) will now by default have an implied `//@ needs-target-std` directive, *unless* the test explicitly has an `#![no_std]`/`#![no_core]` attribute which disables this implied behavior. - When a test has both `#![no_std]`/`#![no_core]` and `//@ needs-target-std`, the explicit `//@ needs-target-std` directive will cause the test to be ignored for targets that do not support std still. This is to make it easier to test out-of-tree targets / custom targets (and targets not tested in r-l/r CI) without requiring target maintainers to do a bunch of manual `//@ needs-target-std` busywork. Co-authored-by: Edoardo Marangoni <ecmm@anche.no>
…erformance, r=folkertdev Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics # Summary Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs. AVX-512 keeps similiar performance characteristics. This is building on the work already merged in rust-lang#151259. In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore. Thanks to @folkertdev for pointing me to consider `as_chunk` again. # The implementation: - Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together - Extracts the MSB mask with a single `pmovmskb` instruction - Falls back to usize-at-a-time SWAR for inputs < 64 bytes # Performance impact (vs before rust-lang#151259): - AVX-512: 34-48x faster - SSE2: 1.5-2x faster <details> <summary>Benchmark Results (click to expand)</summary> Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest). Tops out at 139GB/s for large inputs. ### early_non_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 64 | 1.01 | **1.00** | 13.45 | 1.13 | | 1024 | 1.01 | **1.00** | 13.53 | 1.14 | | 65536 | 1.01 | **1.00** | 13.99 | 1.12 | | 1048576 | 1.02 | **1.00** | 13.29 | 1.12 | ### late_non_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 64 | **1.00** | 1.01 | 13.37 | 1.13 | | 1024 | 1.10 | **1.00** | 42.42 | 1.95 | | 65536 | **1.00** | 1.06 | 42.22 | 1.73 | | 1048576 | **1.00** | 1.03 | 34.73 | 1.46 | ### pure_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 4 | 1.03 | **1.00** | 1.75 | 1.32 | | 8 | **1.00** | 1.14 | 3.89 | 2.06 | | 16 | **1.00** | 1.04 | 1.13 | 1.62 | | 32 | 1.07 | 1.19 | 5.11 | **1.00** | | 64 | **1.00** | 1.13 | 13.32 | 1.57 | | 128 | **1.00** | 1.01 | 19.97 | 1.55 | | 256 | **1.00** | 1.02 | 27.77 | 1.61 | | 1024 | **1.00** | 1.02 | 41.34 | 1.84 | | 4096 | 1.02 | **1.00** | 45.61 | 1.98 | | 16384 | 1.01 | **1.00** | 48.67 | 2.04 | | 65536 | **1.00** | 1.03 | 43.86 | 1.77 | | 262144 | **1.00** | 1.06 | 41.44 | 1.79 | | 1048576 | 1.02 | **1.00** | 35.36 | 1.44 | </details> ## Reproduction / Test Projects Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison - `fuzz/` - Compares old/new implementations with libfuzzer Relates to: llvm/llvm-project#176906
…r=joboet Add missing mut to pin.rs docs Per my understanding, needed for mut access next line.
…=Zalathar compiletest: add implied `needs-target-std` for `codegen` mode tests unless annotated with `#![no_std]`/`#![no_core]` A `codegen` mode test (such as `codegen-llvm` test suite) will now by default have an implied `//@ needs-target-std` directive, *unless* the test explicitly has an `#![no_std]`/`#![no_core]` attribute which disables this behavior. - When a test has both `#![no_std]`/`#![no_core]` and `//@ needs-target-std`, the explicit `//@ needs-target-std` directive will cause the test to be ignored for targets that do not support std still. This is to make it easier to test out-of-tree targets / custom targets (and targets not tested in r-l/r CI) without requiring target maintainers to do a bunch of manual `//@ needs-target-std` busywork. Context: [#t-compiler/help > &rust-lang#96;compiletest&rust-lang#96; cannot find &rust-lang#96;core&rust-lang#96; library for target != host](https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/.60compiletest.60.20cannot.20find.20.60core.60.20library.20for.20target.20!.3D.20host/with/568652419) ## Implementation remarks This is an alternative version of rust-lang#150672, with some differences: - *This* PR applies this implied-`needs-target-std` behavior to all `codegen` test mode tests. - *This* PR does the synthetic directive injection in the same place as implied-`codegen-run` directives. Both are of course hacks, but at least they're together next to each other.
…aumeGomez Add a `documentation` remapping path scope for rustdoc usage This PR adds a new remapping path scope for rustdoc usage: `documentation`, instead of rustdoc abusing the other scopes for it's usage. Like remapping paths in rustdoc, this scope is unstable. (rustdoc doesn't even have yet an equivalent to [rustc `--remap-path-scope`](https://doc.rust-lang.org/nightly/rustc/remap-source-paths.html#--remap-path-scope)). I also took the opportunity to add a bit of documentation in rustdoc book.
Fix broken WASIp1 reference link ### Location (URL) https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip1.html <img width="800" alt="image" src="https://github.com/user-attachments/assets/b9402b3a-db7b-405f-b4ef-d849c03ad893" /> ### Summary The WASIp1 reference link in the `wasm32-wasip1` platform documentation currently points to a path that no longer exists in the WASI repository. The WASI project recently migrated the WASI 0.1 (preview1) documentation from the `legacy/preview1` directory to the dedicated `wasi-0.1` branch (WebAssembly/WASI#855). This updates the link to point to the intended historical WASIp1 reference, which matches the documented intent of the `wasm32-wasip1` target.
…obzol Update `sysinfo` version to `0.38.0` Some bugfixes and added supported for NetBSD. r? @Kobzol
|
Rollup of everything. @bors r+ rollup=never p=5 |
|
Checking the probably-flaky failure from #151664 (comment): @bors try jobs=i686-gnu-1 |
Rollup of 6 pull requests try-job: i686-gnu-1
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
📌 Perf builds for each rolled up PR:
previous master: fb292b75fb In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing fb292b7 (parent) -> 0462e8f (this PR) Test differencesShow 79 test diffsStage 0
Stage 1
Stage 2
Additionally, 64 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 0462e8f7e51f20692b02d68efee68bb28a6f4457 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (0462e8f): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 2.3%, secondary -0.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 475.284s -> 473.061s (-0.47%) |
Successful merges:
needs-target-stdforcodegenmode tests unless annotated with#![no_std]/#![no_core]#151294 (compiletest: add impliedneeds-target-stdforcodegenmode tests unless annotated with#![no_std]/#![no_core])documentationremapping path scope for rustdoc usage #151589 (Add adocumentationremapping path scope for rustdoc usage)sysinfoversion to0.38.0#151645 (Updatesysinfoversion to0.38.0)r? @ghost
Create a similar rollup