-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Rollup of 4 pull requests #151664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 4 pull requests #151664
Conversation
Per my understanding, needed for mut access next line.
Use explicit SSE2 intrinsics to avoid LLVM's broken AVX-512 auto-vectorization which generates ~31 kshiftrd instructions. Performance - AVX-512: 34-48x faster - SSE2: 1.5-2x faster Improves on earlier pr
The SSE2 helper function is not inlined across crate boundaries, so we cannot verify the codegen in an assembly test. The fix is still verified by the absence of performance regression.
refactor rustc-hash integration I found that rustc-hash is used in multiple compiler crates. Also some types use `FxBuildHasher` whereas others use `BuildHasherDefault<FxHasher>` (both do the same thing). In order to simplify future hashing experiments, I changed every location to use `rustc_data_structures::fx::*` types instead, and also removed the `BuildHasherDefault` variant. This will simplify future experiments with hashing (for example trying out a hasher that doesn't implement `Default` for whatever reason).
…erformance, r=folkertdev Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics # Summary Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs. AVX-512 keeps similiar performance characteristics. This is building on the work already merged in rust-lang#151259. In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore. Thanks to @folkertdev for pointing me to consider `as_chunk` again. # The implementation: - Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together - Extracts the MSB mask with a single `pmovmskb` instruction - Falls back to usize-at-a-time SWAR for inputs < 64 bytes # Performance impact (vs before rust-lang#151259): - AVX-512: 34-48x faster - SSE2: 1.5-2x faster <details> <summary>Benchmark Results (click to expand)</summary> Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest). Tops out at 139GB/s for large inputs. ### early_non_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 64 | 1.01 | **1.00** | 13.45 | 1.13 | | 1024 | 1.01 | **1.00** | 13.53 | 1.14 | | 65536 | 1.01 | **1.00** | 13.99 | 1.12 | | 1048576 | 1.02 | **1.00** | 13.29 | 1.12 | ### late_non_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 64 | **1.00** | 1.01 | 13.37 | 1.13 | | 1024 | 1.10 | **1.00** | 42.42 | 1.95 | | 65536 | **1.00** | 1.06 | 42.22 | 1.73 | | 1048576 | **1.00** | 1.03 | 34.73 | 1.46 | ### pure_ascii | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 | |------------|------------|----------|------------|----------| | 4 | 1.03 | **1.00** | 1.75 | 1.32 | | 8 | **1.00** | 1.14 | 3.89 | 2.06 | | 16 | **1.00** | 1.04 | 1.13 | 1.62 | | 32 | 1.07 | 1.19 | 5.11 | **1.00** | | 64 | **1.00** | 1.13 | 13.32 | 1.57 | | 128 | **1.00** | 1.01 | 19.97 | 1.55 | | 256 | **1.00** | 1.02 | 27.77 | 1.61 | | 1024 | **1.00** | 1.02 | 41.34 | 1.84 | | 4096 | 1.02 | **1.00** | 45.61 | 1.98 | | 16384 | 1.01 | **1.00** | 48.67 | 2.04 | | 65536 | **1.00** | 1.03 | 43.86 | 1.77 | | 262144 | **1.00** | 1.06 | 41.44 | 1.79 | | 1048576 | 1.02 | **1.00** | 35.36 | 1.44 | </details> ## Reproduction / Test Projects Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison - `fuzz/` - Compares old/new implementations with libfuzzer Relates to: llvm/llvm-project#176906
…r=joboet Add missing mut to pin.rs docs Per my understanding, needed for mut access next line.
Fix broken WASIp1 reference link ### Location (URL) https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip1.html <img width="800" alt="image" src="https://github.com/user-attachments/assets/b9402b3a-db7b-405f-b4ef-d849c03ad893" /> ### Summary The WASIp1 reference link in the `wasm32-wasip1` platform documentation currently points to a path that no longer exists in the WASI repository. The WASI project recently migrated the WASI 0.1 (preview1) documentation from the `legacy/preview1` directory to the dedicated `wasi-0.1` branch (WebAssembly/WASI#855). This updates the link to point to the intended historical WASIp1 reference, which matches the documented intent of the `wasm32-wasip1` target.
|
@bors r+ rollup=never p=4 |
This comment has been minimized.
This comment has been minimized.
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
💔 Test for b614256 failed: CI. Failed job:
|
Surprising, I only seen this on x86_64 macos before, I think? |
|
No immediately obvious cause, and this is partially being "try"-d in #151667. Deferring to that. |
Successful merges:
r? @ghost
Create a similar rollup