Skip to content

Conversation

@Zalathar
Copy link
Member

Successful merges:

r? @ghost

Create a similar rollup

Per my understanding, needed for mut access next line.
Use explicit SSE2 intrinsics to avoid LLVM's broken AVX-512
auto-vectorization which generates ~31 kshiftrd instructions.

Performance
- AVX-512: 34-48x faster
- SSE2: 1.5-2x faster

Improves on earlier pr
The SSE2 helper function is not inlined across crate boundaries,
so we cannot verify the codegen in an assembly test. The fix is
still verified by the absence of performance regression.
…tests by default

A `codegen-llvm` test (and other codegen test mode tests) will now by
default have an implied `//@ needs-target-std` directive, *unless* the
test explicitly has an `#![no_std]`/`#![no_core]` attribute which
disables this implied behavior.

- When a test has both `#![no_std]`/`#![no_core]` and `//@
  needs-target-std`, the explicit `//@ needs-target-std` directive will
  cause the test to be ignored for targets that do not support std
  still.

This is to make it easier to test out-of-tree targets / custom targets
(and targets not tested in r-l/r CI) without requiring target
maintainers to do a bunch of manual `//@ needs-target-std` busywork.

Co-authored-by: Edoardo Marangoni <ecmm@anche.no>
…erformance, r=folkertdev

Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics

# Summary

Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs.
AVX-512 keeps similiar performance characteristics.

This is building on the work already merged in rust-lang#151259.
In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore.
Thanks to @folkertdev for pointing me to consider `as_chunk` again.

# The implementation:
- Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together
- Extracts the MSB mask with a single `pmovmskb` instruction
- Falls back to usize-at-a-time SWAR for inputs < 64 bytes

# Performance impact (vs before rust-lang#151259):
- AVX-512: 34-48x faster
- SSE2: 1.5-2x faster

  <details>
  <summary>Benchmark Results (click to expand)</summary>

  Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest).
  Tops out at 139GB/s for large inputs.

  ### early_non_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 64 | 1.01 | **1.00** | 13.45 | 1.13 |
  | 1024 | 1.01 | **1.00** | 13.53 | 1.14 |
  | 65536 | 1.01 | **1.00** | 13.99 | 1.12 |
  | 1048576 | 1.02 | **1.00** | 13.29 | 1.12 |

  ### late_non_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 64 | **1.00** | 1.01 | 13.37 | 1.13 |
  | 1024 | 1.10 | **1.00** | 42.42 | 1.95 |
  | 65536 | **1.00** | 1.06 | 42.22 | 1.73 |
  | 1048576 | **1.00** | 1.03 | 34.73 | 1.46 |

  ### pure_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 4 | 1.03 | **1.00** | 1.75 | 1.32 |
  | 8 | **1.00** | 1.14 | 3.89 | 2.06 |
  | 16 | **1.00** | 1.04 | 1.13 | 1.62 |
  | 32 | 1.07 | 1.19 | 5.11 | **1.00** |
  | 64 | **1.00** | 1.13 | 13.32 | 1.57 |
  | 128 | **1.00** | 1.01 | 19.97 | 1.55 |
  | 256 | **1.00** | 1.02 | 27.77 | 1.61 |
  | 1024 | **1.00** | 1.02 | 41.34 | 1.84 |
  | 4096 | 1.02 | **1.00** | 45.61 | 1.98 |
  | 16384 | 1.01 | **1.00** | 48.67 | 2.04 |
  | 65536 | **1.00** | 1.03 | 43.86 | 1.77 |
  | 262144 | **1.00** | 1.06 | 41.44 | 1.79 |
  | 1048576 | 1.02 | **1.00** | 35.36 | 1.44 |

  </details>

## Reproduction / Test Projects

Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer

Relates to: llvm/llvm-project#176906
…r=joboet

Add missing mut to pin.rs docs

Per my understanding, needed for mut access next line.
…=Zalathar

compiletest: add implied `needs-target-std` for `codegen` mode tests unless annotated with `#![no_std]`/`#![no_core]`

A `codegen` mode test (such as `codegen-llvm` test suite) will now by default have an implied `//@ needs-target-std` directive, *unless* the test explicitly has an `#![no_std]`/`#![no_core]` attribute which disables this behavior.

- When a test has both `#![no_std]`/`#![no_core]` and `//@ needs-target-std`, the explicit `//@ needs-target-std` directive will cause the test to be ignored for targets that do not support std still.

This is to make it easier to test out-of-tree targets / custom targets (and targets not tested in r-l/r CI) without requiring target maintainers to do a bunch of manual `//@ needs-target-std` busywork.

Context: [#t-compiler/help > &rust-lang#96;compiletest&rust-lang#96; cannot find &rust-lang#96;core&rust-lang#96; library for target != host](https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/.60compiletest.60.20cannot.20find.20.60core.60.20library.20for.20target.20!.3D.20host/with/568652419)

## Implementation remarks

This is an alternative version of rust-lang#150672, with some differences:

- *This* PR applies this implied-`needs-target-std` behavior to all `codegen` test mode tests.
- *This* PR does the synthetic directive injection in the same place as implied-`codegen-run` directives. Both are of course hacks, but at least they're together next to each other.
…aumeGomez

Add a `documentation` remapping path scope for rustdoc usage

This PR adds a new remapping path scope for rustdoc usage: `documentation`, instead of rustdoc abusing the other scopes for it's usage.

Like remapping paths in rustdoc, this scope is unstable. (rustdoc doesn't even have yet an equivalent to [rustc `--remap-path-scope`](https://doc.rust-lang.org/nightly/rustc/remap-source-paths.html#--remap-path-scope)).

I also took the opportunity to add a bit of documentation in rustdoc book.
Fix broken WASIp1 reference link

### Location (URL)
https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip1.html

<img width="800" alt="image" src="https://github.com/user-attachments/assets/b9402b3a-db7b-405f-b4ef-d849c03ad893" />

### Summary
The WASIp1 reference link in the `wasm32-wasip1` platform documentation currently points to a path that no longer exists in the WASI repository.

The WASI project recently migrated the WASI 0.1 (preview1) documentation from the `legacy/preview1` directory to the dedicated `wasi-0.1` branch (WebAssembly/WASI#855).

This updates the link to point to the intended historical WASIp1 reference, which matches the documented intent of the `wasm32-wasip1` target.
…obzol

Update `sysinfo` version to `0.38.0`

Some bugfixes and added supported for NetBSD.

r? @Kobzol
@rust-bors rust-bors bot added the rollup A PR which is a rollup label Jan 26, 2026
@rustbot rustbot added A-compiletest Area: The compiletest test runner A-run-make Area: port run-make Makefiles to rmake.rs A-rustc-dev-guide Area: rustc-dev-guide A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Jan 26, 2026
@Zalathar
Copy link
Member Author

Rollup of everything.

@bors r+ rollup=never p=5

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 26, 2026

📌 Commit ec48041 has been approved by Zalathar

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2026
@Zalathar
Copy link
Member Author

Checking the probably-flaky failure from #151664 (comment):

@bors try jobs=i686-gnu-1

rust-bors bot pushed a commit that referenced this pull request Jan 26, 2026
Rollup of 6 pull requests


try-job: i686-gnu-1
@rust-bors

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 26, 2026

☀️ Try build successful (CI)
Build commit: b7d669e (b7d669eb39c5d2b94f5ea136486ee5b8e9a760b0, parent: 873d4682c7d285540b8f28bfe637006cef8918a6)

@rust-bors rust-bors bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 26, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 26, 2026

☀️ Test successful - CI
Approved by: Zalathar
Duration: 3h 42m 19s
Pushing 0462e8f to main...

@rust-bors rust-bors bot merged commit 0462e8f into rust-lang:main Jan 26, 2026
13 checks passed
@rustbot rustbot added this to the 1.95.0 milestone Jan 26, 2026
@Zalathar Zalathar deleted the rollup-OzG0S5m branch January 26, 2026 09:26
@rust-timer
Copy link
Collaborator

📌 Perf builds for each rolled up PR:

PR# Message Perf Build Sha
#150705 Add missing mut to pin.rs docs 33e78feeede8a7d57d9b910b199b088550a5ced1 (link)
#151294 compiletest: add implied needs-target-std for codegen m… 099101ea82c952349f4130f8caea617fbdeafb1b (link)
#151589 Add a documentation remapping path scope for rustdoc usage 00c04f9fa38ce73385147dc052b66eee4afb9e2e (link)
#151611 Improve is_ascii performance on x86_64 with explicit SSE2 i… 34b0423a2432740ce6f2f7203896331792d3e7ce (link)
#151639 Fix broken WASIp1 reference link d971337d8c863fa5561e80b0d03cb27711873f16 (link)
#151645 Update sysinfo version to 0.38.0 a04ec21aff8ec0e5e71fbace137f443da7aacaa9 (link)

previous master: fb292b75fb

In the case of a perf regression, run the following command for each PR you suspect might be the cause: @rust-timer build $SHA

@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing fb292b7 (parent) -> 0462e8f (this PR)

Test differences

Show 79 test diffs

Stage 0

  • directives::tests::implied_needs_target_std: [missing] -> pass (J2)

Stage 1

  • [ui] tests/ui/compile-flags/invalid/remap-path-scope.rs#foo: [missing] -> pass (J0)
  • [ui] tests/ui/compile-flags/invalid/remap-path-scope.rs#underscore: [missing] -> pass (J0)
  • [ui] tests/ui/errors/remap-path-prefix-diagnostics.rs#only-doc-in-deps: [missing] -> pass (J0)
  • [ui] tests/ui/errors/remap-path-prefix-diagnostics.rs#with-doc-in-deps: [missing] -> pass (J0)
  • [ui] tests/ui/errors/remap-path-prefix-macro.rs#only-doc-in-deps: [missing] -> pass (J0)
  • [ui] tests/ui/errors/remap-path-prefix-macro.rs#with-doc-in-deps: [missing] -> pass (J0)
  • [ui] tests/ui/feature-gates/remap-path-scope-documentation.rs: [missing] -> pass (J0)

Stage 2

  • [ui] tests/ui/compile-flags/invalid/remap-path-scope.rs#foo: [missing] -> pass (J1)
  • [ui] tests/ui/compile-flags/invalid/remap-path-scope.rs#underscore: [missing] -> pass (J1)
  • [ui] tests/ui/errors/remap-path-prefix-diagnostics.rs#only-doc-in-deps: [missing] -> pass (J1)
  • [ui] tests/ui/errors/remap-path-prefix-diagnostics.rs#with-doc-in-deps: [missing] -> pass (J1)
  • [ui] tests/ui/errors/remap-path-prefix-macro.rs#only-doc-in-deps: [missing] -> pass (J1)
  • [ui] tests/ui/errors/remap-path-prefix-macro.rs#with-doc-in-deps: [missing] -> pass (J1)
  • [ui] tests/ui/feature-gates/remap-path-scope-documentation.rs: [missing] -> pass (J1)

Additionally, 64 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 0462e8f7e51f20692b02d68efee68bb28a6f4457 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-apple: 8394.7s -> 5965.4s (-28.9%)
  2. pr-check-1: 1685.2s -> 2018.9s (+19.8%)
  3. i686-gnu-2: 5233.2s -> 6164.0s (+17.8%)
  4. aarch64-apple: 11260.7s -> 12925.4s (+14.8%)
  5. x86_64-rust-for-linux: 2723.3s -> 3125.0s (+14.8%)
  6. x86_64-gnu-tools: 3352.2s -> 3846.1s (+14.7%)
  7. x86_64-gnu-miri: 4409.3s -> 5026.3s (+14.0%)
  8. i686-gnu-nopt-1: 7326.3s -> 8298.0s (+13.3%)
  9. armhf-gnu: 4851.2s -> 5424.5s (+11.8%)
  10. i686-gnu-1: 7595.4s -> 8460.5s (+11.4%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0462e8f): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 2.3%, secondary -0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
7.8% [7.8%, 7.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.3% [-5.4%, -5.1%] 2
All ❌✅ (primary) 2.3% [2.3%, 2.3%] 1

Cycles

Results (primary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.4%, -0.6%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.0% [-1.4%, -0.6%] 2

Binary size

Results (primary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [0.1%, 0.1%] 1

Bootstrap: 475.284s -> 473.061s (-0.47%)
Artifact size: 383.64 MiB -> 383.58 MiB (-0.02%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-compiletest Area: The compiletest test runner A-run-make Area: port run-make Makefiles to rmake.rs A-rustc-dev-guide Area: rustc-dev-guide A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. rollup A PR which is a rollup T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants