Skip to content

Conversation

@JonathanBrouwer
Copy link
Contributor

Successful merges:

r? @ghost

Create a similar rollup

bonega and others added 26 commits January 17, 2026 17:38
When `[u8]::is_ascii()` is compiled with `-C target-cpu=native` on
AVX-512 CPUs, LLVM generates inefficient code. Because `is_ascii` is
marked `#[inline]`, it gets inlined and recompiled with the user's
target settings. The previous implementation used a counting loop that
LLVM auto-vectorizes to `pmovmskb` on SSE2, but with AVX-512 enabled,
LLVM uses k-registers and extracts bits individually with ~31
`kshiftrd` instructions.

This fix replaces the counting loop with explicit SSE2 intrinsics
(`_mm_loadu_si128`, `_mm_or_si128`, `_mm_movemask_epi8`) for x86_64.
`_mm_movemask_epi8` compiles to `pmovmskb`, forcing efficient codegen
regardless of CPU features.

Benchmark results on AMD Ryzen 5 7500F (Zen 4 with AVX-512):
- Default build: ~73 GB/s → ~74 GB/s (no regression)
- With -C target-cpu=native: ~3 GB/s → ~67 GB/s (22x improvement)

The loongarch64 implementation retains the original counting loop
since it doesn't have this issue.

Regression from: rust-lang#130733
For inputs smaller than 32 bytes, use usize-at-a-time processing
instead of calling the SSE2 function. This avoids function call
overhead from #[target_feature(enable = "sse2")] which prevents
inlining.

Also moves CHUNK_SIZE to module level so it can be shared between
is_ascii and is_ascii_sse2.
This will be used in order to emit HVX intrinsics
Combine the x86_64 and loongarch64 is_ascii tests into a single file
using compiletest revisions. Both now test assembly output:

- X86_64: Verifies no broken kshiftrd/kshiftrq instructions (AVX-512 fix)
- LA64: Verifies vmskltz.b instruction is used (auto-vectorization)
Remove the `#[target_feature(enable = "sse2")]` attribute and make the
function safe to call. The SSE2 requirement is already enforced by the
`#[cfg(target_feature = "sse2")]` predicate.

Individual unsafe blocks are used for intrinsic calls with appropriate
SAFETY comments.

Also adds FIXME reference to llvm#176906 for tracking when this
workaround can be removed.
Removed comment about reproducibility failures with crate type `bin` and `-Cdebuginfo=2` on non windows machines 
issue rust-lang#89911
Implements WCAG 2.4.1 (Level A) - Bypass Blocks accessibility feature.

Changes:
- Add skip-main-content link in page.html with tabindex=-1 on main-content
- Add CSS styling per reviewer feedback (outline border, themed colors)
- Add GOML test for skip navigation functionality

Fixes rust-lang#151420
This changes the `test` build script so that it does not use the default
fingerprinting mechanism in cargo which causes a full scan of the
package every time it runs. This build script does not depend on any of
the files in the package.

This is the recommended approach for writing build scripts.
Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size`
items it uses do not exist.

The other changes are straightforward fixes for places where the test
code drifted from the current API, since the tests are not yet built in
CI for the UEFI target.
…umbv8r, r=petrochenkov

Add Tier 3 Thumb-mode targets for Armv7-A, Armv7-R and Armv8-R

We currently have targets for bare-metal Armv7-R, Armv7-A and Armv8-R, but only in Arm mode. This PR adds five new targets enabling bare-metal support on these architectures in Thumb mode.

This has been tested using https://github.com/rust-embedded/aarch32/compare/main...thejpster:aarch32:support-thumb-mode-v7-v8?expand=1 and they all seem to work as expected.

However, I wasn't sure what to do with the maintainer lists as these are five new targets, but they share the docs page with the existing Arm versions. I can ask the Embedded Devices WG Arm Team about taking on these ones too, but whether Arm themselves want to take them on I guess is a bigger question.
…ertdev

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native

## Summary

This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.

On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.

This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.

  ## Problem

  When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
  instruction. This causes a **~22x performance regression**.

  Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.

## Root cause (upstream)

The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.

An upstream issue has been filed by @folkertdev  to track the root cause: llvm/llvm-project#176906

Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.

  ## Solution

  Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.

  ## Godbolt Links (Rust 1.92)

  | Pattern | Target | Link | Result |
  |---------|--------|------|--------|
  | Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
  | Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
  | SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
  | SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |

  ## Benchmark Results

  **CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)

  ### Default Target (SSE2) — Mixed

  | Size | Before | After | Change |
  |------|--------|-------|--------|
  | 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
  | 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
  | 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
  | 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
  | 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
  | 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
  | 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
  | 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |

  ### Native Target (AVX-512) — Up to 24x Faster

  | Size | Before | After | Speedup |
  |------|--------|-------|---------|
  | 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
  | 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
  | 16 B | ~7 GB/s | ~7 GB/s | ~same |
  | 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
  | 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
  | 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
  | 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
  | 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |

  ### Summary

  - **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
  - **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)

  Note: this is the pure ascii path, but the story is similar for the others.
  See linked bench project.

  ## Test Plan

  - [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
  - [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
  - [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
  - [x] Benchmarks confirm performance improvement
  - [x] Tidy checks pass

  ## Reproduction / Test Projects

  Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

  - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
  - `fuzz/` - Compares old/new implementations with libfuzzer

  ## Related Issues
  - issue opened by @folkertdev llvm/llvm-project#176906
  - Regression introduced in rust-lang#130733
…r=folkertdev

hexagon: Add HVX target features

This will be used in order to emit HVX intrinsics
…sts-linux, r=Kobzol

Enable reproducible binary builds with debuginfo on Linux

Fixes rust-lang#89911

This PR enables `-Cdebuginfo=2` for binary crate types in the `reproducible-build` run-make test on Linux platforms.

- Removed the `!matches!(crate_type, CrateType::Bin)` check in `diff_dir_test()`
- SHA256 hashes match: `932be0d950f4ffae62451f7b4c8391eb458a68583feb11193dd501551b6201d4`

This scenario was previously disabled due to rust-lang#89911. I have verified locally on Linux (WSL) with LLVM 21 that the regression reported in that issue appears to be resolved, and the tests now pass with debug info enabled.
…r=GuillaumeGomez

Add "Skip to main content" link for keyboard navigation in rustdoc

## Summary

This PR adds a "Skip to main content" link for keyboard navigation in rustdoc, improving accessibility by allowing users to bypass the sidebar and navigate directly to the main content area.

## Changes

- **`src/librustdoc/html/templates/page.html`**: Added a skip link (`<a class="skip-main-content">`) immediately after the `<body>` tag that links to `#main-content`
- **`src/librustdoc/html/static/css/rustdoc.css`**: Added CSS styles for the skip link:
  - Visually hidden by default (`position: absolute; top: -100%`)
  - Becomes visible when focused via Tab key (`top: 0` on `:focus`)
  - Styled consistently with rustdoc theme using existing CSS variables
- **`tests/rustdoc-gui/skip-navigation.goml`**: Added GUI test to verify the skip link functionality

## WCAG Compliance

This addresses **WCAG Success Criterion 2.4.1 (Level A)** - Bypass Blocks:
> A mechanism is available to bypass blocks of content that are repeated on multiple web pages.

## Demo

When pressing Tab on a rustdoc page, the first focusable element is now the "Skip to main content" link, allowing keyboard users to jump directly to the main content without tabbing through the entire sidebar.

## Future Improvements

Based on the discussion in rust-lang#151420, additional skip links could be added between the page summary and module contents sections. This PR provides the foundation, and we can iterate on adding more skip links based on feedback.

Fixes rust-lang#151420

r? @JayanAXHF
…s-under-feature-gate-const-bool, r=jhpratt

constify boolean methods

```rs
// core::bool

impl bool {
    pub const fn then_some<T: [const] Destruct>(self, t: T) -> Option<T>;
    pub const fn then<T, F: [const] FnOnce() -> T + [const] Destruct>(self, f: F) -> Option<T>;
    pub const fn ok_or<E: [const] Destruct>(self, err: E) -> Result<(), E>;
    pub const fn ok_or_else<E, F: [const] FnOnce() -> E + [const] Destruct>;
}
```

will make tracking issue if pr liked
Don't use default build-script fingerprinting in `test`

This changes the `test` build script so that it does not use the default fingerprinting mechanism in cargo which causes a full scan of the package every time it runs. This build script does not depend on any of the files in the package.

This is the recommended approach for writing build scripts.
…i-test, r=Ayush1325,tgross35

Fix compilation of std/src/sys/pal/uefi/tests.rs

Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size` items it uses do not exist.

The other changes are straightforward fixes for places where the test code drifted from the current API, since the tests are not yet built in CI for the UEFI target.

CC @Ayush1325
@rust-bors rust-bors bot added the rollup A PR which is a rollup label Jan 24, 2026
@rustbot rustbot added A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 24, 2026
@rustbot rustbot added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Jan 24, 2026
@JonathanBrouwer
Copy link
Contributor Author

@bors r+ rollup=never p=5

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 24, 2026

📌 Commit 85430df has been approved by JonathanBrouwer

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 24, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 24, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 24, 2026

☀️ Test successful - CI
Approved by: JonathanBrouwer
Duration: 3h 13m 34s
Pushing a18e6d9 to main...

@rust-bors rust-bors bot merged commit a18e6d9 into rust-lang:main Jan 24, 2026
12 checks passed
@rustbot rustbot added this to the 1.95.0 milestone Jan 24, 2026
@rust-timer
Copy link
Collaborator

📌 Perf builds for each rolled up PR:

PR# Message Perf Build Sha
#150556 Add Tier 3 Thumb-mode targets for Armv7-A, Armv7-R and Armv… a7a0883591a032b98140bb6c17d9533153081680 (link)
#151259 Fix is_ascii performance regression on AVX-512 CPUs when co… 7d12730deb916c5f9098de6f63da0d4eab71c9b8 (link)
#151482 Add "Skip to main content" link for keyboard navigation in … a3ab7d3ce3a474b737929e066e3c75d6b4f3ed98 (link)
#151489 constify boolean methods ababb0cdde005a806bd7551796bf8ff8134b3c76 (link)
#151500 hexagon: Add HVX target features 28ec7137dfa258e50e834b2fdd838310f4800a0b (link)
#151517 Enable reproducible binary builds with debuginfo on Linux 56a3070059dabd84962a3067bd9c8148b4cf7de4 (link)
#151551 Don't use default build-script fingerprinting in test d8d29e759f03b5dfdc6218b8be900e76a072f3b3 (link)
#151555 Fix compilation of std/src/sys/pal/uefi/tests.rs ac32d98b1541c3b0fdd3f71c1da32be32c0e69e8 (link)

previous master: 87b2721871

In the case of a perf regression, run the following command for each PR you suspect might be the cause: @rust-timer build $SHA

@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 87b2721 (parent) -> a18e6d9 (this PR)

Test differences

Show 79 test diffs

Stage 0

  • spec::tests::thumbv7a_none_eabi: [missing] -> pass (J1)
  • spec::tests::thumbv7a_none_eabihf: [missing] -> pass (J1)
  • spec::tests::thumbv7r_none_eabi: [missing] -> pass (J1)
  • spec::tests::thumbv7r_none_eabihf: [missing] -> pass (J1)
  • spec::tests::thumbv8r_none_eabihf: [missing] -> pass (J1)

Stage 1

  • [assembly] tests/assembly-llvm/slice-is-ascii.rs#LA64: [missing] -> ignore (only executed when the architecture is loongarch64) (J1)
  • [assembly] tests/assembly-llvm/slice-is-ascii.rs#X86_64: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7a_none_eabi: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7a_none_eabihf: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7r_none_eabi: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7r_none_eabihf: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv8r_none_eabihf: [missing] -> pass (J1)
  • [codegen] tests/codegen-llvm/slice-is-ascii.rs: pass -> [missing] (J1)
  • [codegen] tests/codegen-llvm/slice-is-ascii.rs: ignore (only executed when the architecture is x86_64) -> [missing] (J3)
  • spec::tests::thumbv7a_none_eabi: [missing] -> pass (J5)
  • spec::tests::thumbv7a_none_eabihf: [missing] -> pass (J5)
  • spec::tests::thumbv7r_none_eabi: [missing] -> pass (J5)
  • spec::tests::thumbv7r_none_eabihf: [missing] -> pass (J5)
  • spec::tests::thumbv8r_none_eabihf: [missing] -> pass (J5)

Stage 2

  • [assembly] tests/assembly-llvm/slice-is-ascii.rs#X86_64: [missing] -> ignore (only executed when the architecture is x86_64) (J0)
  • [codegen] tests/codegen-llvm/slice-is-ascii.rs: ignore (only executed when the architecture is x86_64) -> [missing] (J0)
  • [assembly] tests/assembly-llvm/slice-is-ascii.rs#X86_64: [missing] -> pass (J2)
  • [codegen] tests/codegen-llvm/slice-is-ascii.rs: pass -> [missing] (J2)
  • [assembly] tests/assembly-llvm/slice-is-ascii.rs#LA64: [missing] -> ignore (only executed when the architecture is loongarch64) (J4)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7a_none_eabi: [missing] -> pass (J4)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7a_none_eabihf: [missing] -> pass (J4)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7r_none_eabi: [missing] -> pass (J4)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv7r_none_eabihf: [missing] -> pass (J4)
  • [assembly] tests/assembly-llvm/targets/targets-elf.rs#thumbv8r_none_eabihf: [missing] -> pass (J4)

Additionally, 50 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard a18e6d9d1473d9b25581dd04bef6c7577999631c --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-apple: 7400.1s -> 8977.8s (+21.3%)
  2. i686-gnu-nopt-1: 8328.4s -> 7042.7s (-15.4%)
  3. i686-gnu-2: 6123.6s -> 5199.6s (-15.1%)
  4. dist-aarch64-llvm-mingw: 5517.9s -> 6339.4s (+14.9%)
  5. pr-check-1: 1773.3s -> 1521.9s (-14.2%)
  6. i686-gnu-nopt-2: 8588.5s -> 7582.6s (-11.7%)
  7. aarch64-msvc-1: 7280.6s -> 6444.0s (-11.5%)
  8. x86_64-gnu-llvm-21-3: 6710.9s -> 5979.5s (-10.9%)
  9. i686-gnu-1: 8311.6s -> 7476.5s (-10.0%)
  10. aarch64-gnu-debug: 4296.5s -> 3935.6s (-8.4%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a18e6d9): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.3%, 0.3%] 1
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.2%, 0.3%] 2

Max RSS (memory usage)

Results (primary 3.1%, secondary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.1% [3.1%, 3.1%] 1
Regressions ❌
(secondary)
2.5% [1.9%, 3.1%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.1% [-6.0%, -1.6%] 3
All ❌✅ (primary) 3.1% [3.1%, 3.1%] 1

Cycles

Results (secondary -1.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.5% [3.5%, 3.5%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.6% [-4.7%, -2.5%] 2
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.3% [0.1%, 0.8%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.4%, 0.8%] 5

Bootstrap: 471.473s -> 469.207s (-0.48%)
Artifact size: 383.50 MiB -> 383.51 MiB (0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end merged-by-bors This PR was explicitly merged by bors. rollup A PR which is a rollup T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.