Skip to content

Conversation

@JonathanBrouwer
Copy link
Contributor

Successful merges:

r? @ghost

Create a similar rollup

bonega and others added 30 commits January 17, 2026 17:38
When `[u8]::is_ascii()` is compiled with `-C target-cpu=native` on
AVX-512 CPUs, LLVM generates inefficient code. Because `is_ascii` is
marked `#[inline]`, it gets inlined and recompiled with the user's
target settings. The previous implementation used a counting loop that
LLVM auto-vectorizes to `pmovmskb` on SSE2, but with AVX-512 enabled,
LLVM uses k-registers and extracts bits individually with ~31
`kshiftrd` instructions.

This fix replaces the counting loop with explicit SSE2 intrinsics
(`_mm_loadu_si128`, `_mm_or_si128`, `_mm_movemask_epi8`) for x86_64.
`_mm_movemask_epi8` compiles to `pmovmskb`, forcing efficient codegen
regardless of CPU features.

Benchmark results on AMD Ryzen 5 7500F (Zen 4 with AVX-512):
- Default build: ~73 GB/s → ~74 GB/s (no regression)
- With -C target-cpu=native: ~3 GB/s → ~67 GB/s (22x improvement)

The loongarch64 implementation retains the original counting loop
since it doesn't have this issue.

Regression from: rust-lang#130733
For inputs smaller than 32 bytes, use usize-at-a-time processing
instead of calling the SSE2 function. This avoids function call
overhead from #[target_feature(enable = "sse2")] which prevents
inlining.

Also moves CHUNK_SIZE to module level so it can be shared between
is_ascii and is_ascii_sse2.
It is a singleton which doesn't actually need to be passed through over
the bridge.
And rename FreeFunctions struct to Methods.
This fixes stage1 builds when the proc-macro bridge api changed.
The rustc_proc_macro crate is identical to the proc_macro that would end
up in the sysroot of the rustc compiler rustc_proc_macro is linked into.
Combine the x86_64 and loongarch64 is_ascii tests into a single file
using compiletest revisions. Both now test assembly output:

- X86_64: Verifies no broken kshiftrd/kshiftrq instructions (AVX-512 fix)
- LA64: Verifies vmskltz.b instruction is used (auto-vectorization)
Remove the `#[target_feature(enable = "sse2")]` attribute and make the
function safe to call. The SSE2 requirement is already enforced by the
`#[cfg(target_feature = "sse2")]` predicate.

Individual unsafe blocks are used for intrinsic calls with appropriate
SAFETY comments.

Also adds FIXME reference to llvm#176906 for tracking when this
workaround can be removed.
Removed comment about reproducibility failures with crate type `bin` and `-Cdebuginfo=2` on non windows machines 
issue rust-lang#89911
Implements WCAG 2.4.1 (Level A) - Bypass Blocks accessibility feature.

Changes:
- Add skip-main-content link in page.html with tabindex=-1 on main-content
- Add CSS styling per reviewer feedback (outline border, themed colors)
- Add GOML test for skip navigation functionality

Fixes rust-lang#151420
Use allocator_shim_contents in allocator_shim_symbols
…umbv8r, r=petrochenkov

Add Tier 3 Thumb-mode targets for Armv7-A, Armv7-R and Armv8-R

We currently have targets for bare-metal Armv7-R, Armv7-A and Armv8-R, but only in Arm mode. This PR adds five new targets enabling bare-metal support on these architectures in Thumb mode.

This has been tested using https://github.com/rust-embedded/aarch32/compare/main...thejpster:aarch32:support-thumb-mode-v7-v8?expand=1 and they all seem to work as expected.

However, I wasn't sure what to do with the maintainer lists as these are five new targets, but they share the docs page with the existing Arm versions. I can ask the Embedded Devices WG Arm Team about taking on these ones too, but whether Arm themselves want to take them on I guess is a bigger question.
…ertdev

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native

## Summary

This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.

On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.

This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.

  ## Problem

  When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
  instruction. This causes a **~22x performance regression**.

  Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.

## Root cause (upstream)

The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.

An upstream issue has been filed by @folkertdev  to track the root cause: llvm/llvm-project#176906

Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.

  ## Solution

  Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.

  ## Godbolt Links (Rust 1.92)

  | Pattern | Target | Link | Result |
  |---------|--------|------|--------|
  | Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
  | Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
  | SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
  | SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |

  ## Benchmark Results

  **CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)

  ### Default Target (SSE2) — Mixed

  | Size | Before | After | Change |
  |------|--------|-------|--------|
  | 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
  | 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
  | 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
  | 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
  | 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
  | 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
  | 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
  | 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |

  ### Native Target (AVX-512) — Up to 24x Faster

  | Size | Before | After | Speedup |
  |------|--------|-------|---------|
  | 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
  | 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
  | 16 B | ~7 GB/s | ~7 GB/s | ~same |
  | 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
  | 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
  | 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
  | 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
  | 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |

  ### Summary

  - **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
  - **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)

  Note: this is the pure ascii path, but the story is similar for the others.
  See linked bench project.

  ## Test Plan

  - [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
  - [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
  - [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
  - [x] Benchmarks confirm performance improvement
  - [x] Tidy checks pass

  ## Reproduction / Test Projects

  Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

  - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
  - `fuzz/` - Compares old/new implementations with libfuzzer

  ## Related Issues
  - issue opened by @folkertdev llvm/llvm-project#176906
  - Regression introduced in rust-lang#130733
…r=GuillaumeGomez

Add "Skip to main content" link for keyboard navigation in rustdoc

## Summary

This PR adds a "Skip to main content" link for keyboard navigation in rustdoc, improving accessibility by allowing users to bypass the sidebar and navigate directly to the main content area.

## Changes

- **`src/librustdoc/html/templates/page.html`**: Added a skip link (`<a class="skip-main-content">`) immediately after the `<body>` tag that links to `#main-content`
- **`src/librustdoc/html/static/css/rustdoc.css`**: Added CSS styles for the skip link:
  - Visually hidden by default (`position: absolute; top: -100%`)
  - Becomes visible when focused via Tab key (`top: 0` on `:focus`)
  - Styled consistently with rustdoc theme using existing CSS variables
- **`tests/rustdoc-gui/skip-navigation.goml`**: Added GUI test to verify the skip link functionality

## WCAG Compliance

This addresses **WCAG Success Criterion 2.4.1 (Level A)** - Bypass Blocks:
> A mechanism is available to bypass blocks of content that are repeated on multiple web pages.

## Demo

When pressing Tab on a rustdoc page, the first focusable element is now the "Skip to main content" link, allowing keyboard users to jump directly to the main content without tabbing through the entire sidebar.

## Future Improvements

Based on the discussion in rust-lang#151420, additional skip links could be added between the page summary and module contents sections. This PR provides the foundation, and we can iterate on adding more skip links based on feedback.

Fixes rust-lang#151420

r? @JayanAXHF
…rochenkov

Various refactors to the proc_macro bridge

This reduces the amount of types, traits and other abstractions that are involved with the bridge, which should make it easier to understand and modify. This should also help a bit with getting rid of the type marking hack, which is complicating the code a fair bit.
…sts-linux, r=Kobzol

Enable reproducible binary builds with debuginfo on Linux

Fixes rust-lang#89911

This PR enables `-Cdebuginfo=2` for binary crate types in the `reproducible-build` run-make test on Linux platforms.

- Removed the `!matches!(crate_type, CrateType::Bin)` check in `diff_dir_test()`
- SHA256 hashes match: `932be0d950f4ffae62451f7b4c8391eb458a68583feb11193dd501551b6201d4`

This scenario was previously disabled due to rust-lang#89911. I have verified locally on Linux (WSL) with LLVM 21 that the regression reported in that issue appears to be resolved, and the tests now pass with debug info enabled.
Tweak bounds check in `DepNodeColorMap.get`

This aligns the outer bounds check for a green color in `DepNodeColorMap.get` with the bounds check in `DepNodeIndex::from_u32`, allowing the latter to be optimized out.
@rust-bors rust-bors bot added the rollup A PR which is a rollup label Jan 23, 2026
@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Jan 23, 2026
@JonathanBrouwer
Copy link
Contributor Author

@bors r+ rollup=never p=5

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 23, 2026

📌 Commit 20ae8d8 has been approved by JonathanBrouwer

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 23, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 23, 2026

⌛ Testing commit 20ae8d8 with merge af432da...

Workflow: https://github.com/rust-lang/rust/actions/runs/21296369531

rust-bors bot pushed a commit that referenced this pull request Jan 23, 2026
…uwer

Rollup of 7 pull requests

Successful merges:

 - #149848 (Use allocator_shim_contents in allocator_shim_symbols)
 - #150556 (Add Tier 3 Thumb-mode targets for Armv7-A, Armv7-R and Armv8-R)
 - #151259 (Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native)
 - #151482 (Add "Skip to main content" link for keyboard navigation in rustdoc)
 - #151505 (Various refactors to the proc_macro bridge)
 - #151517 (Enable reproducible binary builds with debuginfo on Linux)
 - #151540 (Tweak bounds check in `DepNodeColorMap.get`)

r? @ghost
@Kobzol
Copy link
Member

Kobzol commented Jan 23, 2026

@bors r-

I'll cancel this because I want to r- #151540.

@rust-bors rust-bors bot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 23, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 23, 2026

Commit 20ae8d8 has been unapproved.

Auto build cancelled due to unapproval. Cancelled workflows:

@rust-bors rust-bors bot removed the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 23, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-js Area: Rustdoc's JS front-end rollup A PR which is a rollup T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants