Skip to content

Conversation

@rustbot
Copy link
Collaborator

rustbot commented Jan 6, 2026

These commits modify compiler targets.
(See the Target Tier Policy.)

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 6, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 6, 2026

⚠️ Warning ⚠️

@nikic
Copy link
Contributor Author

nikic commented Jan 6, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Jan 6, 2026
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 6, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 6, 2026

☀️ Try build successful (CI)
Build commit: dabe9cd (dabe9cd2e5a1630ac5ed5f1b87f891b6dc563025, parent: da476f1942868cdf94ed88b01ea31170cfe95047)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dabe9cd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.9% [0.2%, 8.9%] 228
Regressions ❌
(secondary)
1.4% [0.2%, 8.0%] 260
Improvements ✅
(primary)
-0.8% [-1.1%, -0.3%] 6
Improvements ✅
(secondary)
-2.2% [-8.0%, -0.2%] 28
All ❌✅ (primary) 0.9% [-1.1%, 8.9%] 234

Max RSS (memory usage)

Results (primary 1.0%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.0% [0.8%, 1.3%] 2
Regressions ❌
(secondary)
1.8% [0.9%, 2.4%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.7% [-4.8%, -2.6%] 10
All ❌✅ (primary) 1.0% [0.8%, 1.3%] 2

Cycles

Results (primary 3.2%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.2% [1.6%, 10.2%] 19
Regressions ❌
(secondary)
3.3% [1.3%, 7.4%] 38
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.8% [-7.5%, -2.5%] 7
All ❌✅ (primary) 3.2% [1.6%, 10.2%] 19

Binary size

Results (primary -0.6%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 4
Regressions ❌
(secondary)
2.6% [0.1%, 5.1%] 2
Improvements ✅
(primary)
-0.8% [-1.5%, -0.1%] 14
Improvements ✅
(secondary)
-1.1% [-3.9%, -0.0%] 93
All ❌✅ (primary) -0.6% [-1.5%, 0.1%] 18

Bootstrap: 473.133s -> 480.877s (1.64%)
Artifact size: 390.77 MiB -> 402.18 MiB (2.92%)

@rust-bors

This comment was marked as outdated.

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 6, 2026
@rust-timer

This comment was marked as outdated.

@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 6, 2026

☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Jan 6, 2026

☔ The latest upstream changes (presumably #150726) made this pull request unmergeable. Please resolve the merge conflicts.

@nikic
Copy link
Contributor Author

nikic commented Jan 8, 2026

Based on helloworld, the perf regressions seem to be related to the allocator somehow. Previously tcache_alloc_small_hard was called 3 times, now it's called 189 times. The total number of allocations is smaller, but the time spent in the allocator is larger.

@nikic
Copy link
Contributor Author

nikic commented Jan 8, 2026

Or maybe the issue is not actually the allocator behavior itself. I suspect that we might have lost LTO on jemalloc and tcache_alloc_small_hard previously got inlined into malloc_default but now no longer is. Possibly updating the host toolchain at the same time so that the versions match will help.

@rustbot rustbot added A-CI Area: Our Github Actions CI A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Jan 8, 2026
@rustbot

This comment has been minimized.

@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Jan 20, 2026
@nikic
Copy link
Contributor Author

nikic commented Jan 20, 2026

@bors try jobs=aarch64-msvc-1

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Jan 20, 2026
Update to LLVM 22


try-job: aarch64-msvc-1
@rust-lang rust-lang deleted a comment from rust-bors bot Jan 20, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 20, 2026

💔 Test for c6faa0e failed: CI. Failed job:

@rust-log-analyzer

This comment has been minimized.

@nikic
Copy link
Contributor Author

nikic commented Jan 20, 2026

@bors try jobs=aarch64-msvc-1

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Jan 20, 2026
Update to LLVM 22


try-job: aarch64-msvc-1
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 20, 2026

☀️ Try build successful (CI)
Build commit: 1c09150 (1c091504c0d54f2f5af8dea98d70ead54c1392ea, parent: fffc4fcf96b30bc838551de5104d74f82400b35b)

@nikic
Copy link
Contributor Author

nikic commented Jan 20, 2026

Yay, with that all issues should have a pending patch.

I looked into the 4% regression on include-blob, and am somewhat confused. It looks like we're spending 170M instructions in MCObjectStreamer::emitBytes() instead of 6M -- and I think the reason is that it no longer calls into memcpy and instead uses this most beautiful loop:

 8a1cc3a:   48 83 c1 08             add    $0x8,%rcx
 8a1cc3e:   45 31 c0                xor    %r8d,%r8d
 8a1cc41:   46 0f b6 0c 06          movzbl (%rsi,%r8,1),%r9d
 8a1cc46:   46 88 0c 00             mov    %r9b,(%rax,%r8,1)
 8a1cc4a:   46 0f b6 4c 06 01       movzbl 0x1(%rsi,%r8,1),%r9d
 8a1cc50:   46 88 4c 00 01          mov    %r9b,0x1(%rax,%r8,1)
 8a1cc55:   46 0f b6 4c 06 02       movzbl 0x2(%rsi,%r8,1),%r9d
 8a1cc5b:   46 88 4c 00 02          mov    %r9b,0x2(%rax,%r8,1)
 8a1cc60:   46 0f b6 4c 06 03       movzbl 0x3(%rsi,%r8,1),%r9d
 8a1cc66:   46 88 4c 00 03          mov    %r9b,0x3(%rax,%r8,1)
 8a1cc6b:   46 0f b6 4c 06 04       movzbl 0x4(%rsi,%r8,1),%r9d
 8a1cc71:   46 88 4c 00 04          mov    %r9b,0x4(%rax,%r8,1)
 8a1cc76:   46 0f b6 4c 06 05       movzbl 0x5(%rsi,%r8,1),%r9d
 8a1cc7c:   46 88 4c 00 05          mov    %r9b,0x5(%rax,%r8,1)
 8a1cc81:   46 0f b6 4c 06 06       movzbl 0x6(%rsi,%r8,1),%r9d
 8a1cc87:   46 88 4c 00 06          mov    %r9b,0x6(%rax,%r8,1)
 8a1cc8c:   46 0f b6 4c 06 07       movzbl 0x7(%rsi,%r8,1),%r9d
 8a1cc92:   46 88 4c 00 07          mov    %r9b,0x7(%rax,%r8,1)
 8a1cc97:   48 83 c1 f8             add    $0xfffffffffffffff8,%rcx
 8a1cc9b:   49 83 c0 08             add    $0x8,%r8
 8a1cc9f:   48 83 f9 08             cmp    $0x8,%rcx
 8a1cca3:   7f 9c                   jg     8a1cc41 <_ZN4llvm16MCObjectStreamer9emitBytesENS_9StringRefE+0xa9>
 8a1cca5:   e9 63 ff ff ff          jmp    8a1cc0d <_ZN4llvm16MCObjectStreamer9emitBytesENS_9StringRefE+0x75>

But I'm not sure how we can end up with something like this. This is ultimately just a std::copy, which should be converted to memmove by STL headers, so that would imply that LLVM is going out of the way to convert the memcpy into this loop. I don't see this in my local build. Wondering whether this is some weird PGO transform, but I'm not aware of anything that would cause a loop expansion (only constant size specialization).

@nikic
Copy link
Contributor Author

nikic commented Jan 20, 2026

Oh, I think this may be related to the fact that we're using an old libstdc++ 9.5. It looks like that version does not specialize to memmove if the input and output types of the iterators aren't the same: https://cpp.godbolt.org/z/x7cMb1shT In this case there is a signed / unsigned mismatch.

It doesn't explain why we get the terrible byte-wise copies instead of at xmm copies, but at least that explains why we don't get memmove.

Looks like memmove is only getting used for the different-type case starting with libstdc++ 15: https://cpp.godbolt.org/z/qzc5ehMcP

@rust-bors

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Jan 21, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@nikic
Copy link
Contributor Author

nikic commented Jan 21, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Jan 21, 2026
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 21, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 21, 2026

☀️ Try build successful (CI)
Build commit: 0da05b9 (0da05b9571e60e86e9fc63537cd6abc32cfddd5b, parent: 838db2538201a845a3694c99d9114a1acebd6e28)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0da05b9): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
2.9% [0.7%, 7.4%] 3
Regressions ❌
(secondary)
2.1% [0.4%, 6.2%] 4
Improvements ✅
(primary)
-0.6% [-2.9%, -0.2%] 230
Improvements ✅
(secondary)
-0.9% [-9.9%, -0.2%] 278
All ❌✅ (primary) -0.6% [-2.9%, 7.4%] 233

Max RSS (memory usage)

Results (primary -0.9%, secondary -2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.6% [0.9%, 3.8%] 7
Improvements ✅
(primary)
-0.9% [-0.9%, -0.9%] 1
Improvements ✅
(secondary)
-4.3% [-6.5%, -1.6%] 12
All ❌✅ (primary) -0.9% [-0.9%, -0.9%] 1

Cycles

Results (primary 3.2%, secondary -2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
8.9% [8.9%, 8.9%] 1
Regressions ❌
(secondary)
4.6% [2.1%, 10.2%] 6
Improvements ✅
(primary)
-2.5% [-2.5%, -2.5%] 1
Improvements ✅
(secondary)
-4.3% [-7.9%, -2.0%] 17
All ❌✅ (primary) 3.2% [-2.5%, 8.9%] 2

Binary size

Results (primary -0.3%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 4
Regressions ❌
(secondary)
2.6% [0.1%, 5.1%] 2
Improvements ✅
(primary)
-0.4% [-1.5%, -0.1%] 53
Improvements ✅
(secondary)
-1.1% [-3.8%, -0.0%] 98
All ❌✅ (primary) -0.3% [-1.5%, 0.1%] 57

Bootstrap: 472.653s -> 476.124s (0.73%)
Artifact size: 383.22 MiB -> 397.37 MiB (3.69%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CI Area: Our Github Actions CI A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-run-make Area: port run-make Makefiles to rmake.rs A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants