Update to LLVM 22 #150722

nikic · 2026-01-06T11:26:00Z

Scheduled release date: Feb 24
1.94 becomes stable: Mar 5

Depends on:

rustbot · 2026-01-06T11:26:03Z

These commits modify compiler targets.
(See the Target Tier Policy.)

rustbot · 2026-01-06T11:26:05Z

⚠️ Warning ⚠️

Some commits in this PR modify submodules.

If this was not intentional, see I changed a submodule on accident in the rustc dev guide.

nikic · 2026-01-06T11:39:10Z

@bors try @rust-timer queue

Update to LLVM 22

rust-bors · 2026-01-06T14:36:15Z

☀️ Try build successful (CI)
Build commit: dabe9cd (dabe9cd2e5a1630ac5ed5f1b87f891b6dc563025, parent: da476f1942868cdf94ed88b01ea31170cfe95047)

rust-timer · 2026-01-06T15:17:47Z

Finished benchmarking commit (dabe9cd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 8.9%]	228
Regressions ❌ (secondary)	1.4%	[0.2%, 8.0%]	260
Improvements ✅ (primary)	-0.8%	[-1.1%, -0.3%]	6
Improvements ✅ (secondary)	-2.2%	[-8.0%, -0.2%]	28
All ❌✅ (primary)	0.9%	[-1.1%, 8.9%]	234

Max RSS (memory usage)

Results (primary 1.0%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.8%, 1.3%]	2
Regressions ❌ (secondary)	1.8%	[0.9%, 2.4%]	8
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.7%	[-4.8%, -2.6%]	10
All ❌✅ (primary)	1.0%	[0.8%, 1.3%]	2

Cycles

Results (primary 3.2%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.2%	[1.6%, 10.2%]	19
Regressions ❌ (secondary)	3.3%	[1.3%, 7.4%]	38
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.8%	[-7.5%, -2.5%]	7
All ❌✅ (primary)	3.2%	[1.6%, 10.2%]	19

Binary size

Results (primary -0.6%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.1%]	4
Regressions ❌ (secondary)	2.6%	[0.1%, 5.1%]	2
Improvements ✅ (primary)	-0.8%	[-1.5%, -0.1%]	14
Improvements ✅ (secondary)	-1.1%	[-3.9%, -0.0%]	93
All ❌✅ (primary)	-0.6%	[-1.5%, 0.1%]	18

Bootstrap: 473.133s -> 480.877s (1.64%)
Artifact size: 390.77 MiB -> 402.18 MiB (2.92%)

rust-bors · 2026-01-06T18:45:25Z

☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2026-01-06T19:43:13Z

☔ The latest upstream changes (presumably #150726) made this pull request unmergeable. Please resolve the merge conflicts.

nikic · 2026-01-08T11:49:52Z

Based on helloworld, the perf regressions seem to be related to the allocator somehow. Previously tcache_alloc_small_hard was called 3 times, now it's called 189 times. The total number of allocations is smaller, but the time spent in the allocator is larger.

nikic · 2026-01-08T14:29:53Z

Or maybe the issue is not actually the allocator behavior itself. I suspect that we might have lost LTO on jemalloc and tcache_alloc_small_hard previously got inlined into malloc_default but now no longer is. Possibly updating the host toolchain at the same time so that the versions match will help.

nikic · 2026-01-20T10:06:56Z

@bors try jobs=aarch64-msvc-1

Update to LLVM 22 try-job: aarch64-msvc-1

rust-bors · 2026-01-20T12:12:11Z

💔 Test for c6faa0e failed: CI. Failed job:

try - aarch64-msvc-1 (web logs, enhanced plaintext logs)

nikic · 2026-01-20T13:45:39Z

@bors try jobs=aarch64-msvc-1

Update to LLVM 22 try-job: aarch64-msvc-1

rust-bors · 2026-01-20T16:15:54Z

☀️ Try build successful (CI)
Build commit: 1c09150 (1c091504c0d54f2f5af8dea98d70ead54c1392ea, parent: fffc4fcf96b30bc838551de5104d74f82400b35b)

nikic · 2026-01-20T17:16:58Z

Yay, with that all issues should have a pending patch.

I looked into the 4% regression on include-blob, and am somewhat confused. It looks like we're spending 170M instructions in MCObjectStreamer::emitBytes() instead of 6M -- and I think the reason is that it no longer calls into memcpy and instead uses this most beautiful loop:

 8a1cc3a:   48 83 c1 08             add    $0x8,%rcx
 8a1cc3e:   45 31 c0                xor    %r8d,%r8d
 8a1cc41:   46 0f b6 0c 06          movzbl (%rsi,%r8,1),%r9d
 8a1cc46:   46 88 0c 00             mov    %r9b,(%rax,%r8,1)
 8a1cc4a:   46 0f b6 4c 06 01       movzbl 0x1(%rsi,%r8,1),%r9d
 8a1cc50:   46 88 4c 00 01          mov    %r9b,0x1(%rax,%r8,1)
 8a1cc55:   46 0f b6 4c 06 02       movzbl 0x2(%rsi,%r8,1),%r9d
 8a1cc5b:   46 88 4c 00 02          mov    %r9b,0x2(%rax,%r8,1)
 8a1cc60:   46 0f b6 4c 06 03       movzbl 0x3(%rsi,%r8,1),%r9d
 8a1cc66:   46 88 4c 00 03          mov    %r9b,0x3(%rax,%r8,1)
 8a1cc6b:   46 0f b6 4c 06 04       movzbl 0x4(%rsi,%r8,1),%r9d
 8a1cc71:   46 88 4c 00 04          mov    %r9b,0x4(%rax,%r8,1)
 8a1cc76:   46 0f b6 4c 06 05       movzbl 0x5(%rsi,%r8,1),%r9d
 8a1cc7c:   46 88 4c 00 05          mov    %r9b,0x5(%rax,%r8,1)
 8a1cc81:   46 0f b6 4c 06 06       movzbl 0x6(%rsi,%r8,1),%r9d
 8a1cc87:   46 88 4c 00 06          mov    %r9b,0x6(%rax,%r8,1)
 8a1cc8c:   46 0f b6 4c 06 07       movzbl 0x7(%rsi,%r8,1),%r9d
 8a1cc92:   46 88 4c 00 07          mov    %r9b,0x7(%rax,%r8,1)
 8a1cc97:   48 83 c1 f8             add    $0xfffffffffffffff8,%rcx
 8a1cc9b:   49 83 c0 08             add    $0x8,%r8
 8a1cc9f:   48 83 f9 08             cmp    $0x8,%rcx
 8a1cca3:   7f 9c                   jg     8a1cc41 <_ZN4llvm16MCObjectStreamer9emitBytesENS_9StringRefE+0xa9>
 8a1cca5:   e9 63 ff ff ff          jmp    8a1cc0d <_ZN4llvm16MCObjectStreamer9emitBytesENS_9StringRefE+0x75>

But I'm not sure how we can end up with something like this. This is ultimately just a std::copy, which should be converted to memmove by STL headers, so that would imply that LLVM is going out of the way to convert the memcpy into this loop. I don't see this in my local build. Wondering whether this is some weird PGO transform, but I'm not aware of anything that would cause a loop expansion (only constant size specialization).

nikic · 2026-01-20T17:30:01Z

Oh, I think this may be related to the fact that we're using an old libstdc++ 9.5. It looks like that version does not specialize to memmove if the input and output types of the iterators aren't the same: https://cpp.godbolt.org/z/x7cMb1shT In this case there is a signed / unsigned mismatch.

It doesn't explain why we get the terrible byte-wise copies instead of at xmm copies, but at least that explains why we don't get memmove.

Looks like memmove is only getting used for the different-type case starting with libstdc++ 15: https://cpp.godbolt.org/z/qzc5ehMcP

rustbot · 2026-01-21T11:55:42Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

nikic · 2026-01-21T11:57:01Z

@bors try @rust-timer queue

Update to LLVM 22

rust-bors · 2026-01-21T14:20:03Z

☀️ Try build successful (CI)
Build commit: 0da05b9 (0da05b9571e60e86e9fc63537cd6abc32cfddd5b, parent: 838db2538201a845a3694c99d9114a1acebd6e28)

rust-timer · 2026-01-21T15:00:58Z

Finished benchmarking commit (0da05b9): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	2.9%	[0.7%, 7.4%]	3
Regressions ❌ (secondary)	2.1%	[0.4%, 6.2%]	4
Improvements ✅ (primary)	-0.6%	[-2.9%, -0.2%]	230
Improvements ✅ (secondary)	-0.9%	[-9.9%, -0.2%]	278
All ❌✅ (primary)	-0.6%	[-2.9%, 7.4%]	233

Max RSS (memory usage)

Results (primary -0.9%, secondary -2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.6%	[0.9%, 3.8%]	7
Improvements ✅ (primary)	-0.9%	[-0.9%, -0.9%]	1
Improvements ✅ (secondary)	-4.3%	[-6.5%, -1.6%]	12
All ❌✅ (primary)	-0.9%	[-0.9%, -0.9%]	1

Cycles

Results (primary 3.2%, secondary -2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	8.9%	[8.9%, 8.9%]	1
Regressions ❌ (secondary)	4.6%	[2.1%, 10.2%]	6
Improvements ✅ (primary)	-2.5%	[-2.5%, -2.5%]	1
Improvements ✅ (secondary)	-4.3%	[-7.9%, -2.0%]	17
All ❌✅ (primary)	3.2%	[-2.5%, 8.9%]	2

Binary size

Results (primary -0.3%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.1%]	4
Regressions ❌ (secondary)	2.6%	[0.1%, 5.1%]	2
Improvements ✅ (primary)	-0.4%	[-1.5%, -0.1%]	53
Improvements ✅ (secondary)	-1.1%	[-3.8%, -0.0%]	98
All ❌✅ (primary)	-0.3%	[-1.5%, 0.1%]	57

Bootstrap: 472.653s -> 476.124s (0.73%)
Artifact size: 383.22 MiB -> 397.37 MiB (3.69%)