feat(pm): preload tgz #2549

elrrrrrrr · 2026-01-27T10:16:16Z

Summary

Test Plan

gemini-code-assist · 2026-01-27T10:16:55Z

Summary of Changes

Hello @elrrrrrrr, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance enhancement to the package manager by implementing a pipeline-driven approach for package installation. It allows for the concurrent fetching of package manifests and the preloading of tarball archives, leveraging a new OnceMap utility to efficiently manage and deduplicate these operations. The change also upgrades the gzip decompression mechanism to use the highly optimized libdeflater library, further speeding up the extraction phase. These architectural improvements aim to reduce overall installation times and provide a more responsive user experience, as validated by the new benchmarking script.

Highlights

Pipeline Architecture for Tarball Preloading: Implemented a new pipeline system where tarball downloads begin immediately upon package resolution, running concurrently with further manifest fetching. This significantly reduces overall installation time by overlapping I/O operations.
Optimized Gzip Decompression: Switched from async-compression and tokio-tar to libdeflater for gzip decompression. libdeflater is a highly optimized library, leading to faster extraction of tarball archives. The decompression now happens in a blocking task executor to avoid blocking the async runtime.
Concurrent Download and Extraction Management: Introduced a global OnceMap to deduplicate download requests and allow multiple parts of the system to wait for and reuse a single ongoing download. Concurrency limits are now more granular, separating download concurrency from file cloning concurrency.
Enhanced Progress Reporting: The progress bar now displays real-time statistics for packages currently downloading and extracting, providing better visibility into the installation process.
New Benchmarking Script: Added an end-to-end benchmarking script (e2e/pm-bench.sh) to compare utoo's performance against other package managers like pnpm and bun under various caching scenarios (cold, warm, hot installs).

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/pm-bench.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant performance improvement by implementing a preloading pipeline for package tarballs. The new architecture leverages concurrent manifest resolution and tarball downloading, using a OnceMap for deduplication and libdeflater for faster gzip decompression. The changes are well-structured, separating pipeline logic into a new module and integrating it effectively with existing install processes. The addition of a benchmarking script is a valuable step for validating these performance gains.

gemini-code-assist · 2026-01-27T10:20:25Z

crates/pm/src/util/downloader.rs

+/// Estimate uncompressed size from gzip footer (last 4 bytes store original size mod 2^32)
+fn estimate_uncompressed_size(gzip_data: &[u8]) -> usize {
+    if gzip_data.len() < 4 {
+        return gzip_data.len() * 10; // fallback estimate
+    }
+    let last_4 = &gzip_data[gzip_data.len() - 4..];
+    let size = u32::from_le_bytes([last_4[0], last_4[1], last_4[2], last_4[3]]) as usize;
+    // Sanity check: if size is 0 or too small, use a reasonable estimate
+    if !(16..=512 * 1024 * 1024).contains(&size) {
+        gzip_data.len() * 10
+    } else {
+        size
+    }
+}


The estimate_uncompressed_size function is duplicated in crates/pm/src/service/pipeline.rs. It should be extracted into a shared utility function to avoid code duplication and ensure consistent logic. Additionally, the fallback estimate gzip_data.len() * 10 could lead to very large memory allocations if the input gzip_data is large and the actual uncompressed size is small or zero. Consider adding a maximum cap to this estimate or a more robust fallback strategy to prevent excessive memory usage.

gemini-code-assist · 2026-01-27T10:20:25Z

crates/pm/Cargo.toml

+# tokio-tar removed - using sync tar with spawn_blocking for libdeflate batch processing
+tokio-util         = { version = "0.7" }


The comment for tokio-tar removal is clear, but the removal of features = ["io"] from tokio-util is not explicitly mentioned. While it might be implicitly handled by the new approach, it would be good to add a brief note about why these specific features are no longer needed or if they are now covered by other dependencies.

gemini-code-assist · 2026-01-27T10:20:25Z

crates/pm/src/service/pipeline.rs

+
+fn get_download_semaphore() -> &'static Arc<Semaphore> {
+    DOWNLOAD_SEMAPHORE.get_or_init(|| {
+        let limit = get_manifests_concurrency_limit_sync();
+        tracing::debug!("Initializing download semaphore with limit: {}", limit);
+        Arc::new(Semaphore::new(limit))
+    })


The DOWNLOAD_SEMAPHORE is initialized using get_manifests_concurrency_limit_sync(). This is fine, but it's worth noting that get_manifests_concurrency_limit_sync() returns the default value if the config is not yet loaded. Ensure this behavior is intended for the semaphore's initial limit, or consider if get_manifests_concurrency_limit().await is more appropriate here if it's guaranteed to be called after config loading.

gemini-code-assist · 2026-01-27T10:20:25Z

crates/pm/src/util/downloader.rs

+            if let Some(parent) = entry.path.parent() {
+                let parent_path = parent.to_path_buf();
+                if !created_dirs.contains(&parent_path) {
+                    crate::fs::create_dir_all(&parent_path).await.ok();


The crate::fs::create_dir_all(&parent_path).await.ok() call silently ignores any errors during directory creation. While this might be acceptable if the directory is expected to exist, create_dir_all can fail for reasons other than the directory already existing (e.g., permissions, invalid path components). It's safer to handle these errors explicitly, perhaps by logging a warning or returning an error, to prevent potential issues during file writing.

gemini-code-assist · 2026-01-27T10:20:25Z

crates/ruborist/src/resolver/preload.rs


 /// Default concurrency limit for manifest fetching
-pub const DEFAULT_CONCURRENCY: usize = 20;
+pub const DEFAULT_CONCURRENCY: usize = 64;


The DEFAULT_CONCURRENCY for manifest fetching has been increased from 20 to 64. This is a significant change. While it can improve performance, it also increases resource consumption (network requests, CPU for parsing manifests). It would be beneficial to add a comment explaining the rationale behind this specific value, perhaps referencing benchmarks or expected system capabilities.

gemini-code-assist · 2026-01-27T10:20:25Z

e2e/pm-bench.sh

+  esac
+
+  local start=$(date +%s.%N)
+  eval "$cmd" >/dev/null 2>&1 || true


The eval "$cmd" >/dev/null 2>&1 || true command suppresses all output and errors from the package manager commands and forces the script to continue even if the command fails. While this keeps the benchmark output clean, it can make debugging failures very difficult. Consider logging errors to a file or conditionally showing output for failed commands to aid in troubleshooting.

elrrrrrrr added 7 commits January 24, 2026 23:39

chore: pm bench

e45adec

chore: i18n

73ba08e

chore: bun no-store

b51e639

feat: libdeflater

de83df4

feat: preload tgz

0c9109e

fix: ci

28e9bc3

chore: default 64

f34eee0

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

elrrrrrrr added 4 commits January 27, 2026 19:21

feat: linux opt

5c16f98

fix: slow request

03efc1b

feat: rayon write

4f82aaf

chore: add log

38fee0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pm): preload tgz #2549

feat(pm): preload tgz #2549

Uh oh!

elrrrrrrr commented Jan 27, 2026

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

gemini-code-assist bot Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# tokio-tar removed - using sync tar with spawn_blocking for libdeflate batch processing
		tokio-util = { version = "0.7" }

feat(pm): preload tgz #2549

Are you sure you want to change the base?

feat(pm): preload tgz #2549

Uh oh!

Conversation

elrrrrrrr commented Jan 27, 2026

Summary

Test Plan

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants