Skip to content

Conversation

@chaserhkj
Copy link
Contributor

This PR fixes the broken garbage collection implementation provided by cfsctl. To my knowledge this GC implementation is not used anywhere else, but this fixed implementation could be helpful for downstream project like bootc as well, see bootc-dev/bootc#1808

Previously, the GC implementation was completely broken as it didn't add any object IDs to the live set and would thus always mark every object for deletion. The code in gc_category seems to be assuming an old composefs structure where non-first-level entries in streams/ and images/ directory (e.g. streams/refs/some_distro/some_version) could directly link to object stores. But currently these links always link to first-level entries first and first-level entries would then link to objects in store.

This PR fixes this by doing a proper walk to add all objects referred from named references to the live set, such that all unlinked objects would be marked for deletion.

Furthermore, the old implementation was doing a naive shallow walk for the streams, this is problematic for pulled OCI images in composefs repo, since they have two layers of links, a config split stream linking all layers, and each layers linking to their layer contents. This PR adds a full walk algorithm to walk down and prune the entire stream tree to mark unlinked objects for deletion.

Currently this is still dry-run only, but I have changed the output format to add a "#" before all non-delete lines and the output could now just be piped to a shell to perform the deletion.

Note that bootc has its own GC implementations here. But bootc uses bootloader entries as part of the GC root, which I believe should be considered out of composefs scope. Bootc GC also does not prune streams. I think ideally we should have a complete GC implementation in composefs and bootc should just forward the call.

@Johan-Liebert1
Copy link
Collaborator

I think a lot of this will change once #185 lands. We'd probably want to wait until then, I think

@chaserhkj
Copy link
Contributor Author

I briefly skimmed the PR, I think as long as the split stream changes exposes the same get_object_refs interface with sane semantics, changes here will be fine. And the stream walking logic is still good as long as that interface returns objects referenced by the new named references as well.

That being said, it's totally fair to delay this PR ahead of a potentially dependent change as large as that. I could work on a rebase once that is landed.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not adding any tests here, but we definitely want that. Basic ones would be ensuring adding content and removing all the streams also GC's all objects, and two streams with shared objects but removing just one keeps the right objects etc.

@cgwalters
Copy link
Collaborator

Also sorry to be clear the above review was using the newly landed https://github.com/bootc-dev/agent-skills/blob/main/perform-forge-review/SKILL.md flow...and it somehow seems to have lost my header comment? I'll look at that.

@chaserhkj
Copy link
Contributor Author

I rebased the branch onto main after the splitstream changes. It seems that in the new format manifests referring to other streams in a separate table stream_refs. This slightly changes how things should be handled in my gc implementation. I'll work that changes in first then address all the review comments and add unit tests. I also have a few more improvements on the gc on my other branch that I feel should fall in the same scope so I might also pull these in.

@chaserhkj chaserhkj force-pushed the gc-fix branch 9 times, most recently from 87a7a1a to 5d61fcf Compare January 13, 2026 18:51
@chaserhkj
Copy link
Contributor Author

Done, I have included some improvements to the GC code as well. Now GC process can take caller-specified GC roots, and remove broken links in the repo. cfsctl interfaces are added and now --force actually performs removal. Full testsuite is also included that tests behavior against streams, images and streams using named references. I feel code at this stage should be robust enough.

PTAL @cgwalters

@chaserhkj chaserhkj requested a review from cgwalters January 13, 2026 18:57
@chaserhkj
Copy link
Contributor Author

chaserhkj commented Jan 13, 2026

Hold on, I found another bug with actual bootc repository testing.
I assumed the file names in streams/ will just be the same as the named reference table name field, but apparently they are not. So the stream name map might still be needed

@chaserhkj
Copy link
Contributor Author

Should be good now, I also included a regression test for the bug mentioned above

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!! Sorry about the delay. I only have a relatively superficial review so far

GC {
// digest of root images for gc operations
#[clap(long, short = 'i')]
root_images: Vec<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait why would these be provided externally? Are they additional to the images present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are added to the list of GC roots that would be used for GC analysis. The existing internal references (in streams/refs/ and in images/refs) will always be considered and added to the GC root list.

The main reasoning here to allow additional external roots is mainly because we don't have a concrete standard to partition */refs directory into different namespaces for different users of composefs. Even bootc does not call composefs in a way that would create these references. So it is better to just assume the */refs references are not enough and it is up to the user to supplement all sensible roots. This concrete standard/scheme is particularly more relevant in pursuing the unified containers storage approach and we probably need to come to it later.

Besides it's really handy to be able to specify the additional roots when doing tests, management and diagnosis.

}

fn walk_symlinkdir(fd: OwnedFd, objects: &mut HashSet<ObjectID>) -> Result<()> {
fn walk_symlinkdir(fd: OwnedFd, entry_digests: &mut HashSet<CString>) -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prefer Rust-native representations in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now uses OsString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we want String here or OsString. Since the hash set stores digest strings, they should always be UTF8 and not really giving any UTF8 parsing errors. But the UTF8 conversion feels redundant anyways. Let me know your preferences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Unix platforms there's not a "conversion" to UTF-8, only a validation. (Yes, Windows is messier w/UTF-16 but we are not targeting that)

Now when we start to get into representation precision: any reason not to require a digest type here?

@chaserhkj
Copy link
Contributor Author

Addressed most of the comments, PTAL.

@chaserhkj chaserhkj requested a review from cgwalters January 24, 2026 01:54
@cgwalters
Copy link
Collaborator

cgwalters commented Jan 27, 2026

OK I was looking at this more, and the thing that strongly relates to this (that to be clear is a pre-existing problem) is how we've ended up having this project trying to be too OCI independent in my opinion.

This relates to the topic of garbage collection in that it's really important to have a strong data model - what references what and how.

So on this topic I put up #216 as draft which has a high level goal of just starting by ensuring we store the manifest too (and have a schema for tags that name OCI images specifically including transports).

But the big next step we could take is that when this project is compiled with oci support (and why wouldn't one do that?) the core GC model just directly parses manifest JSON and not splitstream - we would in fact cease to store splitstream for manifests and configs, only for tar layers.

Anyways...just something I'm thinking about while looking at this, they are still distinct threads.

chaserhkj and others added 2 commits January 27, 2026 14:28
(Commit message from Colin Walters, Assisted-by: Opus 4.5)

The previous GC implementation was completely broken - it never added
any object IDs to the live set and would mark every object for deletion.
The code assumed an old composefs structure where non-first-level entries
could directly link to object stores.

This commit provides a working GC implementation that:

- Properly walks named references in `*/refs/` directories to find GC roots
- Recursively traverses splitstream named references (stream_refs) to find
  all transitively reachable objects, handling the two-layer OCI structure
  where config splitstreams reference layer splitstreams
- Supports caller-specified additional GC roots via `--root-images` and
  `--root-streams` flags, useful for external integrations like bootc
- Actually performs deletions (previously was dry-run only)
- Cleans up broken symlinks in images/ and streams/ after GC
- Uses `--dry-run` flag (conventional) instead of `--force` (inverted)

Includes comprehensive test coverage for:
- Stream and image GC with various root configurations
- Shared objects between multiple streams/images
- Named reference traversal with different table vs repo names

Signed-off-by: Chaser Huang <huangkangjing@gmail.com>
Signed-off-by: Colin Walters <walters@verbum.org>
Refactor the GC API:

- gc(additional_roots) performs GC, returns GcResult with statistics
- gc_dry_run(additional_roots) previews without deleting

Key difference: gc_dry_run() only acquires a shared lock since it
doesn't modify anything, allowing concurrent reads during preview.

Additional roots are looked up in both images and streams by name,
useful for external integrations (like bootc) that track roots
outside the repository's refs.

GcResult includes:
- objects_removed: count of unreferenced objects deleted
- objects_bytes: total bytes freed
- images_pruned/streams_pruned: broken symlinks cleaned up

Tests verify GcResult values match expected counts.

Assisted-by: Claude (Opus)
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters
Copy link
Collaborator

Hi @chaserhkj I took the liberty of making the "force => dry-run" change per above, and also:

  • squashed the commits
  • rebased on git main
  • Fixed a test failure that fell out of that rebase
  • Fleshed out the commit message

Then I added a commit on top which further tweaks the API. WDYT?

Copy link
Collaborator

@Johan-Liebert1 Johan-Liebert1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested some basic cases locally. LGTM

@cgwalters cgwalters merged commit 17289c6 into containers:main Jan 28, 2026
14 of 15 checks passed
@chaserhkj
Copy link
Contributor Author

Sorry was working on another project in the last few days. Thanks for the changes and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants