-
Notifications
You must be signed in to change notification settings - Fork 25
Add per-branch cuckoo filters for cross-branch-safe index GC #1102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bplatz
wants to merge
15
commits into
feature/rebase
Choose a base branch
from
feature/cuckoo-index-check
base: feature/rebase
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
495b4b5 to
e4d3906
Compare
…set functionality
- add cuckoo filter implementation with chain support for 100K+ segments - integrate filters into index refresh and garbage collection processes - use FNV-1a 32-bit hash for cross-platform determinism (CLJ/CLJS) - implement proactive filter growth at 90% capacity threshold - cache other-branch filters during GC to reduce I/O operations - exclude garbage files from filter, only track actual index segments - add comprehensive test suite for filter and chain operations
…e S3 connection options
7b8734d to
f8b0d71
Compare
- Integrated cuckoo filter operations with updated branch metadata structure - Kept cuckoo filter copying on branch creation and deletion on branch delete - Fixed typo: :db/unkown-ledger -> :db/unknown-ledger - Added cuckoo and psot to index file lists in tests - Preserved branch metadata flattening for index optimization
… deletion handling
- Removed outdated cuckoo chain test suite and replaced it with integration tests for garbage collection and round-trip serialization. - Added new tests for CBOR encoding/decoding to ensure data integrity during storage operations. - Updated existing tests to utilize the new filter chain structure, ensuring compatibility with recent changes in the Cuckoo filter implementation. - Enhanced edge case handling and collision detection tests to improve robustness. - Adjusted assertions in the main test suite to reflect changes in filter structure and statistics.
Contributor
This link is broken. |
Contributor
Author
Not really, it will work once the PR is merged and it is on the main branch... but the doc is part of the PR. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This sits on top of this PR: #1095
Which sits on top of this PR: #1096
All three PRs should be thought of as one package, as without this PR we might garbage collect index segments still relied on by other branches.
Summary
Introduces per-branch cuckoo filters to prevent deletion of index nodes still referenced by other branches during garbage collection. Uses 16-bit fingerprints from SHA-256 hashes with a chain design for dynamic growth. Integrates seamlessly with indexing and GC workflows.
Problem
Solution
Cuckoo filter implementation (
fluree.db.indexer.cuckoo):ledger/index/cuckoo/<branch>.cborIntegration:
Performance
Documentation
See docs/cuckoo-filter-gc-strategy.md for detailed implementation notes.