Balanced index splits (byte-based multi-split), forward leaf link #1140

bplatz · 2025-10-15T15:10:18Z

The way we indexed, the persistent index tree could become unbalanced (varying heights). In larger ledgers with specific insert patterns that can affect performance adversely.

This fix keeps the index tree at constant height. I also added a :next-id to the leaf which will allow range scans in the future (once support is implemented) to skip directly to the next node which can also improve performance but TBD on how much.

Summary

Leaf rebalancing now performs byte-based multi-splits: a single oversized leaf can split into N leaves, targeting ≈ half the overflow threshold per leaf.
Branch rebalancing uses median-by-count splits to maintain even fanout.
Forward leaf link :next-id added (optional) and resolved at write-time in a single pass.
Traversal remains index-only; novelty overlay stays post-traversal and authorized.
Tests added for split invariants, cascading behavior, and backward compatibility.

Motivation

Heavy inserts localized to one key range could repeatedly split the same path, increasing depth on the “hot” side. Prior leaf splitting didn’t always produce balanced results when a single leaf was significantly over target. We also lacked a forward link to optimize sequential scans across leaves. This change ensures:

Branches remain balanced with median child splits.
Range scans can optionally hop via :next-id.

Key Changes

Leaf splits (byte-based, N-way)
- New split planner computes target number of leaves from total bytes and a per-leaf byte budget (≈ overflow/2).
- Single pass partitions flakes into multiple leaves; sets :rhs on each left sibling to the next sibling’s first flake; final sibling inherits original :rhs.
- Emits leaves in reverse order (right-to-left) so right siblings are written first; this enables resolving each left sibling’s :next-id to its right sibling’s final id without extra writes.
Branch splits (median-by-count)
- When a branch overflows, children are split at the median child; reconstruct fence keys (:first, :rhs), and ensure only the first sibling is :leftmost?.
Cascading and global height
- Parent branches are updated on child splits; if parent overflows, split and promote; root overflow creates a new root. Height increases only at the root, keeping leaf depth uniform globally.
Forward leaf link
- Optional :next-id added to leaf node schema (backward compatible).
- On split, left sibling sets :next-tempid to the right sibling’s temp id. Before persistence, :next-tempid is resolved to the right sibling’s final id using the existing updated-ids mapping. No second write required.

Behavior and Invariants

Leaf multi-split maintains order, preserves all flakes, sets correct fence keys: for each adjacent pair, left.rhs = right.first.
Only the first sibling retains :leftmost?; others are false.
:next-id is forward-only and optional; readers fall back to traversal if absent.
Traversal/auth unchanged: novelty overlay is post-traversal and runs through authorization and paging (as previously fixed).

Backward Compatibility

On-disk format unchanged except an optional :next-id on leaves. Old nodes remain valid; readers ignore missing :next-id.
No full reindex required; subtrees converge to balanced shape as they are updated.

zonotope · 2025-10-16T03:34:10Z

src/fluree/db/flake/index/novelty.cljc

+        right-children (subvec (vec child-nodes) median-i)
+        left-branch    (reconstruct-branch branch t left-children)
+        right-branch   (reconstruct-branch branch t right-children)]
+    (update-sibling-leftmost [left-branch right-branch])))


This change would only cut a branch in half. A reindexing that incorporates a large amount of novelty could theoretically cause a branch to have more than double the target count of children, which would mean that the new branches resulting from this process would still be too big.

This was intentional here, as the scenario that would apply is getting close to 100MB of novelty, likely beyond a reasonable amount. In the scenario this did happen, the second split would just happen at next index and a larger branch is pretty benign.

What's the advantage of only splitting by two instead of the way it was before?

zonotope · 2025-10-21T06:55:43Z

src/fluree/db/flake/index/novelty.cljc

+        right-children (subvec (vec child-nodes) median-i)
+        left-branch    (reconstruct-branch branch t left-children)
+        right-branch   (reconstruct-branch branch t right-children)]
+    (update-sibling-leftmost [left-branch right-branch])))


What's the advantage of only splitting by two instead of the way it was before?

zonotope · 2025-10-21T07:00:38Z

src/fluree/db/flake/index/novelty.cljc

+                  (let [size (flake/size-flake f)]
+                    [(conj tuples [f size])
+                     (+ sum size)]))
+                [[] 0]


I tend to prefer reduce over loop myself, but I think loop will be faster in this particular case because of all the extra vector allocations for the intermediate steps the reduce version has to do.

zonotope · 2025-10-21T07:18:02Z

src/fluree/db/flake/index/novelty.cljc

+  - :target-leaves - number of leaves to create
+  - :bytes-per-leaf - target bytes per leaf"
+  [flakes]
+  (let [flake-vec             (vec flakes)


Why is this necessary? It looks like flake-vec is only used in the reduce call below. I think flakes is a clojure.data.avl sorted set, which also implements IReduce.

Resolved conflict in novelty.cljc by keeping the balanced splitting algorithm which already handles the edge case fixed in main (ensuring each leaf has at least one flake via seq check before splitting).

zonotope · 2025-10-21T20:50:52Z

* Cascading and global height
  
  * Parent branches are updated on child splits; if parent overflows, split and promote; root overflow creates a new root. Height increases only at the root, keeping leaf depth uniform globally.

What part of the diff from this pull request changes the previous behavior related to these bullet points?

bplatz added 2 commits October 15, 2025 07:34

base balanced split

ad1aeea

byte-based split, multiple splits

eb6d79f

bplatz requested a review from a team October 15, 2025 15:10

zonotope reviewed Oct 16, 2025

View reviewed changes

zonotope reviewed Oct 21, 2025

View reviewed changes

Merge main into fix/balanced-persist-idx

280d922

Resolved conflict in novelty.cljc by keeping the balanced splitting algorithm which already handles the edge case fixed in main (ensuring each leaf has at least one flake via seq check before splitting).

bplatz marked this pull request as draft October 24, 2025 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Balanced index splits (byte-based multi-split), forward leaf link #1140

Balanced index splits (byte-based multi-split), forward leaf link #1140

Uh oh!

bplatz commented Oct 15, 2025 •

edited

Loading

Uh oh!

zonotope Oct 16, 2025

Uh oh!

bplatz Oct 16, 2025

Uh oh!

zonotope Oct 21, 2025

Uh oh!

zonotope Oct 21, 2025

Uh oh!

zonotope Oct 21, 2025

Uh oh!

zonotope Oct 21, 2025

Uh oh!

zonotope commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Balanced index splits (byte-based multi-split), forward leaf link #1140

Are you sure you want to change the base?

Balanced index splits (byte-based multi-split), forward leaf link #1140

Uh oh!

Conversation

bplatz commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key Changes

Behavior and Invariants

Backward Compatibility

Uh oh!

zonotope Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

bplatz Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

zonotope Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

zonotope Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

zonotope Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

zonotope Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

zonotope commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bplatz commented Oct 15, 2025 •

edited

Loading