Skip to content

Conversation

@bplatz
Copy link
Contributor

@bplatz bplatz commented Oct 15, 2025

The way we indexed, the persistent index tree could become unbalanced (varying heights). In larger ledgers with specific insert patterns that can affect performance adversely.

This fix keeps the index tree at constant height. I also added a :next-id to the leaf which will allow range scans in the future (once support is implemented) to skip directly to the next node which can also improve performance but TBD on how much.

Summary

  • Leaf rebalancing now performs byte-based multi-splits: a single oversized leaf can split into N leaves, targeting ≈ half the overflow threshold per leaf.
  • Branch rebalancing uses median-by-count splits to maintain even fanout.
  • Forward leaf link :next-id added (optional) and resolved at write-time in a single pass.
  • Traversal remains index-only; novelty overlay stays post-traversal and authorized.
  • Tests added for split invariants, cascading behavior, and backward compatibility.

Motivation

Heavy inserts localized to one key range could repeatedly split the same path, increasing depth on the “hot” side. Prior leaf splitting didn’t always produce balanced results when a single leaf was significantly over target. We also lacked a forward link to optimize sequential scans across leaves. This change ensures:

  • Branches remain balanced with median child splits.
  • Range scans can optionally hop via :next-id.

Key Changes

  • Leaf splits (byte-based, N-way)
    • New split planner computes target number of leaves from total bytes and a per-leaf byte budget (≈ overflow/2).
    • Single pass partitions flakes into multiple leaves; sets :rhs on each left sibling to the next sibling’s first flake; final sibling inherits original :rhs.
    • Emits leaves in reverse order (right-to-left) so right siblings are written first; this enables resolving each left sibling’s :next-id to its right sibling’s final id without extra writes.
  • Branch splits (median-by-count)
    • When a branch overflows, children are split at the median child; reconstruct fence keys (:first, :rhs), and ensure only the first sibling is :leftmost?.
  • Cascading and global height
    • Parent branches are updated on child splits; if parent overflows, split and promote; root overflow creates a new root. Height increases only at the root, keeping leaf depth uniform globally.
  • Forward leaf link
    • Optional :next-id added to leaf node schema (backward compatible).
    • On split, left sibling sets :next-tempid to the right sibling’s temp id. Before persistence, :next-tempid is resolved to the right sibling’s final id using the existing updated-ids mapping. No second write required.

Behavior and Invariants

  • Leaf multi-split maintains order, preserves all flakes, sets correct fence keys: for each adjacent pair, left.rhs = right.first.
  • Only the first sibling retains :leftmost?; others are false.
  • :next-id is forward-only and optional; readers fall back to traversal if absent.
  • Traversal/auth unchanged: novelty overlay is post-traversal and runs through authorization and paging (as previously fixed).

Backward Compatibility

  • On-disk format unchanged except an optional :next-id on leaves. Old nodes remain valid; readers ignore missing :next-id.
  • No full reindex required; subtrees converge to balanced shape as they are updated.

@bplatz bplatz requested a review from a team October 15, 2025 15:10
right-children (subvec (vec child-nodes) median-i)
left-branch (reconstruct-branch branch t left-children)
right-branch (reconstruct-branch branch t right-children)]
(update-sibling-leftmost [left-branch right-branch])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would only cut a branch in half. A reindexing that incorporates a large amount of novelty could theoretically cause a branch to have more than double the target count of children, which would mean that the new branches resulting from this process would still be too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional here, as the scenario that would apply is getting close to 100MB of novelty, likely beyond a reasonable amount. In the scenario this did happen, the second split would just happen at next index and a larger branch is pretty benign.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of only splitting by two instead of the way it was before?

right-children (subvec (vec child-nodes) median-i)
left-branch (reconstruct-branch branch t left-children)
right-branch (reconstruct-branch branch t right-children)]
(update-sibling-leftmost [left-branch right-branch])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of only splitting by two instead of the way it was before?

(let [size (flake/size-flake f)]
[(conj tuples [f size])
(+ sum size)]))
[[] 0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer reduce over loop myself, but I think loop will be faster in this particular case because of all the extra vector allocations for the intermediate steps the reduce version has to do.

- :target-leaves - number of leaves to create
- :bytes-per-leaf - target bytes per leaf"
[flakes]
(let [flake-vec (vec flakes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? It looks like flake-vec is only used in the reduce call below. I think flakes is a clojure.data.avl sorted set, which also implements IReduce.

Resolved conflict in novelty.cljc by keeping the balanced splitting
algorithm which already handles the edge case fixed in main (ensuring
each leaf has at least one flake via seq check before splitting).
@zonotope
Copy link
Contributor

* Cascading and global height
  
  * Parent branches are updated on child splits; if parent overflows, split and promote; root overflow creates a new root. Height increases only at the root, keeping leaf depth uniform globally.

What part of the diff from this pull request changes the previous behavior related to these bullet points?

@bplatz bplatz marked this pull request as draft October 24, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants