Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Dec 28, 2025

  • Use a bitset to track eligible documents instead of a slice of N uint64s, reducing memory usage from 8N bytes to N/8 bytes per segment (up to 64× reduction) and improving cache locality.
  • Pass an iterator over eligible documents that iterates the bitset directly, allowing direct translation into a bitset of eligible vector IDs in the storage layer and eliminating the need for a separate slice intermediary.
  • Fix garbage creation in the UnadornedPostingsIterator, which previously allocated a temporary struct per Next() call to wrap a doc number and satisfy the Postings interface; the iterator now returns a single reusable struct (one-time allocation) consistent with the working of the PostingsIterator in the storage-layer.
  • Avoid unnecessary BytesRead statistics computation when executing searches in no-scoring mode, removing redundant work as a micro-optimization.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-architects vector search to improve memory efficiency and reduce garbage collection pressure. The changes replace slice-based eligible document tracking with bitsets, achieving up to 64× memory reduction per segment, and optimize the iterator pattern to eliminate per-call allocations in the unadorned postings iterator.

Key changes:

  • Replaced slice-based eligible document tracking ([]uint64) with bitsets, reducing memory from 8N bytes to N/8 bytes per segment
  • Introduced iterator-based API for eligible documents that directly translates to bitset iteration at the storage layer
  • Fixed garbage creation in UnadornedPostingsIterator by reusing a single struct instance instead of allocating per Next() call
  • Optimized bytes read tracking to skip computation in no-scoring mode

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
index/scorch/snapshot_vector_index.go Introduces bitset-based eligible document storage and iterator API, replacing the previous slice-based approach
index/scorch/unadorned.go Changes UnadornedPosting from uint64 to struct with pointer receivers and adds reusable struct fields to iterators to eliminate per-call allocations
index/scorch/snapshot_index_tfr.go Adds conditional bytes read tracking via updateBytesRead flag to skip computation in no-scoring mode
index/scorch/snapshot_index.go Initializes updateBytesRead flag based on scoring requirements
index/scorch/optimize_knn.go Removes requiresFiltering flag and updates to use new SegmentEligibleDocuments API
index/scorch/optimize.go Sets updateBytesRead to false for unadorned term field readers
index/scorch/snapshot_index_vr.go Updates InterpretVectorIndex call to remove filtering parameter
index_test.go Updates expected bytes read values to reflect the optimization that skips unnecessary computation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants