Skip to content

Conversation

@Fengzdadi
Copy link
Contributor

@Fengzdadi Fengzdadi commented Jan 15, 2026

Summary

This PR implements the core VarOptItemsSketch[T] for variance-optimal weighted sampling, following the C++ and Java implementations exactly. ref #98

Changes

internal/family.go

  • Added VarOptItems family (ID=13, MaxPreLongs=4)

sampling/varopt_items_sketch.go (new file, ~520 lines)

  • VarOptItemsSketch[T] struct with fields: k, n, h, m, r, totalWeightR, data, weights
  • Update algorithm matching C++/Java:
    • Warmup mode (h ≤ k): stores all items directly
    • Transition at h > k via transitionFromWarmup()
    • growCandidateSet() / downsampleCandidateSet() for variance-optimal sampling
    • chooseDeleteSlot() / chooseWeightedDeleteSlot() for weighted random selection
  • Min-heap operations: heapify(), siftUp(), siftDown()
  • Go 1.23 iterator: Samples() iter.Seq2[T, float64] for elegant range iteration

sampling/varopt_items_sketch_test.go (new file, ~218 lines)

8 unit tests matching C++ test coverage:

  • TestVarOptItemsSketch_NewSketch - constructor and k validation
  • TestVarOptItemsSketch_WarmupPhase - exact mode behavior
  • TestVarOptItemsSketch_TransitionToEstimation - h > k transition
  • TestVarOptItemsSketch_EstimationMode - sampling mode invariants
  • TestVarOptItemsSketch_InvalidWeight - negative/zero weight handling
  • TestVarOptItemsSketch_Reset - reset state verification
  • TestVarOptItemsSketch_UniformWeights - reservoir-like behavior
  • TestVarOptItemsSketch_CumulativeWeight - weight sum preservation (matches C++ test with exp(5*N(0,1)) distribution and 1e-13 tolerance)

API Usage

sketch, _ := NewVarOptItemsSketch[string](256)

// Add weighted items
sketch.Update("item1", 1.5)
sketch.Update("item2", 3.0)

// Iterate samples
for item, weight := range sketch.Samples() {
    fmt.Printf("Item: %s, Weight: %f\n", item, weight)
}

Note

The current implementation includes detailed comments for clarity. If the reviewers prefer, we can simplify the comments.

- Add VarOptItems family (ID=13) to internal/family.go
- Implement VarOptItemsSketch[T] with full C++/Java-compatible algorithm:
  - Warmup mode with h > k transition
  - growCandidateSet/downsampleCandidateSet for variance-optimal sampling
  - chooseDeleteSlot with weighted random selection
  - Min-heap operations (heapify, siftUp, siftDown)
- Add Go 1.23 iter.Seq2-based Samples() iterator
- Add 8 unit tests matching C++ test coverage:
  - NewSketch, WarmupPhase, TransitionToEstimation, EstimationMode
  - InvalidWeight, Reset, UniformWeights, CumulativeWeight

Implements PR1 of apache#98
}

// NewVarOptItemsSketchWithResizeFactor creates a new VarOpt sketch with specified k and resize factor.
func NewVarOptItemsSketchWithResizeFactor[T any](k int, rf ResizeFactor) (*VarOptItemsSketch[T], error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you consider functional options pattern? so you can integrate with "NewVarOptItemsSketch"

// for item, weight := range sketch.Samples() {
// // process item and weight
// }
func (s *VarOptItemsSketch[T]) Samples() iter.Seq2[T, float64] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returing struct like

type Sample[T] struct {
    data T
    Weight float64
}

is more clear API.

NOTE: convention is using All. ref: https://pkg.go.dev/iter#hdr-Naming_Conventions

// chooseDeleteSlot randomly selects which item to delete from candidates.
func (s *VarOptItemsSketch[T]) chooseDeleteSlot(wtCands float64, numCands int) int {
if s.r == 0 {
panic("choosing delete slot while in exact mode")
Copy link
Member

@proost proost Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not panic. can you returning error?

- Use functional options pattern for constructor
- Return Sample[T] struct from All() iterator
- Return error instead of panic in chooseDeleteSlot
@Fengzdadi
Copy link
Contributor Author

Thanks for the great feedback @proost! I've addressed all three points. I learned a lot about Go design philosophy from your review - still have much to learn about idiomatic Go patterns :)

@Fengzdadi Fengzdadi force-pushed the feat-varopt-items-sketch branch from 1d9436d to fe39ed5 Compare January 17, 2026 22:34
Copy link
Member

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@proost proost merged commit 37838ad into apache:main Jan 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants