-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Overview
This issue tracks the implementation of VarOptItemsSketch (Variance Optimal Sampling) for the Go library, following the Java and C++ implementations.
VarOpt provides weighted sampling with optimal variance for subset sum estimation - useful for network traffic monitoring, ad analytics, and log sampling.
Proposed Implementation Plan
We plan to split this into multiple PRs for easier review:
PR 1: Core VarOptItemsSketch (basic functionality)
- Add
VarOptItemsfamily (ID=13) tointernal/family.go - Implement
VarOptItemsSketch[T]struct with fields: k, n, h, m, r, totalWtR, data, weights - Implement warmup mode (n ≤ k): store all items
- Implement estimation mode (n > k): weighted sampling with tau threshold
- Min-heap operations for H region
- Basic unit tests
PR 2: Serialization & Compatibility
- Implement
ToByteArray()(Java-compatible format) - Implement
NewVarOptItemsSketchFromSlice() - Add cross-language compatibility tests
- Generate test data files for validation
PR 3: Subset Sum Estimation
- Implement
EstimateSubsetSum(predicate)returning bounds - Pseudo-hypergeometric confidence intervals
- Tests for estimation accuracy
PR 4: VarOptItemsUnion (optional, can be deferred)
- Implement union with gadget/marks mechanism
- Serialization for union state
References
- Java:
VarOptItemsSketch.java - C++:
var_opt_sketch.hpp - Algorithm paper: Variance Optimal Sampling
Metadata
Metadata
Assignees
Labels
No labels