-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Describe the bug
Error:
Parquet error: Invalid offset in sparse column chunk data: 754, no matching page found.
If you are using a `SelectionStrategyPolicy::Mask`, ensure that the OffsetIndex is provided when creating the InMemoryRowGroup.
Occurs when:
- A predicate uses
RowSelectionStrategy::Selectorswith aRowSelectorlist that skips an entire page. - Another predicate uses
RowSelectionStrategy::Maskby triggering the mask run-length threshold - The column with
RowSelectionStrategy::Maskis not in the output projection, soshould_force_selectorsdoes not force it to useRowSelectionStrategy::Selectors - The mask strategy attempts to fetch pages that were skipped, resulting in an error
To Reproduce
A minimal reproducer is available at: https://github.com/erratic-pattern/parquet_mask_strategy_missing_pages
git clone https://github.com/erratic-pattern/parquet_mask_strategy_missing_pages
cd parquet_mask_strategy_missing_pages
cargo testThe test uses a parquet file with:
- 2 row groups, 300 rows each
- Tag column with values 'a', 'b', 'c' sorted (100 rows each)
- Time column with alternating in-range/out-of-range values
- Page size set so tag='b' section contains at least one full page
The test simulates a query like SELECT tag WHERE tag IN ('a', 'c') AND time >= X AND time < Y with three predicates:
tag IN ('a', 'c')- creates initial selection[select 100, skip 100, select 100]time >= X- creates sparse selection, pages fetched as Sparsetime < Y- triggers Mask strategy due to sparse selection from predicate 2
Additional context
- Introduced in parquet 57.1.0 via [Parquet] Adaptive Parquet Predicate Pushdown #8733
- Related to [Parquet] Support skipping pages with mask based evaluation #8845