-
Notifications
You must be signed in to change notification settings - Fork 65
Description
In #1523 we added the explicit IDMask enum as the return type of AttributeFilterExec::execute. There are a few places where we can use the values of this to eliminate some work.
-
In
FilterExec::execute, ifAttreibuteFilterExec::executereturns either theNoneorAllvariants of theIDMask, we don't need to compute the selection_vector here from the ID column. We can skip over all this:
otel-arrow/rust/experimental/query_engine/engine-columnar/src/pipeline/filter.rs
Lines 598 to 653 in 02c327d
let id_col = match get_id_col_from_parent(root_rb, attrs_filter.payload_type())? { Some(id_col) => id_col, None => { // None of the records have any attributes return Ok(BooleanArray::new( if self.missing_attrs_pass { BooleanBuffer::new_set(root_rb.num_rows()) } else { BooleanBuffer::new_unset(root_rb.num_rows()) }, None, )); } }; let id_mask = attrs_filter.execute(otap_batch, session_ctx, false)?; let mut attrs_selection_vec_builder = BooleanBufferBuilder::new(root_rb.num_rows()); // we append to the selection vector in contiguous segments rather than doing it 1-by-1 // for each value, as this is a faster way to build up the BooleanBuffer let mut segment_validity = false; let mut segment_len = 0usize; for index in 0..id_col.len() { let row_validity = if id_col.is_valid(index) { id_mask.contains(id_col.value(index) as u32) } else { // attribute does not exist self.missing_attrs_pass }; if segment_validity != row_validity { if segment_len > 0 { attrs_selection_vec_builder.append_n(segment_len, segment_validity); } segment_validity = row_validity; segment_len = 0; } segment_len += 1; } // append the last segment if segment_len > 0 { attrs_selection_vec_builder.append_n(segment_len, segment_validity); } let attr_selection_vec = BooleanArray::new(attrs_selection_vec_builder.finish(), None); selection_vec = Some(match selection_vec { // update the result selection_vec to be the intersection of what's already filtered // and the attributes filters Some(selection_vec) => and(&selection_vec, &attr_selection_vec)?, // no predicate was applied to root batch, so we are just filtering by attributes None => attr_selection_vec, }); -
In Columnar query engine optimization for attribute filtering #1514 we added the optimizer that would turn
Composite<FilterExec>intoComposite<AttributeFilterExec>where it made sense from a performance perspective. It didn't make sense to do this forComposite::<FilterExec>::Notwhen invertingComposite<AttributeFilterExec>, we needed to do a double scan over the parent ID column to compute the inverted ID mask, so nothing was gained. However, now that it's cheap to createIdMask::NotSome, this optimization might make more sense.- note that when we do this, we need to ensure we don't break the checks
FilterExecuses to figure out if missing attributes pass the predicate, which was added in Columnar query engine filter handle== nullfilters #1538
- note that when we do this, we need to ensure we don't break the checks
Metadata
Metadata
Assignees
Labels
Type
Projects
Status