Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Jan 27, 2026

Which issue does this PR close?

None

Rationale for this change

Making take kernel faster for building native.

You can see that the compiler vectorize and unroll the loop in GodBolt when using unchecked

What changes are included in this PR?

use get_unchecked in take_native

Are these changes tested?

Existing tests

Are there any user-facing changes?

Undefined behavior if out of range

See:

@rluvaton
Copy link
Member Author

run benchmark take

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, thanks for the request (#9277 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, json-reader, metadata, row_format, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: take.

@rluvaton
Copy link
Member Author

run benchmark take_kernels

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 27, 2026
@rluvaton
Copy link
Member Author

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

let index = index.as_usize();
// Safety: we either checked already bounds (passed check_bounds = true) or the user
// guarantees the value to be in range.
// Avoiding bound checks allows the compiler to vectorize it and do better loop unrolling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just do this, as we get the indices from the user / other safe ckdd (and we don't check bounds by default).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can take the conversation about that there?

@Dandandan
Copy link
Contributor

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

We can't just make it unsafe as it will make safe code UB.

See also #8879

@rluvaton
Copy link
Member Author

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).

Job User Benchmarks Comment
20013_3802121366.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3802121366
20013_3802419751.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3802419751
20013_3803579216.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3803579216
18392_3806155369.sh comphead tpcds https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369
arrow-9277-3806313724.sh rluvaton take_kernels https://github.com/apache/arrow-rs/pull/9277#issuecomment-3806313724

@rluvaton
Copy link
Member Author

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

We can't just make it unsafe as it will make safe code UB.

See also #8879

I will add validation before and see what is the performance impact

@rluvaton
Copy link
Member Author

Show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).

Job User Benchmarks Comment
20013_3802121366.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3802121366
20013_3802419751.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3802419751
20013_3803579216.sh Dandandan default https://github.com/apache/datafusion/pull/20013#issuecomment-3803579216
18392_3806155369.sh comphead tpcds https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369
arrow-9277-3806313724.sh rluvaton take_kernels https://github.com/apache/arrow-rs/pull/9277#issuecomment-3806313724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants