-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Optimize NOT IN and != predicates for single-value partition manifests #15064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return null; | ||
| } | ||
|
|
||
| if (fieldStats.containsNaN() != null && fieldStats.containsNaN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containsNaN() is nullable, so null likely means “not recorded/unknown”. Should we treat null as unknown and skip the single-value pruning in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for your your review Huaxin.
You're right, I missed handling the null case. I've updated the code to treat containsNaN() == null as unknown and skip pruning. Also added a test case to verify this behavior.
| return null; | ||
| } | ||
|
|
||
| if (fieldStats.containsNaN() == null || fieldStats.containsNaN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just combine the above two cases (ifs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review Manu.
You are right, I have combined them into a single condition.
Summary
This PR adds manifest pruning optimization for
NOT INand!=predicates when a manifest contains a single distinct partition value (i.e., whenlower == upper).Problem
Currently, ManifestEvaluator cannot prune manifests for
NOT INand!=predicates, even when the manifest provably contains no matching partitions.Solution
When
lower == upperand the manifest has no nulls or NaNs, we can safely prune if:NOT IN: the single partition value is in the exclusion list!=: the single partition value equals the literalThis mirrors the optimization added in #14593 for InclusiveMetricsEvaluator, but applied at the manifest level for partition pruning.
Testing
notInandnotEqoptimizationsFixes #15063