Skip to content

Conversation

@joyhaldar
Copy link
Contributor

@joyhaldar joyhaldar commented Jan 16, 2026

Summary

This PR adds manifest pruning optimization for NOT IN and != predicates when a manifest contains a single distinct partition value (i.e., when lower == upper).

Problem

Currently, ManifestEvaluator cannot prune manifests for NOT IN and != predicates, even when the manifest provably contains no matching partitions.

Solution

When lower == upper and the manifest has no nulls or NaNs, we can safely prune if:

  • For NOT IN: the single partition value is in the exclusion list
  • For !=: the single partition value equals the literal

This mirrors the optimization added in #14593 for InclusiveMetricsEvaluator, but applied at the manifest level for partition pruning.

Testing

  • Added unit tests for both notIn and notEq optimizations

Fixes #15063

@github-actions github-actions bot added the API label Jan 16, 2026
@joyhaldar joyhaldar marked this pull request as ready for review January 16, 2026 22:48
return null;
}

if (fieldStats.containsNaN() != null && fieldStats.containsNaN()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containsNaN() is nullable, so null likely means “not recorded/unknown”. Should we treat null as unknown and skip the single-value pruning in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for your your review Huaxin.

You're right, I missed handling the null case. I've updated the code to treat containsNaN() == null as unknown and skip pruning. Also added a test case to verify this behavior.

return null;
}

if (fieldStats.containsNaN() == null || fieldStats.containsNaN()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just combine the above two cases (ifs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review Manu.

You are right, I have combined them into a single condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NOT IN and != predicates do not prune manifests when lower == upper

4 participants