-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
It seems like BigQuery optimizer is struggling with the date filter on the data sources:
AND ds.submission_date BETWEEN '2024-10-31' AND '2024-12-19'
AND ds.submission_date BETWEEN
DATE_ADD(e.enrollment_date, interval e.analysis_window_start day)
AND DATE_ADD(e.enrollment_date, interval e.analysis_window_end day)
It ends up scanning all partitions between 2024-10-31 and 2024-12-19, even though for daily runs only the second part of the filter is relevant and will filter on a single date. When I changed the filter to a specific date like ds.submission_date = '2024-12-19' then it scanned significantly less data and way less slot usage.
BigQuery optimizer seems to make optimizations based on the size of source data as well, so this issue doesn't seem to consistently affect all experiments. But one that was affected was default-ui-experiment.
We may need to rethink how this filter should work.
┆Issue is synchronized with this Jira Task
Metadata
Metadata
Assignees
Labels
No labels