-
-
Notifications
You must be signed in to change notification settings - Fork 60
feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| """ | ||
| filter_expressions = [] | ||
| for trace_filter in trace_filters: | ||
| assert hasattr(trace_filter, "fitler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: A typo in an assertion, hasattr(trace_filter, "fitler") instead of "filter", will cause cross-item queries to crash when a specific feature flag is enabled.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
In the _build_cross_item_query function, an assertion checks for a misspelled attribute fitler on the trace_filter object. However, the correct attribute name is filter. This typo will cause hasattr(trace_filter, "fitler") to return False, triggering an AssertionError. This will cause any cross-item query to fail when the use_subquery_optimization_for_traces feature flag is enabled, as this code path is executed under that condition.
💡 Suggested Fix
Correct the typo in the assert statement on line 492 from fitler to filter. The line should be assert hasattr(trace_filter, "filter").
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/web/rpc/v1/endpoint_get_traces.py#L492
Potential issue: In the `_build_cross_item_query` function, an assertion checks for a
misspelled attribute `fitler` on the `trace_filter` object. However, the correct
attribute name is `filter`. This typo will cause `hasattr(trace_filter, "fitler")` to
return `False`, triggering an `AssertionError`. This will cause any cross-item query to
fail when the `use_subquery_optimization_for_traces` feature flag is enabled, as this
code path is executed under that condition.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8313976
| placeholder_pattern = "in(trace_id, ['00000000-0000-0000-0000-000000000000'])" | ||
|
|
||
| # Use IN clause with subquery instead of placeholder | ||
| # Both queries use hex format for trace_id comparison | ||
| # Replace the placeholder IN clause with the subquery directly | ||
| in_subquery_clause = f"replaceAll(toString(trace_id), '-', '') IN ({subquery_sql})" | ||
| final_sql = metadata_sql.replace(placeholder_pattern, in_subquery_clause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: SQL injection via fragile string replacement can fail silently, returning empty results. The function also returns incorrect metadata from a dry run, not the actual query.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
When subquery optimization is enabled, the code constructs a final SQL query by replacing a hardcoded placeholder string. This replacement is fragile and can fail silently if the SQL formatter's output changes, causing the query to execute with a placeholder UUID and return no results. Additionally, the function returns metadata and statistics from a dry_run query, not the actual executed query. This results in incorrect debug information and metrics, as the reported SQL and stats will not match what was actually run against the database.
💡 Suggested Fix
Instead of string replacement, use a more robust method to combine the subquery and the main query, such as modifying the query's abstract syntax tree (AST) before formatting. To fix the metadata issue, capture the results and metadata from the final _execute_direct_sql call and use that to construct the QueryResult that is returned.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/web/rpc/v1/endpoint_get_traces.py#L1055-L1061
Potential issue: When subquery optimization is enabled, the code constructs a final SQL
query by replacing a hardcoded placeholder string. This replacement is fragile and can
fail silently if the SQL formatter's output changes, causing the query to execute with a
placeholder UUID and return no results. Additionally, the function returns metadata and
statistics from a `dry_run` query, not the actual executed query. This results in
incorrect debug information and metrics, as the reported SQL and stats will not match
what was actually run against the database.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8313976
| trace_item_filters_expression = trace_item_filters_to_expression( | ||
| TraceItemFilter( | ||
| and_filter=AndFilter( | ||
| filters=[f.filter for f in in_msg.filters], | ||
| ), | ||
| ), | ||
| attribute_key_to_expression, | ||
| ) | ||
| selected_columns: list[SelectedExpression] = [ | ||
| SelectedExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The subquery optimization path does not set the sampling tier, causing it to query the wrong storage table when trace sampling is enabled, leading to incorrect results.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
When both enable_trace_sampling and use_subquery_optimization_for_traces are enabled, the subquery optimization path fails to apply the sampling tier setting. The _get_subquery_sql function generates SQL for the subquery without calling settings.set_sampling_tier(), causing it to default to the non-sampled storage. However, the main query may be routed to a downsampled storage tier. This mismatch of storage tiers between the subquery and the main query can lead to incorrect results, schema mismatches, or query failures.
💡 Suggested Fix
In the _get_subquery_sql function, ensure the sampling_tier from the routing_decision is applied to the HTTPQuerySettings before the dry run query is executed. This will ensure the subquery is generated against the correct storage tier, matching the main query.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/web/rpc/v1/endpoint_get_traces.py#L629-L638
Potential issue: When both `enable_trace_sampling` and
`use_subquery_optimization_for_traces` are enabled, the subquery optimization path fails
to apply the sampling tier setting. The `_get_subquery_sql` function generates SQL for
the subquery without calling `settings.set_sampling_tier()`, causing it to default to
the non-sampled storage. However, the main query may be routed to a downsampled storage
tier. This mismatch of storage tiers between the subquery and the main query can lead to
incorrect results, schema mismatches, or query failures.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8313976
This is entirely vibecoded, don't review yet
Problem
The
EndpointGetTracesendpoint had a performance bottleneck when retrieving metadata for large numbers of trace IDs. The current approach uses anINclause with potentially thousands of trace IDs, which can hit ClickHouse query sizelimits and cause failures.
Solution
Implemented a subquery optimization that eliminates large
INclauses by using SQL subqueries instead of passing trace ID lists back and forth.Key Changes:
use_subquery_optimization_for_tracesto enable/disable the optimization_get_metadata_for_traces_with_subquery()method that uses placeholder substitutionTechnical Approach:
replaceAll(toString(trace_id), '-', '') IN (SELECT ...)Performance Impact
Testing
Configuration