feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622

volokluev · 2026-01-08T01:10:46Z

This is entirely vibecoded, don't review yet

Problem

The EndpointGetTraces endpoint had a performance bottleneck when retrieving metadata for large numbers of trace IDs. The current approach uses an IN clause with potentially thousands of trace IDs, which can hit ClickHouse query size
limits and cause failures.

Solution

Implemented a subquery optimization that eliminates large IN clauses by using SQL subqueries instead of passing trace ID lists back and forth.

Key Changes:

Added runtime configuration use_subquery_optimization_for_traces to enable/disable the optimization
Implemented _get_metadata_for_traces_with_subquery() method that uses placeholder substitution
Added helper functions for SQL extraction and direct execution
Supports both single-item and cross-item queries

Technical Approach:

Generate trace ID subquery SQL using dry run mode
Generate metadata query with UUID placeholder instead of actual trace IDs
Replace placeholder pattern with subquery SQL: replaceAll(toString(trace_id), '-', '') IN (SELECT ...)
Execute combined query directly and map result columns appropriately

Performance Impact

✅ Eliminates query size limits: Handles unlimited trace ID counts
✅ Reduces query complexity: Single optimized query vs multiple round trips
✅ Backward compatible: Falls back to original approach when disabled
✅ Runtime configurable: Can be enabled per environment

Testing

✅ All 14 existing trace tests pass
✅ New test validates optimization for cross-event queries
✅ Verified all tests pass with optimization enabled globally

Configuration

# Enable optimization (disabled by default)
set_config("use_subquery_optimization_for_traces", True)

sentry · 2026-01-08T01:15:16Z

snuba/web/rpc/v1/endpoint_get_traces.py

+    """
+    filter_expressions = []
+    for trace_filter in trace_filters:
+        assert hasattr(trace_filter, "fitler")


Bug: A typo in an assertion, hasattr(trace_filter, "fitler") instead of "filter", will cause cross-item queries to crash when a specific feature flag is enabled.
_{Severity: CRITICAL | Confidence: High}

🔍 Detailed Analysis

In the _build_cross_item_query function, an assertion checks for a misspelled attribute fitler on the trace_filter object. However, the correct attribute name is filter. This typo will cause hasattr(trace_filter, "fitler") to return False, triggering an AssertionError. This will cause any cross-item query to fail when the use_subquery_optimization_for_traces feature flag is enabled, as this code path is executed under that condition.

💡 Suggested Fix

Correct the typo in the assert statement on line 492 from fitler to filter. The line should be assert hasattr(trace_filter, "filter").

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/web/rpc/v1/endpoint_get_traces.py#L492 Potential issue: In the `_build_cross_item_query` function, an assertion checks for a misspelled attribute `fitler` on the `trace_filter` object. However, the correct attribute name is `filter`. This typo will cause `hasattr(trace_filter, "fitler")` to return `False`, triggering an `AssertionError`. This will cause any cross-item query to fail when the `use_subquery_optimization_for_traces` feature flag is enabled, as this code path is executed under that condition.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 8313976}

sentry · 2026-01-08T01:15:16Z

snuba/web/rpc/v1/endpoint_get_traces.py

+        placeholder_pattern = "in(trace_id, ['00000000-0000-0000-0000-000000000000'])"
+
+        # Use IN clause with subquery instead of placeholder
+        # Both queries use hex format for trace_id comparison
+        # Replace the placeholder IN clause with the subquery directly
+        in_subquery_clause = f"replaceAll(toString(trace_id), '-', '') IN ({subquery_sql})"
+        final_sql = metadata_sql.replace(placeholder_pattern, in_subquery_clause)


Bug: SQL injection via fragile string replacement can fail silently, returning empty results. The function also returns incorrect metadata from a dry run, not the actual query.
_{Severity: CRITICAL | Confidence: High}

🔍 Detailed Analysis

When subquery optimization is enabled, the code constructs a final SQL query by replacing a hardcoded placeholder string. This replacement is fragile and can fail silently if the SQL formatter's output changes, causing the query to execute with a placeholder UUID and return no results. Additionally, the function returns metadata and statistics from a dry_run query, not the actual executed query. This results in incorrect debug information and metrics, as the reported SQL and stats will not match what was actually run against the database.

💡 Suggested Fix

Instead of string replacement, use a more robust method to combine the subquery and the main query, such as modifying the query's abstract syntax tree (AST) before formatting. To fix the metadata issue, capture the results and metadata from the final _execute_direct_sql call and use that to construct the QueryResult that is returned.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/web/rpc/v1/endpoint_get_traces.py#L1055-L1061 Potential issue: When subquery optimization is enabled, the code constructs a final SQL query by replacing a hardcoded placeholder string. This replacement is fragile and can fail silently if the SQL formatter's output changes, causing the query to execute with a placeholder UUID and return no results. Additionally, the function returns metadata and statistics from a `dry_run` query, not the actual executed query. This results in incorrect debug information and metrics, as the reported SQL and stats will not match what was actually run against the database.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 8313976}

sentry · 2026-01-08T01:15:16Z

snuba/web/rpc/v1/endpoint_get_traces.py

+                trace_item_filters_expression = trace_item_filters_to_expression(
+                    TraceItemFilter(
+                        and_filter=AndFilter(
+                            filters=[f.filter for f in in_msg.filters],
+                        ),
+                    ),
+                    attribute_key_to_expression,
+                )
+                selected_columns: list[SelectedExpression] = [
+                    SelectedExpression(


Bug: The subquery optimization path does not set the sampling tier, causing it to query the wrong storage table when trace sampling is enabled, leading to incorrect results.
_{Severity: CRITICAL | Confidence: High}

🔍 Detailed Analysis

When both enable_trace_sampling and use_subquery_optimization_for_traces are enabled, the subquery optimization path fails to apply the sampling tier setting. The _get_subquery_sql function generates SQL for the subquery without calling settings.set_sampling_tier(), causing it to default to the non-sampled storage. However, the main query may be routed to a downsampled storage tier. This mismatch of storage tiers between the subquery and the main query can lead to incorrect results, schema mismatches, or query failures.

💡 Suggested Fix

In the _get_subquery_sql function, ensure the sampling_tier from the routing_decision is applied to the HTTPQuerySettings before the dry run query is executed. This will ensure the subquery is generated against the correct storage tier, matching the main query.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/web/rpc/v1/endpoint_get_traces.py#L629-L638 Potential issue: When both `enable_trace_sampling` and `use_subquery_optimization_for_traces` are enabled, the subquery optimization path fails to apply the sampling tier setting. The `_get_subquery_sql` function generates SQL for the subquery without calling `settings.set_sampling_tier()`, causing it to default to the non-sampled storage. However, the main query may be routed to a downsampled storage tier. This mismatch of storage tiers between the subquery and the main query can lead to incorrect results, schema mismatches, or query failures.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 8313976}

it works apparently

5bc99de

volokluev requested review from a team as code owners January 8, 2026 01:10

volokluev marked this pull request as draft January 8, 2026 01:10

sentry bot reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622

feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622

volokluev commented Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622

Are you sure you want to change the base?

feat(traces): Add subquery optimization for GetTraces endpoint [DO NOT MERGE] #7622

Conversation

volokluev commented Jan 8, 2026

Problem

Solution

Performance Impact

Testing

Configuration

Uh oh!

sentry bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants