Skip to content

Conversation

@bplatz
Copy link
Contributor

@bplatz bplatz commented Oct 13, 2025

Summary

Implements query pattern optimization based on property and class statistics. Adds explain API to show optimization decisions without executing queries.

Key Features

Query Optimization

  • Reorders WHERE clause patterns based on selectivity scores
  • Lower selectivity = more selective = execute first
  • Respects optimization boundaries (filters, binds, etc.)
  • Statistics-driven: uses property/class counts from index

Explain API

  • New fluree.db.api/explain function
  • Shows original and optimized pattern order
  • Includes selectivity scores for each pattern
  • User-friendly output with decoded IRIs

Selectivity Scoring

  • Specific value lookups: 0 (most selective)
  • ID patterns: 0 (single entity)
  • Property scans: property count from stats
  • Class patterns: class count from stats
  • Full scans: ∞ (least selective)

Implementation

Protocol-based Design

  • Optimizable protocol for FlakeDB, AsyncDB, Dataset
  • Pattern segmentation preserves optimization boundaries
  • Independent segment optimization

Query Integration

  • Optimization runs automatically before query execution
  • No breaking changes to existing query API
  • Federated queries (DataSets) are not optimized

Test Coverage

315 new assertions across 5 integration tests:

  • No-optimization scenarios (equal selectivity)
  • Value lookup optimization (specific value → class)
  • Property count optimization (rare property → common class)
  • Optimization boundaries (filters separate segments)
  • Multiple segment optimization

All tests pass (290 tests, 2142 assertions).

- Implemented `explain` function to return query execution plans.
- Added `optimize-query` function to reorder query patterns based on selectivity.
- Introduced `Optimizable` protocol for query optimization.
- Created integration tests for explain functionality and optimization behavior.
- Added unit tests for pattern recognition and boundary splitting in optimization.
@bplatz bplatz requested a review from a team October 13, 2025 17:31
Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to see tests with more unoptimizable patterns with nested clauses: optional, union, subquery, etc.

@dpetran
Copy link
Contributor

dpetran commented Oct 23, 2025

I was looking at how to reconcile this explain api and the one I put together earlier this summer: #1030

I think they're quite complementary - if you're familiar with Postgres, this work corresponds to the EXPLAIN statement, reporting information about the query plan, while the other PR more closely corresponds with EXPLAIN ANALYZE, where it actually runs the query and reports true flake counts and other execution metrics.

This one doesn't yet have support for nested clauses, and I think we could integrate the two approaches without too much trouble. And I'd be happy to pick this up and finish it, depending on your availability.

Base automatically changed from feature/data-stats to main October 26, 2025 12:03
@bplatz
Copy link
Contributor Author

bplatz commented Oct 30, 2025

I was looking at how to reconcile this explain api and the one I put together earlier this summer: #1030

I think they're quite complementary - if you're familiar with Postgres, this work corresponds to the EXPLAIN statement, reporting information about the query plan, while the other PR more closely corresponds with EXPLAIN ANALYZE, where it actually runs the query and reports true flake counts and other execution metrics.

This one doesn't yet have support for nested clauses, and I think we could integrate the two approaches without too much trouble. And I'd be happy to pick this up and finish it, depending on your availability.

Please do! The main purpose of including this is to see how the query got reordered for an end-user, but I'm sure there is lots more value we can bring. The upstream branch includes detailed statistics on each property to explain the state of the data and why it was reordered, so you should at least use that as the baseline for any future work here.

@bplatz
Copy link
Contributor Author

bplatz commented Oct 30, 2025

Closing because all work was done on upstream branch which is based off this branch.

@bplatz bplatz closed this Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants