Skip to content

Conversation

@wojiaodoubao
Copy link
Contributor

Adding python api to support using fts as a filter for vector search, or using vector_query as a filter for fts search. Related to #4928.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions bot added enhancement New feature or request python labels Dec 17, 2025
@wojiaodoubao
Copy link
Contributor Author

hi @BubbleCal , could you help review this when you have time, thanks very much~

Copy link
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
just need to add some doc strings then feel free to merge

def filter(
self,
filter: Union[
str, pa.compute.Expression, FullTextQuery, VectorSearchQuery, dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add some doc strings for this param, esp for dict case

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this a "search filter" instead of a "query filter"? I think of queries as both "filtered scans" and "searches". This filter both:

  • Only applies to searches (can't use this on a filtered scan)
  • Is a search itself (e.g. a VectorSearchQuery or FullTextQuery)

@wojiaodoubao wojiaodoubao force-pushed the query_filter-python branch 2 times, most recently from 2740551 to 2d0dc31 Compare December 18, 2025 14:26
@wojiaodoubao wojiaodoubao changed the title feat(python): expose query_filter in scanner feat(python): expose search_filter in scanner Dec 19, 2025
@wojiaodoubao
Copy link
Contributor Author

Hi @westonpace , pr has been updated, please review when you have time, thanks very much!

Comment on lines +741 to +747
filter=VectorSearchQuery(
"vector",
np.array([12, 17, 300, 10], dtype=np.float32),
5,
20,
True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't actually know what happens in this case. Is the full text search applied after the vector search? Basically just re-ranking the vector search results?

Or is there some equivalent to "at least one token matches" that is being used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the full text search applied after the vector search? Basically just re-ranking the vector search results?

Yes, in this case the vector search is used as a filter for full text search. But it doesn't mean the full text search is always applied after the vector search. The Scanner::prefilter determines whether the vector search filter is used before/after full text search.

If prefiter=false, we first do a full text search and get rows with score > 0, then we do refine filter using flat knn to get the top 5 nearest rows.

If prefilter=true, we first do a vector search and get 5 rows, then do a flat full text search based on the 5 rows. The full text search will drop the unmatched rows and re-rank the results by score.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants