feat(pull): filter by scanning all rows #351
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch introduces a
--scannflag that modifies the behavior of a pull. This is particularly useful when you need to extract a significant portion of your database. Instead of filtering data during the extraction process, this mode allows you to first pull all the data and then apply the filter as a post-processing step.This method can be faster than querying the database for each individual row with filters applied, especially when dealing with large datasets. It minimizes the need for multiple database queries and speeds up the extraction process by retrieving all the data at once, then excluding unwanted rows afterward.
Changes Overview:
CLI Update:
scannflag (--scann) is added to the pull command for filtering in memory.scannflag to filter in memory.Handler Changes:
scannoption, which influences how the filters are applied.Driver Interface Update:
Pullmethod's signature is updated to accept an additional parameterincluded KeyStoreto support filtering in memory.Test Additions:
--scannfunctionality, including tests for filtering with files, applying filters, handling no matches, and ensuring order consistency.Summary of Key Modifications:
--scannflag and uses it to load data into memory for filtering instead of using a database filter.pullerandpullerParallelnow handle the newincluded KeyStoreto filter data in memory whenscannis enabled.--scannflag, including cases where there are no matches or multiple rows with specific filters.Example of the
--scannFlag Use:lino pull source --scann --filter-from-file customer_filter.jsonlThe flag ensures that the entire dataset is pulled and then filtered in memory based on the provided
customer_filter.jsonlfile.