-
Notifications
You must be signed in to change notification settings - Fork 319
feat(Python-client):add scan filter supported #2305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for hashkey and sortkey scan filters to the Python Pegasus client. The changes enable filtering of scan results based on pattern matching (prefix, postfix, anywhere) for both hash keys and sort keys during scanning operations.
- Added filter configuration fields to ScanOptions class for hashkey and sortkey filtering
- Fixed issues in the
generate_next_bytesfunction to handle different input types and avoid infinite loops - Enhanced scanner functionality to properly set filter parameters in scan requests
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| python-client/pypegasus/utils/tools.py | Added filter configuration fields to ScanOptions and utility function for byte value handling |
| python-client/pypegasus/pgclient.py | Enhanced scanning logic with filter support, fixed generate_next_bytes function, and improved key generation |
| python-client/pypegasus/base/ttypes.py | Added raw() method to blob class for accessing underlying data |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
f1f3481 to
a228dbc
Compare
| if isinstance(data,str): | ||
| self._is_str = True | ||
| data = data.encode('UTF-8') | ||
| else: | ||
| self._is_str = False | ||
| self.data = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if isinstance(data,str): | |
| self._is_str = True | |
| data = data.encode('UTF-8') | |
| else: | |
| self._is_str = False | |
| self.data = data | |
| if isinstance(data,str): | |
| self._is_str = True | |
| self.data = data.encode('UTF-8') | |
| else: | |
| self._is_str = False | |
| self.data = data |
| if sort_key_len >= 0xFFFF: | ||
| raise ValueError("sort_key length must be less than 65535") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no restriction on the length of the sort key. The limit on the hash key exists because the hash key length is stored in only two bytes.
| if sort_key_len >= 0xFFFF: | |
| raise ValueError("sort_key length must be less than 65535") |
| return arr | ||
| else: | ||
| return buff + chr(0) | ||
| return bytes(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it converted to bytes here?
What problem does this PR solve?
Add support for hashkey and sortkey scan filters
What is changed and how does it work?
generate_next_bytesfunction has two problema. The input buff (i.e. hashkey) can be either ‘str’ or ‘bytearray’. If it's a str, in-place modification like
buff[pos] += 1won't work since strings are immutable.b. The pos variable was initialized to a fixed index (len(buff) - 1), which is counterintuitive and could lead to an infinite loop.
incubator-pegasus/python-client/pypegasus/pgclient.py
Lines 613 to 624 in 44400f6
Checklist
Tests
the test script as below