-
Notifications
You must be signed in to change notification settings - Fork 411
Core: Pass REST auth manager to S3 signer #2846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix. I commented on the original issue (#2544) to see if anyone can verify the solution.
I'll also try to reproduce the issue and try out this fix locally
| if token := self.properties.get(TOKEN): | ||
| signer_headers = {"Authorization": f"Bearer {token}"} | ||
| auth_header = f"Bearer {token}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can get rid of accessing TOKEN through properties here and just standardize on using the auth manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be out of scope for this bug-fix PR, but it looks like we’re tightly coupling AuthManager and authentication tokens which are RestCatalog concepts into FileIO, which should be Catalog type agnostic.
It might be worth revisiting this design in more detail in the future to ensure we don’t introduce fallback logic that’s driven by configuration properties rather than clearer separation of concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea great point. i think it would be better to split out the "rest signer" from fileio. there's a good example already in the REST catalog,
iceberg-python/pyiceberg/catalog/rest/__init__.py
Lines 409 to 459 in a99dcad
| def _init_sigv4(self, session: Session) -> None: | |
| from urllib import parse | |
| import boto3 | |
| from botocore.auth import SigV4Auth | |
| from botocore.awsrequest import AWSRequest | |
| from requests import PreparedRequest | |
| from requests.adapters import HTTPAdapter | |
| class SigV4Adapter(HTTPAdapter): | |
| def __init__(self, **properties: str): | |
| super().__init__() | |
| self._properties = properties | |
| self._boto_session = boto3.Session( | |
| region_name=get_first_property_value(self._properties, AWS_REGION), | |
| botocore_session=self._properties.get(BOTOCORE_SESSION), | |
| aws_access_key_id=get_first_property_value(self._properties, AWS_ACCESS_KEY_ID), | |
| aws_secret_access_key=get_first_property_value(self._properties, AWS_SECRET_ACCESS_KEY), | |
| aws_session_token=get_first_property_value(self._properties, AWS_SESSION_TOKEN), | |
| ) | |
| def add_headers(self, request: PreparedRequest, **kwargs: Any) -> None: # pylint: disable=W0613 | |
| credentials = self._boto_session.get_credentials().get_frozen_credentials() | |
| region = self._properties.get(SIGV4_REGION, self._boto_session.region_name) | |
| service = self._properties.get(SIGV4_SERVICE, "execute-api") | |
| url = str(request.url).split("?")[0] | |
| query = str(parse.urlsplit(request.url).query) | |
| params = dict(parse.parse_qsl(query)) | |
| # remove the connection header as it will be updated after signing | |
| del request.headers["connection"] | |
| aws_request = AWSRequest( | |
| method=request.method, url=url, params=params, data=request.body, headers=dict(request.headers) | |
| ) | |
| SigV4Auth(credentials, service, region).add_auth(aws_request) | |
| original_header = request.headers | |
| signed_headers = aws_request.headers | |
| relocated_headers = {} | |
| # relocate headers if there is a conflict with signed headers | |
| for header, value in original_header.items(): | |
| if header in signed_headers and signed_headers[header] != value: | |
| relocated_headers[f"Original-{header}"] = value | |
| request.headers.update(relocated_headers) | |
| request.headers.update(signed_headers) | |
| session.mount(self.uri, SigV4Adapter(**self.properties)) |
it might also be easier to just pass in the request Session from the REST catalog to the Signer. So we dont need to recreate the auth header directly
but again, we can refactor this after the bug fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - just leaving a comment so we don't forget 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case we forget, #2862
| header = getattr(auth_manager, "auth_header", None) | ||
| if callable(header): | ||
| auth_header = header() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| header = getattr(auth_manager, "auth_header", None) | |
| if callable(header): | |
| auth_header = header() | |
| auth_header = auth_manager.auth_header() |
could we just call the function directly? this will fail if the auth_header function does not exist.
i think the current solution will fail silently, i.e. not add any auth header if the auth_header function does not exist
| _ENV_CONFIG = Config() | ||
|
|
||
| TOKEN = "token" | ||
| AUTH_MANAGER = "auth.manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could we move this to pyiceberg/catalog/rest/auth.py? with the other auth manager code.
|
i was able to verify this locally by using the amazing repro script from @martyngigg |
|
|
||
| auth_header: str | None = None | ||
| if token := self.properties.get(TOKEN): | ||
| signer_headers = {"Authorization": f"Bearer {token}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was partially mentioned but, should the auth manager take precedence over token when both are set?
What does this change do?
Why is this needed?
How was this tested?
Closes #2544