Skip to content

Missing AWS Profile Support in PyIceberg #2841

@thomas-pfeiffer

Description

@thomas-pfeiffer

Feature Request / Improvement

Feature: Missing AWS Profile Support in PyIceberg / PyIceberg should support AWS profiles

Description:
When working with multiple AWS configs / credentials in parallel, AWS profiles are a convenient way to achieve this. Ideally, PyIceberg should therefore also support AWS profiles, which it currently does not.

Current state (as of writing - pyIceberg v0.10.0):

  • The Glue part of the GlueCatalog can be configured to use the profile by specifying the Glue client explicitly in the Glue Catalog or via glue.profile-name config parameter:
from boto3 import Session
...
catalog = GlueCatalog(name="your_glue_catalog",client=Session(profile_name="your_aws_profile").client("glue"),...)

or

catalog = GlueCatalog(
    name="your_glue_catalog",
    **{ 
        "glue.profile-name": "your_aws_profile",
        ...
    },
)
from s3fs import S3FileSystem
from aiobotocore.session import AioSession
...
fs = S3FileSystem(session=AioSession(profile="your_aws_profile"),...)

Workaround for this feature gap:

session = Session(profile_name="your_aws_profile")
credentials = session.get_credentials()  
if credentials is None:
    raise ValueError("Could not retrieve credentials for profile")
catalog = GlueCatalog(
    name="your_glue_catalog",
    **{ 
        "client.access-key-id": credentials.access_key,
        "client.secret-access-key": credentials.secret_key,
        "client.session-token": credentials.token,
        ...
    },
)

To-Be / Expected Behavior:

  1. PyIceberg should have a new client.profile-name and s3.profile-name configuration parameter (next to existing glue.profile-name.
  2. New client.profile-name should also set glue.profile-name (same behaviour as for all the other unified AWS credentials).
  3. For now, AWS profile support should be implemented for fsspec backend and client.profile-name and s3.profile-name should only be supported when using fsspec backend ("py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO").
  4. Once PyArrow supports AWS profile names (see [Python][C++] Add Profile support to S3FileSystem arrow#47880), AWS profile support should be implemented for PyArrow backend as well and client.profile-name and s3.profile-name should be fully supported.

Remark: I found this feature gap with the GlueCatalog; it might be that the RestCatalog is equally affected, but not sure.
Issues possibly related to this issue: #570, #1207, #2657

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions