-
Notifications
You must be signed in to change notification settings - Fork 426
AWS profile support to glue and fsspec s3 fileio #2948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
79e50a1 to
be62e94
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Left a comment about passing profile name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds AWS profile support for the Glue catalog client and fsspec-based S3 FileIO, addressing issue #2841. Users can now explicitly configure AWS profiles through client.profile-name (unified) and s3.profile-name (S3-specific) properties, with s3.profile-name taking precedence over client.profile-name for S3 operations. Similarly, glue.profile-name takes precedence over client.profile-name for Glue catalog operations.
Changes:
- Added
AWS_PROFILE_NAMEandS3_PROFILE_NAMEconfiguration constants - Extended GlueCatalog to support fallback from
glue.profile-nametoclient.profile-name - Implemented profile support in S3FileSystem by creating AioSession with the configured profile
- Added comprehensive unit tests validating profile propagation
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pyiceberg/io/init.py | Defines new constants AWS_PROFILE_NAME and S3_PROFILE_NAME for configuration properties |
| pyiceberg/catalog/glue.py | Updates GlueCatalog to fall back to client.profile-name when glue.profile-name is not set |
| pyiceberg/io/fsspec.py | Implements profile support by creating AioSession with the configured profile for S3FileSystem |
| tests/catalog/test_glue_profile.py | Adds test verifying GlueCatalog uses client.profile-name when provided |
| tests/io/test_fsspec_profile.py | Adds tests verifying S3FileIO uses both s3.profile-name and client.profile-name |
| tests/io/test_fsspec.py | Updates existing tests to include session=None parameter in S3FileSystem assertions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3b4f02e to
97c562a
Compare
97c562a to
5547322
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for also updating the docs
have you tested to see if s3/glue client actually used the given profile name? we have it mocked and the docs say it should work but would be good to verify locally
| } | ||
|
|
||
| if profile_name := get_first_property_value(properties, S3_PROFILE_NAME, AWS_PROFILE_NAME): | ||
| s3_fs_kwargs["profile"] = profile_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Heres the docs for "profile", as reference
https://s3fs.readthedocs.io/en/latest/#credentials
You can specify a profile using s3fs.S3FileSystem(profile='PROFILE'). Otherwise sf3s will use authentication via boto environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating your feedback!
I performed local verification using a script without mocks against real my AWS profiles. I made two AWS IAM Users with different permission profile(s3-only-allowed vs glue-only-allowed) and tested.
First, I verified this locally using real AWS credentials.
- S3 (fsspec): using
s3.profile-name=s3-only-allowed successfully wrote to a test S3 bucket. Switching to a different profile without S3 permissions resulted inAccessDenied, confirming the profile is actually used. - Glue: Glue catalog calls (ex, list_namespaces) succeeded only when using the glue-only-allowed profile. Using a profile without Glue permissions resulted in an authorization failure.
So, using distinct profiles (ex, s3-allowed vs glue-allowed) correctly isolates permissions — the client succeeded only for the service allowed by the respective profile.
Second, I confirmed that providing a non-existent profile correctly raises botocore.exceptions.ProfileNotFound from both Glue and S3 clients.
Third, I checked that specific properties like s3.profile-name and glue.profile-name correctly override client.profile-name in a real environment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, changes looks good!
The cred resolving chain looks as expected with s3.profile-name --> client.profile-name using get_first_property_value, and you're only passing the profile kwarg when it's actually set, which avoids the profile=None issue. The Glue pattern matches what's already there for region_name and aws_access_key_id, so that's consistent.
Tests cover both the fallback and resolving chain scenarios, which is what matters. And thanks for testing with distinct IAM profiles in your actual setup. That's alot more useful than unit tests alone for catching edge cases. One thing to note is a user can set profile and credentials but credentials should take precedence here.
Closes #2841
Rationale for this change
This PR adds explicit AWS profile support for both the Glue catalog client and
fsspec-based S3 FileIO.
While
GlueCatalogalready supports profile configuration, fsspec-based S3operations did not propagate profile selection to the underlying
S3FileSystemor async AWS session. As a result, users had to rely on environmentvariables or the default AWS profile, which makes it difficult to work with
multiple AWS configurations in parallel.
This change introduces two configuration properties:
client.profile-name: a unified AWS profile for the catalog client and FileIOs3.profile-name: an AWS profile specifically for S3 FileIOProfile resolution follows this precedence:
s3.profile-nameclient.profile-nameThis ensures consistent and explicit credential selection across catalog and
FileIO layers when using the fsspec backend.
Are these changes tested?
Yes. New unit tests were added to validate the profile propagation behavior.
Glue Catalog
boto3.Session(profile_name=...)is created when initializingGlueCatalogwithclient.profile-name.S3 FileIO (fsspec)
client.profile-nameors3.profile-nameresults in thecreation of an async AWS session with the correct profile, which is then
passed to
S3FileSystem.The tests were run locally with:
Output would be:
Are there any user-facing changes?
Yes, this adds new configuration properties that users can set:
client.profile-name: Sets the AWS profile for both the catalog client and FileIO (unified configuration).s3.profile-name: Sets the AWS profile specifically for S3 FileIO.Example Usage: