-
Notifications
You must be signed in to change notification settings - Fork 3.2k
OpenAPI: Standardize credentials in loadTable/loadView responses #10722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3103,6 +3103,21 @@ components: | |
| uuid: | ||
| type: string | ||
|
|
||
| StorageCredential: | ||
| type: object | ||
| required: | ||
|
nastra marked this conversation as resolved.
Outdated
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we add the
Adding the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for including
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Despite the other comment below - the region should be the same as the one for the S3FileIO configuration?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @flyrain the region is already sent back in the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the region is bound to the credential. I believe it's possible to build a policy that allows access to buckets in multiple regions, so region should be associated with the bucket, not the credentials. Therefore, we shouldn't include it here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jackye1995 it's not published yet because I wanted to get this proposal in first and otherwise it would deviate the discussion. Refreshing vended credentials would entail adding a new This new endpoint would be called by the mechanism that is offered by the respective storage provider. For example, GCS offers a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, in that case I think this is a great step, it does make a lot of sense to have dedicated credentials refresh API, we were also thinking about proposing it later. We have implemented credentials vending in LakeFormation for a long time now, and last year S3 also offered access grant with similar feature. The user pattern has been proven that having a dedicated API for that makes a lot of sense since refreshing credentials is a much more frequent operation than loading table and can be done very quickly with a dedicated API, and from security perspective it is also a better posture. In that case I think what is proposed here makes sense, and I am more in favor of the current design than what Yufei suggests, because it makes it clear that these fields are related to that new API, and existing config map continues to work for configuring the FileIO, with the additional note that the new credentials field would override the credentials config in the config map.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nastra, thanks for the context. The suggest I provide is a minor structure change, which isn't needed in the new endpoint. I am OK with the current one if that's more align with the new endpoint.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @flyrain yes the structure that is being proposed here is fully aligned with how credentials will be returned by the new credentials endpoint. I understand your reasoning for adding one additional structure on top, but I think we should only do that if we have a concrete use case that this would address (which I don't think we currently have).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. based on recent discussions the feedback was that we don't want to have anything storage-specific in the OpenAPI spec (other than documenting the different storage configurations, which is handled by #10576). Therefore I've updated the PR and made it flexible enough so that we could still pass different credentials based on a given |
||
| - prefix | ||
| - config | ||
| properties: | ||
| prefix: | ||
| type: string | ||
| description: Indicates a storage location prefix where the credential is relevant. Clients should choose the most | ||
| specific prefix (by selecting the longest prefix) if several credentials of the same type are available. | ||
| config: | ||
| type: object | ||
| additionalProperties: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand there was a general agreement to treat vendor-specific credentials as simple property bags. However, would it make sense to define some common attributes (like expiry time) in this spec? I suppose those would be relevant to the follow-up effort of refreshing credentials on the fly. WDYT?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that there could be some common attributes that we could extract, but that would lead to a mixture of having to look up things in a config map vs other properties. This isn't set in stone and we can always revisit this once we implement refreshing vended credentials for the different providers and see how helpful having a separate field for expiration would be. I'd like to unblock this proposal so that we can make progress on refreshing vended credentials |
||
| type: string | ||
|
|
||
| LoadTableResult: | ||
| description: | | ||
| Result used when a table is successfully loaded. | ||
|
|
@@ -3129,6 +3144,11 @@ components: | |
| - `s3.secret-access-key`: secret for credentials that provide access to data in S3 | ||
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
|
|
||
| ## Storage Credentials | ||
|
|
||
| Credentials for ADLS / GCS / S3 / ... are provided through the `storage-credentials` field. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Servers have to provide concrete properties for vendor-specific credentials. Since this spec does not define them, how will servers know what properties to provide? If servesr adapt to the java REST client impl. then the java client will become the "spec"... I'd prefer for Iceberg to provide additional guidelines (which do not have to be part of the REST Catalog spec) to clarify credential properties for known use cases. It might be necessary for this spec to define a sub-type property since some vendors support multiple auth methods for the same URI scheme.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thx!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dimas-b does the documentation of the storage related properties address your comment about "additional guidelines"?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #10576 appears to define credential properties in the REST spec, which is fine from my POV :) I assume after rebasing it will be talking about credential properties under the new config section introduced in this PR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes exactly, we're trying to move the docs to this new credential section |
||
| Clients must first check whether the respective credentials exist in the `storage-credentials` field before checking the `config` for credentials. | ||
| type: object | ||
| required: | ||
| - metadata | ||
|
|
@@ -3142,6 +3162,10 @@ components: | |
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| storage-credentials: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious why should return multiple credentials for one table? Any use cases? The current TableOps just support one fileIO and also noticed the AWS and GCS credential refresh handler just pick the first credential from the server.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use case for this is outlined in #10722 (comment). |
||
| type: array | ||
| items: | ||
| $ref: '#/components/schemas/StorageCredential' | ||
|
|
||
| ScanTasks: | ||
| type: object | ||
|
|
@@ -3395,6 +3419,10 @@ components: | |
|
|
||
| - `token`: Authorization bearer token to use for view requests if OAuth2 security is enabled | ||
|
|
||
| ## Storage Credentials | ||
|
|
||
| Credentials for ADLS / GCS / S3 / ... are provided through the `storage-credentials` field. | ||
| Clients must first check whether the respective credentials exist in the `storage-credentials` field before checking the `config` for credentials. | ||
| type: object | ||
| required: | ||
| - metadata-location | ||
|
|
@@ -3408,6 +3436,10 @@ components: | |
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| storage-credentials: | ||
| type: array | ||
| items: | ||
| $ref: '#/components/schemas/StorageCredential' | ||
|
|
||
| TokenType: | ||
| type: string | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.