Add BigQuery Metastore federation support#4050
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for your contribution, @joyhaldar ! The PR LGTM 👍 Still, I believe we need to open a dev discussion for it before merging as it includes REST API changes.
| ICEBERG_REST: "#/components/schemas/IcebergRestConnectionConfigInfo" | ||
| HADOOP: "#/components/schemas/HadoopConnectionConfigInfo" | ||
| HIVE: "#/components/schemas/HiveConnectionConfigInfo" | ||
| BIGQUERY: "#/components/schemas/BigQueryMetastoreConnectionConfigInfo" |
There was a problem hiding this comment.
Please open a dev ML discussion for this (it's customary for all REST API changes).
There was a problem hiding this comment.
Thank you for your review Dmitri.
I have an existing dev ML thread for this. Replied with the PR link: https://lists.apache.org/thread/n3hh5s1zxn6yn7cbowfgf8p029z6m7g1
Would you like me to start a new discussion?
There was a problem hiding this comment.
I opened https://lists.apache.org/thread/m05xm7szd7znrm9yos1rnld5ljx04y0h specifically for community awareness of REST API changes.
| connectionConfigInfoDpo.asIcebergCatalogProperties(polarisCredentialManager)); | ||
|
|
||
| BigQueryMetastoreCatalog bigQueryMetastoreCatalog = new BigQueryMetastoreCatalog(); | ||
| bigQueryMetastoreCatalog.initialize(warehouse, mergedProperties); |
There was a problem hiding this comment.
How / where does it get BigQuery connection credentials?
There was a problem hiding this comment.
Thank you for your review Dmitri.
BigQuery Metastore uses Application Default Credentials via Iceberg's BigQueryProperties.
Credentials are resolved automatically from the environment, GOOGLE_APPLICATION_CREDENTIALS, gcloud auth, or attached Service Account.
There was a problem hiding this comment.
So, if storage is also in GCP, the BigQuery connection and storage connections will have to use the same credentials, effectively (in current Polaris code).
It's not a big deal ATM, I think... just trying to understand the full picture :)
There was a problem hiding this comment.
Yes, that's my understanding also. Both BigQuery Metastore and GCS storage will use the same ADC credentials, or impersonated SA if configured. Thank you for clarifying the full picture. Sorry for the late reply.
There was a problem hiding this comment.
Thanks for the explanation, @joyhaldar. Can we comment out this "hidden" behavior somewhere?
There was a problem hiding this comment.
Done. Added a comment explaining the ADC credential behavior in this commit.
8f471ad to
a3db29b
Compare
Thank you for your review @dimas-b. I have an existing dev ML thread for this. Replied with the PR link: https://lists.apache.org/thread/n3hh5s1zxn6yn7cbowfgf8p029z6m7g1 Would you like me to start a new discussion? |
|
CI failure is unrelated to this PR, MongoDB testcontainer timeout if I understand correctly. |
c6980e7 to
07b00dc
Compare
jbonofre
left a comment
There was a problem hiding this comment.
If this PR is solid, I have a concern about the potential NPE. Can you please clarify ?
| .add("gcpProjectId", gcpProjectId) | ||
| .add("gcpLocation", gcpLocation) | ||
| .add("listAllTables", listAllTables) | ||
| .add("impersonateServiceAccount", impersonateServiceAccount) |
There was a problem hiding this comment.
Is it ok to include impersonateServiceAccount for toString() ?
It could include sensitive data I guess, so it could potentially land in log messages.
Maybe we should remove the account details or offuscate ?
There was a problem hiding this comment.
Thank you for your review JB.
Done. I have removed all impersonation fields from toString().
| // IMPLICIT authentication. | ||
| AuthenticationParametersDpo authenticationParametersDpo = | ||
| connectionConfigInfoDpo.getAuthenticationParameters(); | ||
| if (authenticationParametersDpo.getAuthenticationTypeCode() |
There was a problem hiding this comment.
Should we include a null check here ?
If connectionConfigInfoDpo.getAuthenticationParameters() returns null, then authenticationParametersDpo.getAuthenticationTypeCode() will throw NPE.
Since the constructor marks authenticationParameters as required = false I guess it can happen.
There was a problem hiding this comment.
Thank you for your review JB.
I copied this pattern from HiveFederatedCatalogFactory and HadoopFederatedCatalogFactory which also don't have null checks. Should I add the null check here and open a separate PR to fix Hive and Hadoop as well? Or keep it consistent with existing code for now?
There was a problem hiding this comment.
I'd say let's make new code robust immediately and fix old code later.
There was a problem hiding this comment.
Note: IIRC, Hive federation is not even bundled by default:
polaris/runtime/server/build.gradle.kts
Lines 42 to 43 in b67e1a5
There was a problem hiding this comment.
IIUC, This path seems to rely on authenticationParameters being present, while the public BIGQUERY contract still appears to allow it to be omitted. I wonder if that mismatch should be resolved at validation / schema time rather than here in the runtime path.
There was a problem hiding this comment.
Done. Added null check here. If authenticationParameters is null, BigQuery uses ADC by default.
| AuthenticationParametersDpo.fromAuthenticationParametersModelWithSecrets( | ||
| bigqueryConfigModel.getAuthenticationParameters(), secretReferences); | ||
| String bigqueryUri = | ||
| bigqueryConfigModel.getUri() != null |
There was a problem hiding this comment.
nit: If URI is null, we default to https://bigquery.googleapis.com. Maybe this default should live in DPO for visibility (or maybe OpenAPI as you started a discussion to update it 😄 ).
There was a problem hiding this comment.
Done. I have moved the default URI to a public constant in BigQueryMetastoreConnectionConfigInfoDpo.
flyingImer
left a comment
There was a problem hiding this comment.
I’m aligned with the direction here overall. I think there are two high-priority issues worth tightening before this moves forward:
-
The current BIGQUERY auth contract does not look aligned yet. IIUC, the API/model path still allows
authenticationParametersto be omitted, while the runtime only supportsIMPLICITauth and later dereferences that field as if it were always present. That feels like more than a local null-safety issue, the boundary contract still looks looser than the implementation. -
Since this adds a new federation connector across OpenAPI, model conversion, and runtime wiring, I think we should have at least one round-trip/ wiring test for
BIGQUERYbefore the contract settles.
2 cents: this PR also seems like a useful point to clarify whether federation here is aiming for a common Polaris contract over native Iceberg catalogs, or for backend-specific adapters behind one REST surface. I think that answer affects how strict/explicit the connector contract should be.
| items: | ||
| type: string | ||
| description: Delegation chain for impersonation | ||
| required: |
There was a problem hiding this comment.
IIUC, BIGQUERY can still omit authenticationParameters at the API boundary, but the current implementation only supports IMPLICIT auth and later code paths assume the field is present. Would it make sense to tighten the BIGQUERY contract here instead, so the boundary reflects what the runtime actually supports today?
There was a problem hiding this comment.
Thank you for the feedback. I have added null checks in both BigQueryMetastoreFederatedCatalogFactory and asConnectionConfigInfoModel. If authenticationParameters is null, BigQuery uses ADC by default. Please let me know if you'd like me to change the approach. Sorry for the late reply.
| .setImpersonateLifetimeSeconds(impersonateLifetimeSeconds) | ||
| .setImpersonateScopes(impersonateScopes) | ||
| .setImpersonateDelegates(impersonateDelegates) | ||
| .setAuthenticationParameters( |
There was a problem hiding this comment.
Related contract gap: the readback/model-conversion path also seems to assume authenticationParameters is non-null here. So even if the create/update boundary accepts a BIGQUERY payload without an auth block, we can still fail later when converting the stored config back to the API model.
There was a problem hiding this comment.
I have added null check in asConnectionConfigInfoModel.
| // IMPLICIT authentication. | ||
| AuthenticationParametersDpo authenticationParametersDpo = | ||
| connectionConfigInfoDpo.getAuthenticationParameters(); | ||
| if (authenticationParametersDpo.getAuthenticationTypeCode() |
There was a problem hiding this comment.
IIUC, This path seems to rely on authenticationParameters being present, while the public BIGQUERY contract still appears to allow it to be omitted. I wonder if that mismatch should be resolved at validation / schema time rather than here in the runtime path.
Thank you for the detailed review. I have tried to address both points:
Please let me know if you'd like any changes. |
| // IMPLICIT authentication. | ||
| AuthenticationParametersDpo authenticationParametersDpo = | ||
| connectionConfigInfoDpo.getAuthenticationParameters(); | ||
| if (authenticationParametersDpo != null |
There was a problem hiding this comment.
What's the use case for permitting federated catalogs without an AuthenticationParametersDpo object?
There was a problem hiding this comment.
Same concern here. Should we fail fast when authenticationParametersDpo is null?
There was a problem hiding this comment.
BigQuery always uses Application Default Credentials or the GOOGLE_APPLICATION_CREDENTIALS environment variable to automatically pick up the user's credentials or a JSON file pointing to their Service Account. Making authenticationParameters required would just force users to type IMPLICIT for no reason IMO, but please let me know what you think, or please let me know if you think I am incorrect.
There was a problem hiding this comment.
I think we will need an AuthenticationType regardless of how BigQuery picks up credentials. In this case, it should be IMPLICIT(I agreed with line 58 here). However, from a syntax perspective, we may not want users to specify it explicitly. We could default it when the AuthenticationType is missing. That said, I would not recommend handling this defaulting logic here. The defaulting to IMPLICIT should happen elsewhere. WDYT?
There was a problem hiding this comment.
I believe declaring the IMPLICIT authentication explicitly 😉 is actually good. It increases clarity about the intended catalog behaviour. It is also more robust in case of future API and backend code changes.
There was a problem hiding this comment.
Agreed, the point is even we want a default value for better UX. This is not the place to handle the defaulting logic.
There was a problem hiding this comment.
@joyhaldar, sorry to miss this one, I think we will actually need this to be resolved. I'd suggest to add it to the method fromConnectionConfigInfoModelWithSecrets(), something like this
case BIGQUERY:
BigQueryMetastoreConnectionConfigInfo bigqueryConfigModel =
(BigQueryMetastoreConnectionConfigInfo) connectionConfigurationModel;
// Default to IMPLICIT authentication if not provided
authenticationParameters =
bigqueryConfigModel.getAuthenticationParameters() != null
? AuthenticationParametersDpo.fromAuthenticationParametersModelWithSecrets(
bigqueryConfigModel.getAuthenticationParameters(), secretReferences)
: new ImplicitAuthenticationParametersDpo();
There was a problem hiding this comment.
No, I am sorry, I missed it.
I tried adding the defaulting in fromConnectionConfigInfoModelWithSecrets() but it doesn't work, PolarisServiceImpl.validateAuthenticationParameters() runs before conversion and throws NPE when authenticationParameters is null IIUC.
Fixing that would mean changing PolarisServiceImpl too, which seems out of scope for this PR.
For now I'm going with @dimas-b's suggestion, users must pass authenticationParameters: {authenticationType: IMPLICIT} explicitly (commit).
Let me know if my understanding is correct. Would appreciate your guidance.
| gcpLocation: | ||
| type: string | ||
| description: The GCP location (default = us) | ||
| listAllTables: | ||
| type: boolean | ||
| description: Whether to list all tables (default = true) | ||
| impersonateServiceAccount: | ||
| type: string | ||
| description: Service account email to impersonate | ||
| impersonateLifetimeSeconds: | ||
| type: integer | ||
| description: Token lifetime in seconds for impersonation (default = 3600) | ||
| impersonateScopes: | ||
| type: array | ||
| items: | ||
| type: string | ||
| description: OAuth scopes for impersonation (default = cloud-platform) | ||
| impersonateDelegates: | ||
| type: array | ||
| items: | ||
| type: string | ||
| description: Delegation chain for impersonation |
There was a problem hiding this comment.
How often does the BQMS config surface change? For example, when new fields are introduced or existing fields are updated. Locking every field into the spec creates a maintenance burden, since each upstream change requires a Polaris spec PR, client regeneration, and a new release.
Instead, we could consider using a bag of open properties to keep the model more flexible.
properties:
type: object
additionalProperties:
type: string
There was a problem hiding this comment.
You are right, I will work on addressing this.
There was a problem hiding this comment.
Done. Refactored to use a properties bag. Only warehouse and gcpProjectId remain as required fields, everything else goes into the map. Thank you for your help.
flyrain
left a comment
There was a problem hiding this comment.
Thanks @joyhaldar for adding this. I think it would be really useful. The PR is in the right direction. Left some comments.
| private static final String IMPERSONATE_LIFETIME_SECONDS = | ||
| "gcp.bigquery.impersonate.lifetime-seconds"; | ||
| private static final String IMPERSONATE_SCOPES = "gcp.bigquery.impersonate.scopes"; | ||
| private static final String IMPERSONATE_DELEGATES = "gcp.bigquery.impersonate.delegates"; |
There was a problem hiding this comment.
For these property keys, should we just use the iceberg sdk's properties?
There was a problem hiding this comment.
The property keys in Iceberg's BigQueryProperties are package-private, so I had to duplicate them here. I can add a comment referencing the Iceberg source. Let me know what you think.
| private final String impersonateServiceAccount; | ||
| private final Integer impersonateLifetimeSeconds; | ||
| private final List<String> impersonateScopes; | ||
| private final List<String> impersonateDelegates; |
There was a problem hiding this comment.
These are authentication related properties, wandering if we need to create a new authentication for GCP?
private final String impersonateServiceAccount;
private final Integer impersonateLifetimeSeconds;
private final List<String> impersonateScopes;
private final List<String> impersonateDelegates;
There was a problem hiding this comment.
Thank you for the review. These fields are now part of the properties bag.
XJDKC
left a comment
There was a problem hiding this comment.
Side Question: if we can federate to BigLake, we should be able to federate to BigQuery through BigLake right?
jbonofre
left a comment
There was a problem hiding this comment.
I have two major questions/"concerns":
- As we introduce new dependencies (for BigQuery and transitive), the LICENSE/NOTICE should be updated according to the second point.
- The question is: do we ship bigquery metastore federation by default in the server ? If yes, the LICENSE/NOTICE of the server should be updated. If no, no need to update it.
| .add("gcpProjectId", gcpProjectId) | ||
| .add("gcpLocation", gcpLocation) | ||
| .add("listAllTables", listAllTables) | ||
| .add("authenticationParameters", getAuthenticationParameters()) |
There was a problem hiding this comment.
I know I had a comment in this toString() method, and you addressed it. Thanks a lot.
Sorry to be a pain, but I have a follow up question 😄
Here we directly call getAuthenticationParameters(). authenticationParameters is @Nullable.
It's similar to HiveConnectionConfigInfoDpo which has the same nullable pattern.
If other DPOs call toString() on the auth params, this could throw NPE.
The existing Hadoop DPO has authenticationParameters as @Nonnull and calls toString() explicitly.
Maybe we should do something similar here to prevent NPE ?
We can do something simple:
| .add("authenticationParameters", getAuthenticationParameters()) | |
| .add("authenticationParameters", | |
| getAuthenticationParameters() != null ? getAuthenticationParameters().toString() : "null") |
There was a problem hiding this comment.
Done. Added the null check as suggested. Thank you again JB!
flyrain
left a comment
There was a problem hiding this comment.
Thanks for continuous working on it, @joyhaldar! I think we are getting closer. Left some comments. Could you fix the CI failures as well?
A few followup items(not a blocker):
- Document this new feature
- CLI changes to support BQMS.
Made it optional for now to unblock CI. Will follow up on LICENSE/NOTICE updates separately if we decide to ship it by default. |
| connectionConfigInfoDpo.asIcebergCatalogProperties(polarisCredentialManager)); | ||
|
|
||
| // Credentials are resolved via Google Application Default Credentials (ADC). | ||
| // GCS storage operations use the same ADC credentials. |
There was a problem hiding this comment.
The second statement is not 100% accurate. In principle a user can override GCS storage credentials:
I'd propose adding ... by default.
There was a problem hiding this comment.
Updated the comment to say by default.
flyingImer
left a comment
There was a problem hiding this comment.
This is directionally better now, but I still think there are two blocker-level contract issues here.
-
BIGQUERY can still accept auth modes that the connector itself does not actually support. The connector is still IMPLICIT-only, but the create/update path is validating auth too generically, so an invalid BIGQUERY catalog can still be accepted and persisted, then fail later at catalog instantiation time.
-
The
authenticationParameterscontract is still inconsistent. The schema/DPO/factory path still treats it as effectively optional, but the create/admin validation path dereferences it as if it were required. So right now a missing auth block can still turn into a null-deref path instead of a deterministic validation/defaulting decision.
My preference would be to make this narrower and explicit before merge:
- validate BIGQUERY auth per connection type at create/update time, not only later in the connector, and
- pick one contract for missing
authenticationParameters(required + IMPLICITormissing means IMPLICIT) and make the spec/validation/DPO path all match.
| properties: | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| description: Additional catalog properties |
There was a problem hiding this comment.
Thanks for adding this. Thinking more broadly, this properties would be useful for other federated catalog type like HMS. I'd proposed to move it up to its parent ConnectionConfigInfo, so that all subtypes can benefit from it. WDYT? cc @XJDKC @HonahX @singhpk234 @dimas-b
There was a problem hiding this comment.
Yes, the more I think about it the more I like the idea of supporting general properties inside ConnectionConfigInfo. That REST API change should be able to support both BigQuery Catalog federation and #3729 (CC: @PhillHenry).
There was a problem hiding this comment.
Here is another use case that justifies it. A client for the remote catalog may support more properties than those defined in the spec. Without a freeform properties field, we would need to keep updating the spec to accommodate properties already supported by existing clients (for example HMS or BQMS), which is both unnecessary and heavy. With properties, users can simply update their federated catalog in the metastore instead.
There was a problem hiding this comment.
I have moved properties to parent ConnectionConfigInfo so all federation types can use it.
Co-authored-by: Joy Haldar <Joy.Haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
1074346 to
65ab8be
Compare
…am change Co-authored-by: Joy Haldar <joy.haldar@target.com>
| properties.put(GCP_BIGQUERY_PROJECT_ID, gcpProjectId); | ||
| properties.put(CatalogProperties.WAREHOUSE_LOCATION, warehouse); |
There was a problem hiding this comment.
Why not use the generic getProperties() map (line 108) for these parameter too?
There was a problem hiding this comment.
These are required fields in the spec. projectId is validated as required in BigQueryProperties.java, and warehouse is validated in BigQueryMetastoreCatalog.java. The properties bag is for optional settings like gcp.bigquery.location. But let me know if this is not appropriate, would love your opinion.
There was a problem hiding this comment.
From my POV this creates a skew in the Polaris API, which becomes dependent on particular Iceberg java classes.
Conceptually, this class merely passes some user-defined properties to the Iceberg catalog impl. This is fine. However, do we have to elevate a sub-set of those properties to explicit Polaris OpenAPI properties?.. I'm not sure.
I believe keeping the Polaris API generic and uniform for all federated catalog properties might be preferable.
Implementations (both Iceberg java classes and Polaris classes) can and should perform config validation as appropriate, of course.
There was a problem hiding this comment.
Can I do this in a follow-up PR?
I'm trying to figure out the right pattern here, Hadoop and Hive have warehouse as an explicit field, but BigQueryMetastoreCatalog needs both warehouse and projectId. Should I:
- Keep
warehouseexplicit (consistent withHadoop/Hive), moveprojectIdto properties. - Move both to properties (fully generic, but diverges from
Hadoop/Hivepattern) - Something else?
Would appreciate your guidance.
There was a problem hiding this comment.
Retracting API changes is not ideal (in a follow-up).
I'd propose to remove gcpProjectId and warehouse from OpenAPI explicit properties and process corresponding Iceberg properties from the generic properties bag. Also, add description to OpenAPI YAML to list required properties.
My rationale is that it is fine to accept pass-through properties destined to become Iceberg (federated) catalog properties. Yet, specific processing / validation is a feature of the implementation and can change in new Polaris versions without requiring API changes.
I wonder what other reviewers think about this.
There was a problem hiding this comment.
@dimas-b proposal works for me, as soon as clear documented. That said, I'm also fine to keep them as it's "bigquery" specific.
There was a problem hiding this comment.
Re: Hadoop and Hive: from my POV the same argument applies to them, but I missed out on the review of that code when it was introduced 😅
This is why my request in this thread is non-blocking.
However, I do not really see a reason to promote an old pattern if we're in agreement about the rationale for generic properties in this case (hence my request for more reviewer opinions).
There was a problem hiding this comment.
I have moved warehouse and gcpProjectId to the properties map with validation in the factory and documented required properties in the spec in this commit.
flyrain
left a comment
There was a problem hiding this comment.
LGTM. Thanks @joyhaldar !
dimas-b
left a comment
There was a problem hiding this comment.
The PR LGTM overall 👍
Feel free to consider my point about properties as non-blocking.
| properties.put(GCP_BIGQUERY_PROJECT_ID, gcpProjectId); | ||
| properties.put(CatalogProperties.WAREHOUSE_LOCATION, warehouse); |
There was a problem hiding this comment.
From my POV this creates a skew in the Polaris API, which becomes dependent on particular Iceberg java classes.
Conceptually, this class merely passes some user-defined properties to the Iceberg catalog impl. This is fine. However, do we have to elevate a sub-set of those properties to explicit Polaris OpenAPI properties?.. I'm not sure.
I believe keeping the Polaris API generic and uniform for all federated catalog properties might be preferable.
Implementations (both Iceberg java classes and Polaris classes) can and should perform config validation as appropriate, of course.
flyrain
left a comment
There was a problem hiding this comment.
The defaulting logic need to be moved
…y validation Co-authored-by: Joy Haldar <joy.haldar@target.com>
Co-authored-by: Joy Haldar <joy.haldar@target.com>
| @JsonProperty(value = "warehouse", required = false) @Nullable String remoteCatalogName) { | ||
| super(ConnectionType.HADOOP.getCode(), uri, authenticationParameters, serviceIdentityInfo); | ||
| super( | ||
| ConnectionType.HADOOP.getCode(), uri, authenticationParameters, serviceIdentityInfo, null); |
There was a problem hiding this comment.
nit: I think we can spare adding these ugly 'null' parameters in the super calls. We can overload the public constructor of ConnectionConfigInfoDpo like this:
public ConnectionConfigInfoDpo(
@JsonProperty(value = "connectionTypeCode", required = true) int connectionTypeCode,
@JsonProperty(value = "uri", required = true) @Nonnull String uri,
@JsonProperty(value = "authenticationParameters", required = true) @Nullable
AuthenticationParametersDpo authenticationParameters,
@JsonProperty(value = "serviceIdentity", required = false) @Nullable
ServiceIdentityInfoDpo serviceIdentity,
@JsonProperty(value = "properties", required = false) @Nullable
Map<String, String> properties) {
this(connectionTypeCode, uri, authenticationParameters, serviceIdentity, true, properties);
}
public ConnectionConfigInfoDpo(
@JsonProperty(value = "connectionTypeCode", required = true) int connectionTypeCode,
@JsonProperty(value = "uri", required = true) @Nonnull String uri,
@JsonProperty(value = "authenticationParameters", required = true) @Nullable
AuthenticationParametersDpo authenticationParameters,
@JsonProperty(value = "serviceIdentity", required = false) @Nullable
ServiceIdentityInfoDpo serviceIdentity) {
this(connectionTypeCode, uri, authenticationParameters, serviceIdentity, true, null);
}
There was a problem hiding this comment.
I have added the overloaded constructor and updated the subclasses.
| if (warehouse == null || warehouse.isEmpty()) { | ||
| throw new IllegalArgumentException("warehouse is required for BigQuery Metastore federation"); | ||
| } | ||
| if (properties.get("gcp.bigquery.project-id") == null) { |
There was a problem hiding this comment.
Is empty string accepted for gcp.bigquery.project-id?
There was a problem hiding this comment.
I have added empty string validation for gcp.bigquery.project-id.
| */ | ||
| public class BigQueryMetastoreConnectionConfigInfoDpo extends ConnectionConfigInfoDpo { | ||
|
|
||
| public static final String DEFAULT_URI = "https://bigquery.googleapis.com"; |
There was a problem hiding this comment.
nit: I think we can keep this private.
There was a problem hiding this comment.
Actually it's public because it's used in ConnectionConfigInfoDpo.fromConnectionConfigInfoModelWithSecrets() for URI defaulting. I kept it in the BQMS DPO since it's specific to BQMS. But let me know if you disagree, would appreciate your inputs.
There was a problem hiding this comment.
Oh okay, I missed that it is used there. Makes sense, thanks!
Co-authored-by: Joy Haldar <joy.haldar@target.com>
nandorKollar
left a comment
There was a problem hiding this comment.
I'm not an expert of BigQuery, but overall LGTM.
dimas-b
left a comment
There was a problem hiding this comment.
LGTM 👍 Thanks for bearing with me 🙂
| BigQueryMetastoreConnectionConfigInfo: | ||
| type: object | ||
| description: | | ||
| Configuration necessary for connecting to a BigQuery Metastore Catalog. |
There was a problem hiding this comment.
nit: maybe make a note that properties are effectively Iceberg's BigQuery catalog properties (for clarity)?
| if (warehouse == null || warehouse.isEmpty()) { | ||
| throw new IllegalArgumentException("warehouse is required for BigQuery Metastore federation"); | ||
| } | ||
| String projectId = properties.get("gcp.bigquery.project-id"); |
There was a problem hiding this comment.
nit: make a constant of refer to a constant from Iceberg jars?
No, thank you for taking the time to thoroughly review my PR and giving me so much constructive feedback! |
|
Merge is blocked by Github CI. Re-triggered. I will merge it today if there is no more feedback. |
|
I believe there is an option to re-run failed CI jobs without closing/reopening the PR 🤔 +1 to merging today. |
|
Thanks @joyhaldar for the change! It's very useful feature. Thanks everyone for the review.
@dimas-b, got it. Mind sharing the option? |
Adds federation support for BigQuery Metastore catalogs, enabling Polaris to serve as a unified REST catalog interface for Iceberg tables managed in BigQuery Metastore.
Changes
BigQueryMetastoreFederatedCatalogFactory- factory for creating BigQuery Metastore catalog handlesBigQueryMetastoreConnectionConfigInfoDpo- connection configuration with all BigQueryPropertiesBIGQUERYconnection type inConnectionTypeBigQueryMetastoreConnectionConfigInfoChecklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)