[v26.1.x] iceberg: Add case-insensitive schema matching#30577
Open
vbotbuildovich wants to merge 5 commits into
Open
[v26.1.x] iceberg: Add case-insensitive schema matching#30577vbotbuildovich wants to merge 5 commits into
vbotbuildovich wants to merge 5 commits into
Conversation
(cherry picked from commit 84df79a)
Passes field_name_comparison to catalog_schema_manager::ensure_table_schema, compatibility::check, and table_metadata field lookups so that name comparisons can be made case-insensitively when the caller requests it. coordinator and translation/deps pass field_name_comparison::verbatim as a placeholder; config-driven resolution is wired up in the next commit. (cherry picked from commit 0924bb2)
Adds the cluster-level property iceberg_schema_case_insensitive with values yes/no/auto (default auto). This controls whether Iceberg schema field name matching is done case-insensitively. auto enables case-insensitive matching when the configured REST catalog is AWS Glue (detected via SigV4 auth + service name "glue"), and uses exact matching otherwise. This addresses a sporadic issue where AWS Glue returns schema field names lower-cased rather than verbatim. The resolution logic lives in datalake/coordinator/catalog_config, which wires the cluster config and Glue detection together into a field_name_comparison value that is then threaded into coordinator and translation/deps. (cherry picked from commit a3ad07e)
The backport bot substituted the dev branch lockfile (lockFileVersion 26, Bazel 9.1.0). Regenerate from v26.1.x base (lockFileVersion 18, Bazel 8.4.1) to add just the utf8proc and transitive rules_cc entries.
The utf8proc BCR build has includes=["."] commented out, so the header is not on the angle-bracket include path in Bazel 8.4.1. Use a quoted include instead, consistent with project conventions for external deps.
Contributor
|
Fixed up the lockfile, plus a knockon fix |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of PR #30459
Conflict details
format_to(iceberg_schema_case_insensitive ...)andoperator>>(iceberg_schema_case_insensitive&)adjacent tofips_mode_flag's formatter. On dev,fips_mode_flaghad already migrated fromoperator<<toformat_to; v26.1.x still has theoperator<<form. Kept v26.1.x's existingoperator<<(fips_mode_flag&)unchanged and inserted the newiceberg_schema_case_insensitiveformatter/parser above it. The incomingfips_mode_flagformat_towas an unrelated context change from the source branch, not part of this PR's intent.The following files were cherry-picked and may need regeneration:
These files were accepted as-is from the source branch. Before merging,
regenerate them on the target branch to ensure they're correct. For example:
bazel mod deps --lockfile_mode=updatego mod tidy