[SPARK-56054][SQL] Add test for MERGE INTO schema evolution with aliased assignments#55239
[SPARK-56054][SQL] Add test for MERGE INTO schema evolution with aliased assignments#55239johanl-db wants to merge 2 commits intoapache:masterfrom
Conversation
szehon-ho
left a comment
There was a problem hiding this comment.
these are all DataFrame API, wondering is there any equivalent in SQL? Be good to abstract it out if possible
If not, maybe we can we make a new file MergeIntoSchemaEvolutionExtraDataFrameTests to make them only run in DataFrame mode?
It is a bit of a mess now due to the inheritance patterns
.../org/apache/spark/sql/connector/MergeIntoSchemaEvolutionTypeWideningAndExtraFieldTests.scala
Outdated
Show resolved
Hide resolved
|
I moved the tests to a dedicated trait for dataframe tests. There's no SQL equivalent, it's not possible to specify an alias in an assignment expression using SQL afaict: |
| spark.table("source") | ||
| .mergeInto(tableNameAsString, | ||
| col(s"$tableNameAsString.pk") === col("source.pk")) | ||
| .whenMatched().update(Map("info" -> col("source.info").as("info"))) |
There was a problem hiding this comment.
Okay, this seems like a behavior change compared to the initial PR then? The code in master would evolve the schema in this case because we had that alias support?
| .mergeInto(tableNameAsString, | ||
| col(s"$tableNameAsString.pk") === col("source.pk")) | ||
| .whenMatched().update(Map( | ||
| "info" -> col("source.info").as("something_else"))) |
There was a problem hiding this comment.
How does Delta behave in this case?
|
@johanl-db, what about cases when the source query has an alias? |
What changes were proposed in this pull request?
Follow up from #54891 in particular this comment.
Adds more tests covering schema evolution in MERGE INTO when the assignment contains an alias.
This change also undoes the fix from #54891. On closer inspection, this isn't needed in Spark as Spark already removes trivial aliases on nested field accessors in MERGE in resolveExprInAssignment, which is what the change aimed to allow. See this test added here covering trivial aliases implictily added on nested field access.
Delta doesn't strip aliases during resolution in that way and will require special handling, but that's up to Delta to implement in its custom resolution logic to replicate what Spark already does
How was this patch tested?
Adds tests covering MERGE INTO schema evolution with aliases in assignments