fix: support literal sha2() with 'Unsupported argument types'#3466
fix: support literal sha2() with 'Unsupported argument types'#34660lai0 wants to merge 13 commits intoapache:mainfrom
Conversation
|
Test failure example: |
|
Thanks for looking into this @0lai0. I don't think we should implement a new version of Sha2. The version in this PR has less functionality than the upstream version that it is replacing (fewer data types supported). I looked at the latest upstream code and it does appear to support scalar arguments. I wonder if there were improvements in DF v52.0.0 (we have a PR for upgrading to that). If not, perhaps it would be better just to fall back to Spark if all arguments are scalar, by updating the Scala-side |
|
Thanks @andygrove for the review. I'll incorporate these suggestions into the update. Thank you! |
andygrove
left a comment
There was a problem hiding this comment.
One thing that concerns me is the test changes. The main hash query in hash.sql is changed from query to query spark_answer_only, but that query tests md5, sha1, hash, and xxhash64 alongside sha2. Switching the whole thing to spark_answer_only means we no longer verify that those other functions run natively. It would be better to split the sha2 calls into a separate query expect_fallback(...) line and keep the original query for the rest.
Same issue in CometExpressionSuite.scala where checkSparkAnswerAndOperator is changed to checkSparkAnswer. Those queries include md5, hash, xxhash64, and sha1 which should still be verified to run natively. Could you split out the sha2 calls into a separate checkSparkAnswer and keep checkSparkAnswerAndOperator for the rest?
|
Sure! Thanks for the reminder. I'll review and make changes based on comments to split out the sha2 tests. |
|
Thanks @andygrove , PR updated. |
Which issue does this PR close?
Closes #3340
part of #3328
Rationale for this change
When Spark's ConstantFolding optimizer rule is disabled, an all-literal sha2('test', 256) call reaches the native engine as ScalarValue arguments.
What changes are included in this PR?
Added a dedicated scalar execution path to handle literal inputs, preventing crashes when ConstantFolding is disabled.
How are these changes tested?
cargo test -p datafusion-comet-spark-expr hash_funcs::sha2::tests