Skip to content

fix: support literal sha2() with 'Unsupported argument types'#3466

Open
0lai0 wants to merge 13 commits intoapache:mainfrom
0lai0:native_engine_crashes_on_sha2
Open

fix: support literal sha2() with 'Unsupported argument types'#3466
0lai0 wants to merge 13 commits intoapache:mainfrom
0lai0:native_engine_crashes_on_sha2

Conversation

@0lai0
Copy link
Contributor

@0lai0 0lai0 commented Feb 10, 2026

Which issue does this PR close?

Closes #3340
part of #3328

Rationale for this change

When Spark's ConstantFolding optimizer rule is disabled, an all-literal sha2('test', 256) call reaches the native engine as ScalarValue arguments.

What changes are included in this PR?

Added a dedicated scalar execution path to handle literal inputs, preventing crashes when ConstantFolding is disabled.

How are these changes tested?

cargo test -p datafusion-comet-spark-expr hash_funcs::sha2::tests

@andygrove
Copy link
Member

Test failure example:

2026-02-13T16:58:50.3294698Z - hash functions *** FAILED *** (392 milliseconds)
2026-02-13T16:58:50.3299944Z   org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 3753.0 failed 1 times, most recent failure: Lost task 4.0 in stage 3753.0 (TID 12908) (localhost executor driver): org.apache.comet.CometNativeException: could not cast array of type Binary to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
2026-02-13T16:58:50.3302951Z This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

@andygrove
Copy link
Member

Thanks for looking into this @0lai0. I don't think we should implement a new version of Sha2. The version in this PR has less functionality than the upstream version that it is replacing (fewer data types supported). I looked at the latest upstream code and it does appear to support scalar arguments. I wonder if there were improvements in DF v52.0.0 (we have a PR for upgrading to that). If not, perhaps it would be better just to fall back to Spark if all arguments are scalar, by updating the Scala-side getSupportLevel checks.

@0lai0
Copy link
Contributor Author

0lai0 commented Feb 17, 2026

Thanks @andygrove for the review. I'll incorporate these suggestions into the update. Thank you!

@0lai0 0lai0 marked this pull request as draft February 28, 2026 12:49
@0lai0 0lai0 marked this pull request as ready for review February 28, 2026 15:20
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that concerns me is the test changes. The main hash query in hash.sql is changed from query to query spark_answer_only, but that query tests md5, sha1, hash, and xxhash64 alongside sha2. Switching the whole thing to spark_answer_only means we no longer verify that those other functions run natively. It would be better to split the sha2 calls into a separate query expect_fallback(...) line and keep the original query for the rest.

Same issue in CometExpressionSuite.scala where checkSparkAnswerAndOperator is changed to checkSparkAnswer. Those queries include md5, hash, xxhash64, and sha1 which should still be verified to run natively. Could you split out the sha2 calls into a separate checkSparkAnswer and keep checkSparkAnswerAndOperator for the rest?

@0lai0
Copy link
Contributor Author

0lai0 commented Mar 14, 2026

Sure! Thanks for the reminder. I'll review and make changes based on comments to split out the sha2 tests.

@0lai0
Copy link
Contributor Author

0lai0 commented Mar 14, 2026

Thanks @andygrove , PR updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native engine crashes on literal sha2() with 'Unsupported argument types'

2 participants