feat: add SparkPow UDF returning Infinity for pow(0, negative)#22605
Conversation
Spark returns Infinity for pow(0, <negative>) following IEEE 754, while the DataFusion default (PowerFunc) raises an error to match PostgreSQL behavior. This adds SparkPow to the datafusion-spark crate which overrides the Float64 path to explicitly return +Infinity when base == 0.0 and exp < 0.0 (covers both 0.0 and -0.0), and delegates all decimal types to the existing PowerFunc. Both 'pow' and 'power' aliases are covered. Closes apache#22598
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a Spark-compatible pow/power implementation and updates SLT expectations to match Spark’s 0 ^ negative = Infinity behavior.
Changes:
- Introduces
SparkPowUDF that overrides DataFusion’s defaultpow/powersemantics for0 ^ negative. - Registers the new function in the Spark math module and exposes it via
expr_fn. - Updates SLT cases to assert Spark’s
Infinityresults forpow(0, -1)variants.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| datafusion/sqllogictest/test_files/spark/math/pow.slt | Updates Spark SLT expectations for pow and adds 0 ^ negative coverage expecting Infinity. |
| datafusion/spark/src/function/math/pow.rs | Adds SparkPow UDF wrapping PowerFunc with Spark-specific edge-case behavior and unit tests. |
| datafusion/spark/src/function/math/mod.rs | Registers pow UDF module, exports it, and adds to the function registry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Only Float64 needs the Spark override. | ||
| // Decimal / integer paths are delegated to the standard PowerFunc which | ||
| // already handles them correctly (decimal can't represent Infinity anyway). | ||
| if !matches!(args.args[0].data_type(), DataType::Float64) { | ||
| return self.inner.invoke_with_args(args); |
| (Some(b), Some(e)) => { | ||
| if b == 0.0 && e < 0.0 { | ||
| Some(f64::INFINITY) | ||
| } else { | ||
| Some(b.powf(e)) | ||
| } | ||
| } |
| // ── Array path ─────────────────────────────────────────────────────── | ||
| let [base, exponent] = take_function_args(self.name(), &args.args)?; | ||
|
|
||
| let base_arr: ArrayRef = base.to_array(num_rows)?; | ||
| let exp_arr: ArrayRef = exponent.to_array(num_rows)?; |
| query R | ||
| SELECT pow(2::int, 3::int); | ||
| ---- | ||
| 8 |
comphead
left a comment
There was a problem hiding this comment.
Thanks @Brijesh-Thakkar this a solid PR, please remove tests from pow.rs, those are repetitive.
for the pow.slt lets have more double edgecases, specifically, nulls, nans, -0, +0, -inf, +Inf
Spark returns Infinity for pow(0, <negative>) following IEEE 754, while the DataFusion default (PowerFunc) raises an error to match PostgreSQL behavior. This adds SparkPow to the datafusion-spark crate which overrides the Float64 path to return +Infinity when base == 0.0 and exp < 0.0 (covers both 0.0 and -0.0). All decimal types delegate to PowerFunc. Both 'pow' and 'power' aliases are covered. Adds sqllogictest edge cases for: nulls, NaN, signed zeros (-0/+0), and signed infinities (-Inf/+Inf) including array and mixed paths. Closes apache#22598
@comphead Removed tests from pow.rs file as you sugegested and also added more edge cases |
| ] = args.args.as_slice() | ||
| { | ||
| // b and e are &Option<f64>; Option<f64> is Copy. | ||
| let result = (*b).zip(*e).map(|(b, e)| { |
There was a problem hiding this comment.
lets call it base, exp instead of b, e
| let result: Float64Array = base_f64 | ||
| .iter() | ||
| .zip(exp_f64.iter()) | ||
| .map(|(b, e)| match (b, e) { |
|
@comphead I will fix this and commit the changes |
|
@comphead I have done the changes as you suggested and commited them as well |
|
@comphead All requested changes have been addressed and checks are passing. Could you please approve the PR when you get a chance? Thanks! |
comphead
left a comment
There was a problem hiding this comment.
Thanks @Brijesh-Thakkar lgtm!
Which issue does this PR close?
Closes #22598
Rationale for this change
In Apache Spark, the
pow(base, exp)function follows IEEE 754 semantics where raising0(or-0.0) to a negative exponent yields positiveInfinity.Currently, DataFusion's default core
PowerFuncmimics PostgreSQL behavior, throwing an explicit error ("zero raised to a negative power is undefined"). To support standard Spark compatibility without breaking core DataFusion expectations, this PR introduces a specializedSparkPowUDF inside thedatafusion-sparkcrate.What changes are included in this PR?
This PR introduces the following changes within the
datafusion-sparkintegration crate:SparkPowUDF (datafusion/spark/src/function/math/pow.rs): Overrides theFloat64execution path to evaluatebase == 0.0 && exp < 0.0asf64::INFINITY(safely catching both0.0and-0.0due to IEEE 754 equality rules).PowerFunc, as decimals cannot represent infinity.datafusion/spark/src/function/math/mod.rs): Registers the newpowfunction and establishespoweras a valid alias.datafusion/sqllogictest/test_files/spark/math/pow.slt): Updates and adds test coverage ensuringpow(0, -1),power(0, -1), andpow(0.0, -1.0)successfully returnInfinity.Are these changes tested?
Yes, the changes are covered via both unit and integration tests:
test_spark_pow_zero_negative_returns_infinityandtest_spark_pow_normal_caseswithinpow.rsto validate the core scalar execution logic.datafusion/sqllogictest/test_files/spark/math/pow.sltto verify the end-to-end SQL evaluation behavior.Are there any user-facing changes?
Yes, but only for users utilizing the
datafusion-sparkcompatibility features. When the Spark dialect/crate is active, evaluatingpow(0, <negative>)will now returnInfinityinstead of throwing an evaluation error. Core DataFusion behavior remains completely unchanged.