[SPARK-55818][PANDAS/PS] Decimal-float mixed arithmetic should always raise TypeError#56340
Open
HenryBui777 wants to merge 1 commit into
Open
[SPARK-55818][PANDAS/PS] Decimal-float mixed arithmetic should always raise TypeError#56340HenryBui777 wants to merge 1 commit into
HenryBui777 wants to merge 1 commit into
Conversation
… raise TypeError In the Pandas API on Spark, when performing arithmetic between a float Series and a decimal.Decimal scalar, the behavior was inconsistent: - With ANSI mode ON: TypeError is raised (correct) - With ANSI mode OFF: Operation completes silently (incorrect) Native pandas always raises TypeError in this case. This commit fixes FractionalOps to always raise TypeError for decimal-float mixed arithmetic operations (add, sub), regardless of ANSI mode setting. Other operations (mul, truediv, floordiv, mod, rmul, rmod) already checked ANSI mode; they are unchanged as a separate concern. Resolves: https://issues.apache.org/jira/browse/SPARK-55818
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In the Pandas API on Spark, when performing arithmetic between a float Series (or Index) and a
decimal.Decimalscalar, the behavior was inconsistent depending on the Spark ANSI mode setting:TypeErrorwas raised correctly (matching pandas behavior).NoneorNaN(incorrect behavior).Native pandas always raises
TypeErrorfor mixed arithmetic operations between float and decimal types.This PR fixes
FractionalOps(which handles arithmetic operations for float types) to always raiseTypeErrorfor decimal-float mixed arithmetic operations (specifically__add__,__sub__,__radd__, and__rsub__), regardless of the Spark ANSI mode setting.Changes:
python/pyspark/pandas/data_type_ops/num_ops.py:FractionalOps.addandFractionalOps.subto explicitly check if the other operand is adecimal.Decimal(or a Series/Index of decimals) and raiseTypeErrorimmediately.FractionalOps.raddandFractionalOps.rsubfor right-side operations (e.g.,decimal.Decimal(1.5) + float_series).Why are the changes needed?
The behavior of the Pandas API on Spark should match native pandas as closely as possible, and should not silently succeed and return incorrect results (like nulls) depending on Spark's internal configurations (like ANSI mode) when native pandas strictly prohibits such mixed-type operations.
Does this PR introduce any user-facing change?
Yes. Operations like
float_series + decimal.Decimal('1.5')andfloat_series - decimal.Decimal('1.5')now always raiseTypeError, even when Spark ANSI mode is turned off.How was this patch tested?
Added unit tests in:
python/pyspark/pandas/tests/data_type_ops/test_decimal_float_arithmetic.py- covers:TypeErroris raised in all cases under both ANSI ON and ANSI OFF configurations.Was this patch authored or co-authored using generative AI tooling?
No.