Skip to content

[SPARK-55818][PANDAS/PS] Decimal-float mixed arithmetic should always raise TypeError#56340

Open
HenryBui777 wants to merge 1 commit into
apache:masterfrom
HenryBui777:SPARK-55818-decimal-float-arithmetic
Open

[SPARK-55818][PANDAS/PS] Decimal-float mixed arithmetic should always raise TypeError#56340
HenryBui777 wants to merge 1 commit into
apache:masterfrom
HenryBui777:SPARK-55818-decimal-float-arithmetic

Conversation

@HenryBui777
Copy link
Copy Markdown

@HenryBui777 HenryBui777 commented Jun 5, 2026

What changes were proposed in this pull request?

In the Pandas API on Spark, when performing arithmetic between a float Series (or Index) and a decimal.Decimal scalar, the behavior was inconsistent depending on the Spark ANSI mode setting:

  • ANSI mode ON: TypeError was raised correctly (matching pandas behavior).
    • ANSI mode OFF: The operation completed silently, returning a Series filled with None or NaN (incorrect behavior).
      Native pandas always raises TypeError for mixed arithmetic operations between float and decimal types.

This PR fixes FractionalOps (which handles arithmetic operations for float types) to always raise TypeError for decimal-float mixed arithmetic operations (specifically __add__, __sub__, __radd__, and __rsub__), regardless of the Spark ANSI mode setting.

Changes:

  • python/pyspark/pandas/data_type_ops/num_ops.py:
    • Modified FractionalOps.add and FractionalOps.sub to explicitly check if the other operand is a decimal.Decimal (or a Series/Index of decimals) and raise TypeError immediately.
    • Same check added to FractionalOps.radd and FractionalOps.rsub for right-side operations (e.g., decimal.Decimal(1.5) + float_series).

Why are the changes needed?

The behavior of the Pandas API on Spark should match native pandas as closely as possible, and should not silently succeed and return incorrect results (like nulls) depending on Spark's internal configurations (like ANSI mode) when native pandas strictly prohibits such mixed-type operations.

Does this PR introduce any user-facing change?

Yes. Operations like float_series + decimal.Decimal('1.5') and float_series - decimal.Decimal('1.5') now always raise TypeError, even when Spark ANSI mode is turned off.

How was this patch tested?

Added unit tests in:

  • python/pyspark/pandas/tests/data_type_ops/test_decimal_float_arithmetic.py - covers:
    • Float Series + Decimal scalar (both sides)
    • Float Series - Decimal scalar (both sides)
    • Float Index + Decimal scalar (both sides)
    • Float Index - Decimal scalar (both sides)
    • Float Series + Decimal Series
    • Verified that TypeError is raised in all cases under both ANSI ON and ANSI OFF configurations.

Was this patch authored or co-authored using generative AI tooling?

No.

… raise TypeError

In the Pandas API on Spark, when performing arithmetic between a float
Series and a decimal.Decimal scalar, the behavior was inconsistent:
- With ANSI mode ON: TypeError is raised (correct)
- With ANSI mode OFF: Operation completes silently (incorrect)

Native pandas always raises TypeError in this case. This commit fixes
FractionalOps to always raise TypeError for decimal-float mixed
arithmetic operations (add, sub), regardless of ANSI mode setting.

Other operations (mul, truediv, floordiv, mod, rmul, rmod) already
checked ANSI mode; they are unchanged as a separate concern.

Resolves: https://issues.apache.org/jira/browse/SPARK-55818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant