[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis by HenryBui777 · Pull Request #56339 · apache/spark

HenryBui777 · 2026-06-05T07:54:56Z

What changes were proposed in this pull request?

This PR implements DataFrame.set_axis() and Series.set_axis() for the Pandas API on Spark, which were previously unsupported (listed in missing/frame.py and missing/series.py).

Changes:

python/pyspark/pandas/frame.py: Added DataFrame.set_axis(labels, axis=0) method.
- axis=0 or axis='index': Reassigns the row index using the provided labels (delegates to set_index with a pd.Index)
- axis=1 or axis='columns': Reassigns column labels using rename(columns=...)
- Raises ValueError if the number of labels doesn't match the axis length
- python/pyspark/pandas/series.py: Added Series.set_axis(labels, axis=0) method.
- Only axis=0 is supported for Series (consistent with pandas)
- Raises ValueError if labels length doesn't match the Series length
- Raises ValueError if axis != 0
- python/pyspark/pandas/missing/frame.py: Removed set_axis from _unsupported_function list
- python/pyspark/pandas/missing/series.py: Removed set_axis from _unsupported_function list
  Supported behavior (matching pandas):

import pyspark.pandas as ps

# DataFrame - change row index
df = ps.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.set_axis(['x', 'y', 'z'])           # axis=0 (default)
df.set_axis(['x', 'y', 'z'], axis=0)
df.set_axis(['x', 'y', 'z'], axis='index')

# DataFrame - change column labels
df.set_axis(['I', 'II'], axis=1)
df.set_axis(['I', 'II'], axis='columns')

# Series
s = ps.Series([1, 2, 3])
s.set_axis(['a', 'b', 'c'])

Why are the changes needed?

DataFrame.set_axis() and Series.set_axis() are standard pandas APIs used extensively for reassigning index or column labels. They were completely unsupported in the Pandas API on Spark, causing PandasNotImplementedError even though the underlying functionality (index/column renaming) is already supported. This PR enables compatibility with pandas code that uses set_axis.

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_axis.html

Does this PR introduce any user-facing change?

Yes. DataFrame.set_axis() and Series.set_axis() now work in the Pandas API on Spark instead of raising PandasNotImplementedError.

How was this patch tested?

Added new unit test files:

python/pyspark/pandas/tests/frame/test_set_axis.py - covers:
- test_set_axis_index: axis=0 with list, string 'index', and pd.Index labels
- test_set_axis_columns: axis=1 with list, string 'columns', and pd.Index labels
- test_set_axis_errors: ValueError on length mismatch and invalid axis
- test_set_axis_numeric_index: numeric index labels
- python/pyspark/pandas/tests/series/test_set_axis.py - covers:
- test_set_axis_index: axis=0 with list, string 'index', and pd.Index labels
- test_set_axis_errors: ValueError on length mismatch and invalid axis (axis=1)
- test_set_axis_named: named Series

Was this patch authored or co-authored using generative AI tooling?

No.

…axis This commit implements DataFrame.set_axis() and Series.set_axis() for the Pandas API on Spark, matching the behavior of the native pandas API. - DataFrame.set_axis(labels, axis=0) supports both axis=0 (index) and axis=1 (columns). - Series.set_axis(labels, axis=0) supports only axis=0 (index), consistent with pandas. - Removed set_axis from the missing function lists in missing/frame.py and missing/series.py. - Added unit tests in tests/frame/test_set_axis.py and tests/series/test_set_axis.py. Resolves: https://issues.apache.org/jira/browse/SPARK-56375

… raise TypeError In the Pandas API on Spark, when performing arithmetic between a float Series and a decimal.Decimal scalar, the behavior was inconsistent: - With ANSI mode ON: TypeError is raised (correct) - With ANSI mode OFF: Operation completes silently (incorrect) Native pandas always raises TypeError in this case. This commit fixes FractionalOps to always raise TypeError for decimal-float mixed arithmetic operations (add, sub), regardless of ANSI mode setting. Other operations (mul, truediv, floordiv, mod, rmul, rmod) already checked ANSI mode; they are unchanged as a separate concern. Resolves: https://issues.apache.org/jira/browse/SPARK-55818

Henry Bui added 2 commits June 5, 2026 14:37

HenryBui777 changed the title ~~Spark 56375 set axis[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis~~ [SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis#56339

[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis#56339
HenryBui777 wants to merge 2 commits into
apache:masterfrom
HenryBui777:SPARK-56375-set-axis

HenryBui777 commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HenryBui777 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HenryBui777 commented Jun 5, 2026 •

edited

Loading