[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis#56339
Open
HenryBui777 wants to merge 2 commits into
Open
[SPARK-56375][PANDAS/PS] Implement DataFrame.set_axis and Series.set_axis#56339HenryBui777 wants to merge 2 commits into
HenryBui777 wants to merge 2 commits into
Conversation
added 2 commits
June 5, 2026 14:37
…axis This commit implements DataFrame.set_axis() and Series.set_axis() for the Pandas API on Spark, matching the behavior of the native pandas API. - DataFrame.set_axis(labels, axis=0) supports both axis=0 (index) and axis=1 (columns). - Series.set_axis(labels, axis=0) supports only axis=0 (index), consistent with pandas. - Removed set_axis from the missing function lists in missing/frame.py and missing/series.py. - Added unit tests in tests/frame/test_set_axis.py and tests/series/test_set_axis.py. Resolves: https://issues.apache.org/jira/browse/SPARK-56375
… raise TypeError In the Pandas API on Spark, when performing arithmetic between a float Series and a decimal.Decimal scalar, the behavior was inconsistent: - With ANSI mode ON: TypeError is raised (correct) - With ANSI mode OFF: Operation completes silently (incorrect) Native pandas always raises TypeError in this case. This commit fixes FractionalOps to always raise TypeError for decimal-float mixed arithmetic operations (add, sub), regardless of ANSI mode setting. Other operations (mul, truediv, floordiv, mod, rmul, rmod) already checked ANSI mode; they are unchanged as a separate concern. Resolves: https://issues.apache.org/jira/browse/SPARK-55818
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR implements
DataFrame.set_axis()andSeries.set_axis()for the Pandas API on Spark, which were previously unsupported (listed inmissing/frame.pyandmissing/series.py).Changes:
python/pyspark/pandas/frame.py: AddedDataFrame.set_axis(labels, axis=0)method.axis=0oraxis='index': Reassigns the row index using the provided labels (delegates toset_indexwith apd.Index)axis=1oraxis='columns': Reassigns column labels usingrename(columns=...)ValueErrorif the number of labels doesn't match the axis lengthpython/pyspark/pandas/series.py: AddedSeries.set_axis(labels, axis=0)method.axis=0is supported for Series (consistent with pandas)ValueErrorif labels length doesn't match the Series lengthValueErrorifaxis != 0python/pyspark/pandas/missing/frame.py: Removedset_axisfrom_unsupported_functionlistpython/pyspark/pandas/missing/series.py: Removedset_axisfrom_unsupported_functionlistSupported behavior (matching pandas):
Why are the changes needed?
DataFrame.set_axis()andSeries.set_axis()are standard pandas APIs used extensively for reassigning index or column labels. They were completely unsupported in the Pandas API on Spark, causingPandasNotImplementedErroreven though the underlying functionality (index/column renaming) is already supported. This PR enables compatibility with pandas code that usesset_axis.Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_axis.html
Does this PR introduce any user-facing change?
Yes.
DataFrame.set_axis()andSeries.set_axis()now work in the Pandas API on Spark instead of raisingPandasNotImplementedError.How was this patch tested?
Added new unit test files:
python/pyspark/pandas/tests/frame/test_set_axis.py- covers:test_set_axis_index: axis=0 with list, string 'index', and pd.Index labelstest_set_axis_columns: axis=1 with list, string 'columns', and pd.Index labelstest_set_axis_errors: ValueError on length mismatch and invalid axistest_set_axis_numeric_index: numeric index labelspython/pyspark/pandas/tests/series/test_set_axis.py- covers:test_set_axis_index: axis=0 with list, string 'index', and pd.Index labelstest_set_axis_errors: ValueError on length mismatch and invalid axis (axis=1)test_set_axis_named: named SeriesWas this patch authored or co-authored using generative AI tooling?
No.