Fix shift and pad on pandas extension-array variables (#10301)#11426
Open
MaykThewessen wants to merge 1 commit into
Open
Fix shift and pad on pandas extension-array variables (#10301)#11426MaykThewessen wants to merge 1 commit into
MaykThewessen wants to merge 1 commit into
Conversation
DataArray/Dataset .shift() and .pad() raised TypeError on variables backed by pandas nullable extension arrays (Int64, Float64, boolean): the NA fill value could not be written into the numpy array that np.pad coerces the extension array to. Register an np.pad NEP-18 handler for PandasExtensionArray that builds the padded result from same-dtype extension arrays and concatenates them, so the extension dtype and its native pd.NA are preserved. duck_array_ops.pad wraps raw extension arrays for constant-mode padding (matching as_shared_dtype); non-constant modes keep the existing numpy path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
DataArray/Dataset.shift()and.pad()raiseTypeErroron variables backed by pandas nullable extension arrays (Int64,Float64,boolean):Both methods route through
duck_array_ops.pad→np.pad, which coerces the extension array to a plain numpy array and then cannot insert thepd.NAfill value into an integer/float/bool ndarray. (String-backed arrays accidentally worked because they coerce toobject.)This is the part of #10301 not covered by the merged #10423 (which fixed the
NAtype-promotion path used byreindex/fillna, but not the padding path).How
np.padNEP-18 handler onPandasExtensionArray(__extension_duck_array__pad). Formode="constant"it builds the padded result from same-dtype extension arrays and concatenates them with_concat_same_type, so the extension dtype and its nativepd.NAare preserved. Per-side construction also honorsconstant_values=(before, after).duck_array_ops.padwraps raw extension arrays inPandasExtensionArrayfor constant-mode padding (the same idiom already used byas_shared_dtype), so the handler is reached. Non-constant modes (edge,reflect, …) don't introduce fill values and keep the existing numpy path unchanged.Scope
Intentionally narrow: this only fixes the
shift/padcrash and does not attempt the broader ufunc/operator work being explored in #10380. Numeric (Int64/Float64/boolean), as well ascategoricaland other extension types, are exercised by the tests.shift/padpart of Regression in DataArrays created from Pandas #10301test_duck_array_ops.py,test_dataarray.py)whats-new.rstapi.rst(internal only)Authored with AI assistance (Claude), reviewed and verified by me. Happy to adjust the approach, e.g. fold it into #10380 if maintainers prefer a single consolidated fix.