Skip to content

Fix shift and pad on pandas extension-array variables (#10301)#11426

Open
MaykThewessen wants to merge 1 commit into
pydata:mainfrom
MaykThewessen:fix-extension-array-shift-pad
Open

Fix shift and pad on pandas extension-array variables (#10301)#11426
MaykThewessen wants to merge 1 commit into
pydata:mainfrom
MaykThewessen:fix-extension-array-shift-pad

Conversation

@MaykThewessen

Copy link
Copy Markdown

What

DataArray/Dataset .shift() and .pad() raise TypeError on variables backed by pandas nullable extension arrays (Int64, Float64, boolean):

import numpy as np, pandas as pd, xarray as xr

arr = pd.Series(index=np.array([1, 2, 3]), data=1).convert_dtypes().to_xarray()
arr.shift(index=1)       # TypeError: int() argument must be ... not 'NAType'
arr.pad(index=(1, 1))    # TypeError: int() argument must be ... not 'NAType'

Both methods route through duck_array_ops.padnp.pad, which coerces the extension array to a plain numpy array and then cannot insert the pd.NA fill value into an integer/float/bool ndarray. (String-backed arrays accidentally worked because they coerce to object.)

This is the part of #10301 not covered by the merged #10423 (which fixed the NA type-promotion path used by reindex/fillna, but not the padding path).

How

  • Register an np.pad NEP-18 handler on PandasExtensionArray (__extension_duck_array__pad). For mode="constant" it builds the padded result from same-dtype extension arrays and concatenates them with _concat_same_type, so the extension dtype and its native pd.NA are preserved. Per-side construction also honors constant_values=(before, after).
  • duck_array_ops.pad wraps raw extension arrays in PandasExtensionArray for constant-mode padding (the same idiom already used by as_shared_dtype), so the handler is reached. Non-constant modes (edge, reflect, …) don't introduce fill values and keep the existing numpy path unchanged.
>>> arr.shift(index=1)
<xarray.DataArray (index: 3)> ...
<IntegerArray>
[<NA>, 1, 1]
Length: 3, dtype: Int64

Scope

Intentionally narrow: this only fixes the shift/pad crash and does not attempt the broader ufunc/operator work being explored in #10380. Numeric (Int64/Float64/boolean), as well as categorical and other extension types, are exercised by the tests.


Authored with AI assistance (Claude), reviewed and verified by me. Happy to adjust the approach, e.g. fold it into #10380 if maintainers prefer a single consolidated fix.

DataArray/Dataset .shift() and .pad() raised TypeError on variables backed
by pandas nullable extension arrays (Int64, Float64, boolean): the NA fill
value could not be written into the numpy array that np.pad coerces the
extension array to.

Register an np.pad NEP-18 handler for PandasExtensionArray that builds the
padded result from same-dtype extension arrays and concatenates them, so the
extension dtype and its native pd.NA are preserved. duck_array_ops.pad wraps
raw extension arrays for constant-mode padding (matching as_shared_dtype);
non-constant modes keep the existing numpy path.
@github-actions github-actions Bot added the topic-arrays related to flexible array support label Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-arrays related to flexible array support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant