Skip to content

fix: preserve scalar variables in reduce operations (GH#11417)#11418

Open
C1-BA-B1-F3 wants to merge 4 commits into
pydata:mainfrom
C1-BA-B1-F3:fix-11417-sum-string-scalars
Open

fix: preserve scalar variables in reduce operations (GH#11417)#11418
C1-BA-B1-F3 wants to merge 4 commits into
pydata:mainfrom
C1-BA-B1-F3:fix-11417-sum-string-scalars

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Problem

When calling Dataset.sum() (or other numeric-only reduce operations like mean, std, etc.) on a dataset with scalar (0-dimensional) non-numeric data variables, it raises:

TypeError: the resolved dtypes are not compatible with add.reduce. Resolved (dtype('<U5'), dtype('<U5'), dtype('<U10'))

Reproduction

import xarray as xr
ds = xr.Dataset(data_vars={
    'a': (['index'], [1, 2, 3]),
    'd': ([], 'hello')
})
ds.sum('index')  # Raises TypeError

Root Cause

In Dataset.reduce(), the reduce_maybe_single logic determines which axis to pass to the reduction function. For a 0-dimensional scalar variable with no matching reduce dimensions:

  • reduce_dims = [] (empty)
  • len(reduce_dims) == var.ndim0 == 0 → True
  • var.ndim != 10 != 1 → True
  • reduce_maybe_single = None (reduce over all dims)

This causes var.reduce(func, dim=None) which calls func(self.data) without an axis argument. For string scalars, this fails because numpy's add.reduce can't handle string concatenation.

Fix

Added a check that reduce_dims is non-empty before setting reduce_maybe_single = None. When reduce_dims is empty (variable doesn't have the target dimension), reduce_maybe_single stays as [] (empty list), which triggers the invariant_0d check in duck_array_ops and returns the scalar value unchanged.

reduce_maybe_single = (
    None
    if reduce_dims
    and len(reduce_dims) == var.ndim
    and var.ndim != 1
    else reduce_dims
)

Test

Added test_reduce_string_scalar to verify that:

  • sum('index') preserves string scalar variables
  • mean('index') preserves string scalar variables
  • min('index') preserves string scalar variables
  • max('index') preserves string scalar variables

Closes #11417

C1-BA-B1-F3 and others added 3 commits June 26, 2026 10:00
…(GH#11323)

The  class was defined inside `_open_scipy_netcdf()`,
so each call created a new class object. After opening two scipy-backed
datasets from file-like objects, the first dataset's class reference became
unreachable by qualname, causing pickle's class-identity check to fail with:

    PicklingError: Can't pickle
    <class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>:
    it's not the same object as
    xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file

Fix: create the class once in `_get_flush_only_class()`, set its
`__qualname__` to a module-level name, and register it as a module
attribute so pickle can always resolve it.

Regression test included.
The line
dynamically adds an attribute to the module for pickle resolution.
Mypy cannot track this pattern, so we add .

Fixes mypy error:
  xarray/backends/scipy_.py:168: error: Module has no attribute
  "flush_only_netcdf_file" [attr-defined]
When calling reduce operations like sum/mean on a Dataset with scalar
(non-dimensional) data variables, the reduce_maybe_single logic would
set axis=None for 0-d variables with no matching reduce dims. This
caused numpy to attempt reduction on the scalar value itself, which
failed for non-numeric types like strings.

The fix adds a check that reduce_dims is non-empty before setting
reduce_maybe_single=None. When reduce_dims is empty (variable doesn't
have the target dimension), reduce_maybe_single stays as [] (empty
list), which triggers the invariant_0d check in duck_array_ops and
returns the scalar value unchanged.

Closes pydata#11417
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset.sum() when string scalars present: "resolved dtypes not compatible with add.reduce"

1 participant