fix: preserve scalar variables in reduce operations (GH#11417)#11418
Open
C1-BA-B1-F3 wants to merge 4 commits into
Open
fix: preserve scalar variables in reduce operations (GH#11417)#11418C1-BA-B1-F3 wants to merge 4 commits into
C1-BA-B1-F3 wants to merge 4 commits into
Conversation
…(GH#11323)
The class was defined inside `_open_scipy_netcdf()`,
so each call created a new class object. After opening two scipy-backed
datasets from file-like objects, the first dataset's class reference became
unreachable by qualname, causing pickle's class-identity check to fail with:
PicklingError: Can't pickle
<class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>:
it's not the same object as
xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file
Fix: create the class once in `_get_flush_only_class()`, set its
`__qualname__` to a module-level name, and register it as a module
attribute so pickle can always resolve it.
Regression test included.
The line dynamically adds an attribute to the module for pickle resolution. Mypy cannot track this pattern, so we add . Fixes mypy error: xarray/backends/scipy_.py:168: error: Module has no attribute "flush_only_netcdf_file" [attr-defined]
When calling reduce operations like sum/mean on a Dataset with scalar (non-dimensional) data variables, the reduce_maybe_single logic would set axis=None for 0-d variables with no matching reduce dims. This caused numpy to attempt reduction on the scalar value itself, which failed for non-numeric types like strings. The fix adds a check that reduce_dims is non-empty before setting reduce_maybe_single=None. When reduce_dims is empty (variable doesn't have the target dimension), reduce_maybe_single stays as [] (empty list), which triggers the invariant_0d check in duck_array_ops and returns the scalar value unchanged. Closes pydata#11417
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When calling
Dataset.sum()(or other numeric-only reduce operations likemean,std, etc.) on a dataset with scalar (0-dimensional) non-numeric data variables, it raises:Reproduction
Root Cause
In
Dataset.reduce(), thereduce_maybe_singlelogic determines which axis to pass to the reduction function. For a 0-dimensional scalar variable with no matching reduce dimensions:reduce_dims = [](empty)len(reduce_dims) == var.ndim→0 == 0→ Truevar.ndim != 1→0 != 1→ Truereduce_maybe_single = None(reduce over all dims)This causes
var.reduce(func, dim=None)which callsfunc(self.data)without an axis argument. For string scalars, this fails because numpy'sadd.reducecan't handle string concatenation.Fix
Added a check that
reduce_dimsis non-empty before settingreduce_maybe_single = None. Whenreduce_dimsis empty (variable doesn't have the target dimension),reduce_maybe_singlestays as[](empty list), which triggers theinvariant_0dcheck induck_array_opsand returns the scalar value unchanged.Test
Added
test_reduce_string_scalarto verify that:sum('index')preserves string scalar variablesmean('index')preserves string scalar variablesmin('index')preserves string scalar variablesmax('index')preserves string scalar variablesCloses #11417