Skip to content

Conversation

@Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 6, 2025

The non-flox version reduces chunksizes significantly:

x = xr.DataArray([1, 1, 1, 1, 1], name="x").chunk()
grp_idx = xr.DataArray([-1, 0, 0, -1, 1])
with xr.set_options(use_flox=False):
    print(x.groupby(grp_idx).cumsum())
<xarray.DataArray 'x' (dim_0: 5)> Size: 40B
dask.array<getitem, shape=(5,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
Dimensions without coordinates: dim_0

With flox the chunksize is retained:

x = xr.DataArray([1, 1, 1, 1, 1], name="x").chunk()
grp_idx = xr.DataArray([-1, 0, 0, -1, 1])
with xr.set_options(use_flox=True):
    print(x.groupby(grp_idx).cumsum())
<xarray.DataArray 'x' (dim_0: 5)> Size: 40B
dask.array<_finalize_scan, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>
Dimensions without coordinates: dim_0

Other changes:

  • Changes DataArray.cumsum/Dataset.cumsum/DataTree.cumsum/DataArray.groupby.cumsum/Dataset.groupby.cumsum etc.
  • Coordinates are now retained

Notes
groupby_scan was added in: https://github.com/xarray-contrib/flox/releases/tag/v0.9.9
cumsum was added in: https://github.com/xarray-contrib/flox/releases/tag/v0.10.5

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
**kwargs,
)

# Prefer Dataset.func(...) over Dataset.reduce(func, ...):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed

cc @TomNicholas

Copy link
Contributor Author

@Illviljan Illviljan Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of #6528 and #10987 (comment). Since the problem of Datasets dropping the coords is solved in .cumsum in this PR, DataTree must also use ds.cumsum instead of ds.reduce("cumsum") in order to avoid dropping the coordinates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's update the code comment to reflect your comment above

@Illviljan
Copy link
Contributor Author

@dcherian, this is ready for a another review now. It was only changes in tests since the last time.

@dcherian
Copy link
Contributor

dcherian commented Jan 8, 2026

Are you able to address the extra testing requested in #10987 (comment)?

If you're too busy, we can just merge. This is a good improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-DataTree Related to the implementation of a DataTree class topic-groupby

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cumsum drops index coordinates

2 participants