-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
When loading a zarr-backed DataArray via a fsspec URL, if the DataArray has coordinates xarray appears to treat the load as a request for a Dataset, not a DataArray. It then seeks to load the coordinate as a distinct variable within the file, where it is not present.
This issue does not occur when loading a DirectoryStore-backed zarr; the coordinate dimension is loaded successfully.
What did you expect to happen?
Behaviour should be identical between DirectoryStore backed DataArrays and FSSpec-backed DirectoryArrays, and both should support arrays with coordinates.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import xarray as xr
import numpy as np
import os
os.system('rm -rf good.zipstore.zip')
os.system('rm -rf bad.zipstore.zip')
os.system('rm -rf good.dirstore.zarr')
os.system('rm -rf bad.dirstore.zarr')
## Working example, no coordinates
foo = xr.DataArray(data=np.zeros(1),
dims='foo')
# coords={'foo' : [0]})
foo.to_zarr('good.dirstore.zarr',consolidated=False,zarr_format=3).close()
# Make zipstore
os.system('cd good.dirstore.zarr; zip -0rv ../good.zipstore.zip ./')
foo_dirstore = xr.load_dataarray('good.dirstore.zarr',engine='zarr',zarr_format=3,consolidated=False)
print(foo_dirstore)
# <xarray.DataArray (foo: 1)> Size: 8B
# array([0.])
# Dimensions without coordinates: foo
foo_zipstore = xr.load_dataarray('zip:///::good.zipstore.zip',engine='zarr',zarr_format=3,consolidated=False)
print(foo_zipstore)
# <xarray.DataArray (foo: 1)> Size: 8B
# array([0.])
# Dimensions without coordinates: foo
## Broken example, with coordinates
foo = xr.DataArray(data=np.zeros(1),
dims='foo',
coords={'foo' : [0]})
foo.to_zarr('bad.dirstore.zarr',consolidated=False,zarr_format=3).close()
# Make zipstore
os.system('cd bad.dirstore.zarr; zip -0rv ../bad.zipstore.zip ./')
foo_dirstore = xr.load_dataarray('bad.dirstore.zarr',engine='zarr',zarr_format=3,consolidated=False)
print(foo_dirstore)
# <xarray.DataArray (foo: 1)> Size: 8B
# array([0.])
# Coordinates:
# * foo (foo) int64 8B 0
foo_zipstore = xr.load_dataarray('zip:///::bad.zipstore.zip',engine='zarr',zarr_format=3,consolidated=False)
print(foo_zipstore)
# KeyError Traceback (most recent call last)
# /tmp/ipython-input-2393741926.py in <cell line: 0>()
# ----> 1 foo_zipstore = xr.load_dataarray('zip:///::bad.zipstore.zip',engine='zarr',zarr_format=3,consolidated=False)
# 2 print(foo_zipstore)
# ...
# /usr/lib/python3.12/zipfile/__init__.py in getinfo(self, name)
# 1547 info = self.NameToInfo.get(name)
# 1548 if info is None:
# -> 1549 raise KeyError(
# 1550 'There is no item named %r in the archive' % name)
# 1551
# KeyError: "There is no item named 'foo/c/0' in the archive"Steps to reproduce
See MCVE: write a DataArray containing coordinates to a DirectoryStore, zip it to a ZipStore, and open the file via a fsspec zip:// URL. MCVE tested on google colab.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
<xarray.DataArray (foo: 1)> Size: 8B
array([0.])
Dimensions without coordinates: foo
<xarray.DataArray (foo: 1)> Size: 8B
array([0.])
Dimensions without coordinates: foo
<xarray.DataArray (foo: 1)> Size: 8B
array([0.])
Coordinates:
* foo (foo) int64 8B 0
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipython-input-3205390801.py in <cell line: 0>()
45 # * foo (foo) int64 8B 0
46
---> 47 foo_zipstore = xr.load_dataarray('zip:///::bad.zipstore.zip',engine='zarr',zarr_format=3,consolidated=False)
48 print(foo_zipstore)
49 # KeyError Traceback (most recent call last)
35 frames
/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py in load_dataarray(filename_or_obj, **kwargs)
189 raise TypeError("cache has no effect in this context")
190
--> 191 with open_dataarray(filename_or_obj, **kwargs) as da:
192 return da.load()
193
/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py in open_dataarray(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
811 """
812
--> 813 dataset = open_dataset(
814 filename_or_obj,
815 decode_cf=decode_cf,
/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
610 **kwargs,
611 )
--> 612 ds = _dataset_from_backend_dataset(
613 backend_ds,
614 filename_or_obj,
/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py in _dataset_from_backend_dataset(backend_ds, filename_or_obj, engine, chunks, cache, overwrite_encoded_chunks, inline_array, chunked_array_type, from_array_kwargs, create_default_indexes, **extra_tokens)
300
301 if create_default_indexes:
--> 302 ds = _maybe_create_default_indexes(backend_ds)
303 else:
304 ds = backend_ds
/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py in _maybe_create_default_indexes(ds)
276 if coord.dims == (name,) and name not in ds.xindexes
277 }
--> 278 return ds.assign_coords(Coordinates(to_index))
279
280
/usr/local/lib/python3.12/dist-packages/xarray/core/coordinates.py in __init__(self, coords, indexes)
313 var = as_variable(data, name=name, auto_convert=False)
314 if var.dims == (name,) and indexes is None:
--> 315 index, index_vars = create_default_index_implicit(var, list(coords))
316 default_indexes.update(dict.fromkeys(index_vars, index))
317 variables.update(index_vars)
/usr/local/lib/python3.12/dist-packages/xarray/core/indexes.py in create_default_index_implicit(dim_variable, all_variables)
1636 else:
1637 dim_var = {name: dim_variable}
-> 1638 index = PandasIndex.from_variables(dim_var, options={})
1639 index_vars = index.create_variables(dim_var)
1640
/usr/local/lib/python3.12/dist-packages/xarray/core/indexes.py in from_variables(cls, variables, options)
718 # preserve wrapped pd.Index (if any)
719 # accessing `.data` can load data from disk, so we only access if needed
--> 720 data = var._data if isinstance(var._data, PandasIndexingAdapter) else var.data # type: ignore[redundant-expr]
721 # multi-index level variable: get level index
722 if isinstance(var._data, PandasMultiIndexingAdapter):
/usr/local/lib/python3.12/dist-packages/xarray/core/variable.py in data(self)
454 duck_array = self._data.array
455 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 456 duck_array = self._data.get_duck_array()
457 elif is_duck_array(self._data):
458 duck_array = self._data
/usr/local/lib/python3.12/dist-packages/xarray/core/indexing.py in get_duck_array(self)
941
942 def get_duck_array(self):
--> 943 duck_array = self.array.get_duck_array()
944 # ensure the array object is cached in-memory
945 self.array = as_indexable(duck_array)
/usr/local/lib/python3.12/dist-packages/xarray/core/indexing.py in get_duck_array(self)
895
896 def get_duck_array(self):
--> 897 return self.array.get_duck_array()
898
899 async def async_get_duck_array(self):
/usr/local/lib/python3.12/dist-packages/xarray/core/indexing.py in get_duck_array(self)
735
736 if isinstance(self.array, BackendArray):
--> 737 array = self.array[self.key]
738 else:
739 array = apply_indexer(self.array, self.key)
/usr/local/lib/python3.12/dist-packages/xarray/backends/zarr.py in __getitem__(self, key)
260 elif isinstance(key, indexing.OuterIndexer):
261 method = self._oindex
--> 262 return indexing.explicit_indexing_adapter(
263 key, array.shape, indexing.IndexingSupport.VECTORIZED, method
264 )
/usr/local/lib/python3.12/dist-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
1127 """
1128 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1129 result = raw_indexing_method(raw_key.tuple)
1130 if numpy_indices.tuple:
1131 # index the loaded duck array
/usr/local/lib/python3.12/dist-packages/xarray/backends/zarr.py in _getitem(self, key)
223
224 def _getitem(self, key):
--> 225 return self._array[key]
226
227 async def _async_getitem(self, key):
/usr/local/lib/python3.12/dist-packages/zarr/core/array.py in __getitem__(self, selection)
2866 return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
2867 elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
-> 2868 return self.get_orthogonal_selection(pure_selection, fields=fields)
2869 else:
2870 return self.get_basic_selection(cast("BasicSelection", pure_selection), fields=fields)
/usr/local/lib/python3.12/dist-packages/zarr/core/array.py in get_orthogonal_selection(self, selection, out, fields, prototype)
3337 prototype = default_buffer_prototype()
3338 indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
-> 3339 return sync(
3340 self.async_array._get_selection(
3341 indexer=indexer, out=out, fields=fields, prototype=prototype
/usr/local/lib/python3.12/dist-packages/zarr/core/sync.py in sync(coro, loop, timeout)
157
158 if isinstance(return_result, BaseException):
--> 159 raise return_result
160 else:
161 return return_result
/usr/local/lib/python3.12/dist-packages/zarr/core/sync.py in _runner(coro)
117 """
118 try:
--> 119 return await coro
120 except Exception as ex:
121 return ex
/usr/local/lib/python3.12/dist-packages/zarr/core/array.py in _get_selection(self, indexer, prototype, out, fields)
1563
1564 # reading chunks and decoding them
-> 1565 await self.codec_pipeline.read(
1566 [
1567 (
/usr/local/lib/python3.12/dist-packages/zarr/core/codec_pipeline.py in read(self, batch_info, out, drop_axes)
471 drop_axes: tuple[int, ...] = (),
472 ) -> None:
--> 473 await concurrent_map(
474 [
475 (single_batch_info, out, drop_axes)
/usr/local/lib/python3.12/dist-packages/zarr/core/common.py in concurrent_map(items, func, limit)
114 return await func(*item)
115
--> 116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
117
118
/usr/local/lib/python3.12/dist-packages/zarr/core/common.py in run(item)
112 async def run(item: tuple[Any]) -> V:
113 async with sem:
--> 114 return await func(*item)
115
116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
/usr/local/lib/python3.12/dist-packages/zarr/core/codec_pipeline.py in read_batch(self, batch_info, out, drop_axes)
268 out[out_selection] = fill_value_or_default(chunk_spec)
269 else:
--> 270 chunk_bytes_batch = await concurrent_map(
271 [(byte_getter, array_spec.prototype) for byte_getter, array_spec, *_ in batch_info],
272 lambda byte_getter, prototype: byte_getter.get(prototype),
/usr/local/lib/python3.12/dist-packages/zarr/core/common.py in concurrent_map(items, func, limit)
114 return await func(*item)
115
--> 116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
117
118
/usr/local/lib/python3.12/dist-packages/zarr/core/common.py in run(item)
112 async def run(item: tuple[Any]) -> V:
113 async with sem:
--> 114 return await func(*item)
115
116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
/usr/local/lib/python3.12/dist-packages/zarr/storage/_common.py in get(self, prototype, byte_range)
166 if prototype is None:
167 prototype = default_buffer_prototype()
--> 168 return await self.store.get(self.path, prototype=prototype, byte_range=byte_range)
169
170 async def set(self, value: Buffer) -> None:
/usr/local/lib/python3.12/dist-packages/zarr/storage/_fsspec.py in get(self, key, prototype, byte_range)
287 try:
288 if byte_range is None:
--> 289 value = prototype.buffer.from_bytes(await self.fs._cat_file(path))
290 elif isinstance(byte_range, RangeByteRequest):
291 value = prototype.buffer.from_bytes(
/usr/local/lib/python3.12/dist-packages/fsspec/implementations/asyn_wrapper.py in wrapper(*args, **kwargs)
25 @functools.wraps(func)
26 async def wrapper(*args, **kwargs):
---> 27 return await asyncio.to_thread(func, *args, **kwargs)
28
29 return wrapper
/usr/lib/python3.12/asyncio/threads.py in to_thread(func, *args, **kwargs)
23 ctx = contextvars.copy_context()
24 func_call = functools.partial(ctx.run, func, *args, **kwargs)
---> 25 return await loop.run_in_executor(None, func_call)
/usr/lib/python3.12/concurrent/futures/thread.py in run(self)
57
58 try:
---> 59 result = self.fn(*self.args, **self.kwargs)
60 except BaseException as exc:
61 self.future.set_exception(exc)
/usr/local/lib/python3.12/dist-packages/fsspec/spec.py in cat_file(self, path, start, end, **kwargs)
767 """
768 # explicitly set buffering off?
--> 769 with self.open(path, "rb", **kwargs) as f:
770 if start is not None:
771 if start >= 0:
/usr/local/lib/python3.12/dist-packages/fsspec/spec.py in open(self, path, mode, block_size, cache_options, compression, **kwargs)
1308 else:
1309 ac = kwargs.pop("autocommit", not self._intrans)
-> 1310 f = self._open(
1311 path,
1312 mode=mode,
/usr/local/lib/python3.12/dist-packages/fsspec/implementations/zip.py in _open(self, path, mode, block_size, autocommit, cache_options, **kwargs)
128 if "r" in self.mode and "w" in mode:
129 raise OSError("ZipFS can only be open for reading or writing, not both")
--> 130 out = self.zip.open(path, mode.strip("b"), force_zip64=self.force_zip_64)
131 if "r" in mode:
132 info = self.info(path)
/usr/lib/python3.12/zipfile/__init__.py in open(self, name, mode, pwd, force_zip64)
1619 else:
1620 # Get info object for name
-> 1621 zinfo = self.getinfo(name)
1622
1623 if mode == 'w':
/usr/lib/python3.12/zipfile/__init__.py in getinfo(self, name)
1547 info = self.NameToInfo.get(name)
1548 if info is None:
-> 1549 raise KeyError(
1550 'There is no item named %r in the archive' % name)
1551
KeyError: "There is no item named 'foo/c/0' in the archive"Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.6.105+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2025.11.1.dev8+g6e82a3afa
pandas: 2.2.2
numpy: 2.0.2
scipy: 1.16.3
netCDF4: 1.7.3
pydap: 3.5.8
h5netcdf: 1.7.3
h5py: 3.15.1
zarr: 3.1.5
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.4.2
dask: 2025.9.1
distributed: 2025.9.1
matplotlib: 3.10.0
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.3
fsspec: 2025.3.0
cupy: 13.6.0
pint: None
sparse: 0.17.0
flox: 0.10.7
numpy_groupies: 0.11.3
setuptools: 75.2.0
pip: 24.1.2
conda: None
pytest: 8.4.2
mypy: None
IPython: 7.34.0
sphinx: 8.2.3