diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..1ecca1a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,75 @@ +# AGENTS.md + +Guidance for automation tools and AI coding assistants working in this +repository. + +## Purpose + +Xee is an Xarray backend for Google Earth Engine. The primary user API is: + +- `xr.open_dataset(..., engine='ee')` + +When making workflow examples, integrations, or docs updates, optimize for this +user-facing API first. + +## Current Version Context + +- Xee `v0.1.0` is a refactor with breaking changes. +- Some repository examples are still pre-v0.1.0 and are being updated. +- For migration context, see `docs/migration-guide-v0.1.0.md`. + +## Canonical Dev Commands + +Use Pixi environments so behavior is reproducible across platforms. + +- Unit tests: `pixi run -e tests pytest -q xee/ext_test.py` +- Integration tests: `pixi run -e tests pytest -q xee/ext_integration_test.py` +- Docs build: `pixi run -e docs docs-build` +- Docs strict check: `pixi run -e docs docs-check` + +Before proposing completion, run at least unit tests for touched areas and +`docs-check` for docs changes. + +## Integration Guidance (for tools helping users adopt Xee) + +1. Prefer `xr.open_dataset(..., engine='ee')` examples over backend internals. +2. Show one of these grid strategies explicitly: + - `helpers.extract_grid_params(...)` for matching source grid + - `helpers.fit_geometry(..., grid_shape=...)` for fixed output shape + - `helpers.fit_geometry(..., grid_scale=...)` for fixed physical resolution +3. Use consistent endpoint advice: + - High-volume endpoint for stored collections + - Standard endpoint for computed collections / iterative workflows +4. Mention that both plain asset IDs and `ee://...` forms are accepted. +5. Prefer AOI wording as: `AOI (area of interest)` on first use. + +## Files To Prefer For Source-of-Truth + +- Install/setup: `docs/installation.md` +- First-user flow: `docs/quickstart.md` +- Concepts/terminology: `docs/concepts.md` +- Canonical parameter reference: `docs/open_dataset.md` +- Performance guidance: `docs/performance.md` +- Contributor process and required checks: `docs/contributing.md` + +## Documentation Expectations + +If behavior or API usage changes, update docs in the same change where practical: + +- Update user-facing docs first, then examples. +- Avoid adding duplicate guidance in many places; link to canonical pages. +- Keep examples explicit about grid parameters and endpoint assumptions. + +## Avoid + +- Recommending backend internals (`EarthEngineStore`) as primary user entrypoint. +- Adding new examples that depend on outdated pre-v0.1.0 assumptions. +- Mixing contradictory endpoint guidance across docs. +- Introducing new terminology variants when established wording exists. + +## PR-Ready Checklist + +- Code and docs are aligned with current `v0.1.0` guidance. +- Relevant tests or checks were run and summarized. +- New docs links resolve and `docs-check` passes. +- Any known limitations are stated explicitly. diff --git a/README.md b/README.md index 9d659ea..6f9d36d 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,9 @@ > **⚠️ Breaking Change in v0.1.0** > -> A major refactor was released in v0.1.0, introducing breaking changes to the Xee API. In most cases, existing code written for pre-v0.1.0 versions will require updates to remain compatible. +> v0.1.0 includes a major refactor with breaking API changes. > -> - See the [Migration Guide](https://github.com/google/Xee/blob/main/docs/migration-guide-v0.1.0.md) for details on updating your code. -> - If you need more time to migrate, you can pin your environment to the latest pre-v0.1.0 release. -> -> **During the v0.1.0 prerelease window:** `pip install xee` and `conda install xee` may still install the previous stable line. -> To use the refactored API documented here, install a prerelease with `pip install --upgrade --pre xee` or pin an RC such as `pip install xee==0.1.0rc1`. +> - Migration steps: [docs/migration-guide-v0.1.0.md](docs/migration-guide-v0.1.0.md) +> - Canonical install options (prerelease vs stable): [docs/installation.md](docs/installation.md) # Xee: Xarray + Google Earth Engine @@ -22,31 +19,14 @@ Xee is an Xarray backend for Google Earth Engine. Open `ee.Image` / `ee.ImageCol ## Install -For the refactored v0.1.0 API documented below (prerelease period): +For the latest v0.1.0 prerelease: ```bash pip install --upgrade --pre xee ``` -or pin a specific release candidate: - -```bash -pip install xee==0.1.0rc1 -``` - -For the current stable line (pre-v0.1.0 API): - -```bash -pip install --upgrade xee -``` - -or - -```bash -conda install -c conda-forge xee -``` - -Note: conda-forge may lag PyPI during prerelease testing. Use pip for the latest RC builds. +For all installation paths (including stable line and conda), see +[docs/installation.md](docs/installation.md). ## Minimal example @@ -59,7 +39,7 @@ from xee import helpers # earthengine authenticate project = 'PROJECT-ID' # Set your Earth Engine registered Google Cloud project ID -# Initialize (high‑volume endpoint recommended for reading stored collections) +# Initialize (high-volume endpoint recommended for reading stored collections) ee.Initialize(project=project, opt_url='https://earthengine-highvolume.googleapis.com') # Open a dataset by matching its native grid @@ -71,9 +51,9 @@ print(ds) Next steps: -- [Quickstart](https://github.com/google/Xee/blob/main/docs/quickstart.md) -- [Concepts (grid params, CRS, orientation)](https://github.com/google/Xee/blob/main/docs/concepts.md) -- [User Guide (workflows)](https://github.com/google/Xee/blob/main/docs/guide.md) +- [Quickstart](docs/quickstart.md) +- [Concepts (grid params, CRS, orientation)](docs/concepts.md) +- [User Guide (workflows)](docs/guide.md) ## Features diff --git a/docs/README.md b/docs/README.md index 086eef1..27d57ee 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,10 +2,10 @@ > **⚠️ Breaking Change in v0.1.0** > -> A major refactor was released in v0.1.0, introducing breaking changes to the Xee API. In most cases, existing code written for pre-v0.1.0 versions will require updates to remain compatible. +> v0.1.0 includes breaking API changes. > -> - See the [Migration Guide](migration-guide-v0.1.0.md) for details on updating your code. -> - If you need more time to migrate, you can pin your environment to the latest pre-v0.1.0 release. +> - Migration steps: [migration-guide-v0.1.0.md](migration-guide-v0.1.0.md) +> - Install options (prerelease vs stable): [installation.md](installation.md) ## For nicely rendered documentation diff --git a/docs/concepts.md b/docs/concepts.md index 8978228..0a9ae26 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -31,6 +31,7 @@ The tuple follows the [Rasterio/`affine.Affine`](https://affine.readthedocs.io/e Instead of constructing these manually, prefer helpers: +- AOI means area of interest. - `extract_grid_params(obj)`: Match an `ee.Image` or `ee.ImageCollection` source grid. - `fit_geometry(geometry, grid_crs, grid_scale=(x, y))`: Define pixel size (resolution) over an AOI. - `fit_geometry(geometry, grid_crs, grid_shape=(w, h))`: Define output array dimensions, letting resolution float. @@ -45,7 +46,7 @@ Datasets are returned as `[time, y, x]` aligning with CF conventions and most ge ## Stored vs Computed Collections -- Stored: unmodified `ee.ImageCollection('ID')` — use high‑volume endpoint for throughput. +- Stored: unmodified `ee.ImageCollection('ID')` — use high-volume endpoint for throughput. - Computed: collections after `.map()`, `.select()`, filtering, band math — standard endpoint sometimes more efficient due to caching. ## Choosing a Grid Strategy diff --git a/docs/examples.md b/docs/examples.md new file mode 100644 index 0000000..ec03a1c --- /dev/null +++ b/docs/examples.md @@ -0,0 +1,38 @@ +--- +title: Examples +orphan: true +--- + +# Examples + +```{admonition} Status note +:class: warning + +Most examples linked here currently target the pre-v0.1.0 API and may not work +as-is with the refactored v0.1.0 interface documented in this site. + +These examples are being updated and expanded. +``` + +This page points to runnable end-to-end examples maintained in the repository. + +## Core examples + +- [examples/README.md](https://github.com/google/Xee/blob/main/examples/README.md) +- [examples/ee_to_zarr.py](https://github.com/google/Xee/blob/main/examples/ee_to_zarr.py) +- [examples/ee_to_zarr_reqs.txt](https://github.com/google/Xee/blob/main/examples/ee_to_zarr_reqs.txt) + +## Dataflow pipeline example + +- [examples/dataflow/README.md](https://github.com/google/Xee/blob/main/examples/dataflow/README.md) +- [examples/dataflow/ee_to_zarr_dataflow.py](https://github.com/google/Xee/blob/main/examples/dataflow/ee_to_zarr_dataflow.py) +- [examples/dataflow/requirements.txt](https://github.com/google/Xee/blob/main/examples/dataflow/requirements.txt) +- [examples/dataflow/Dockerfile](https://github.com/google/Xee/blob/main/examples/dataflow/Dockerfile) + +## Choosing where to start + +- New to Xee: start with [Quickstart](quickstart.md), then + [examples/README.md](https://github.com/google/Xee/blob/main/examples/README.md). +- Need reproducible chunked outputs: use + [examples/ee_to_zarr.py](https://github.com/google/Xee/blob/main/examples/ee_to_zarr.py). +- Need scalable batch execution: use the Dataflow example set. diff --git a/docs/guide.md b/docs/guide.md index 1678fc4..5b24fae 100644 --- a/docs/guide.md +++ b/docs/guide.md @@ -2,6 +2,14 @@ This guide collects practical workflows. For underlying theory see [Core Concepts](concepts.md). For a minimal setup see the [Quickstart](quickstart.md). +```{admonition} Where do these parameters come from? +:class: tip + +All examples in this guide ultimately call `xr.open_dataset(..., engine='ee')`. +Use [Open Dataset Reference](open_dataset.md) for the complete parameter list, +defaults, and runtime behavior. +``` + ## Match Source Grid Use `helpers.extract_grid_params` to mirror the dataset's native projection & resolution. @@ -15,7 +23,7 @@ grid_params = helpers.extract_grid_params(ic) ds = xr.open_dataset(ic, engine='ee', **grid_params) ``` -## Fit Area to a Shape +## Fit Area of Interest (AOI) to a Shape Derive a grid that covers an AOI with a fixed pixel count (resolution floats). @@ -33,7 +41,7 @@ grid_params = helpers.fit_geometry( ds = xr.open_dataset('ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params) ``` -## Fit Area to a Scale (Resolution) +## Fit Area of Interest (AOI) to a Scale (Resolution) Fix physical pixel size; grid dimensions derived from AOI extent. @@ -159,4 +167,5 @@ temp_slice.plot() - [Core Concepts](concepts.md) - [Performance & Limits](performance.md) - [FAQ](faq.md) -- Examples: see [examples](https://github.com/google/Xee/tree/main/examples) directory in the repository +- Repository examples (legacy pre-v0.1.0, update in progress): + diff --git a/docs/index.md b/docs/index.md index 82dc1e7..e396122 100644 --- a/docs/index.md +++ b/docs/index.md @@ -21,6 +21,7 @@ quickstart installation concepts guide +open_dataset client-vs-server performance api diff --git a/docs/installation.md b/docs/installation.md index 33c7750..258064a 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -106,7 +106,7 @@ top of your script, include one of the following expressions with the `project` argument modified to match the Google Cloud project ID enabled and registered for Earth Engine use. -#### High-volume endpoint (bulk stored data) +#### High-volume endpoint (stored collections) If you are requesting stored data (supplying a collection ID or passing an unmodified `ee.ImageCollection()` object to `xarray.open_dataset`), connect to @@ -120,7 +120,7 @@ ee.Initialize( ) ``` -#### Standard endpoint (computed / cached) +#### Standard endpoint (computed collections / iterative development) If you are requesting computed data (applying expressions to the data), consider connecting to the [standard @@ -131,3 +131,5 @@ something about the request. ```python ee.Initialize(project='your-project-id') ``` + +For more tuning guidance, see [Performance & Limits](performance.md). diff --git a/docs/open_dataset.md b/docs/open_dataset.md new file mode 100644 index 0000000..614572b --- /dev/null +++ b/docs/open_dataset.md @@ -0,0 +1,309 @@ +# Open Dataset Reference (`engine='ee'`) + +This page is the canonical user-facing reference for calling: + +```python +xr.open_dataset(..., engine='ee') +``` + +## How The Call Chain Works + +In plain terms: + +1. You call `xarray.open_dataset(..., engine='ee')`. +2. Xarray routes that call to Xee's backend entrypoint method: + `xee.EarthEngineBackendEntrypoint.open_dataset`. +3. That entrypoint creates and uses `xee.EarthEngineStore` internally to stream + pixels and metadata. + +`EarthEngineStore` is an internal/core backend type. Most users should treat +`xr.open_dataset(..., engine='ee')` as the public API and use this page as the +parameter reference. + +Related API pages: + +- [EarthEngineBackendEntrypoint autosummary](_autosummary/xee.EarthEngineBackendEntrypoint) +- [EarthEngineStore autosummary](_autosummary/xee.EarthEngineStore) + +## Required vs Optional Parameters + +When `engine='ee'`, the grid parameters are required at call time: + +- `crs` +- `crs_transform` +- `shape_2d` + +Most other parameters are optional tuning or decoding controls. + +Input source (`filename_or_obj`) can be one of: + +- An `ee.ImageCollection` object +- An `ee.Image` object (wrapped internally as an ImageCollection) +- An asset id string/path, including `ee://...` / `ee:...` style URIs + +## Canonical Parameter List + +The signature and parameter docs below are rendered from the backend method +used at runtime, so this reference stays aligned with implementation behavior. + +```{eval-rst} +.. currentmodule:: xee +.. automethod:: EarthEngineBackendEntrypoint.open_dataset +``` + +## Parameter Name Mapping (User API vs Core Backend) + +Most users should pass arguments to `xr.open_dataset(..., engine='ee')`. +Some names differ in the core backend API (`EarthEngineStore.open`). + +| User-facing (`xr.open_dataset`) | Core backend (`EarthEngineStore.open`) | Notes | +|---|---|---| +| `filename_or_obj` | `image_collection` | Backend always operates on an `ee.ImageCollection` | +| `io_chunks` | `chunk_store` / `chunks` | Same concept, different name at different layers | +| `ee_mask_value` | `mask_value` | Same behavior | + +If you are reading backend API pages, these name differences are expected. + +## Practical Parameter Guide + +The list below explains the most common practical usage patterns for parameters +you may see in user docs and backend API docs. + +### `image_collection` (`ee.ImageCollection`) + +- Backend/core parameter corresponding to user-facing `filename_or_obj`. +- You usually pass either an EE object (`ee.ImageCollection`/`ee.Image`) or + an asset URI string into `xr.open_dataset`; Xee normalizes to an + `ee.ImageCollection` internally. +- Asset paths usually come from either: + - The public Earth Engine Data Catalog: + + - The Awesome GEE Community Catalog (community datasets): + + - Your own Earth Engine assets (personal, team, or project-owned): + +- Example catalog path: + `ECMWF/ERA5_LAND/MONTHLY_AGGR` (or URI form `ee://ECMWF/ERA5_LAND/MONTHLY_AGGR`). + +### `crs` (`str`) + +- Output coordinate reference system for all opened variables. +- Required at runtime for `engine='ee'`. +- Prefer `helpers.extract_grid_params(...)` / `helpers.fit_geometry(...)` + unless you explicitly need a manual override. + +### `crs_transform` (`tuple[float, float, float, float, float, float] | Affine`) + +- Geotransform defining pixel size/origin in the selected CRS. +- Required at runtime for `engine='ee'`. +- Keep this consistent with `shape_2d`; mismatches can cause confusing bounds + or orientation outcomes. + +### `shape_2d` (`tuple[int, int]`) + +- Pixel grid size in `(width, height)` order. +- Required at runtime for `engine='ee'`. +- Large shapes increase memory and request pressure. + +### `chunks` (`int | dict[Any, Any] | Literal['auto'] | None`) + +- Default: `None`. +- Dask/Xarray chunking in the returned dataset. +- Affects downstream compute scheduling/memory behavior, not just EE request + boundaries. +- Start with modest time chunks and tune only when needed. + +### `n_images` (`int`) + +- Default: `-1` (include all images). +- Limit the number of images loaded from the collection (`-1` means all). +- Useful for quick iteration, debugging, or very large collections. + +### `primary_dim_name` (`str | None`) + +- Default: `None` (resolved to `time`). +- Rename the primary stacked dimension (default: `time`). +- Usually keep default unless integrating with an existing schema. + +### `primary_dim_property` (`str | None`) + +- Default: `None` (resolved to `system:time_start`). +- EE image property used to derive primary-dimension coordinate values + (default: `system:time_start`). +- Change only if your collection indexing semantics depend on another property. + +### `mask_value` (`float | None`) + +- Default: `None` (resolved to `np.iinfo(np.int32).max`, i.e. `2147483647`). +- Backend/core mask sentinel corresponding to user-facing `ee_mask_value`. +- Used to convert EE nodata/sentinel pixels to NaN-friendly behavior. + +### `request_byte_limit` (`int`) + +- Default: `48 * 1024 * 1024` (48 MB). +- Upper bound for per-request payload size. +- Advanced tuning control: Earth Engine size constraints vary by workload. +- Prefer lowering this value when you hit request-size instability. +- Avoid increasing unless validated for your specific dataset/query pattern. + +### `ee_init_kwargs` (`dict[str, Any] | None`) + +- Default: `None`. +- Keyword arguments forwarded to `ee.Initialize(...)` during optional worker + auto-initialization. +- Useful in distributed settings where workers need credentials/project config. + +### `ee_init_if_necessary` (`bool`) + +- Default: `False`. +- Whether Xee should attempt EE initialization on demand (commonly for remote + workers). +- Keep `False` for standard local workflows where EE is already initialized. + +### `executor_kwargs` (`dict[str, Any] | None`) + +- Default: `None` (internally treated as `{}`). +- Thread pool settings for parallel pixel retrieval. +- Advanced tuning: increasing worker count may improve throughput or trigger + more rate/quota pressure depending on workload. + +### `getitem_kwargs` (`dict[str, int] | None`) + +- Default: `None` (uses internal defaults: `max_retries=6`, + `initial_delay=500` ms). +- Retry/backoff tuning for array indexing fetches. +- Useful for transient quota/rate errors. +- Tune conservatively (`max_retries`, `initial_delay`) and prefer reducing + concurrency before aggressive retry expansion. + +### `fast_time_slicing` (`bool`) + +- Default: `False`. +- Enables a faster slice path by loading images by ID. +- Important: for computed/modified ImageCollections, this can return original + asset images (looked up by ID) rather than your computed image values. + +## `fast_time_slicing` Deep Dive + +`fast_time_slicing=True` is an important optimization, but it changes how time +slices are resolved. + +What it does: + +- `False` (default): Xee slices directly from the in-memory EE + `ImageCollection` object. +- `True`: Xee slices by `system:id` first and then loads by those IDs. + +Why this can be confusing: + +- If your collection is computed/modified (for example: `.map(...)`, band math, + clipping/masking, or replacing images), slicing by ID can bypass those + computed modifications and return the original images associated with the + IDs. +- In other words, `fast_time_slicing=True` can be faster but may not reflect + computed collection transformations. + +When to use it: + +- Good fit: direct/stored collections where you want faster time slicing and + are not depending on computed per-image transformations. +- Use caution: computed collections where transformed pixel values must be + preserved in reads. + +Practical recommendation: + +1. Start with `fast_time_slicing=False` for correctness-sensitive workflows. +2. Enable `fast_time_slicing=True` only after validating that sampled outputs + match your intended processing semantics. +3. If enabled and your collection lacks image IDs, Xee logs a warning and + falls back to default (non-fast) behavior. + +## Common Recipes + +### 1. Match source projection/resolution + +Use this when you want output aligned to the dataset's native grid. + +```python +import ee +import xarray as xr +from xee import helpers + +ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') +grid = helpers.extract_grid_params(ic) + +ds = xr.open_dataset(ic, engine='ee', **grid) +``` + +### 2. Manual grid override + +Use this when you must align with an external raster/grid spec. + +```python +import xarray as xr + +manual_crs = 'EPSG:4326' +manual_transform = (0.25, 0, -180, 0, -0.25, 90) +manual_shape = (1440, 720) # (width, height) + +ds = xr.open_dataset( + 'ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', + engine='ee', + crs=manual_crs, + crs_transform=manual_transform, + shape_2d=manual_shape, +) +``` + +### 3. Performance/chunking tuning + +Use this when throughput or memory behavior needs tuning. + +```python +import ee +import xarray as xr +from xee import helpers + +ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') +grid = helpers.extract_grid_params(ic) + +ds = xr.open_dataset( + ic, + engine='ee', + **grid, + chunks={'time': 12}, + io_chunks={'time': 24, 'x': 256, 'y': 256}, + request_byte_limit=32 * 1024 * 1024, +) +``` + +```{admonition} Advanced tuning only +:class: warning + +`io_chunks` and `request_byte_limit` are advanced controls. Earth Engine +imposes response/request size constraints, so these values usually require +trial-and-error for each workload. + +Start from defaults and tune conservatively. In most cases, reducing request +size is safer than increasing it. +``` + +Notes: + +- `chunks` controls Dask chunking in Xarray. +- `io_chunks` controls request windows used by Xee for Earth Engine reads. +- `request_byte_limit` limits per-request payload size. Prefer reducing this if + you encounter request-size failures or unstable reads. +- Avoid increasing `request_byte_limit` unless you have validated behavior + against Earth Engine limits for your specific dataset and query pattern. + +## Object vs URI Inputs + +For `engine='ee'`, these are equivalent in outcome once resolved: + +- Passing an EE object (`ee.ImageCollection`/`ee.Image`) +- Passing a URI/asset id string (`ee://...` style or asset path string) + +Object inputs are often convenient in notebooks where you've already built a +computed collection. URI/asset id strings are useful for concise, declarative +loading and config-driven workflows. diff --git a/docs/performance.md b/docs/performance.md index 4a312f6..946ae95 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -6,11 +6,13 @@ title: Performance & Limits Guidance for working efficiently within Earth Engine and Xee constraints. +AOI means area of interest. + ## Endpoints | Endpoint | Use case | Notes | |----------|----------|-------| -| High‑volume | Reading stored ImageCollections | Higher throughput, intended for bulk pixel access. | +| High-volume | Reading stored ImageCollections | Higher throughput, intended for bulk pixel access. | | Standard | Computed collections / iterative dev | Caching can accelerate repeated computations. | Switch endpoints by passing / omitting `opt_url` in `ee.Initialize`. diff --git a/docs/quickstart.md b/docs/quickstart.md index 18fb84e..d0cbb7e 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -8,32 +8,16 @@ Get up and running with Xee in a few minutes. ## 1. Install -This quickstart uses the refactored v0.1.0 API. +This quickstart uses the refactored v0.1.0 API. For canonical installation +options (prerelease, stable, conda, and upgrade guidance), see +[Installation](installation.md). -Install prerelease from pip: +Fast path (latest v0.1.0 prerelease from pip): ```bash pip install --upgrade --pre xee ``` -or pin a specific RC: - -```bash -pip install xee==0.1.0rc1 -``` - -If you need the current stable line (pre-v0.1.0 API), use: - -```bash -pip install --upgrade xee -``` - -```bash -conda install -c conda-forge xee -``` - -Note: conda-forge may lag PyPI during prerelease testing. - Optional (plotting): `pip install matplotlib`. ## 2. Earth Engine access @@ -53,7 +37,7 @@ import ee ee.Authenticate() ``` -Initialize (high‑volume endpoint recommended for reading stored ImageCollections): +Initialize (high-volume endpoint recommended for reading stored collections): ```python import ee @@ -63,7 +47,7 @@ ee.Initialize( ) ``` -For computed collections (server-side expressions) you can omit `opt_url` to use the standard endpoint which benefits from caching. +For computed collections (server-side expressions), omit `opt_url` to use the standard endpoint, which benefits from caching during iterative development. ## 3. Open your first dataset @@ -77,6 +61,31 @@ ds = xr.open_dataset(ic, engine='ee', **grid) print(ds) ``` +```{admonition} Choosing a grid strategy quickly +:class: tip + +- Exploratory analysis: use `helpers.extract_grid_params(...)`. +- Fixed output shape (for model inputs): use `helpers.fit_geometry(..., grid_shape=...)`. +- Fixed physical resolution: use `helpers.fit_geometry(..., grid_scale=...)`. +- Manual `crs` / `crs_transform` / `shape_2d`: advanced alignment workflows only. +``` + +```{admonition} Dataset ID input forms +:class: note + +Xee accepts both plain Earth Engine asset IDs (for example, +`ECMWF/ERA5_LAND/MONTHLY_AGGR`) and URI forms (for example, +`ee://ECMWF/ERA5_LAND/MONTHLY_AGGR`). +``` + +```{admonition} Where do these parameters come from? +:class: tip + +`xr.open_dataset(..., engine='ee')` is the primary user entrypoint for Xee. +For the complete, canonical parameter reference (including defaults and backend +behavior), see [Open Dataset Reference](open_dataset.md). +``` + Plot the first time slice (matplotlib required): ```python @@ -104,6 +113,8 @@ ds['temperature_2m'].isel(time=0).plot() ## 6. Example: custom AOI at fixed size +AOI means area of interest. + ```python import shapely from xee import helpers diff --git a/xee/ext.py b/xee/ext.py index 8669500..945936a 100644 --- a/xee/ext.py +++ b/xee/ext.py @@ -168,6 +168,35 @@ def open( getitem_kwargs: dict[str, int] | None = None, fast_time_slicing: bool = False, ) -> EarthEngineStore: + """Create an EarthEngineStore for a normalized ImageCollection input. + + This method is used internally by the Xarray backend entrypoint. Most users + should call ``xarray.open_dataset(..., engine='ee')`` instead of calling + this API directly. + + Args: + image_collection: Normalized EE ImageCollection to read from. + crs: Output coordinate reference system. + crs_transform: Output geotransform in the selected CRS. + shape_2d: Output pixel grid shape as ``(width, height)``. + mode: Access mode. Only ``'r'`` is supported. + chunk_store: Request chunk configuration propagated to store chunks. + n_images: Maximum number of images to include; ``-1`` includes all. + primary_dim_name: Primary dimension name in the resulting dataset. + primary_dim_property: EE image property used for primary coordinate + values. + mask_value: Sentinel value used for mask/nodata conversion. + request_byte_limit: Maximum bytes to request per EE call. + ee_init_kwargs: Optional kwargs for ``ee.Initialize`` when auto init is + enabled. + ee_init_if_necessary: Whether to attempt EE auto-initialization. + executor_kwargs: Optional ThreadPoolExecutor kwargs for parallel fetches. + getitem_kwargs: Optional retry/backoff overrides for robust item fetches. + fast_time_slicing: Enable faster ID-based time slicing path. + + Returns: + A configured ``EarthEngineStore`` instance. + """ if mode != 'r': raise ValueError( f'mode {mode!r} is invalid: data can only be read from Earth Engine.' @@ -926,7 +955,7 @@ def open_dataset( Args: filename_or_obj: An asset ID for an ImageCollection, or an - ee.ImageCollection object. + ``ee.ImageCollection`` object. crs: The coordinate reference system (a CRS code or WKT string). This defines the frame of reference to coalesce all variables upon opening. @@ -936,20 +965,20 @@ def open_dataset( drop_variables (optional): Variables or bands to drop before opening. io_chunks (optional): Specifies the chunking strategy for loading data from EE. By default, this automatically calculates optional chunks based - on the `request_byte_limit`. + on ``request_byte_limit``. n_images (optional): The max number of EE images in the collection to open. Useful when there are a large number of images in the collection since calculating collection size can be slow. -1 indicates that all images should be included. - mask_and_scale (optional): Lazily scale (using scale_factor and - add_offset) and mask (using _FillValue). - decode_times (optional): Decode cf times (e.g., integers since "hours - since 2000-01-01") to np.datetime64. + mask_and_scale (optional): Lazily scale (using ``scale_factor`` and + ``add_offset``) and mask (using ``_FillValue``). + decode_times (optional): Decode CF times (e.g., integers since ``"hours + since 2000-01-01"``) to ``np.datetime64``. decode_timedelta (optional): If True, decode variables and coordinates with time units in {"days", "hours", "minutes", "seconds", "milliseconds", "microseconds"} into timedelta objects. If False, leave - them encoded as numbers. If None (default), assume the same value of - decode_time. + them encoded as numbers. If ``None`` (default), assume the same value + of ``decode_times``. use_cftime (optional): Only relevant if encoded dates come from a standard calendar (e.g. "gregorian", "proleptic_gregorian", "standard", or not specified). If None (default), attempt to decode times to @@ -960,35 +989,36 @@ def open_dataset( decode times to ``np.datetime64[ns]`` objects; if this is not possible raise an error. concat_characters (optional): Should character arrays be concatenated to - strings, for example: ["h", "e", "l", "l", "o"] -> "hello" - decode_coords (optional): bool or {"coordinates", "all"}, Controls which - variables are set as coordinate variables: - "coordinates" or True: Set - variables referred to in the ``'coordinates'`` attribute of the datasets - or individual variables as coordinate variables. - "all": Set variables - referred to in ``'grid_mapping'``, ``'bounds'`` and other attributes as - coordinate variables. + strings, for example: ``["h", "e", "l", "l", "o"] -> "hello"``. + decode_coords (optional): ``bool`` or ``{"coordinates", "all"}``. Controls + which variables are set as coordinate variables. Use + ``"coordinates"`` (or ``True``) to set variables referenced by the + ``'coordinates'`` attribute of datasets or individual variables as + coordinate variables. Use ``"all"`` to additionally set variables + referenced by ``'grid_mapping'``, ``'bounds'``, and related attributes + as coordinate variables. primary_dim_name (optional): Override the name of the primary dimension of - the output Dataset. By default, the name is 'time'. - primary_dim_property (optional): Override the `ee.Image` property for + the output Dataset. By default, the name is ``'time'``. + primary_dim_property (optional): Override the ``ee.Image`` property for which to derive the values of the primary dimension. By default, this is - 'system:time_start'. + ``'system:time_start'``. ee_mask_value (optional): Value to mask to EE nodata values. By default, - this is 'np.iinfo(np.int32).max' i.e. 2147483647. - request_byte_limit: the max allowed bytes to request at a time from Earth - Engine. By default, it is 48MBs. + this is ``np.iinfo(np.int32).max`` (i.e., ``2147483647``). + request_byte_limit: The max allowed bytes to request at a time from Earth + Engine. By default, it is ``48 * 1024 * 1024`` (48 MB). ee_init_if_necessary: boolean flag to set if auto initialize for Earth - Engine should be attempted. Set to True if using distributed compute + Engine should be attempted. Set to ``True`` if using distributed compute frameworks. ee_init_kwargs: keywords to pass to Earth Engine Initialize when attempting to auto init for remote workers. executor_kwargs (optional): A dictionary of keyword arguments to pass to - the ThreadPoolExecutor that handles the parallel computation of pixels - i.e. {'max_workers': 2}. + the ``ThreadPoolExecutor`` that handles the parallel computation of + pixels, for example ``{'max_workers': 2}``. getitem_kwargs (optional): Exponential backoff kwargs passed into the - xarray function to index the array (`robust_getitem`). - - 'max_retries', the maximum number of retry attempts. Defaults to 6. - - 'initial_delay', the initial delay in milliseconds before the first - retry. Defaults to 500. + xarray function used to index the array (``robust_getitem``). Supported + keys include ``'max_retries'`` (maximum retry attempts, default 6) and + ``'initial_delay'`` (initial delay in milliseconds before the first + retry, default 500). fast_time_slicing (optional): Whether to perform an optimization that makes slicing an ImageCollection across time faster. This optimization loads EE images in a slice by ID, so any modifications to images in a