Skip to content

refactor(errors): unify request errors under a DataRetrievalError root#313

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/unify-http-error-handling
Draft

refactor(errors): unify request errors under a DataRetrievalError root#313
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/unify-http-error-handling

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Problem

An HTTP failure surfaces as a different exception type depending on which module made the request, so there is no single except a caller can use to handle "any dataretrieval request failure":

Path Raised on failure
legacy query() (wqp, nwis, ngwmn, nldi) ValueError, or NoSitesError(Exception)
waterdata RateLimited / ServiceUnavailable (RuntimeError), RequestTooLarge (ValueError), ChunkInterrupted (RuntimeError)
nadp, streamstats bare httpx.HTTPStatusError

Change

Introduce a single base class, dataretrieval.utils.DataRetrievalError (re-exported as dataretrieval.DataRetrievalError), and root the existing exceptions on it. Unification is at the root only — each exception keeps the built-in type it has always raised, so this is backward compatible:

  • Legacy query() path — new typed errors BadRequestError (400), NotFoundError (404), RequestTooLargeError (414 / over-long URL), ServiceUnavailableError (5xx), each (DataRetrievalError, ValueError). Same messages, same statuses raised on; still ValueError.
  • The inline status ladder in query() is consolidated into a reusable _raise_for_status() helper.
  • NoSitesError now subclasses DataRetrievalError (was Exception).
  • waterdataRateLimited / ServiceUnavailable / RequestTooLarge / ChunkInterrupted gain DataRetrievalError as an additional base (still RuntimeError / ValueError; the chunker's isinstance classification is unaffected).

Callers can now write:

try:
    df, md = dataretrieval.wqp.get_results(...)
except dataretrieval.DataRetrievalError:
    ...   # catches a failure from any module

while except ValueError (legacy path) and except RuntimeError (waterdata retryable types) continue to work.

Backward compatibility

No behavior change beyond the widened type hierarchy — all raised messages and the exact set of statuses query() raises on are identical. 208 tests pass (200 existing + 8 new). The new tests assert both the unified hierarchy and the preserved built-in bases. Live-verified through the refactored query(): a real 200 (happy path) and a real over-long-URL 414 (now RequestTooLargeError, still caught as ValueError).

Out of scope (proposed follow-ups)

This is a focused first step that establishes the root and migrates the two major subsystems. Deliberately left for follow-up PRs, flagged here for discussion:

  • nadp / streamstats still raise httpx.HTTPStatusError — a one-line swap each to _raise_for_status, but it changes their public error type.
  • nldi's manual non-200 ValueError and nwis._parse_json_or_raise's HTML-detection ValueError aren't rooted yet.
  • waterdata._raise_for_non_200's catch-all for non-retryable 4xx stays a bare RuntimeError — it is load-bearing for the chunker's fatal-vs-resumable classifier, so rooting it needs a dedicated ClientError(DataRetrievalError, RuntimeError) plus a chunker-classification review.

🤖 Generated with Claude Code

…onomy

An HTTP failure used to surface as a different exception type depending on which
module made the request:

  - legacy query() (wqp, nwis, ngwmn, nldi): ValueError, or NoSitesError(Exception)
  - waterdata: RateLimited / ServiceUnavailable (RuntimeError),
    RequestTooLarge (ValueError), ChunkInterrupted (RuntimeError)
  - nadp / streamstats: bare httpx.HTTPStatusError

so no single `except` clause could catch "any dataretrieval request failure."

Add a `DataRetrievalError` base and root the existing exceptions on it, so a
caller can `except dataretrieval.DataRetrievalError` for any request failure.
Each subclass also keeps the built-in it has historically raised (BadRequestError
is a ValueError; RateLimited is a RuntimeError), so existing `except ValueError` /
`except RuntimeError` handlers keep working unchanged.

The taxonomy lives in a new, dependency-free `dataretrieval/exceptions.py` — the
single home for it (cf. requests.exceptions, botocore.exceptions) — rather than
buried in the pandas/httpx-heavy utils.py:

  - exceptions.py: DataRetrievalError + the legacy query() types (BadRequestError,
    NotFoundError, RequestTooLargeError, ServiceUnavailableError; all ValueError)
    + NoSitesError + the Water Data transport types (RateLimited, ServiceUnavailable,
    RequestTooLarge). Explicit __all__, re-exported from dataretrieval/__init__.py
    so the top-level export is intentional, not an accident of `import *`.
  - utils.py keeps the behavior: query() and a consolidated _raise_for_status()
    status->exception mapper; it imports the types it raises.
  - chunking.py keeps the chunker-specific resumable types (ChunkInterrupted and
    its subclasses, which carry a ChunkedCall resume handle) and imports the
    transport types from exceptions.

The old import paths still resolve (utils.NoSitesError,
waterdata.chunking.RateLimited, ...) via re-import, so nothing downstream breaks.

Out of scope (follow-ups): nadp/streamstats still raise httpx.HTTPStatusError;
nldi's manual non-200 ValueError isn't rooted; waterdata.utils._raise_for_non_200's
catch-all for non-retryable 4xx stays a bare RuntimeError (a deliberate
fatal/non-resumable signal the chunker relies on).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the refactor/unify-http-error-handling branch from 2afd2d7 to 9801177 Compare June 1, 2026 17:05
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 2, 2026
…ils/ package

## Why

`dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated
domains -- request building, response parsing, result finalization,
pagination/async, stats post-processing, and validation -- plus constants and
the public engines. It was the package's one genuine god-module. (An
architecture review found the package's OO is otherwise appropriate, so this is
a modularization, not an OO-pattern refactor.)

## What

Convert `utils.py` into a `utils/` package: the public surface stays in
`utils/__init__.py` (a thin facade) and the implementation is split across six
cohesive submodules, moving every definition verbatim (no signature/logic
changes):

| submodule | holds |
|---|---|
| `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) |
| `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after |
| `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) |
| `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) |
| `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping |
| `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) |

`utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names),
so every existing `from dataretrieval.waterdata.utils import ...` and
`mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import
sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the
package's `__init__`, so the import path is unchanged from when it was a module.

Seven functions remain physically defined in `utils/__init__.py`
(`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`,
`_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test
suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*`
name, and a function's global lookups resolve in its defining module. The
geopandas probe stays with them, and the pagination logger keeps the name
`dataretrieval.waterdata.utils` (a caplog test pins it). These could later move
to the `engine`/`responses` submodules -- which do not import the package, so
there is no cycle -- but that requires re-targeting the test patches; left as a
follow-up.

## Behavior-preserving

- 56 top-level definitions moved verbatim -- none lost, none duplicated.
- 469 tests pass, 2 skipped; ruff clean; submodules import without cycles
  (`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <-
  `__init__`); `chunking.py` untouched.

## Note

Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/`
imports -- sequence on merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant