refactor(errors): unify request errors under a DataRetrievalError root by thodson-usgs · Pull Request #313 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-06-01T00:51:03Z

Problem

An HTTP failure surfaces as a different exception type depending on which module made the request, so there is no single except a caller can use to handle "any dataretrieval request failure":

Path	Raised on failure
legacy `query()` (`wqp`, `nwis`, `ngwmn`, `nldi`)	`ValueError`, or `NoSitesError(Exception)`
`waterdata`	`RateLimited` / `ServiceUnavailable` (`RuntimeError`), `RequestTooLarge` (`ValueError`), `ChunkInterrupted` (`RuntimeError`)
`nadp`, `streamstats`	bare `httpx.HTTPStatusError`

Change

Introduce a single base class, dataretrieval.utils.DataRetrievalError (re-exported as dataretrieval.DataRetrievalError), and root the existing exceptions on it. Unification is at the root only — each exception keeps the built-in type it has always raised, so this is backward compatible:

Legacy query() path — new typed errors BadRequestError (400), NotFoundError (404), RequestTooLargeError (414 / over-long URL), ServiceUnavailableError (5xx), each (DataRetrievalError, ValueError). Same messages, same statuses raised on; still ValueError.
The inline status ladder in query() is consolidated into a reusable _raise_for_status() helper.
NoSitesError now subclasses DataRetrievalError (was Exception).
waterdata — RateLimited / ServiceUnavailable / RequestTooLarge / ChunkInterrupted gain DataRetrievalError as an additional base (still RuntimeError / ValueError; the chunker's isinstance classification is unaffected).

Callers can now write:

try:
    df, md = dataretrieval.wqp.get_results(...)
except dataretrieval.DataRetrievalError:
    ...   # catches a failure from any module

while except ValueError (legacy path) and except RuntimeError (waterdata retryable types) continue to work.

Backward compatibility

No behavior change beyond the widened type hierarchy — all raised messages and the exact set of statuses query() raises on are identical. 208 tests pass (200 existing + 8 new). The new tests assert both the unified hierarchy and the preserved built-in bases. Live-verified through the refactored query(): a real 200 (happy path) and a real over-long-URL 414 (now RequestTooLargeError, still caught as ValueError).

Out of scope (proposed follow-ups)

This is a focused first step that establishes the root and migrates the two major subsystems. Deliberately left for follow-up PRs, flagged here for discussion:

nadp / streamstats still raise httpx.HTTPStatusError — a one-line swap each to _raise_for_status, but it changes their public error type.
nldi's manual non-200 ValueError and nwis._parse_json_or_raise's HTML-detection ValueError aren't rooted yet.
waterdata._raise_for_non_200's catch-all for non-retryable 4xx stays a bare RuntimeError — it is load-bearing for the chunker's fatal-vs-resumable classifier, so rooting it needs a dedicated ClientError(DataRetrievalError, RuntimeError) plus a chunker-classification review.

🤖 Generated with Claude Code

…onomy An HTTP failure used to surface as a different exception type depending on which module made the request: - legacy query() (wqp, nwis, ngwmn, nldi): ValueError, or NoSitesError(Exception) - waterdata: RateLimited / ServiceUnavailable (RuntimeError), RequestTooLarge (ValueError), ChunkInterrupted (RuntimeError) - nadp / streamstats: bare httpx.HTTPStatusError so no single `except` clause could catch "any dataretrieval request failure." Add a `DataRetrievalError` base and root the existing exceptions on it, so a caller can `except dataretrieval.DataRetrievalError` for any request failure. Each subclass also keeps the built-in it has historically raised (BadRequestError is a ValueError; RateLimited is a RuntimeError), so existing `except ValueError` / `except RuntimeError` handlers keep working unchanged. The taxonomy lives in a new, dependency-free `dataretrieval/exceptions.py` — the single home for it (cf. requests.exceptions, botocore.exceptions) — rather than buried in the pandas/httpx-heavy utils.py: - exceptions.py: DataRetrievalError + the legacy query() types (BadRequestError, NotFoundError, RequestTooLargeError, ServiceUnavailableError; all ValueError) + NoSitesError + the Water Data transport types (RateLimited, ServiceUnavailable, RequestTooLarge). Explicit __all__, re-exported from dataretrieval/__init__.py so the top-level export is intentional, not an accident of `import *`. - utils.py keeps the behavior: query() and a consolidated _raise_for_status() status->exception mapper; it imports the types it raises. - chunking.py keeps the chunker-specific resumable types (ChunkInterrupted and its subclasses, which carry a ChunkedCall resume handle) and imports the transport types from exceptions. The old import paths still resolve (utils.NoSitesError, waterdata.chunking.RateLimited, ...) via re-import, so nothing downstream breaks. Out of scope (follow-ups): nadp/streamstats still raise httpx.HTTPStatusError; nldi's manual non-200 ValueError isn't rooted; waterdata.utils._raise_for_non_200's catch-all for non-retryable 4xx stays a bare RuntimeError (a deliberate fatal/non-resumable signal the chunker relies on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ils/ package ## Why `dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated domains -- request building, response parsing, result finalization, pagination/async, stats post-processing, and validation -- plus constants and the public engines. It was the package's one genuine god-module. (An architecture review found the package's OO is otherwise appropriate, so this is a modularization, not an OO-pattern refactor.) ## What Convert `utils.py` into a `utils/` package: the public surface stays in `utils/__init__.py` (a thin facade) and the implementation is split across six cohesive submodules, moving every definition verbatim (no signature/logic changes): | submodule | holds | |---|---| | `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) | | `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after | | `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) | | `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) | | `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping | | `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) | `utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names), so every existing `from dataretrieval.waterdata.utils import ...` and `mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the package's `__init__`, so the import path is unchanged from when it was a module. Seven functions remain physically defined in `utils/__init__.py` (`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`, `_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*` name, and a function's global lookups resolve in its defining module. The geopandas probe stays with them, and the pagination logger keeps the name `dataretrieval.waterdata.utils` (a caplog test pins it). These could later move to the `engine`/`responses` submodules -- which do not import the package, so there is no cycle -- but that requires re-targeting the test patches; left as a follow-up. ## Behavior-preserving - 56 top-level definitions moved verbatim -- none lost, none duplicated. - 469 tests pass, 2 skipped; ruff clean; submodules import without cycles (`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <- `__init__`); `chunking.py` untouched. ## Note Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/` imports -- sequence on merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

thodson-usgs mentioned this pull request Jun 1, 2026

chore(typing): set up mypy and tighten the package to mypy --strict #314

Merged

thodson-usgs force-pushed the refactor/unify-http-error-handling branch from 2afd2d7 to 9801177 Compare June 1, 2026 17:05

thodson-usgs mentioned this pull request Jun 1, 2026

refactor(waterdata): decompose the 2033-LOC utils.py god-module into cohesive modules #318

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(errors): unify request errors under a DataRetrievalError root#313

refactor(errors): unify request errors under a DataRetrievalError root#313
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/unify-http-error-handling

thodson-usgs commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented Jun 1, 2026

Problem

Change

Backward compatibility

Out of scope (proposed follow-ups)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant