refactor(waterdata): decompose the 2033-LOC utils.py god-module into cohesive modules by thodson-usgs · Pull Request #318 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-06-01T19:05:58Z

Why

dataretrieval/waterdata/utils.py had grown to 2033 LOC spanning ~6 unrelated domains — request building, response parsing, result finalization, pagination/async, stats post-processing, and validation — plus constants and the public engines. It was the package's one genuine god-module: changing how requests are built means navigating the same file as type coercion, datetime parsing, and the async driver.

(An architecture review concluded the package's OO is otherwise appropriate — the chunker is well-factored, the modern getters are cleanly delegated — so this is a modularization, not an OO-pattern refactor.)

What

Split it into six cohesive private modules under dataretrieval/waterdata/, moving every definition verbatim (no signature/logic changes):

module	holds
`_constants.py`	URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free)
`_http.py`	headers, `_error_body`, `_raise_for_non_200`, retry-after
`_validate.py`	arg normalization/validation (`_get_args`, `_check_*`)
`_requests.py`	request building (`_construct_api_requests`, CQL2, dates)
`_responses.py`	geometry-agnostic parsing / finalization / stats shaping
`_engine.py`	pagination/async driver (`_paginate`, `_run_sync`, …)

utils.py (2033 → 651 LOC) becomes a thin façade that re-exports the internal API (explicit __all__, 56 names), so every existing from dataretrieval.waterdata.utils import … and mock.patch("dataretrieval.waterdata.utils.<name>") keeps working — no import sites or tests were touched.

Seven functions remain physically defined in the façade (get_ogc_data, _fetch_once, get_stats_data, _get_resp_data, _ogc_parse_response, _walk_pages, _handle_stats_nesting) because the test suite monkeypatches them (or gpd) by their module-global name: a function's global lookups resolve in its defining module, so they must stay in utils.py for the patches to take effect. The geopandas import probe stays with them, and the pagination logger keeps the name dataretrieval.waterdata.utils (a caplog test pins it).

Behavior-preserving

56 top-level definitions moved AST-identically — none lost, none duplicated (verified mechanically).
469 tests pass, 2 skipped; ruff clean; all 6 modules import without cycles; chunking.py untouched.

Note

Overlaps with the in-flight error-taxonomy (#313) and namespace (#315) PRs on waterdata/ imports — sequence on merge (rebasing each surfaces any conflict, as we've done for the others).

🤖 Generated with Claude Code

…ils/ package ## Why `dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated domains -- request building, response parsing, result finalization, pagination/async, stats post-processing, and validation -- plus constants and the public engines. It was the package's one genuine god-module. (An architecture review found the package's OO is otherwise appropriate, so this is a modularization, not an OO-pattern refactor.) ## What Convert `utils.py` into a `utils/` package: the public surface stays in `utils/__init__.py` (a thin facade) and the implementation is split across six cohesive submodules, moving every definition verbatim (no signature/logic changes): | submodule | holds | |---|---| | `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) | | `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after | | `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) | | `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) | | `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping | | `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) | `utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names), so every existing `from dataretrieval.waterdata.utils import ...` and `mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the package's `__init__`, so the import path is unchanged from when it was a module. Seven functions remain physically defined in `utils/__init__.py` (`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`, `_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*` name, and a function's global lookups resolve in its defining module. The geopandas probe stays with them, and the pagination logger keeps the name `dataretrieval.waterdata.utils` (a caplog test pins it). These could later move to the `engine`/`responses` submodules -- which do not import the package, so there is no cycle -- but that requires re-targeting the test patches; left as a follow-up. ## Behavior-preserving - 56 top-level definitions moved verbatim -- none lost, none duplicated. - 469 tests pass, 2 skipped; ruff clean; submodules import without cycles (`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <- `__init__`); `chunking.py` untouched. ## Note Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/` imports -- sequence on merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

thodson-usgs force-pushed the refactor/split-waterdata-utils branch from df8943e to cca750e Compare June 2, 2026 00:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(waterdata): decompose the 2033-LOC utils.py god-module into cohesive modules#318

refactor(waterdata): decompose the 2033-LOC utils.py god-module into cohesive modules#318
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/split-waterdata-utils

thodson-usgs commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented Jun 1, 2026

Why

What

Behavior-preserving

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant