refactor(waterdata): decompose the 2033-LOC utils.py god-module into cohesive modules#318
Draft
thodson-usgs wants to merge 1 commit into
Draft
Conversation
…ils/ package
## Why
`dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated
domains -- request building, response parsing, result finalization,
pagination/async, stats post-processing, and validation -- plus constants and
the public engines. It was the package's one genuine god-module. (An
architecture review found the package's OO is otherwise appropriate, so this is
a modularization, not an OO-pattern refactor.)
## What
Convert `utils.py` into a `utils/` package: the public surface stays in
`utils/__init__.py` (a thin facade) and the implementation is split across six
cohesive submodules, moving every definition verbatim (no signature/logic
changes):
| submodule | holds |
|---|---|
| `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) |
| `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after |
| `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) |
| `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) |
| `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping |
| `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) |
`utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names),
so every existing `from dataretrieval.waterdata.utils import ...` and
`mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import
sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the
package's `__init__`, so the import path is unchanged from when it was a module.
Seven functions remain physically defined in `utils/__init__.py`
(`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`,
`_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test
suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*`
name, and a function's global lookups resolve in its defining module. The
geopandas probe stays with them, and the pagination logger keeps the name
`dataretrieval.waterdata.utils` (a caplog test pins it). These could later move
to the `engine`/`responses` submodules -- which do not import the package, so
there is no cycle -- but that requires re-targeting the test patches; left as a
follow-up.
## Behavior-preserving
- 56 top-level definitions moved verbatim -- none lost, none duplicated.
- 469 tests pass, 2 skipped; ruff clean; submodules import without cycles
(`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <-
`__init__`); `chunking.py` untouched.
## Note
Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/`
imports -- sequence on merge.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
df8943e to
cca750e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
dataretrieval/waterdata/utils.pyhad grown to 2033 LOC spanning ~6 unrelated domains — request building, response parsing, result finalization, pagination/async, stats post-processing, and validation — plus constants and the public engines. It was the package's one genuine god-module: changing how requests are built means navigating the same file as type coercion, datetime parsing, and the async driver.(An architecture review concluded the package's OO is otherwise appropriate — the chunker is well-factored, the modern getters are cleanly delegated — so this is a modularization, not an OO-pattern refactor.)
What
Split it into six cohesive private modules under
dataretrieval/waterdata/, moving every definition verbatim (no signature/logic changes):_constants.py_OUTPUT_ID_BY_SERVICE, regexes, param sets (dependency-free)_http.py_error_body,_raise_for_non_200, retry-after_validate.py_get_args,_check_*)_requests.py_construct_api_requests, CQL2, dates)_responses.py_engine.py_paginate,_run_sync, …)utils.py(2033 → 651 LOC) becomes a thin façade that re-exports the internal API (explicit__all__, 56 names), so every existingfrom dataretrieval.waterdata.utils import …andmock.patch("dataretrieval.waterdata.utils.<name>")keeps working — no import sites or tests were touched.Seven functions remain physically defined in the façade (
get_ogc_data,_fetch_once,get_stats_data,_get_resp_data,_ogc_parse_response,_walk_pages,_handle_stats_nesting) because the test suite monkeypatches them (orgpd) by their module-global name: a function's global lookups resolve in its defining module, so they must stay inutils.pyfor the patches to take effect. The geopandas import probe stays with them, and the pagination logger keeps the namedataretrieval.waterdata.utils(a caplog test pins it).Behavior-preserving
chunking.pyuntouched.Note
Overlaps with the in-flight error-taxonomy (#313) and namespace (#315) PRs on
waterdata/imports — sequence on merge (rebasing each surfaces any conflict, as we've done for the others).🤖 Generated with Claude Code