Skip to content

chore(typing): set up mypy and tighten the package to mypy --strict#314

Merged
thodson-usgs merged 3 commits into
DOI-USGS:mainfrom
thodson-usgs:chore/setup-mypy
Jun 1, 2026
Merged

chore(typing): set up mypy and tighten the package to mypy --strict#314
thodson-usgs merged 3 commits into
DOI-USGS:mainfrom
thodson-usgs:chore/setup-mypy

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented Jun 1, 2026

Why

The package ships a py.typed marker — it advertises itself as typed to downstream users — but nothing type-checked it. This adds mypy, gets a clean run, and tightens all the way to mypy --strict.

Two commits:

  1. Set up mypy + clean baseline — config, CI job, and the fixes to reach 0 errors under a lenient config.
  2. Annotate the package + enable strict = true — annotate the remaining 71 functions and resolve every strict finding.

Setup

  • [tool.mypy] in pyproject.tomlstrict = true, target python_version = "3.9" (the project floor), scoped to dataretrieval. The one remaining relaxation is ignore_missing_imports (untyped third-party libs like pandas/geopandas/anyio → Any); dropping that via pandas-stubs + per-module overrides is a possible follow-up.
  • mypy<2 in the [test] extra (pinned <2 so it can target 3.9; mypy 2.x requires ≥3.10).
  • A type-check CI job, parallel to the ruff lint job.

What got fixed

Baseline (commit 1):

  • One annotation cleared ~55 errors: HTTPX_DEFAULTS: dict[str, Any] so **HTTPX_DEFAULTS type-checks when splatted into httpx.get / AsyncClient.
  • A real 3.9 bug: utils.py lacked from __future__ import annotations, so the new str | None annotations would have been a runtime error on Python 3.9 — mypy (targeting 3.9) caught it.
  • ~22 other annotation gaps (BaseMetadata.comment typed None, list-invariance in _format_api_dates, the dict | None indexing in ratings._search, nldi bool/Literal reuse, …).

Strict (commit 2):

  • Typed the @_deprecated decorator with a signature-preserving TypeVar (3.9-safe), un-erasing the decorated nwis getters.
  • Annotated all 71 functions — getters as tuple[pd.DataFrame, <Metadata>], parameterized bare dict generics, a few justified casts (NLDI FeatureCollection endpoints, format-keyed StreamStats), defunct stubs as -> NoReturn.

Behavior

Annotations + type-narrowing only. The single runtime-visible change (from commit 1) is nldi.search() raising a clearer ValueError up front for a basin search missing feature_source / feature_id. 469 tests pass (2 skipped), unchanged.

Notes / surfaced findings

  • Name collision in the public namespace (pre-existing): dataretrieval.get_ratings (nwis vs waterdata) and dataretrieval.what_sites (nwis vs wqp) collide across the import * in __init__.py — the modern definitions win by import order. mypy flagged the last-binding-wins shadowing; it's silenced with a comment here, but it's a real API-surface question worth resolving (e.g. via __all__).
  • The untracked WIP dataretrieval/ngwmn.py (still on requests) isn't in the repo, so CI won't see it; when committed it'll need its own annotations to pass strict.
  • Independent of refactor(errors): unify request errors under a DataRetrievalError root #313.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title chore(typing): set up mypy and fix the type errors it surfaces chore(typing): set up mypy and tighten the package to mypy --strict Jun 1, 2026
thodson-usgs and others added 3 commits June 1, 2026 08:23
The package ships a ``py.typed`` marker (advertising itself as typed to
downstream users) but nothing type-checked it. Add mypy and get a clean run.

Setup:
  - [tool.mypy] in pyproject.toml: a lenient first-pass config
    (ignore_missing_imports, target python_version 3.9), scoped to the
    dataretrieval package.
  - mypy<2 added to the [test] extra (<2 so it can still target 3.9).
  - a type-check job in the CI workflow, parallel to the ruff lint job.

Fixes (mypy went from 78 errors to 0 on the tracked package):
  - HTTPX_DEFAULTS annotated dict[str, Any] so **-splatting it into
    httpx.get / httpx.AsyncClient type-checks -- cleared ~55 errors across
    7 call sites at once.
  - utils.py gains `from __future__ import annotations`: mypy (targeting
    3.9) caught that the new `str | None` annotations there would be a
    runtime error on 3.9, because this module -- unlike the rest of the
    package -- lacked the future import.
  - BaseMetadata.comment annotated `str | None` (was inferred `None`, which
    rejected every subclass that assigns a comment string).
  - _format_api_dates: accept Sequence[str | None] (covariant) so a
    list[str] caller type-checks, and build the formatted list with an
    early return so the final join sees list[str].
  - _as_str_list: delegate to _normalize_str_iterable then wrap, so the
    declared return type list[str] | None holds.
  - _next_req_url: declare next_host / cur_host as `str | None`.
  - ratings._search: build the query dict in a non-Optional local before
    aliasing it to the loop's `params` (which toggles to None per page).
  - nldi: drop the bool->str / Literal->str variable reuse; guard the basin
    branch so feature_source / feature_id are non-None before get_basin.
  - chunking: narrow the optional filter before _is_chunkable; fix a stale
    `# type: ignore` error code.

The fixes are annotations and type-narrowing guards. The only runtime-visible
change is that nldi.search() now raises a clear ValueError up front when a
basin search is missing feature_source/feature_id, where the same condition
previously raised deeper inside get_basin. 259 tests pass across the affected
suites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Builds on the lenient baseline: annotate the 71 previously-unannotated
functions, resolve the remaining strict findings, and flip the mypy config to
``strict = true`` (keeping only ``ignore_missing_imports`` for untyped
third-party libraries).

- nwis: type the ``@_deprecated`` decorator with a signature-preserving
  ``TypeVar`` (3.9-safe; ParamSpec would need 3.10), which un-erases the
  decorated public getters; annotate the getters and helpers; defunct stubs
  that only ``raise`` are typed ``-> NoReturn``.
- wqp / nldi / nadp / streamstats / samples / utils / waterdata{utils,api,
  nearest,ratings,chunking}: precise parameter and return annotations
  (``tuple[pd.DataFrame, <Metadata>]`` for the getters), parameterized bare
  ``dict`` generics, and a few justified ``cast``s where a callee's union
  return is statically wider than the single concrete type a caller is
  guaranteed (NLDI FeatureCollection endpoints, format-keyed StreamStats).
- __init__: ``get_ratings`` (nwis vs waterdata) and ``what_sites`` (nwis vs
  wqp) collide across the star-imports; the modern definitions win by import
  order. mypy can't model last-binding-wins, so the two re-binding mismatches
  are silenced with an explanatory comment. (The collision itself is a
  pre-existing public-namespace question, left as-is here.)

Annotations only -- no runtime behavior change. ``mypy --strict`` is clean and
the full test suite (469 passed, 2 skipped) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The type-check job ran ``pip install .[test]``, pulling pytest/coverage/ruff/
pytest-httpx just to run mypy. Add a minimal ``type-check`` extra (just
``mypy<2``, pinned once; ``test`` self-references it) and install
``.[type-check]`` instead — the job's install drops from 26 to 16 packages.
mypy --strict is unchanged (0 errors).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs marked this pull request as ready for review June 1, 2026 16:39
@thodson-usgs thodson-usgs merged commit dca49a7 into DOI-USGS:main Jun 1, 2026
9 checks passed
@thodson-usgs thodson-usgs deleted the chore/setup-mypy branch June 1, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant