Skip to content

fix(waterdata): raise RequestTooLarge for an unchunkable over-budget request (not a silent 414)#309

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/chunker-size-check-unchunkable
Jun 1, 2026
Merged

fix(waterdata): raise RequestTooLarge for an unchunkable over-budget request (not a silent 414)#309
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/chunker-size-check-unchunkable

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Problem

ChunkPlan's "no chunkable axes" branch returned immediately without sizing the request — its comment even said "if that produces an over-budget URL, the server (or httpx itself) rejects." So a single large CQL-text filter with one big IN (...) clause — which has no top-level OR, hence yields no chunk axis — was shipped verbatim and failed with an opaque HTTP 414, not even a RequestTooLarge:

ids = ", ".join(f"'USGS-{i:08d}'" for i in range(1000))
get_daily(filter=f"monitoring_location_id IN ({ids})", filter_lang="cql-text")
# 17 KB request -> 414, no actionable error
# (the equivalent get_daily(monitoring_location_id=[...]) chunks fine)

Fix

Size-check the no-axes path: if the single request fits the byte limit, pass through exactly as before (the common hot path); if it's over budget there's nothing to split, so raise RequestTooLarge with actionable guidance (narrow the query / simplify the filter / split manually) instead of shipping it.

Verification

(a) 1000-id CQL IN filter -> RequestTooLarge ✓   (was: shipped -> 414)
(b) small scalar query    -> passthrough (0 axes), no raise ✓
(c) monitoring_location_id=[2000 ids] -> chunked (1 axis), no raise ✓

The chunking suite passes (the old test that asserted "pass an over-budget request through (the server may 414)" is updated to expect RequestTooLarge, plus a fits→passthrough case). ruff clean.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs force-pushed the fix/chunker-size-check-unchunkable branch from b728e91 to 8b952bb Compare June 1, 2026 12:23
…request

ChunkPlan's "no chunkable axes" branch returned immediately without sizing the
request, deliberately leaving an over-budget URL for the server to reject. So a
single large CQL-text `filter` with one big `IN (...)` clause (no top-level
`OR`, hence no chunk axis) was shipped verbatim and failed with an opaque HTTP
414 — and not even RequestTooLarge. (The equivalent
monitoring_location_id=[...] chunks fine.)

Size-check the no-axes path: if the single request fits, pass through as
before; if it's over budget and in the chunker's domain (cql-text / GET
multi-value) there's nothing left to split, raise RequestTooLarge with
actionable guidance instead of shipping it.

A cql-json filter is NOT in the chunker's domain — chunking splits only
cql-text — so an over-budget cql-json request is passed through unchanged
regardless of size; the server judges it, not the chunker. This preserves the
existing cql-json passthrough contract (tests/waterdata_filters_test.py::
test_cql_json_filter_is_not_chunked).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the fix/chunker-size-check-unchunkable branch from 8b952bb to 07e27b6 Compare June 1, 2026 12:33
@thodson-usgs thodson-usgs marked this pull request as ready for review June 1, 2026 16:41
@thodson-usgs thodson-usgs merged commit eddcffc into DOI-USGS:main Jun 1, 2026
8 checks passed
@thodson-usgs thodson-usgs deleted the fix/chunker-size-check-unchunkable branch June 1, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant