Fetch COG tiles concurrently in HTTP path to mask RTT#1487
Merged
brendancol merged 2 commits intomainfrom May 5, 2026
Merged
Conversation
_read_cog_http walked tiles one at a time, blocking on each range request before sending the next. Over a 50 ms RTT link, a 100-tile COG paid 5 s in round trips before any data flowed. Add _HTTPSource.read_ranges, which takes a list of (offset, length) pairs and fetches them through a ThreadPoolExecutor (default 8 workers, override via XRSPATIAL_COG_HTTP_WORKERS). _read_cog_http collects every tile range up front, fetches them in parallel, then decodes and places each tile in input order. read_range is unchanged so other call sites are unaffected. Local file reads do not go through this path. Tests: order preservation, empty-list handling, single-request fast path, concurrency speedup against an artificial-latency fake source, and a round-trip correctness test against a local http.server.
The previous test_read_ranges_concurrency_masks_latency required 20 fake-latency requests at 50ms each to finish under 500ms. On a busy macOS CI runner the run took 519ms — 4% over the budget — and tripped fail-fast across the matrix. Track max-in-flight count in _FakeHTTPSource and assert >= 2. That proves the threadpool dispatches concurrently without depending on wall-clock timing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1480.
Summary
_read_cog_httpwalked tiles one at a time, blocking on each range request before sending the next. Over a 50 ms RTT link, a 100-tile COG paid 5 s in round trips before any data flowed.This PR adds
_HTTPSource.read_ranges, which takes a list of(offset, length)pairs and fetches them through aThreadPoolExecutor. Pool size defaults to 8 and can be overridden withXRSPATIAL_COG_HTTP_WORKERS._read_cog_httpnow collects every tile range up front, fetches them in parallel, then decodes and places each tile in input order._HTTPSource.read_rangeis unchanged, so no other call sites are affected. Local file reads do not go through this path.Measured on the test fake source (20 requests, 50 ms each, 8 workers): sequential bound is 1.00 s, concurrent fetch took 0.15 s, a 6.6x speedup.
Test plan
pytest xrspatial/geotiff/tests/test_cog_http_concurrent.py-- 6 new tests cover ordering, empty list, single-range fast path, concurrency speedup vs an artificial-latency fake, and round-trip correctness against a localhttp.server.pytest xrspatial/geotiff/tests/test_cog.py xrspatial/geotiff/tests/test_reader.py-- existing 32 tests still pass.