Skip to content

Refresh geotiff mmap cache when the file at a path is replaced#1506

Open
brendancol wants to merge 1 commit intomainfrom
fix-mmap-cache-stale-inode
Open

Refresh geotiff mmap cache when the file at a path is replaced#1506
brendancol wants to merge 1 commit intomainfrom
fix-mmap-cache-stale-inode

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • The shared _MmapCache keys entries by realpath and never re-stat'd after the first open. After an atomic-rename overwrite (which is what to_geotiff does via tempfile.mkstemp + os.replace) a re-read served bytes from the unlinked old file because the realpath still matched the cache key. Same hazard hits in-place truncate-and-rewrite from external tools.
  • Cache entries now carry (st_ino, st_size, st_mtime_ns); acquire stats the path and drops the cached entry when the ident has changed. Idle entries get closed; in-use mmaps stay valid for their current holders (they still own a reference) but are unlinked from the cache so subsequent acquire calls re-mmap the new file.
  • Reproduces on the numpy and GPU read paths: write A, read A, overwrite at same path with B, read returns A. After the fix, read returns B on every backend.

Test plan

  • pytest xrspatial/geotiff/tests/test_polish_1488.py::TestP3MmapLRU passes (3 prior + 2 new).
  • Full geotiff suite: pytest xrspatial/geotiff/tests/ passes (the 3 pre-existing TestPalette failures are unrelated to the cache and fail on origin/main too).
  • GPU read path verified end-to-end via read_geotiff_gpu round-trip.

The shared `_MmapCache` keyed entries by realpath and never re-stat'd
after the first open. `to_geotiff` writes via `tempfile.mkstemp` +
`os.replace`, which keeps the path stable but swaps the inode. A
process that read a TIFF, overwrote it via `to_geotiff`, then read
again would get bytes from the unlinked old file -- the cache served
the prior mmap because the realpath still matched. Same hazard hits
in-place truncate-and-rewrite from any external tool.

Cache entries now carry `(st_ino, st_size, st_mtime_ns)`; `acquire`
stat's the path and drops the cached entry when the ident has
changed. Idle entries are closed; in-use mmaps stay valid for their
current holders (they still own a reference) but are unlinked from
the cache so subsequent `acquire` calls re-mmap the new file.

Tests cover atomic-rename replacement and in-place truncate
overwrite.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant