Skip to content

HATS Phase‑2: lazy partition-aware dataset with bundle cache, docs, and tests#879

Draft
mtauraso wants to merge 2 commits into
mainfrom
codex/draft-detailed-plan-for-phase-2
Draft

HATS Phase‑2: lazy partition-aware dataset with bundle cache, docs, and tests#879
mtauraso wants to merge 2 commits into
mainfrom
codex/draft-detailed-plan-for-phase-2

Conversation

@mtauraso

Copy link
Copy Markdown
Member

Motivation

  • Replace eager full-catalog materialization with a lazy, partition-aware access model to support large HATS catalogs.
  • Improve random/clustered read performance by introducing a bounded row-bundle cache and column projection.
  • Provide a clear design and usage examples to guide implementation and adoption.

Description

  • Add lazy index/cache/accessor components: HATSPartitionIndex, HATSRowBundleCache, and HATSLazyAccessor implementing global index mapping, LRU row-bundle caching, and partition-aware reads.
  • Refactor HyraxHATSDataset to open HATS catalogs lazily via lsdb.open_catalog/lsdb.read_hats, extract per-dataset hats config, compute __len__ from the partition index, and route per-column get_<column> getters to the lazy accessor.
  • Implement safe fallbacks for catalogs without partition metadata and conservative full-catalog compute when necessary, and add column-projection logic and required-column detection from the data_request.
  • Add design doc specs/hasts_dataset_phase2.md, example notebook docs/notebooks/hats_dataset_phase2_loopback.ipynb, and unit tests in tests/hyrax/test_hats_dataset_phase2.py covering index resolution, catalog open fallback, and cache behavior.

Testing

  • Added unit tests in tests/hyrax/test_hats_dataset_phase2.py that validate HATSPartitionIndex mapping, preference for lsdb.open_catalog with fallback to read_hats, and basic bundle-cache hit/miss behavior.
  • These tests exercise the fake-catalog abstractions and the new dataset accessor logic.
  • No automated test execution or CI results were included in this rollout; test execution should be performed in CI or locally (e.g., pytest tests/hyrax/test_hats_dataset_phase2.py).

Codex Task

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mtauraso mtauraso marked this pull request as draft April 16, 2026 21:08
@mtauraso mtauraso self-assigned this Apr 23, 2026
Base automatically changed from mtauraso/hats-dataset to main April 25, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant