Skip to content

feat(pineapple): add gget pineapple module (#161)#245

Draft
Elarwei001 wants to merge 2 commits into
scverse:devfrom
Elarwei001:feature/pineapple-161
Draft

feat(pineapple): add gget pineapple module (#161)#245
Elarwei001 wants to merge 2 commits into
scverse:devfrom
Elarwei001:feature/pineapple-161

Conversation

@Elarwei001

Copy link
Copy Markdown
Contributor

Summary

Adds a new gget pineapple module to list and download curated bio-imaging data from Pineapple.

A point worth being explicit about: Pineapple is a Rust command-line tool for image-based cell profiling — it does not expose a REST/JSON API. What it does provide (via its pineapple download command) is a curated, standardized catalog of bio-imaging datasets and pre-trained model weights, hosted on Google Drive (each resource maps to a hardcoded Google Drive file ID in the pineapple-data crate). This module mirrors that catalog and downloads the resources directly from the same Google Drive host, so users get the data without installing the Rust binary.

What it does

  • List the catalog by category: segmentation (12 datasets), benchmark (13 datasets), or weights (5 pre-trained model weights). Returns names, authors, sizes (GB), licenses, filenames, and Google Drive IDs.
  • Download a resource by name directly from Google Drive (handles the large-file virus-scan-warning confirmation flow).
  • Available in both the Python API (gget.pineapple(...)) and the command line (gget pineapple ...).
# List the segmentation datasets
gget pineapple --category segmentation

# Download one by name
gget pineapple vicar_2021 --download --out_dir ./pineapple_data

# List pre-trained model weights
gget pineapple --category weights
import gget
gget.pineapple(category="segmentation")
gget.pineapple("vicar_2021", download=True, out_dir="./pineapple_data")

The catalog includes each dataset's original author and license so users can check usage terms (some datasets are non-commercial only).

Testing

  • 10 network-free unit tests (tests/test_pineapple.py, mocked downloads) covering catalog contents/columns, filename conventions, Google Drive virus-scan-warning form parsing, and download routing. No real downloads happen in tests.
  • ruff: clean on all changed files.
  • Live-verified: a real Google Drive file ID from the catalog returns HTTP 200 (application/octet-stream, Content-Disposition: attachment; filename="arvidsson-2022.tar.gz"), matching the module's filename convention.
  • Docs added (docs/src/en/pineapple.md + SUMMARY entry) and an updates.md bullet under Version ≥ 0.30.8.

⚠️ Note for maintainers

Because Pineapple hardcodes its dataset Google Drive file IDs in its own source (the upstream author notes this isn't ideal and may move them out of the library later), this module's catalog is a point-in-time mirror of those IDs. If upstream Pineapple adds/changes datasets, gget's catalog would need a corresponding refresh. I went with mirroring because there is no programmatic API to query — flagging this wrapper approach for your input in case you'd prefer a different strategy (e.g. fetching the registry from a maintained upstream manifest if/when one exists).

Resolves #161

Elarwei001 and others added 2 commits June 24, 2026 23:20
…data (scverse#161)

New module `gget pineapple` lists and downloads the curated bio-imaging
datasets (segmentation/benchmark) and pre-trained model weights
distributed by Pineapple (https://github.com/tomouellette/pineapple,
suggested by Prof. Anne Carpenter). The catalog (names, authors, sizes,
licenses, Google Drive IDs) is mirrored from pineapple's pineapple-data
crate; resources are downloaded directly from their Google Drive host
(with the large-file virus-scan-warning bypass), so no Rust binary is
required. Exposed via the Python API and the command line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 70.93023% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.54%. Comparing base (5cf607f) to head (af30558).
⚠️ Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
gget/gget_pineapple.py 70.23% 25 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #245      +/-   ##
==========================================
+ Coverage   56.14%   56.54%   +0.40%     
==========================================
  Files          29       30       +1     
  Lines        9244     9330      +86     
==========================================
+ Hits         5190     5276      +86     
  Misses       4054     4054              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Elarwei001 Elarwei001 marked this pull request as draft June 25, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants