feat(pineapple): add gget pineapple module (#161) by Elarwei001 · Pull Request #245 · scverse/gget

Elarwei001 · 2026-06-25T00:23:12Z

Summary

Adds a new gget pineapple module to list and download curated bio-imaging data from Pineapple.

A point worth being explicit about: Pineapple is a Rust command-line tool for image-based cell profiling — it does not expose a REST/JSON API. What it does provide (via its pineapple download command) is a curated, standardized catalog of bio-imaging datasets and pre-trained model weights, hosted on Google Drive (each resource maps to a hardcoded Google Drive file ID in the pineapple-data crate). This module mirrors that catalog and downloads the resources directly from the same Google Drive host, so users get the data without installing the Rust binary.

What it does

List the catalog by category: segmentation (12 datasets), benchmark (13 datasets), or weights (5 pre-trained model weights). Returns names, authors, sizes (GB), licenses, filenames, and Google Drive IDs.
Download a resource by name directly from Google Drive (handles the large-file virus-scan-warning confirmation flow).
Available in both the Python API (gget.pineapple(...)) and the command line (gget pineapple ...).

# List the segmentation datasets
gget pineapple --category segmentation

# Download one by name
gget pineapple vicar_2021 --download --out_dir ./pineapple_data

# List pre-trained model weights
gget pineapple --category weights

import gget
gget.pineapple(category="segmentation")
gget.pineapple("vicar_2021", download=True, out_dir="./pineapple_data")

The catalog includes each dataset's original author and license so users can check usage terms (some datasets are non-commercial only).

Testing

10 network-free unit tests (tests/test_pineapple.py, mocked downloads) covering catalog contents/columns, filename conventions, Google Drive virus-scan-warning form parsing, and download routing. No real downloads happen in tests.
ruff: clean on all changed files.
Live-verified: a real Google Drive file ID from the catalog returns HTTP 200 (application/octet-stream, Content-Disposition: attachment; filename="arvidsson-2022.tar.gz"), matching the module's filename convention.
Docs added (docs/src/en/pineapple.md + SUMMARY entry) and an updates.md bullet under Version ≥ 0.30.8.

⚠️ Note for maintainers

Because Pineapple hardcodes its dataset Google Drive file IDs in its own source (the upstream author notes this isn't ideal and may move them out of the library later), this module's catalog is a point-in-time mirror of those IDs. If upstream Pineapple adds/changes datasets, gget's catalog would need a corresponding refresh. I went with mirroring because there is no programmatic API to query — flagging this wrapper approach for your input in case you'd prefer a different strategy (e.g. fetching the registry from a maintained upstream manifest if/when one exists).

Resolves #161

…data (scverse#161) New module `gget pineapple` lists and downloads the curated bio-imaging datasets (segmentation/benchmark) and pre-trained model weights distributed by Pineapple (https://github.com/tomouellette/pineapple, suggested by Prof. Anne Carpenter). The catalog (names, authors, sizes, licenses, Google Drive IDs) is mirrored from pineapple's pineapple-data crate; resources are downloaded directly from their Google Drive host (with the large-file virus-scan-warning bypass), so no Rust binary is required. Exposed via the Python API and the command line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

for more information, see https://pre-commit.ci

codecov-commenter · 2026-06-25T00:37:55Z

Codecov Report

❌ Patch coverage is 70.93023% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.54%. Comparing base (5cf607f) to head (af30558).
⚠️ Report is 1 commits behind head on dev.

Files with missing lines	Patch %	Lines
gget/gget_pineapple.py	70.23%	25 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #245      +/-   ##
==========================================
+ Coverage   56.14%   56.54%   +0.40%     
==========================================
  Files          29       30       +1     
  Lines        9244     9330      +86     
==========================================
+ Hits         5190     5276      +86     
  Misses       4054     4054

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Elarwei001 and others added 2 commits June 24, 2026 23:20

[pre-commit.ci] auto fixes from pre-commit.com hooks

af30558

for more information, see https://pre-commit.ci

Elarwei001 marked this pull request as draft June 25, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pineapple): add gget pineapple module (#161)#245

feat(pineapple): add gget pineapple module (#161)#245
Elarwei001 wants to merge 2 commits into
scverse:devfrom
Elarwei001:feature/pineapple-161

Elarwei001 commented Jun 25, 2026

Uh oh!

codecov-commenter commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Elarwei001 commented Jun 25, 2026

Summary

What it does

Testing

⚠️ Note for maintainers

Uh oh!

codecov-commenter commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Jun 25, 2026 •

edited

Loading