Releases · NygenAnalytics/CyteType · GitHub

02 Apr 03:13

parashardhapola

0.19.4 Latest

Latest

What's Changed

Fix NaN values in adata.obs by @suu-yi in #74
Gene symbols patch by @suu-yi in #76
logging improvements and version bump by @parashardhapola in #79

Full Changelog: 0.19.3...0.19.4

Contributors

parashardhapola and suu-yi

Assets 2

08 Mar 22:34

parashardhapola

0.19.3

What's Changed

Update version to 0.19.3 and enhance gene symbol handling by @parashardhapola in #72

Full Changelog: 0.19.2...0.19.3

Contributors

parashardhapola

Assets 2

08 Mar 10:33

parashardhapola

0.19.2

What's Changed

Update version to 0.19.2 and enhance metadata handling in CyteType by @parashardhapola in #71

Full Changelog: 0.19.1...0.19.2

Contributors

parashardhapola

Assets 2

07 Mar 18:21

parashardhapola

0.19.1

What's Changed

Update version to 0.19.1 and improve file upload initiation by @parashardhapola in #70

Full Changelog: 0.19.0...0.19.1

Contributors

parashardhapola

Assets 2

07 Mar 13:04

parashardhapola

0.19.0

What's Changed

Update version to 0.19.0 and enhance file upload functionality by @parashardhapola in #69

Full Changelog: 0.18.1...0.19.0

Contributors

parashardhapola

Assets 2

03 Mar 20:13

parashardhapola

0.18.1

What's Changed

Update version to 0.18.1 and enhance public API formatting by @parashardhapola in #68

Full Changelog: 0.18.0...0.18.1

Contributors

parashardhapola

Assets 2

03 Mar 15:30

parashardhapola

0.18.0

✨ What's New

🧬 Raw Counts in `vars.h5` Artifact

The save_features_matrix function now writes an optional raw group to the H5 artifact containing integer raw counts (LZ4-compressed CSR).
CyteType.__init__ auto-resolves raw counts from adata.layers['counts'], adata.raw.X, or adata.X (if integer-valued), and embeds them alongside normalized counts.

🚀 `rank_genes_groups_backed` — Memory-Efficient Differential Expression

New public function — a drop-in replacement for sc.tl.rank_genes_groups that works on backed/on-disk _CSRDataset matrices.
Streams cell chunks in a single pass, computes Welch's t-test (one-vs-rest) with BH or Bonferroni correction, and writes scanpy-compatible output to adata.uns.
Exported at cytetype.rank_genes_groups_backed.

✂️ `subsample_by_group` — Per-Group Cell Subsampling

New preprocessing utility that caps each cluster to a configurable maximum number of cells (max_cells_per_group), keeping smaller groups intact.
Works with both in-memory and backed AnnData objects.

🔍 Auto-Detection of Gene Symbols Column

gene_symbols_column now defaults to None and auto-detects by checking well-known column names (feature_name, gene_symbols, etc.), then adata.var_names, then a heuristic scan of all var columns.
Detects and skips composite gene values (e.g., TSPAN6_ENSG00000000003).
Candidates are scored by ID-like percentage, uniqueness ratio, and priority — the best non-ID column wins.

🎨 `marker_dotplot` — Category-Grouped Dot Plot

New plotting module (cytetype.plotting) with marker_dotplot that reads stored CyteType results and creates a scanpy dotplot grouped by cluster categories with top supporting marker genes.

⚡ Improvements

🏗️ Artifact Pipeline Restructuring

Artifacts (vars.h5, obs.duckdb) are now built during __init__ and uploaded during run(), decoupling build from upload.
vars_h5_path and obs_duckdb_path moved from run() to __init__() parameters.
New cleanup() method replaces the removed cleanup_artifacts parameter on run().

💾 CSR-Backed Write Path for Normalized Counts

New two-pass column-group scatter algorithm (_write_csc_via_row_batches) converts CSR-backed data to CSC in the H5 file without loading the full matrix.
Configurable memory budget via WRITE_MEM_BUDGET (default 4 GB).

📊 Expression Percentage Calculation

Refactored to use single-pass row-batched accumulation (reuses _accumulate_group_stats) instead of gene-batched pandas groupby.
Default pcent_batch_size increased from 2000 to 5000.

☁️ Upload Enhancements

vars_h5 max upload size increased from 10 GB to 50 GB.
Upload progress now uses tqdm progress bars when available.
Default connect timeout increased from 30s to 60s.

📈 Progress Reporting

tqdm progress bars added throughout: rank_genes_groups, subsampling, raw counts writing, normalized counts writing, and chunk uploads.

🐛 Bug Fixes / Error Handling

🆕 New ClientDisconnectedError exception for HTTP 499 / CLIENT_DISCONNECTED responses.
🗑️ Removed stale hasattr(adata.var, gene_symbols_col) check (was always True for DataFrames).
🛡️ Raw counts write failures are caught gracefully — the raw group is cleaned up and skipped with a warning.

⚠️ Breaking Changes

gene_symbols_column default changed from "gene_symbols" to None (auto-detect).
vars_h5_path and obs_duckdb_path moved from run() to CyteType.__init__().
cleanup_artifacts parameter removed from run(); use cleanup() method instead.
pcent_batch_size default changed from 2000 to 5000.
batch_size parameter in aggregate_expression_percentages renamed to cell_batch_size.

Assets 2

23 Feb 23:05

parashardhapola

0.17.0

What's Changed

Obsm in duckdb by @parashardhapola in #66

Full Changelog: 0.16.1...0.17.0

Contributors

parashardhapola

Assets 2

20 Feb 22:09

parashardhapola

0.16.1

What's Changed

Retry chunk uploads and allow partial artifacts by @parashardhapola in #64

Full Changelog: 0.16.0...0.16.1

Contributors

parashardhapola

Assets 2

19 Feb 14:07

parashardhapola

0.16.0

What's Changed

Support parallel chunked uploads and bump version by @parashardhapola in #63

Full Changelog: 0.15.0...0.16.0

Contributors

parashardhapola

Assets 2