Releases: NygenAnalytics/CyteType
Releases · NygenAnalytics/CyteType
0.19.4
What's Changed
- Fix NaN values in adata.obs by @suu-yi in #74
- Gene symbols patch by @suu-yi in #76
- logging improvements and version bump by @parashardhapola in #79
Full Changelog: 0.19.3...0.19.4
0.19.3
What's Changed
- Update version to 0.19.3 and enhance gene symbol handling by @parashardhapola in #72
Full Changelog: 0.19.2...0.19.3
0.19.2
What's Changed
- Update version to 0.19.2 and enhance metadata handling in CyteType by @parashardhapola in #71
Full Changelog: 0.19.1...0.19.2
0.19.1
What's Changed
- Update version to 0.19.1 and improve file upload initiation by @parashardhapola in #70
Full Changelog: 0.19.0...0.19.1
0.19.0
What's Changed
- Update version to 0.19.0 and enhance file upload functionality by @parashardhapola in #69
Full Changelog: 0.18.1...0.19.0
0.18.1
What's Changed
- Update version to 0.18.1 and enhance public API formatting by @parashardhapola in #68
Full Changelog: 0.18.0...0.18.1
0.18.0
✨ What's New
🧬 Raw Counts in vars.h5 Artifact
- The
save_features_matrixfunction now writes an optionalrawgroup to the H5 artifact containing integer raw counts (LZ4-compressed CSR). CyteType.__init__auto-resolves raw counts fromadata.layers['counts'],adata.raw.X, oradata.X(if integer-valued), and embeds them alongside normalized counts.
🚀 rank_genes_groups_backed — Memory-Efficient Differential Expression
- New public function — a drop-in replacement for
sc.tl.rank_genes_groupsthat works on backed/on-disk_CSRDatasetmatrices. - Streams cell chunks in a single pass, computes Welch's t-test (one-vs-rest) with BH or Bonferroni correction, and writes scanpy-compatible output to
adata.uns. - Exported at
cytetype.rank_genes_groups_backed.
✂️ subsample_by_group — Per-Group Cell Subsampling
- New preprocessing utility that caps each cluster to a configurable maximum number of cells (
max_cells_per_group), keeping smaller groups intact. - Works with both in-memory and backed AnnData objects.
🔍 Auto-Detection of Gene Symbols Column
gene_symbols_columnnow defaults toNoneand auto-detects by checking well-known column names (feature_name,gene_symbols, etc.), thenadata.var_names, then a heuristic scan of all var columns.- Detects and skips composite gene values (e.g.,
TSPAN6_ENSG00000000003). - Candidates are scored by ID-like percentage, uniqueness ratio, and priority — the best non-ID column wins.
🎨 marker_dotplot — Category-Grouped Dot Plot
- New plotting module (
cytetype.plotting) withmarker_dotplotthat reads stored CyteType results and creates a scanpy dotplot grouped by cluster categories with top supporting marker genes.
⚡ Improvements
🏗️ Artifact Pipeline Restructuring
- Artifacts (
vars.h5,obs.duckdb) are now built during__init__and uploaded duringrun(), decoupling build from upload. vars_h5_pathandobs_duckdb_pathmoved fromrun()to__init__()parameters.- New
cleanup()method replaces the removedcleanup_artifactsparameter onrun().
💾 CSR-Backed Write Path for Normalized Counts
- New two-pass column-group scatter algorithm (
_write_csc_via_row_batches) converts CSR-backed data to CSC in the H5 file without loading the full matrix. - Configurable memory budget via
WRITE_MEM_BUDGET(default 4 GB).
📊 Expression Percentage Calculation
- Refactored to use single-pass row-batched accumulation (reuses
_accumulate_group_stats) instead of gene-batched pandas groupby. - Default
pcent_batch_sizeincreased from 2000 to 5000.
☁️ Upload Enhancements
vars_h5max upload size increased from 10 GB to 50 GB.- Upload progress now uses
tqdmprogress bars when available. - Default connect timeout increased from 30s to 60s.
📈 Progress Reporting
- tqdm progress bars added throughout: rank_genes_groups, subsampling, raw counts writing, normalized counts writing, and chunk uploads.
🐛 Bug Fixes / Error Handling
- 🆕 New
ClientDisconnectedErrorexception for HTTP 499 /CLIENT_DISCONNECTEDresponses. - 🗑️ Removed stale
hasattr(adata.var, gene_symbols_col)check (was always True for DataFrames). - 🛡️ Raw counts write failures are caught gracefully — the
rawgroup is cleaned up and skipped with a warning.
⚠️ Breaking Changes
gene_symbols_columndefault changed from"gene_symbols"toNone(auto-detect).vars_h5_pathandobs_duckdb_pathmoved fromrun()toCyteType.__init__().cleanup_artifactsparameter removed fromrun(); usecleanup()method instead.pcent_batch_sizedefault changed from 2000 to 5000.batch_sizeparameter inaggregate_expression_percentagesrenamed tocell_batch_size.
0.17.0
0.16.1
What's Changed
- Retry chunk uploads and allow partial artifacts by @parashardhapola in #64
Full Changelog: 0.16.0...0.16.1
0.16.0
What's Changed
- Support parallel chunked uploads and bump version by @parashardhapola in #63
Full Changelog: 0.15.0...0.16.0