Skip to content

Compute DARs in the V3 CLI way#227

Open
DadaAb wants to merge 2 commits intopycistopic_v3from
v3_diff_features
Open

Compute DARs in the V3 CLI way#227
DadaAb wants to merge 2 commits intopycistopic_v3from
v3_diff_features

Conversation

@DadaAb
Copy link
Copy Markdown
Collaborator

@DadaAb DadaAb commented Apr 23, 2026

Add dars CLI subcommand: differentially accessible regions from topic models

Adds a new pycistopic dars command with two subcommands that wrap pycisTopic.diff_features functionality in a CLI suitable for batch/HPC use.

compute_hv_regions

Computes highly variable regions from region-topic and cell-topic H5AD files and writes them to a BED file. Optionally saves the mean-vs-dispersion plot.

compute_dars

Computes differentially accessible regions per contrast. Supports:

  • Flexible contrast specification via --contrasts-tsv: a two-column TSV with foreground and background (comma-separated annotations). If --contrasts-tsv is omitted, every unique annotation is run 1-vs-all-others. If background is empty for a row, all cells not in the foreground are used as background.
  • Reusing precomputed HV regions via --hv-regions-bed, which skips the (slow) HV computation step entirely.
  • Per-contrast HV pre-filtering to handle the ValueError: "highly_variable_regions" contains regions that are never accessible in any cell case, which happens when globally-HV regions turn out to be silent in a specific contrast's cell subset (common for rare cell types). Instead of aborting, we drop the offending regions and report how many.
  • Incremental, resumable output: each contrast's BED file is written as soon as it completes. A crash mid-run preserves all already-finished contrasts. Re-running the command automatically skips contrasts whose BED already exists — rm a file to force recomputation.
  • Configurable thresholds: --adjusted-pvalue-threshold, --log2-fold-change-threshold, --scale-factor1, --regions-chunk-size.

Example usage

Compute highly variable regions once and save a plot:

pycistopic dars compute_hv_regions \
  --region-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_region_topic_adata.h5ad \
  --cell-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_cell_topic_adata.h5ad \
  --output-bed hv_regions_v3.bed \
  --plot hv_dispersion.png

1-vs-all DARs for every cell type (no contrasts TSV → each annotation vs all others):

pycistopic dars compute_dars \
  --region-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_region_topic_adata.h5ad \
  --cell-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_cell_topic_adata.h5ad \
  --cell-data cell_data.tsv \
  --output-dir dars_dir/

Custom contrasts with reused HV regions:

pycistopic dars compute_dars \
  --region-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_region_topic_adata.h5ad \
  --cell-topic-h5ad mm_ctx_dev_e13_to_p56_300_iter_.100_topics_cell_topic_adata.h5ad \
  --cell-data cell_data.tsv \
  --hv-regions-bed hv_regions_v3.bed \
  --contrasts-tsv contrasts_example.tsv \
  --output-dir dars_dir_test/

Example contrasts_example.tsv:

foreground	background
Astro	
CR	
PVALB,SST,SST_chodl,Vip,Lamp5,Sncg	
PVALB,SST,SST_chodl	Vip,Lamp5,Sncg
L23_bin3,L23_bin4,L23_bin5,L23_bin6	L5PT,L5NP,L5IT_bin3,L5IT_bin4,L5IT_bin5,L5IT_bin6
Oligo_mature	Oligo_dev
RGC_1,RGC_2	L4,L23_bin4,L6CT_bin5,PVALB,Astro,Oligo_mature

Validation

Tested on a mouse cortex dataset: 807,231 regions × 269,070 cells, 100-topic model (E13.5 to P56). HV regions (62,580) output is byte-identical to an earlier reference run.

Seven contrasts covering a mix of common and edge cases (1-vs-all, group-vs-group, rare cell types, close-related pairs, developmental contrasts) all completed in ~2 minutes 11 seconds after HV-region loading:

Contrast                                      FG cells   BG cells     DARs
------------------------------------------   ---------  ---------  -------
Astro vs others                                  7,506    113,537   18,023
CR vs others (rare cell type: 337 cells)           337    120,706    6,572
All interneurons vs others                      12,526    108,517    5,557
MGE-derived vs CGE-derived interneurons          9,087      3,439   12,344
L2/3 layers vs L5 layers                        14,799      9,386    7,422
Oligo_mature vs Oligo_dev                        5,436      1,476   12,040
RGCs vs curated mature panel                     7,494     45,303   15,085

Final summary line:

DARs computation completed. 7 newly computed, 0 skipped (already existed),
0 empty (no HV regions left after filtering), 7 total contrasts.

The pre-filter safeguard didn't need to drop any regions in this test (the HV regions happened to all be accessible in every contrast, including the 337-cell CR contrast) but the code path is in place to handle it gracefully if it does occur.

Output format

Each contrast produces a 3-column BED file (chrom, start, end) named DARs_<contrast_name>.bed in the --output-dir. Region names are parsed from the chr:start-end format used in the region-topic matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant