Compute DARs in the V3 CLI way#227
Open
DadaAb wants to merge 2 commits intopycistopic_v3from
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
darsCLI subcommand: differentially accessible regions from topic modelsAdds a new
pycistopic darscommand with two subcommands that wrappycisTopic.diff_featuresfunctionality in a CLI suitable for batch/HPC use.compute_hv_regionsComputes highly variable regions from region-topic and cell-topic H5AD files and writes them to a BED file. Optionally saves the mean-vs-dispersion plot.
compute_darsComputes differentially accessible regions per contrast. Supports:
--contrasts-tsv: a two-column TSV withforegroundandbackground(comma-separated annotations). If--contrasts-tsvis omitted, every unique annotation is run 1-vs-all-others. Ifbackgroundis empty for a row, all cells not in the foreground are used as background.--hv-regions-bed, which skips the (slow) HV computation step entirely.ValueError: "highly_variable_regions" contains regions that are never accessible in any cellcase, which happens when globally-HV regions turn out to be silent in a specific contrast's cell subset (common for rare cell types). Instead of aborting, we drop the offending regions and report how many.rma file to force recomputation.--adjusted-pvalue-threshold,--log2-fold-change-threshold,--scale-factor1,--regions-chunk-size.Example usage
Compute highly variable regions once and save a plot:
1-vs-all DARs for every cell type (no contrasts TSV → each annotation vs all others):
Custom contrasts with reused HV regions:
Example
contrasts_example.tsv:Validation
Tested on a mouse cortex dataset: 807,231 regions × 269,070 cells, 100-topic model (E13.5 to P56). HV regions (62,580) output is byte-identical to an earlier reference run.
Seven contrasts covering a mix of common and edge cases (1-vs-all, group-vs-group, rare cell types, close-related pairs, developmental contrasts) all completed in ~2 minutes 11 seconds after HV-region loading:
Final summary line:
The pre-filter safeguard didn't need to drop any regions in this test (the HV regions happened to all be accessible in every contrast, including the 337-cell
CRcontrast) but the code path is in place to handle it gracefully if it does occur.Output format
Each contrast produces a 3-column BED file (
chrom,start,end) namedDARs_<contrast_name>.bedin the--output-dir. Region names are parsed from thechr:start-endformat used in the region-topic matrix.