feat: allow exponentiation post agg for log-fold-change in rank_genes_groups#4037
feat: allow exponentiation post agg for log-fold-change in rank_genes_groups#4037
rank_genes_groups#4037Conversation
rank_genes_groups
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4037 +/- ##
==========================================
+ Coverage 78.58% 78.63% +0.04%
==========================================
Files 118 118
Lines 12753 12764 +11
==========================================
+ Hits 10022 10037 +15
+ Misses 2731 2727 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Regarding 1: I didn’t look at the math, but I assume to get both accurate logfoldchanges and correct t-test results, the t-test needs the means of the data, and the logfoldchanges need the means of e^data. So we’d have to calculate means twice I think. If @eroell doesn’t chime in I’ll re-learn the math …
Regarding 3: maybe lfc_estim_fast or so?
I think it would be confusing if the option only applied to one of them, but of course I don’t know what people expect. |
|
That's a pretty good reason! Nothing in the issue is focused on wilcoxon, I guess! |
Looking back at things, I think this setting was created for a time when the code looked very different. It is only faster for |
|
I think you’re right, but |
|
No changes in benchmarks. Comparison: https://github.com/scverse/scanpy/compare/063262b7c2f6f7e6ca675e5efc2603e21d22df22..39712198c255e2a603c2dcfdcc8a7279f7d410e4 More details: https://github.com/scverse/scanpy/pull/4037/checks?check_run_id=71803666404 |
|
huh, I didn’t expect it to have zero impact. OK then, let’s see what @eroell says |
I think the
How about |
|
Also, a side comment on my understanding of the issue: my agreement with the issue there is that I also consider a log fold change to compute the mean of a group in "count space". Since data seems to lie around generally in log space prior to this function call, the issue mentions From this point of view comes also my suggestion for the argument name of calling the current version log space. For the tests, if a Sorry for the slow response to this issue, thank you so much for taking it up! |
| corr_method: _CorrMethod = "benjamini-hochberg", | ||
| tie_correct: bool = False, | ||
| layer: str | None = None, | ||
| exp_post_agg: bool | Default = Default( |
|
Regarding the unchanged benchmark, of which I did not get into the weeds; Does this test numerically computed logfoldchange values, or rather rankings of genes generated by them? |
Only speed
I like this! thanks for the reply! |
My three questions here:
illicoto compare? I added a very simple test.exp_post_aggmake sense?exp_post_mean?