feat: scaffolding, caching, EGFR by tristan-f-r · Pull Request #65 · Reed-CompBio/spras-benchmarking

tristan-f-r · 2026-03-18T03:20:16Z

We bundle EGFR along with the rest of the caching infrastructure. Notes:

All motivation for the caching system lives under cache/README.md.
We removed pra.yaml for now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.
The CONTRIBUTING.md file is in Changes to CONTRIBUTING guide #57.
directory.py contains unnecessary files from other datasets that were deemed universal.

not needed just yet

ntalluri

I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

this is only in github actions

tristan-f-r · 2026-04-21T01:10:01Z

I implemented the approach to versioning above, which I'll document, but effectively:

If a person wants to just use 'STRING', they say 'STRING/latest' and it will point to STRING v12
If a dataset wants to use a specific version of STRING, they say 'STRING/v12'

This is naively implemented by allowing strings in directory that point to other keys.

ntalluri

Here is another review of this PR.

Main issues:

Why are we still erroring when pinned doesn't = cached? I thought we agreed that pinned != cached will be a warning at the last meeting.
What happens when this error above occurs, how is a user supposed to fix it to allow them to download the data and get rid of the error?
having data in directory.py and the dataset specific snakefiles doesn't make sense (and also having the cached files in two places is also confusing). We need to decide to either put everything in directory.py or in the dataset specific snakefiles and have the contributing guide reflect this.

How is a user supposed to know when their data is supposed to be used in multiple other datasets? That's a very large expectation for them to know that.

Why is there no config for the egfr dataset collection?

ntalluri · 2026-04-29T17:07:49Z

Also not sure why the workflow is breaking.

tristan-f-r · 2026-04-29T17:58:33Z

~~As for (1) and (2), we did decide this! I don't know if I didn't push it or what, but I did completely intend to make that not an error.~~ I forgot to remove the move/copy flag 👍 this has been removed in the non-styling commit after this comment.

tristan-f-r · 2026-04-30T01:06:34Z

My two notes:

How important is it that the 10 value stays 10? (e.g. does 1000 work just as well?) The methodology described in the paper says that it just needs to be a value greater than every other prize.
Unless (*) it is the case that we plan to have different algorithm parameter inputs for different dataset collections, I disagree with the methodology that the paper describes for a config per dataset collection, where I would instead believe it to be straightforward to continue as is. Otherwise, unless (*) is true, we directly contradict one of SPRAS's usability goals of running several datasets on several algorithms.

ntalluri · 2026-04-30T16:56:26Z

Based on meeting:

update to have datasets fetch configs either in directory.py or in a snakemake file that is dataset specific (one or the other not both)

this idea will be updated in the contributing guide as well

keeping the value of 10 for the prizes for egfr

hard to justify but that's just the plan for now

we will have 4 separate dataset collection configs (we can also keep the test-config too if needed).

chore: drop other datasets

b49439e

tristan-f-r added the enhancement New feature or request label Mar 18, 2026

tristan-f-r added 2 commits March 17, 2026 20:36

Merge branch 'main' into egfr-and-infrastructure

2018a13

chore: re-include

136e5ff

tristan-f-r mentioned this pull request Mar 18, 2026

Changes to CONTRIBUTING guide #57

Draft

tristan-f-r added 2 commits March 17, 2026 20:42

chore: drop tools

472468d

not needed just yet

chore: re-add tools

a5de971

This was referenced Mar 18, 2026

dataset: DISEASES #66

Open

dataset: yeast osmotic stress #67

Open

dataset: hiv #68

Open

dataset: muscle skeletal (from ResponseNet) #69

Open

dataset: DepMap #70

Draft

tristan-f-r added the dataset Mutating datasets in any way. label Mar 18, 2026

tristan-f-r mentioned this pull request Mar 18, 2026

dataset: synthetic from PANTHER #71

Draft

tristan-f-r added 3 commits March 18, 2026 05:53

docs: cache

8ddccb4

style: fmt

90cc277

docs: on caching

eb23b8f

tristan-f-r mentioned this pull request Mar 18, 2026

chore: delete [temporarily!] #64

Merged

tristan-f-r changed the title ~~feat: initial scaffolding, EGFR~~ feat: scaffolding, caching, EGFR Mar 18, 2026

ntalluri reviewed Mar 18, 2026

View reviewed changes

Comment thread web/public/favicon.svg Outdated

ntalluri reviewed Mar 18, 2026

View reviewed changes

tristan-f-r and others added 2 commits March 18, 2026 16:54

docs: suggestions from review

4b524bc

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

docs: more comments, refactor: mv function out of Snakefile

69fda05

ntalluri reviewed Mar 19, 2026

View reviewed changes

Comment thread cache/README.md Outdated

ntalluri reviewed Mar 19, 2026

View reviewed changes

Comment thread cache/README.md Outdated

tristan-f-r added 4 commits March 19, 2026 18:58

docs(datasets): mention responsenet and egfr

15c7ecb

docs(datasets): add old synthetic data branch

729a51b

chore: mv to scores instead of dmmm

922be5d

docs: drop expiration docs

f3d6d41

this is only in github actions

tristan-f-r mentioned this pull request Mar 23, 2026

Hook into loguru to warn for outdated datasets #72

Open

docs: mention v12 in the file name

ef1773b

tristan-f-r added 12 commits April 21, 2026 01:15

docs: mention aliasing

0c6d6f0

feat: uncompressing zip files

c2d17db

test: setup with directory redirects

66de95b

feat: deduplicate edges of output interactome

d4749e7

chore: mv directory test

b5e6332

fix: add init file to cache tests

dbba053

docs: apply suggestions

f8ce2a8

docs: cache

8a51052

fix: designate cache & tools as modules

69de55b

docs: general egfr cmts, fix: make egfr dataset work

df4a16e

feat: actual interactome trimming

a178a2a

style: fmt

5df4a13

tristan-f-r requested a review from ntalluri April 23, 2026 19:23

ntalluri requested changes Apr 29, 2026

View reviewed changes

tristan-f-r added 7 commits April 29, 2026 18:29

style: fmt

5e13705

fix: address larger notes

b5ddaa4

docs: fix outdated comments, apply some minor suggestions

cd41227

chore: more minor suggestions

38ef456

chore: apply suggestions

b6b958e

refactor: change file name

5f6720c

chore: apply suggestions

8755803

tristan-f-r requested a review from ntalluri April 30, 2026 01:04

fix: set header=False for normalized interactome

c7efcba

refactor: use other prize value

1b60dcb

Conversation

tristan-f-r commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ntalluri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tristan-f-r commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri commented Apr 29, 2026

Uh oh!

tristan-f-r commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tristan-f-r commented Mar 18, 2026 •

edited

Loading

tristan-f-r commented Apr 21, 2026 •

edited

Loading

tristan-f-r commented Apr 29, 2026 •

edited

Loading

tristan-f-r commented Apr 30, 2026 •

edited

Loading

ntalluri commented Apr 30, 2026 •

edited

Loading