feat: scaffolding, caching, EGFR#65
Conversation
not needed just yet
ntalluri
left a comment
There was a problem hiding this comment.
I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
|
I implemented the approach to versioning above, which I'll document, but effectively:
This is naively implemented by allowing strings in |
ntalluri
left a comment
There was a problem hiding this comment.
Here is another review of this PR.
Main issues:
-
Why are we still erroring when pinned doesn't = cached? I thought we agreed that pinned != cached will be a warning at the last meeting.
-
What happens when this error above occurs, how is a user supposed to fix it to allow them to download the data and get rid of the error?
-
having data in directory.py and the dataset specific snakefiles doesn't make sense (and also having the cached files in two places is also confusing). We need to decide to either put everything in directory.py or in the dataset specific snakefiles and have the contributing guide reflect this.
- How is a user supposed to know when their data is supposed to be used in multiple other datasets? That's a very large expectation for them to know that.
- Why is there no config for the egfr dataset collection?
|
Also not sure why the workflow is breaking. |
|
|
|
My two notes:
|
|
Based on meeting:
|
We bundle EGFR along with the rest of the caching infrastructure. Notes:
cache/README.md.pra.yamlfor now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.CONTRIBUTING.mdfile is in Changes to CONTRIBUTING guide #57.directory.pycontains unnecessary files from other datasets that were deemed universal.