dataset: synthetic from PANTHER#71
Conversation
There was a problem hiding this comment.
Can you add two configs specific to the panther pathways and how we are using it for the computational performance and pathway accurary/algo similaity assessments.
There was a problem hiding this comment.
We might want to consider dataset categories. Separate configs for each dataset would kill parallelism.
|
|
||
| def main(): | ||
| pathways_df = parse_pc_pathways(current_directory / "raw" / "pathways.txt") | ||
| print("Fetching pathways... [This may take some time. On the author's desktop machine, it took 15 minutes.]") |
There was a problem hiding this comment.
| print("Fetching pathways... [This may take some time. On the author's desktop machine, it took 15 minutes.]") | |
| print("Fetching pathways... [This may take some time; around 15 minutes.]") |
There was a problem hiding this comment.
Can you add an overview comment on what is happening in this code.
| human_receptors = human_receptors[["NODE", "uniprot"]] | ||
| human_receptors.to_csv(folder / "sources.txt", sep="\t", index=False) | ||
|
|
||
| # Finally, scores |
There was a problem hiding this comment.
| # Finally, scores | |
| # Finally, scores and actives |
|
|
||
| # Finally, scores | ||
| scores = pd.concat([human_tfs, human_receptors]).drop_duplicates() | ||
| scores["prizes"] = 1 |
There was a problem hiding this comment.
| scores["prizes"] = 1 | |
| scores["prizes"] = 1.0 |
| # Then, we need to get the sources and targets, save them, | ||
| # and mark them with 1.0 prizes: | ||
|
|
||
| # First, for our targets, or transcription factors |
There was a problem hiding this comment.
| # First, for our targets, or transcription factors | |
| # First, for our targets (transcription factors) |
| human_tfs = human_tfs[["NODE", "uniprot"]] | ||
| human_tfs.to_csv(folder / "targets.txt", sep="\t", index=False) | ||
|
|
||
| # Then, for our receptors. NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this. |
There was a problem hiding this comment.
| # Then, for our receptors. NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this. | |
| # Then, for our receptors (surfaceomes). NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this. |
There was a problem hiding this comment.
Why is this trimming to what is in the gold standard? My plan was to trim all data to what is available in the interactome.
This is a draft since we don't provide any
configslinking to any specific data.