dataset: synthetic from PANTHER by tristan-f-r · Pull Request #71 · Reed-CompBio/spras-benchmarking

tristan-f-r · 2026-03-18T05:41:33Z

This is a draft since we don't provide any configs linking to any specific data.

Blocked by feat: scaffolding, caching, EGFR #65.
Blocked by feat: web #73

not needed just yet

ntalluri

Partly reviewed.

ntalluri · 2026-03-25T18:03:24Z

Can you add two configs specific to the panther pathways and how we are using it for the computational performance and pathway accurary/algo similaity assessments.

We might want to consider dataset categories. Separate configs for each dataset would kill parallelism.

ntalluri · 2026-03-25T18:04:25Z

Should be removed by #65.

ntalluri · 2026-03-25T18:06:06Z

+
+def main():
+    pathways_df = parse_pc_pathways(current_directory / "raw" / "pathways.txt")
+    print("Fetching pathways... [This may take some time. On the author's desktop machine, it took 15 minutes.]")


Suggested change

print("Fetching pathways... [This may take some time. On the author's desktop machine, it took 15 minutes.]")

print("Fetching pathways... [This may take some time; around 15 minutes.]")

ntalluri · 2026-03-25T18:06:51Z

Can you add an overview comment on what is happening in this code.

ntalluri · 2026-03-25T18:55:19Z

+    human_receptors = human_receptors[["NODE", "uniprot"]]
+    human_receptors.to_csv(folder / "sources.txt", sep="\t", index=False)
+
+    # Finally, scores


Suggested change

# Finally, scores

# Finally, scores and actives

ntalluri · 2026-03-25T18:55:28Z

+
+    # Finally, scores
+    scores = pd.concat([human_tfs, human_receptors]).drop_duplicates()
+    scores["prizes"] = 1


Suggested change

scores["prizes"] = 1

scores["prizes"] = 1.0

ntalluri · 2026-03-25T18:55:56Z

+    # Then, we need to get the sources and targets, save them,
+    # and mark them with 1.0 prizes:
+
+    # First, for our targets, or transcription factors


Suggested change

# First, for our targets, or transcription factors

# First, for our targets (transcription factors)

ntalluri · 2026-03-25T18:56:25Z

+    human_tfs = human_tfs[["NODE", "uniprot"]]
+    human_tfs.to_csv(folder / "targets.txt", sep="\t", index=False)
+
+    # Then, for our receptors. NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this.


Suggested change

# Then, for our receptors. NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this.

# Then, for our receptors (surfaceomes). NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this.

ntalluri · 2026-04-16T15:21:20Z

Why is this trimming to what is in the gold standard? My plan was to trim all data to what is available in the interactome.

tristan-f-r added 6 commits March 18, 2026 03:16

chore: drop other datasets

b49439e

Merge branch 'main' into egfr-and-infrastructure

2018a13

chore: re-include

136e5ff

chore: drop tools

472468d

not needed just yet

chore: re-add tools

a5de971

dataset: synthetic data

50fa813

tristan-f-r added dataset Mutating datasets in any way. blocked-by-other-pr For PRs that depend on other PRs. labels Mar 18, 2026

tristan-f-r marked this pull request as draft March 18, 2026 05:42

tristan-f-r mentioned this pull request Mar 18, 2026

feat: scaffolding, caching, EGFR #65

Open

ntalluri reviewed Mar 19, 2026

View reviewed changes

Comment thread datasets/egfr/README.md

ntalluri reviewed Mar 26, 2026

View reviewed changes

fix(tools): re-introduce trim.py

d366409

ntalluri reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset: synthetic from PANTHER#71

dataset: synthetic from PANTHER#71
tristan-f-r wants to merge 7 commits intomainfrom
synthetic-dataset

tristan-f-r commented Mar 18, 2026 •

edited by ntalluri

Loading

Uh oh!

Uh oh!

ntalluri left a comment •

edited

Loading

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

tristan-f-r Mar 26, 2026

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

ntalluri Mar 25, 2026

Uh oh!

ntalluri Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	print("Fetching pathways... [This may take some time. On the author's desktop machine, it took 15 minutes.]")
	print("Fetching pathways... [This may take some time; around 15 minutes.]")

	# First, for our targets, or transcription factors
	# First, for our targets (transcription factors)

	# Then, for our receptors. NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this.
	# Then, for our receptors (surfaceomes). NOTE: we skip the first row since it's empty in the XLSX, so this might break if the surfaceome authors fix this.

Conversation

tristan-f-r commented Mar 18, 2026 • edited by ntalluri Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ntalluri left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tristan-f-r commented Mar 18, 2026 •

edited by ntalluri

Loading

ntalluri left a comment •

edited

Loading