Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bda6baa
:card_file_box: IO and Split utils
Nov 18, 2025
7108ab3
:sparkles: PPI Dataset: HIGH-PPI + CORUM
Nov 18, 2025
086ef7b
:bug: Creation of unlabeled cells that were meant to be negative + La…
Nov 18, 2025
2487ee7
:waste_basket: Remove unused imports
Nov 18, 2025
c99fb65
:bug: 1D tensor label for edge-level regression
Nov 19, 2025
57728a0
:construction: Prepare cell-level prediction
Nov 19, 2025
92a60c2
:children_crossing: Infer num features
Nov 19, 2025
b705e4b
:technologist: Filter labels to high-ppi edges + infrastructure for …
Nov 19, 2025
f1339b7
:twisted_rightwards_arrows: Merge branch 'ppi-data' into cell-level-p…
Nov 19, 2025
88efcb3
:sparkles: Extend SCCNN to work for arbitrary ranks
Nov 19, 2025
4036d39
:memo: Include edge_task in name to prevent cache conflicts
Nov 19, 2025
586e8e0
:bug: Message passing
Nov 19, 2025
95a9e06
:sparkles: SCCNN cell wrapper using higher cell features and allowing…
Nov 19, 2025
b4a6c75
:sparkles: Cell Readout layer for cell level prediction
Nov 19, 2025
ba34cd1
:technologist: Modify All Cell Encoder for transductive learning
Nov 20, 2025
2410966
:lipstick: More direct dataset passing
Nov 20, 2025
3e0b324
:card_file_box: IO and Split utils
grapentt Nov 18, 2025
8f0ed34
:sparkles: PPI Dataset: HIGH-PPI + CORUM
grapentt Nov 18, 2025
9057e8c
:bug: Creation of unlabeled cells that were meant to be negative + La…
grapentt Nov 18, 2025
9c6047d
:waste_basket: Remove unused imports
grapentt Nov 18, 2025
12fbb1a
:bug: 1D tensor label for edge-level regression
grapentt Nov 19, 2025
95084b3
:construction: Prepare cell-level prediction
grapentt Nov 19, 2025
e3bf31b
:children_crossing: Infer num features
grapentt Nov 19, 2025
e2e63ba
:technologist: Filter labels to high-ppi edges + infrastructure for …
grapentt Nov 19, 2025
51157e3
:sparkles: Extend SCCNN to work for arbitrary ranks
grapentt Nov 19, 2025
82171c9
:memo: Include edge_task in name to prevent cache conflicts
grapentt Nov 19, 2025
0646111
:bug: Message passing
grapentt Nov 19, 2025
ec900d8
:sparkles: SCCNN cell wrapper using higher cell features and allowing…
grapentt Nov 19, 2025
2a34048
:sparkles: Cell Readout layer for cell level prediction
grapentt Nov 19, 2025
d85e69b
:heavy_plus: add dependency
Nov 26, 2025
7334e26
:twisted_rightwards_arrows: Merge
Nov 26, 2025
ae8d011
:bug: non existing attribute
Nov 26, 2025
289c6b3
:lipstick: More test fixes
Nov 26, 2025
4a86a7c
:bug: Fix tests
grapentt Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions configs/dataset/simplicial/ppi_highppi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
################################################################################
# HIGH-PPI + CORUM: Protein Interaction Prediction via Simplicial Complexes
################################################################################
#
# Data Structure:
# - Proteins (rank 0): ~1,553 proteins
# - Edges (rank 1): ~6,660 HIGH-PPI edges with:
# * Features: 8-dim (7 interaction types + 1 STRING confidence score)
# - Interaction types: reaction, binding, ptmod, activation, inhibition, catalysis, expression
# - Confidence score [0, 1] measuring interaction probability (mapped to [-1, 1])
# - Higher-order cells: CORUM protein complexes (2+ proteins)
# * Features: 1-dim (binary existence: 1=real, -1=fake)
#
# Note: Features at any rank can also serve as prediction targets (labels).
# Models should mask features of the rank being predicted to avoid data leakage.
#
# Prediction Tasks:
# - Edge (rank 1): Regression (confidence scores) or multi-label (interaction types)
# - Cell (ranks 2+): Binary classification (complex existence)
#
################################################################################

# Data loading configuration
loader:
_target_: topobench.data.loaders.PPIHighPPIDatasetLoader
parameters:
data_domain: simplicial
model_domain: simplicial
data_name: ppi_highppi
data_dir: ${paths.data_dir}/${dataset.loader.parameters.data_domain}

# CORUM Complex Configuration
min_complex_size: 2 # Minimum proteins per CORUM complex (2+ allowed)
# Edge features for edges in CORUM:
# - In HIGH-PPI: Interaction types + confidence boosted to 1.0
# - Not in HIGH-PPI: [0,0,0,0,0,0,0, 1.0] (unknown types, high confidence)
max_complex_size: 6 # Maximum proteins per CORUM complex

# Negative Sampling (for classification tasks)
neg_ratio: 1.0 # Ratio of negative to positive samples (1.0 = balanced)

# Multi-Rank Prediction
target_ranks: [2, 3, 4, 5] # Which ranks to predict (train/test on)
# Max target rank must be <= max_complex_size - 1

# Edge Task Type (only applied when rank 1 in target_ranks)
edge_task: score # "score": Regression - predict confidence of interaction [0-1]
# "interaction_type": Multi-label - predict 7 interaction types

# Model training configuration
parameters:
_num_proteins: 1553 # HIGH-PPI has 1,553 proteins

num_features: ${infer_ppi_num_features:${dataset.parameters._num_proteins},${dataset.loader.parameters.edge_task},${dataset.loader.parameters.max_complex_size}}

num_classes: 2 # Depends on task:
# - Higher-order (ranks 2+): 2 (exists/doesn't exist)
# - Edge regression (rank 1, score): 1 (continuous output)
# - Edge multi-label (rank 1, interaction_type): 7 (7 types)
task: classification # Depends on target_ranks and edge_task:
# - Higher-order (ranks 2+): classification
# - Edge regression (rank 1, score): regression
# - Edge multi-label (rank 1, interaction_type): classification
task_level: cell # Predict on cells (edges/triangles/etc), not nodes or graphs

# Multi-Rank Prediction
target_ranks: ${dataset.loader.parameters.target_ranks}

loss_type: cross_entropy # Depends on task:
# - Higher-order binary: cross_entropy
# - Edge regression: mse or mae
# - Edge multi-label: bce_with_logits
monitor_metric: auroc # Depends on task:
# - Higher-order binary: auroc, f1, accuracy
# - Edge regression: mae, rmse
# - Edge multi-label: f1, auroc

# Splits Configuration
split_params:
learning_setting: transductive # Single complex, split labeled cells
data_split_dir: ${paths.data_dir}/data_splits/${dataset.loader.parameters.data_name}
data_seed: 42 # Random seed for reproducible splits

# Split Type Options:
# - "random": Random splitting with train_prop ratio
# - "k-fold": K-fold cross-validation
# - "fixed": Use HIGH-PPI's official train/val split (if available in raw data)
split_type: random

train_prop: 0.8 # For random/k-fold: 80% train, 10% val, 10% test
# Ignored when split_type: fixed

# Dataloader
dataloader_params:
batch_size: 1
num_workers: 0
pin_memory: False
persistent_workers: False
45 changes: 45 additions & 0 deletions configs/model/simplicial/sccnn_cell.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
_target_: topobench.model.TBModel

model_name: sccnn_cell
model_domain: simplicial

_hidden_dim: 32 # Hidden dimension for all ranks

_in_channels: ${infer_in_channels:${dataset},${oc.select:transforms,null}}
_num_ranks: ${infer_num_cell_dimensions:null,${model._in_channels}}
_channel_list: ${infer_channel_list:${model._hidden_dim},${model._num_ranks}}

feature_encoder:
_target_: topobench.nn.encoders.AllCellFeatureEncoder
encoder_name: AllCellFeatureEncoder
in_channels: ${model._in_channels}
out_channels: ${model._hidden_dim}
proj_dropout: 0.0

backbone:
_target_: topobench.nn.backbones.simplicial.sccnn.SCCNNCustom
in_channels_all: ${model._channel_list}
hidden_channels_all: ${model._channel_list}
conv_order: 1
sc_order: ${model._num_ranks}
aggr_norm: false
update_func: sigmoid
n_layers: 2

backbone_wrapper:
_target_: topobench.nn.wrappers.SCCNNCellWrapper
_partial_: true
wrapper_name: SCCNNCellWrapper
num_cell_dimensions: ${model._num_ranks}
target_ranks: ${dataset.parameters.target_ranks}
out_channels: ${model._hidden_dim}

readout:
_target_: topobench.nn.readouts.LinearCellLevelReadout
hidden_dim: ${model._hidden_dim}
out_channels: ${dataset.parameters.num_classes}
num_cell_dimensions: ${model._num_ranks}
target_ranks: ${dataset.parameters.target_ranks}

# Compile model for faster training (pytorch 2.0+)
compile: false
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ dependencies=[
"topomodelx @ git+https://github.com/pyt-team/TopoModelX.git",
"toponetx @ git+https://github.com/pyt-team/TopoNetX.git",
"lightning==2.4.0",
"gdown",
"pybiomart",
]

[project.optional-dependencies]
Expand Down
5 changes: 1 addition & 4 deletions test/data/load/test_datasetloaders.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
"""Comprehensive test suite for all dataset loaders."""
import os
import pytest
import torch
import hydra
from pathlib import Path
from typing import List, Tuple, Dict, Any
from omegaconf import DictConfig
from topobench.data.preprocessor.preprocessor import PreProcessor

class TestLoaders:
"""Comprehensive test suite for all dataset loaders."""

Expand Down
5 changes: 2 additions & 3 deletions test/nn/backbones/simplicial/test_sccnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,8 @@ def test_sccnn_basic_initialization():

# Verify layer structure
assert len(model.layers) == 2 # Default n_layers is 2
assert hasattr(model, 'in_linear_0')
assert hasattr(model, 'in_linear_1')
assert hasattr(model, 'in_linear_2')
assert hasattr(model, 'in_linears')
assert len(model.in_linears) == 3 # Should have 3 input linear layers for ranks 0, 1, 2

def test_update_functions():
"""Test different update functions in the SCCNN."""
Expand Down
4 changes: 4 additions & 0 deletions test/pipeline/test_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
DATASET = "graph/MUTAG" # ADD YOUR DATASET HERE
MODELS = ["graph/gcn", "cell/topotune", "simplicial/topotune"] # ADD ONE OR SEVERAL MODELS OF YOUR CHOICE HERE

# HIGH-PPI B2 integration (optional - uncomment to test)
# DATASET = "simplicial/ppi_highppi"
# MODELS = ["simplicial/sccnn_cell"]


class TestPipeline:
"""Test pipeline for a particular dataset and model."""
Expand Down
12 changes: 11 additions & 1 deletion test/test_tutorials.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Unit tests for the tutorials."""

import glob
import os
import subprocess
import tempfile

Expand Down Expand Up @@ -28,7 +29,16 @@ def _exec_tutorial(path):
file_name,
path,
]
subprocess.check_call(args)

# Set PYTHONPATH to include the project root so notebooks can import topobench
env = os.environ.copy()
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
if 'PYTHONPATH' in env:
env['PYTHONPATH'] = f"{project_root}:{env['PYTHONPATH']}"
else:
env['PYTHONPATH'] = project_root

subprocess.check_call(args, env=env)


paths = sorted(glob.glob("tutorials/*.ipynb"))
Expand Down
Loading
Loading