An end-to-end automated pipeline for TCR-pMHC structure prediction using TCRdock and Rosetta.
This pipeline takes a CSV file with TCR information and automatically:
- Generates target files for each TCR-pMHC complex
- Sets up AlphaFold inputs
- Predicts structures using TCRdock (modified AlphaFold pipeline)
- Relaxes structures with Rosetta
- Relabels chains for InterfaceAnalyzer (MHC+Peptide→A, TCRα+TCRβ→B)
- Analyzes interfaces with Rosetta InterfaceAnalyzer
- Environment setup
- TCRdock (https://github.com/phbradley/TCRdock)
- AlphaFold database (complete steps in "Obtaining Genetic Databases" section here)
- Rosetta here as a non-commercial user (version 3.14).
Required binaries:
relax.linuxgccreleaseInterfaceAnalyzer.linuxgccrelease
Your input CSV must contain the following columns:
Peptide: Peptide sequenceHLA: HLA allele (formats supported: HLA-A0201, A*02:01, A0201, etc.)Va: TCR alpha V geneJa: TCR alpha J geneCDR3a: TCR alpha CDR3 sequenceVb: TCR beta V geneJb: TCR beta J geneCDR3b: TCR beta CDR3 sequence
Example:
Peptide,HLA,Va,Ja,CDR3a,Vb,Jb,CDR3b
GILGFVFTL,HLA-A*02:01,TRAV12-1,TRAJ33,CVVNGGFGNVLHC,TRBV7-2,TRBJ2-7,CASSLAPGTGELFF- Clone this repository:
git clone https://github.com/gloriagrama/tcr_structure.git
cd tcr_structure-
Setup environment see here
-
Ensure all scripts are executable:
chmod +x *.shEdit 00_config.sh to set your system-specific paths:
# Input data
export INPUT_CSV="/path/to/your/input.csv"
# Working directory (all outputs will go here)
export WORK_DIR="/path/to/your/working_directory"
# Software paths
export TCRDOCK_PATH="/path/to/TCRdock"
export AF_DATA_DIR="/path/to/alphafold_data"
export ROSETTA_BIN="/path/to/rosetta/bin"
# SLURM settings
export SLURM_ACCOUNT="your_account"
export SLURM_PARTITION="htc"
export SLURM_QOS="public"
# Conda environment
export CONDA_ENV_PATH="/path/to/your/conda/env"You can also adjust:
- CPU and memory allocations for each step
- Time limits
- Organism and MHC class
- GPU requirements
- Edit
00_config.shwith your paths - Run the pipeline:
tmux new -s tcrpipe
bash run_pipeline.shThat's it! The script will:
- Validate your configuration
- Generate all target files
- Submit all jobs to SLURM with proper dependencies
- Handle the entire workflow automatically
Monitor your jobs:
squeue -u $USERCheck logs:
# Setup logs
ls $WORK_DIR/slurm_logs/setup_*.out
# Prediction logs
ls $WORK_DIR/slurm_logs/predict_*.out
# Relaxation logs
ls $WORK_DIR/slurm_logs/relax_*.out
# Relabeling logs
ls $WORK_DIR/slurm_logs/relabel_*.out
# Interface analysis logs
ls $WORK_DIR/slurm_logs/interface_*.out$WORK_DIR/
├── targets/ # Individual target TSV files (one per TCR)
├── user_outputs/ # AlphaFold setup outputs
├── predictions/ # AlphaFold predicted structures
├── relaxed/ # Rosetta-relaxed structures
├── relabeled/ # Chain-relabeled structures (A/B format)
├── interface_scores/ # Interface analysis scores
├── interface_logs/ # Interface analysis logs
└── slurm_logs/ # SLURM job logs
- Script:
01_generate_targets.py - Input: CSV file with TCR data
- Output: Individual TSV files in
targets/ - Features: HLA normalization, validation, error handling
- Script:
02_setup_alphafold.sh - Input: Target TSV files
- Output: AlphaFold-ready inputs in
user_outputs/ - Uses TCRdock's
setup_for_alphafold.py
- Script:
03_run_prediction.sh - Input: Setup outputs
- Output: Predicted PDB structures in
predictions/ - Uses AlphaFold via TCRdock's
run_prediction.py - Requires GPU
- Script:
04_relax_structure.sh - Input: Predicted structures
- Output: Relaxed structures in
relaxed/ - Uses Rosetta relax protocol
- Script:
04b_relabel_chains.py - Input: Relaxed structures
- Output: Relabeled structures in
relabeled/ - Purpose: Converts AlphaFold chain labels to A/B format for InterfaceAnalyzer
- Chain A: MHC + Peptide
- Chain B: TCRα + TCRβ
- Script:
05_interface_analysis.sh - Input: Relabeled structures
- Output: Interface metrics in
interface_scores/ - Uses Rosetta InterfaceAnalyzer
If you see validation errors, check:
- All paths exist and are correct
- You have read/write permissions
- Software is properly installed
Check the SLURM logs in $WORK_DIR/slurm_logs/:
- Look for error messages
- Verify resource allocations are sufficient
- Check that dependencies are properly installed
Ensure all required software is installed:
# Check Python
python --version
# Check Python packages
python -c "import pandas"
python -c "from Bio.PDB import PDBParser"
# Check if Rosetta binaries exist
ls $ROSETTA_BIN/relax.linuxgccrelease
ls $ROSETTA_BIN/InterfaceAnalyzer.linuxgccrelease
# Check TCRdock
ls $TCRDOCK_PATH/setup_for_alphafold.py
ls $TCRDOCK_PATH/run_prediction.py-
"No valid targets generated"
- Check your input CSV format
- Ensure required columns are present
- Look for data validation errors in output
-
GPU allocation failed
- Adjust
PREDICT_GPUin config - Check GPU availability:
sinfo -o "%20P %5a %10l %6D %6t %10G"
- Adjust
-
Rosetta crashes
- Verify Rosetta binary path
- Check if structures have proper format
- Review relaxation/interface logs
-
Relabeling fails
- Ensure BioPython is installed:
pip install biopython - Check that targets.tsv exists for each structure
- Verify relaxed PDB files are valid
- Ensure BioPython is installed:
Based on typical usage:
- Setup: 4 CPUs, 16GB RAM, 2 min
- Prediction: 4 CPUs, 32GB RAM, 1 GPU, 5 min
- Relaxation: 4 CPUs, 16GB RAM, 5 min
- Relabeling: 2 CPUs, 8GB RAM, 3 min
- Interface: 4 CPUs, 32GB RAM, 5 min
Total pipeline time: ~20 minutes per tcr
Adjust in 00_config.sh based on your system and data.
You can run individual scripts manually:
# Step 1
python 01_generate_targets.py \
--input_csv input.csv \
--output_dir targets/
# Step 2
bash 02_setup_alphafold.sh \
targets/0.tsv \
user_outputs/ \
/path/to/TCRdock
# Step 3
bash 03_run_prediction.sh \
user_outputs/0/targets.tsv \
/path/to/TCRdock \
/path/to/alphafold_data \
user_outputs
# Step 4
bash 04_relax_structure.sh \
predictions/0_run_model_2_ptm.pdb \
/path/to/rosetta/bin \
relaxed/
# Step 5
python 04b_relabel_chains.py \
--pdb_file relaxed/0_run_model_2_ptm_relaxed.pdb \
--targets_tsv user_outputs/0/targets.tsv \
--output_file relabeled/0_run_model_2_ptm_relaxed_relabeled.pdb
# Step 6
bash 05_interface_analysis.sh \
relabeled/0_run_model_2_ptm_relaxed_relabeled.pdb \
/path/to/rosetta/bin \
interface_scores/ \
interface_logs/To customize the pipeline:
- Modify individual scripts (01-05) for specific needs
- Adjust resource allocations in
00_config.sh - Change SLURM parameters in
run_pipeline.sh
For issues or questions:
- Check the troubleshooting section
- Review SLURM logs for error messages
- Verify all dependencies are properly installed