This tool aims to deconvolve multi-regional bulk sequencing samples based on variant calls including SVs, SNVs and CNVs and infer a comprehensive tumor phylogenetic tree.
This work is published in Xuecong Fu, Haoyun Lei, Yifeng Tao, Russell Schwartz, Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations, Bioinformatics, Volume 38, Issue Supplement_1, July 2022, Pages i125–i133, https://doi.org/10.1093/bioinformatics/btac253
The program is written in Python2. Python3 version will come out soon.
The packages you will need to install are listed below.
numpygraphvizete2biopythongurobipyPyVCF
To obtain a Gurobi license, you can sign up as an academic user here https://www.gurobi.com/downloads/end-user-license-agreement-academic/ and follow the instructions for downloading a license.
python tusv-ext.py
The script tusv-ext.py takes as input a single directory containing one or multiple .vcf files. Go here https://samtools.github.io/hts-specs/VCFv4.2.pdf for specifications on the .vcf format. Each .vcf file should contain SV breakpoints, CNVs and SNVs with their processed copy numbers from one sample of a patient.
Inputs:
-ithe input directory containing vcf files of different samples from one patient-othe output directory with deconvoluted results-nnumber of leaves to infer in phylogenetic tree-cmaximum copy number allowed for any breakpoint or segment on any node-tmaximum number of coordinate-descent iterations (program can finish sooner if convergence is reached)-rnumber of random initializations of the coordinate-descent algorithm-colbinary flag whether to collapse the redundant nodes-leafbinary flag whether to assume only leaf nodes are in the mixed samples or not-sv_ubapproximate maximum number of subsampled breakpoints of structural variants, -1 if you don't want to do the subsampling and include all breakpoints-constmaximum number of total subsampled breakpoints and SNVs
The following inputs are optional:
-b(recommended) binary flag to automatically set hyper-parameters lambda 1 and lambda 2-llambda 1 hyper-parameter. controls phylogenetic cost-alambda 2 hyper-parameter. controls breakpoint to segment consistancy-mmaximum time (in seconds) for a single cordinate-descent iteration-snumber of segments (in addition to those containing breakpoints) that are randomly kept for unmixing. default keeps all segments-p(not recommended) number of processors to use. uses all available processors by default
Outputs:
C.tsvthe C matrix which is variants copy number profiles of each cloneU.tsvthe U matrix which is the frequencies of each clone in each sampleT.dotthe inferred phylogenetic tree
