GitHub - integrativebioinformatics/longnoncoder: A pipeline for isoform-level transcriptome assembly and lncRNA discovery from long-reads

Introduction

integrativebioinformatics/longnoncoder is a bioinformatics nextflow pipeline that provides a comprehensive analysis of raw long-read RNA-seq data, encompassing transcriptome assembly, quantification, and characterization. The pipeline reports a detailed overview on the entire transcriptome with particular emphasis on lncRNA structure and isoforms across annotated transcripts and novel candidates.

For more details and further functionality, please refer to the usage and output documentations.

Important

LongNonCoder is compatible Ensembl reference genomes and annotations from the following organisms: > Homo sapiens, Mus musculus, Danio rerio, Anolis carolinensis, Chrysemys picta belli, Eptatetrus burgeri, Gallus gallus, Latimeria chalumnae, Monodelphis domestica, Notechis scutatus, Ornithorhynchus anatinus, Petromyzon marinus, Sphenodon punctatus, and Xenopus tropicalis. In the next releases, we plan to update the pipeline workflow to cover more organisms or even more general taxonomic classes.

The workflow

We can describe each step of the workflow as follows:

Quality control of reads (NanoComp)
Filtering and trimming (chopper)
Mapping to a genome reference (minimap2 and samtools)
Quality control of mapped reads (NanoComp)
Transcriptome Assembly (Bambu)
Compare novel transcripts to the annotation reference (GffCompare)
Convert novel transcripts GTF file to FASTA (GffRead)
Predict transcripts as protein-coding or non-coding (RNAmining)
Gather all data from previous steps and generate informative and re-usable metadata .csv and GTF files for both novel and annotated transcripts (Metadata handling)
Provide a report and data visualization for the full transcriptome, with emphasis on lncRNAs (Report)
Gather all possible QC information from the previous steps (MultiQC)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data. The pipeline is compatible with Docker and Singularity/Apptainer.

You can run an example test by following the instructions:

Enter the test_data folder

cd test_data

Download and unzip the reference FASTA and GTF files, and also download the fastq.gz files:

Make the file executable!!

chmod +x download-ref-fastq.sh

Run it

./download-ref-fastq.sh

Add YOUR full path for the samples in the samplesheet.csv (file). For example, your full path for a sample could be:

home/user/longnoncoder/test_data/thesample.fastq.gz

Go back to the main directory and execute the test!

cd ..

nextflow run main.nf -profile test,singularity -params-file test_data/testing.yml

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option and input a yaml parameters file. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use integrativebioinformatics/longnoncoder in your research, please consider citing it. An extensive list of references for the tools incorporated by the pipeline can be found in the CITATIONS.md file.

Acknowledgments

Development & Contributions

The longnoncoder pipeline was originally developed by Bárbara Borges (@borgessbarbara). We extend our sincere thanks to Lucas Freitas (@lfreitasl), João Cavalcante (@jvfe), and Gleison Azevedo (@gleisonm for their significant contributions and assistance.

Supervision

This project was carried out under the leadership and supervision of Principal Investigators Vinícius Maracajá-Coutinho, Thaís Gaudencio, and Rodrigo Dalmolin.

Computational resources

This project was supported by the High-Performance Computing Center at UFRN (NPAD/UFRN) and the National Laboratory for High Performance Computing (NLHPC) (CCSS210001) at UChile.

Funding

CAPES (001), CNPq (MCTI/FNDCT 445067/2024-1), FONDECYT-ANID Postdoctorado (3250452), FONDECYT-ANID (1211731), FONDAP-ANID (15130011 and 1523A0008), and Anillo-ANID (ATE220016).

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
test_data		test_data
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
examplerun.yml		examplerun.yml
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

The workflow

Usage

Contributions and Support

Citations

Acknowledgments

Development & Contributions

Supervision

Computational resources

Funding

Laboratories and Institutions Involved

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

The workflow

Usage

Contributions and Support

Citations

Acknowledgments

Development & Contributions

Supervision

Computational resources

Funding

Laboratories and Institutions Involved

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages