GENEFLOW: Genomic Examination and Nucleotide Evaluation For Laboratory Operations Workflow

The GENEFLOW application serves as the latest update to the Illumina sequencing processing workflow at NYU's Center for Genomics and Systems Biology. This pipeline, developed with Nextflow, encompasses a comprehensive set of procedures:

Archive the Run Directory
Basecalling
Demultiplexing (Optional)
Demultiplexing Reports
Data Merging
FastQC Reports
MultiQC Report
Data Delivery

The pipeline is designed to interface seamlessly with TuboWeb, a web-based platform for NGS data analysis and visualization, for the retrieval of metadata and customization of run parameters.

Configuration Instructions

To successfully deploy and run the GENEFLOW pipeline, follow these setup steps:

1. Update `launch.sh`

Modify the launch script (launch.sh) to include your email address:

#SBATCH --mail-user=your_netID@nyu.edu

2. Update `nextflow.config`

Global Configuration in Nextflow Configure the global variables in nextflow.config as follows:

alpha: The primary work directory for the pipeline.
tmp_dir: Temporary working directory for Picard tools.
fastqc_path: Destination for rsyncing FastQC files (e.g., web server).
archive_path: Destination for archived run directories.
admin_email: Email for pipeline administration notifications. Set up the module paths Specify the workDir Configure email settings

3. Update config.py

TuboWeb API Configuration

API path
User credentials
API key

File Delivery and Storage Paths

delivery_folder_root: Destination for FastQ files.
raw_run_dir_delivery_root: Destination for raw run directories.
raw_run_root: Storage location for raw run directories.
alpha: The primary work directory for the pipeline (as in nextflow.config).

Gmail Credentials Set the Gmail user and password for email notifications.

Launching the Pipeline

For ease of use, a launch.sh script is provided to initiate the pipeline. This script requires two essential parameters and one optional parameter:

Run Directory Path
Flowcell ID
Optional: Entry point for the pipeline (used to resume the pipeline from a specific step)

Usage Examples

Basic launch:

launch.sh /scratch/gencore/sequencers/{machine_name}/{run_dir_name} {fcid}

Specific Example Launch:

launch.sh /scratch/gencore/sequencers/NB502067/240124_NB502067_0578_AHKFT5BGXV HKFT5BGXV

Launch with Entry Point: For resuming at a specific step like demultiplexing (e.g., 'demux'):
```
launch.sh /scratch/gencore/sequencers/{machine_name}/{run_dir_name} {fcid} {entry}
```

Example with Entry Point:

launch.sh /scratch/gencore/sequencers/NB502067/240124_NB502067_0578_AHKFT5BGXV HKFT5BGXV demux

Production Deployment

In a production environment, launch.sh is typically submitted as an SBATCH job in Slurm. Make sure the directories for error and output files are created beforehand (required by SLURM).

  mkdir -p /scratch/gencore/GENEFLOW/alpha/logs/HKFT5BGXV/pipeline
  sbatch --output=/scratch/gencore/GENEFLOW/alpha/logs/HKFT5BGXV/pipeline/slurm-%j.out \
         --error=/scratch/gencore/GENEFLOW/alpha/logs/HKFT5BGXV/pipeline/slurm-%j.err \
         --job-name=GENEFLOW_MANAGER_(HKFT5BGXV) \
         launch.sh /scratch/gencore/sequencers/NB502067/240124_NB502067_0578_AHKFT5BGXV HKFT5BGXV

Testing

The pipeline includes a regression test suite that compares new pipeline output against known-good (ground truth) results.

How it works

Establish ground truth: Run the pipeline normally on a real run directory. The QC reports produced by MultiQC (demux report, run stats summary, undetermined barcodes) serve as the baseline. Copy these report files to a persistent ground truth directory (e.g. /home/gencore/GENEFLOW_TESTS_GT/<fcid>/).
Prepare a test run directory: Copy the original sequencer run directory and rename it with a test flowcell ID (e.g. 000000000-TEST1). This prevents test runs from writing logs, deliveries, and other artifacts into the production directories for the real run.
Configure the test: Create a test config file (e.g. test-miseq.config) that sets:

params.run_dir_path — path to the renamed test run directory
params.truth_dir — path to the ground truth reports saved in step 1

Run the test: Submit launch_tests.sh via SLURM. It runs the pipeline with the --test flag, which:

Skips the deliver process (no emails sent, no data delivered)
Enables the compare_runs process, which calls compare_runs.py to diff the new MultiQC reports against the ground truth files

Results: compare_runs.py compares three report types (demux report, run stats summary, undetermined barcodes) cell-by-cell. It reports PASS if all values match within tolerance, or FAIL with a detailed breakdown of differences. Exit code 0 = pass, 2 = fail.

Example: adding a test for a new sequencer type

After a successful production run of flowcell H323NDRX7, copy its reports

mkdir -p /home/gencore/GENEFLOW_TESTS_GT/H323NDRX7
cp /path/to/multiqc/output/*.txt /home/gencore/GENEFLOW_TESTS_GT/H323NDRX7/

Copy and rename the run directory

cp -r /scratch/gencore/sequencers/A01097/250822_A01097_0361_AH323NDRX7 \
      /scratch/gencore/sequencers/A01097/250822_A01097_0361_ANOVATEST1

Create test-novaseq.config pointing to both paths
Update launch_tests.sh with the new fcid/config, then sbatch launch_tests.sh

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github		.github
bin		bin
env		env
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
environment.yml		environment.yml
launch.sh		launch.sh
launch_tests.sh		launch_tests.sh
main.nf		main.nf
nextflow.config.template		nextflow.config.template
nyu.config		nyu.config
test-miseq.config		test-miseq.config
test-novaseq.config		test-novaseq.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENEFLOW: Genomic Examination and Nucleotide Evaluation For Laboratory Operations Workflow

Configuration Instructions

1. Update `launch.sh`

2. Update `nextflow.config`

3. Update config.py

Launching the Pipeline

Usage Examples

Production Deployment

Testing

How it works

Example: adding a test for a new sequencer type

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GENEFLOW: Genomic Examination and Nucleotide Evaluation For Laboratory Operations Workflow

Configuration Instructions

1. Update launch.sh

2. Update nextflow.config

3. Update config.py

Launching the Pipeline

Usage Examples

Production Deployment

Testing

How it works

Example: adding a test for a new sequencer type

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Update `launch.sh`

2. Update `nextflow.config`

Packages