Skip to content

Mynamemd/WES-Analysis-Nextflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tumor-Normal Variant Analysis Pipeline using Nextflow

Project Overview

This project demonstrates a modular bioinformatics workflow for tumor-normal sequencing data analysis using Nextflow DSL2. The pipeline performs quality control, read trimming, sequence alignment, variant analysis preparation, and variant annotation. The project also includes IGV-based visualization for comparing tumor and normal samples.


Objectives

  • Perform quality assessment of raw sequencing reads
  • Trim adapters and low-quality bases
  • Align sequencing reads to the human reference genome (hg38)
  • Generate sorted and indexed BAM files
  • Perform variant analysis preparation
  • Annotate variants using ANNOVAR
  • Visualize genomic variants using IGV
  • Compare tumor and normal samples at selected genomic loci

Workflow Diagram

FASTQ
  ↓
FastQC
  ↓
Fastp
  ↓
BWA-MEM
  ↓
SAMtools
  ↓
GATK Variant Analysis
  ↓
ANNOVAR Annotation
  ↓
IGV Validation

Workflow Diagram

Tools Used

Tool Purpose
Nextflow DSL2 Workflow management
FastQC Read quality assessment
Fastp Read trimming and filtering
BWA-MEM Sequence alignment
SAMtools BAM processing and indexing
GATK Variant analysis preparation
ANNOVAR Variant annotation
IGV Variant visualization

Input Data

Input files:

tumor_R1.fastq.gz
tumor_R2.fastq.gz
normal_R1.fastq.gz
normal_R2.fastq.gz

Reference genome:

hg38.fa

The pipeline accepts paired-end FASTQ sequencing reads from tumor and normal samples.


Running the Pipeline

Individual modules were developed and tested separately.

Example commands:

nextflow run fastqc.nf
nextflow run fastp.nf
nextflow run bwa_gatk.nf
nextflow run annovar.nf

Output Files

The pipeline generates:

Quality Control

  • FastQC HTML reports
  • FastQC ZIP reports

Read Trimming

  • Trimmed FASTQ files
  • Fastp HTML report
  • Fastp JSON report

Alignment

  • Sorted BAM files
  • BAM index (.bai) files

Variant Annotation

  • Annotated VCF files
  • Multianno text reports

IGV Visualization

Integrated Genomics Viewer (IGV) was used to inspect aligned reads and validate genomic variants.

Example screenshots are available in:

examples/
├── igv_tumor.png
├── igv_normal.png

visualization Screenshot visualization Screenshot

Observed features:

  • Read alignment patterns
  • Coverage differences
  • Variant-supporting reads
  • Tumor-normal comparison at selected loci

Tumor vs Normal Comparison

A comparative visualization was performed using IGV.

Key observations:

  • Variants present in tumor reads can be inspected visually.
  • Read coverage between tumor and normal samples can be compared.
  • Candidate somatic mutations may be identified through comparative analysis.

Detailed observations are available in:

examples/comparison.md

Comparing File

Future Improvements

  • Complete end-to-end workflow integration through a single main.nf pipeline
  • Add automated variant calling workflow
  • Generate MultiQC summary reports
  • Add containerized execution using Docker/Singularity
  • Support large-scale cohort analysis
  • Integrate CNV and structural variant analysis

References

  1. Nextflow Documentation
  2. FastQC Documentation
  3. Fastp Documentation
  4. BWA Documentation
  5. SAMtools Documentation
  6. GATK Documentation
  7. ANNOVAR Documentation
  8. IGV Documentation

Author

Md Mobarak Ansari

Bioinformatics and Genomics Enthusiast

About

Tumor-normal variant analysis workflow using Nextflow DSL2, FastQC, Fastp, BWA, GATK, ANNOVAR and IGV visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors