This project demonstrates a modular bioinformatics workflow for tumor-normal sequencing data analysis using Nextflow DSL2. The pipeline performs quality control, read trimming, sequence alignment, variant analysis preparation, and variant annotation. The project also includes IGV-based visualization for comparing tumor and normal samples.
- Perform quality assessment of raw sequencing reads
- Trim adapters and low-quality bases
- Align sequencing reads to the human reference genome (hg38)
- Generate sorted and indexed BAM files
- Perform variant analysis preparation
- Annotate variants using ANNOVAR
- Visualize genomic variants using IGV
- Compare tumor and normal samples at selected genomic loci
FASTQ
↓
FastQC
↓
Fastp
↓
BWA-MEM
↓
SAMtools
↓
GATK Variant Analysis
↓
ANNOVAR Annotation
↓
IGV Validation
| Tool | Purpose |
|---|---|
| Nextflow DSL2 | Workflow management |
| FastQC | Read quality assessment |
| Fastp | Read trimming and filtering |
| BWA-MEM | Sequence alignment |
| SAMtools | BAM processing and indexing |
| GATK | Variant analysis preparation |
| ANNOVAR | Variant annotation |
| IGV | Variant visualization |
Input files:
tumor_R1.fastq.gz
tumor_R2.fastq.gz
normal_R1.fastq.gz
normal_R2.fastq.gzReference genome:
hg38.faThe pipeline accepts paired-end FASTQ sequencing reads from tumor and normal samples.
Individual modules were developed and tested separately.
Example commands:
nextflow run fastqc.nfnextflow run fastp.nfnextflow run bwa_gatk.nfnextflow run annovar.nfThe pipeline generates:
- FastQC HTML reports
- FastQC ZIP reports
- Trimmed FASTQ files
- Fastp HTML report
- Fastp JSON report
- Sorted BAM files
- BAM index (.bai) files
- Annotated VCF files
- Multianno text reports
Integrated Genomics Viewer (IGV) was used to inspect aligned reads and validate genomic variants.
Example screenshots are available in:
examples/
├── igv_tumor.png
├── igv_normal.png
Observed features:
- Read alignment patterns
- Coverage differences
- Variant-supporting reads
- Tumor-normal comparison at selected loci
A comparative visualization was performed using IGV.
Key observations:
- Variants present in tumor reads can be inspected visually.
- Read coverage between tumor and normal samples can be compared.
- Candidate somatic mutations may be identified through comparative analysis.
Detailed observations are available in:
examples/comparison.md
- Complete end-to-end workflow integration through a single main.nf pipeline
- Add automated variant calling workflow
- Generate MultiQC summary reports
- Add containerized execution using Docker/Singularity
- Support large-scale cohort analysis
- Integrate CNV and structural variant analysis
- Nextflow Documentation
- FastQC Documentation
- Fastp Documentation
- BWA Documentation
- SAMtools Documentation
- GATK Documentation
- ANNOVAR Documentation
- IGV Documentation
Md Mobarak Ansari
Bioinformatics and Genomics Enthusiast


