Genomic-Analysis-of-Founder-Effects-and-Bottleneck-Events-in-Dingo-Dogs

Production-grade implementation of an end-to-end Whole Genome Sequencing (WGS) variant calling pipeline, coupled with a full-stack analytical dashboard for inspection, validation, and interpretability of results.

The backend pipeline orchestrates all stages of genomic data processing — from raw FASTQ acquisition to final annotated variants — using modular, reproducible Bash workflows coordinated via a master execution script . It includes data download and QC, read trimming, reference genome preparation, alignment, and dual variant calling using both GATK HaplotypeCaller (GVCF workflow) and BCFtools mpileup/call , followed by merging, filtering, annotation with SnpEff , and cross-tool comparison.

The pipeline is designed with explicit intermediate outputs, validation checkpoints, and structured result directories, ensuring reproducibility, traceability, and ease of debugging across all stages (QC, alignment, variant calling, filtering, annotation, and post-processing).

On top of this, the project delivers a production-grade web dashboard that acts as a visualization and validation layer over the pipeline. It exposes pipeline state, execution trace, parameters, and outputs in a structured and interactive format, enabling both technical and biological inspection.

Core capabilities:

End-to-end automated WGS pipeline with modular Bash scripts and deterministic execution flow
Dual variant calling strategy (GATK vs BCFtools) with downstream harmonization and comparison
Variant filtering pipeline (missingness, SNP selection, LD pruning) and functional annotation (SnpEff)
Post-processing layer for contig normalization and chromosome mapping
Full observability of pipeline stages, inputs/outputs, and tool configurations
Interactive dashboard with:
- Aggregated metrics (reads, variants, filtering impact)
- Step-by-step execution trace with artifacts and parameters
- Side-by-side comparison of variant callers (counts, overlap, distributions)
- Variant filtering funnel and chromosome-level visualizations
- Annotation summaries and consistency checks
Embedded LLM assistant enabling contextual querying over pipeline outputs and results

This project bridges raw bioinformatics execution with a modern data engineering and analytics layer, providing a reproducible, inspectable, and explainable environment for genomic variant analysis.

Designed as a thin visualization and interpretation layer on top of bioinformatics workflows, with emphasis on transparency, comparability, and debugging of variant calling pipelines.

Main dashboard view of a Dingo WGS analysis pipeline, showing high-level pipeline metrics, sample and reference metadata, and a visual summary of variant processing stages.

Includes variant counts (GATK vs BCFtools), filtering funnel, pipeline stage progression, and quick access to detailed results and analyses.

Detailed pipeline configuration and execution view, showing all tools used (with versions and roles) and a step-by-step breakdown of the workflow.

Includes preprocessing, alignment, variant calling, filtering, and annotation stages, with parameters, scripts, and generated outputs for each step—enabling full transparency and reproducibility of the analysis.

Step-by-step execution view of the genomic pipeline, presenting each stage from raw data quality control through trimming, alignment, variant calling, filtering, and annotation.

Displays key metrics, parameters, intermediate outputs, and comparison between GATK and BCFtools results, enabling traceability of how raw reads are transformed into final high-confidence variants.

___

Comparison view between GATK HaplotypeCaller and BCFtools pipelines, highlighting differences in variant detection.

Includes overlap analysis (shared vs unique variants), detailed metrics (raw counts, filtering stages, SNP/indel breakdown), and chromosome-level distributions, supported by visualizations and concise explanations of methodological differences between callers.

___

End-to-end pipeline view visualizing the full genomic workflow from raw data ingestion to final annotated variants.

Shows each stage (QC, trimming, alignment, variant calling, filtering, annotation, post-processing) in a sequential timeline with associated inputs, outputs, and parameters, providing a complete, traceable execution overview of the pipeline.

___

Integrated LLM assistant interface that enables querying pipeline results using natural language.

Provides contextual answers based on processed genomic data (variants, samples, annotations), with pre-defined prompts and full pipeline awareness to support interpretation, troubleshooting, and exploratory analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
dingo_dashboard		dingo_dashboard
01_setup_and_preprocessing.sh		01_setup_and_preprocessing.sh
02_2_prepare_reference_genome.sh		02_2_prepare_reference_genome.sh
02_prepare_reference_genome.sh		02_prepare_reference_genome.sh
03_align_and_process_reads.sh		03_align_and_process_reads.sh
04_variant_calling_with_gatk.sh		04_variant_calling_with_gatk.sh
05_variant_calling_with_bcftools.sh		05_variant_calling_with_bcftools.sh
06_merge_bcftools_variants.sh		06_merge_bcftools_variants.sh
07_filter_variants.sh		07_filter_variants.sh
08_annotate_filtered_variants.sh		08_annotate_filtered_variants.sh
09_compare_annotations.sh		09_compare_annotations.sh
10_contigs.sh		10_contigs.sh
11_add_chromosome_annotation.sh		11_add_chromosome_annotation.sh
12_fix_annotation_chromosomes.sh		12_fix_annotation_chromosomes.sh
README.md		README.md
dingo.sh		dingo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomic-Analysis-of-Founder-Effects-and-Bottleneck-Events-in-Dingo-Dogs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Genomic-Analysis-of-Founder-Effects-and-Bottleneck-Events-in-Dingo-Dogs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages