This shotgun metagenomics pipeline processes raw short read paired-end reads into usable microbiome data, suitable for postprocessing. The pipeline performs quality control of sequences, host genome sequence removal, taxonomic profiling, and functional profiling. This pipeline is meant to provide beginners with a seamless tool to achieve basic microbiome analyses.
All downstream scripts used to created the figures in our paper are located in analysis.
Please cite: Metagenomic profiling and predictive modeling of the gut microbiome reveal signatures of gestational disease
To use MGPipe, you need to have conda installed, MGPipe cloned locally, Kraken2/Bracken databases downloaded, and HUMAnN3 installed.
- Unix-based system (Linux/macOS)
- Minimum 16GB RAM (32GB recommended)
- 100GB+ free disk space
mkdir -p ./bin
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ./bin/miniconda.sh
bash ./bin/miniconda.sh -b -p ./bin/miniconda3Important: Update CONDAPATH in mgpipe.sh to match your installation path, especially if you have conda installed already:
CONDAPATH="./bin/miniconda3" # Modify this path if neededgit clone https://github.com/ginnymortensen/MGPipe.gitKraken2/Bracken updates its standard reference database.
To download the most recent database, please reference https://benlangmead.github.io/aws-indexes/k2.
cd MGPipe
curl --header 'Host: genome-idx.s3.amazonaws.com' --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' --header 'Accept-Language: en-US,en;q=0.9' --header 'Referer: https://benlangmead.github.io/' 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz' -L -o 'k2_standard_20240605.tar.gz'
tar -xzvf k2_standard_20240605.tar.gzNotice: Update KRAKEN2_DB in taxonomic_profiler.sh to match your installation path if you already have the Kraken2 database installed:
KRAKEN2_DB="k2_standard_20240605" # Modify this path if neededHUMAnN is updated every so often.
Reference https://github.com/biobakery/humann for installation instructions.
curl --header 'Host: files.pythonhosted.org' --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' --header 'Accept-Language: en-US,en;q=0.9' --header 'Referer: https://pypi.org/' 'https://files.pythonhosted.org/packages/b2/8f/0d908a2a43f89f03e4d1f22baf80b77a4bce342b721552737173c4da74cd/humann-3.9.tar.gz' -L -o 'humann-3.9.tar.gz'Follow the installation instructions for HUMAnN after download is complete. The databases for HUMAnN are installed via:
cd MGPipe
humann_databases --download chocophlan full humann_databases
humann_databases --download uniref uniref90_diamond humann_databasesNotice Update DB_DIR in functional_profiler.sh to match your HUMAnN3 database installation path if you already have them installed:
DB_DIR="humann_databases" # Modify this path if neededMGPipe will automatically install bowtie2 indexes from ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/grch38_1kgmaj.fa.gz if it does not find them in the default directory during its initial execution. This step takes a significant amount of time to run.
If you already have bowtie2 indexes installed, update DB_DIR and INDEX_NAME in host_remover.sh to match your bowtie2 indexes installation path and index name.
DB_PATH="bowtie_indexes" # Modify this path if needed
INDEX_NAME="grch38_1kgmaj" # Modify this index name if needed- Create a directory called
rawat the same directory tree level asMGPipe
mkdir raw- Ensure sequences are in
fastq.gzformat - Place paired-end FASTQs in
raw/with standard short read naming convention:*_R1_001.fastq.gzand*_R2_001.fastq.gz
Your directory should have this structure prior to your initial run:
.
├── MGPipe
│ ├── humann_databases/
│ │ ├── chocophlan/
│ │ └── uniref/
│ ├── k2_standard_20240605/
│ ├── functional_profiler.sh
│ ├── host_remover.sh
│ ├── mgpipe_env.yaml
│ ├── mgpipe.sh
│ ├── README.md
│ ├── taxonomic_profiler.sh
│ └── trimmer.sh
└── raw/
├── sample1_R1_001.fastq.gz
├── sample1_R2_001.fastq.gz
└── ...cd MGPipe
. mgpipe.shIf you'd like to skip taxonomic profiling and/or functional profiling steps:
. mgpipe.sh --skip taxonomic_profiler,functional_profilerWhen running natively, your output directory will have this structure:
.
├── MGPipe
│ ├── bowtie_indexes
│ ├── humann_databases
│ │ ├── chocophlan
│ │ └── uniref
│ └── k2_standard_20240605
├── raw
├── reports
│ ├── sample1
│ └── ...
└── results
├── functional_profile
│ ├── combined_tables
│ ├── renormalized_tables
│ ├── restratified_tables
│ └── sample_tables
├── no_host
├── taxonomic_profile
│ ├── combined_tables
│ ├── kraken2_bracken_output
│ └── sample_tables
└── trimmed. mgpipe.sh --help| Script | Purpose | Key Tools | Tool Documentation |
|---|---|---|---|
trimmer.sh |
Quality control & adapter trimming | FASTP | FASTP Manual |
host_remover.sh |
Host DNA removal | bowtie2 | bowtie2 Manual |
taxonomic_profiler.sh |
Species-level profiling | Kraken2 Bracken |
Kraken2 Wiki Bracken Paper |
functional_profiler.sh |
Metabolic pathway analysis | HUMAnN3 | HUMAnN3 Docs |
- FASTP
Official documentation: https://github.com/OpenGene/fastp
- bowtie2
User manual: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
Index building command used:bowtie2-build --threads $THREADS $REF_FASTA $INDEX_NAME
-
Kraken2
Configuration guide: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown
Database requirements:KRAKEN2_DB="path/to/standard_db" kraken2 --db $KRAKEN2_DB --threads $THREADS --paired $INPUT_FILES
-
Bracken
Abundance estimation methodology:
https://ccb.jhu.edu/software/bracken/
- HUMAnN3
Full documentation: https://github.com/biobakery/humann#humann-30
Critical database files:# ChocoPhlAn database humann_databases --download chocophlan full humann_databases # UniRef90 database humann_databases --download uniref uniref90_diamond humann_databases