Skip to content

NCBI BLAST nt database configuration with dadasnake: Example config.yaml files for use with BLAST #35

Description

@jonwhit

Hi Anna and coauthors, thanks in advance for any advice. I really like the pipeline and could use some help getting it to work with using BLAST and NCBI's nt database. I am having issues getting the correct config settings for using NCBI nt database and taxdb as reference databases for COI.

What are the appropriate config parameters to use NCBI's nt database and taxonomy (taxdb) as reference for a marker like COI?
Could you provide an example config.yaml file that uses Blast nt database as the reference db?

I am able to run the pipeline, but am getting errors at the blastn_cluster step. Specifically, the name of the blast database is 'nt', but because the NCBI nt database is so big there is not a single file named 'nt' but many files with nt.XXX. I am getting the error in logs/blastn_cluster.log. It appears the issues are with the makeblastdb step in blastn_cluster. The database is already made and in a local directory. I have the NCBI nt and taxdump database installed locally and following installation instructions from BASTA as linked in the dadasnake installation instructions.

#Here are the errors I'm getting.

BLAST options error: File /home/jwhitney/dadasnake/DBs/blastdbs/nt does not exist.

log: logs/blastn_cluster.log (check log file(s) for error message)

conda-env: /home/jwhitney/programs/dadasnake/conda/66132e6a149ec730ec4c2d24861f8d4c

shell:

if [ -s clusteredTables/consensus.fasta ]; then

if [ ! -f "/home/jwhitney/dadasnake/DBs/blastdbs/nt.nin" ]

then

makeblastdb -dbtype nucl -in /home/jwhitney/dadasnake/DBs/blastdbs/nt -out /home/jwhitney/dadasnake/DBs/blastdbs/nt &> logs/blastn_cluster.log

fi

blastn -db /home/jwhitney/dadasnake/DBs/blastdbs/nt -query clusteredTables/consensus.fasta -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle" -out clusteredTables/blast_results.tsv -max_target_seqs 10 &>> logs/blastn_cluster.log

else

touch clusteredTables/blast_results.tsv

fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)


And here are the relevant parts of the config.yaml

SETTINGS FOR TAXONOMIC ANNOTATION

taxonomy:
dada:
do: TRUE

classification is only done, if do_taxonomy is true

taxonomy:
mothur:
do: FALSE
db_path: "/home/jwhitney/.basta/taxonomy"
tax_db: ""

blast:
do: true

blast is only done, if do_taxonomy is true

run_on:
- ASV
- cluster
db_path: "/home/jwhitney/dadasnake/DBs/blastdbs"
tax_db: "nt"
e_val: 0.01
tax2id: ""
all: true
max_targets: 10
run_basta: true
basta_db: "/home/jwhitney/.basta/taxonomy"
basta_e_val: 0.00001
basta_alen: 100
basta_number: 0
basta_min: 3
basta_id: 80
basta_besthit: true
basta_perchits: 99


Thanks in advance for any advice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions