Hi Anna and coauthors, thanks in advance for any advice. I really like the pipeline and could use some help getting it to work with using BLAST and NCBI's nt database. I am having issues getting the correct config settings for using NCBI nt database and taxdb as reference databases for COI.
What are the appropriate config parameters to use NCBI's nt database and taxonomy (taxdb) as reference for a marker like COI?
Could you provide an example config.yaml file that uses Blast nt database as the reference db?
I am able to run the pipeline, but am getting errors at the blastn_cluster step. Specifically, the name of the blast database is 'nt', but because the NCBI nt database is so big there is not a single file named 'nt' but many files with nt.XXX. I am getting the error in logs/blastn_cluster.log. It appears the issues are with the makeblastdb step in blastn_cluster. The database is already made and in a local directory. I have the NCBI nt and taxdump database installed locally and following installation instructions from BASTA as linked in the dadasnake installation instructions.
#Here are the errors I'm getting.
BLAST options error: File /home/jwhitney/dadasnake/DBs/blastdbs/nt does not exist.
log: logs/blastn_cluster.log (check log file(s) for error message)
conda-env: /home/jwhitney/programs/dadasnake/conda/66132e6a149ec730ec4c2d24861f8d4c
shell:
if [ -s clusteredTables/consensus.fasta ]; then
if [ ! -f "/home/jwhitney/dadasnake/DBs/blastdbs/nt.nin" ]
then
makeblastdb -dbtype nucl -in /home/jwhitney/dadasnake/DBs/blastdbs/nt -out /home/jwhitney/dadasnake/DBs/blastdbs/nt &> logs/blastn_cluster.log
fi
blastn -db /home/jwhitney/dadasnake/DBs/blastdbs/nt -query clusteredTables/consensus.fasta -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle" -out clusteredTables/blast_results.tsv -max_target_seqs 10 &>> logs/blastn_cluster.log
else
touch clusteredTables/blast_results.tsv
fi
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
And here are the relevant parts of the config.yaml
SETTINGS FOR TAXONOMIC ANNOTATION
taxonomy:
dada:
do: TRUE
classification is only done, if do_taxonomy is true
taxonomy:
mothur:
do: FALSE
db_path: "/home/jwhitney/.basta/taxonomy"
tax_db: ""
blast:
do: true
blast is only done, if do_taxonomy is true
run_on:
- ASV
- cluster
db_path: "/home/jwhitney/dadasnake/DBs/blastdbs"
tax_db: "nt"
e_val: 0.01
tax2id: ""
all: true
max_targets: 10
run_basta: true
basta_db: "/home/jwhitney/.basta/taxonomy"
basta_e_val: 0.00001
basta_alen: 100
basta_number: 0
basta_min: 3
basta_id: 80
basta_besthit: true
basta_perchits: 99
Thanks in advance for any advice.
Hi Anna and coauthors, thanks in advance for any advice. I really like the pipeline and could use some help getting it to work with using BLAST and NCBI's nt database. I am having issues getting the correct config settings for using NCBI nt database and taxdb as reference databases for COI.
What are the appropriate config parameters to use NCBI's nt database and taxonomy (taxdb) as reference for a marker like COI?
Could you provide an example config.yaml file that uses Blast nt database as the reference db?
I am able to run the pipeline, but am getting errors at the blastn_cluster step. Specifically, the name of the blast database is 'nt', but because the NCBI nt database is so big there is not a single file named 'nt' but many files with nt.XXX. I am getting the error in logs/blastn_cluster.log. It appears the issues are with the makeblastdb step in blastn_cluster. The database is already made and in a local directory. I have the NCBI nt and taxdump database installed locally and following installation instructions from BASTA as linked in the dadasnake installation instructions.
#Here are the errors I'm getting.
BLAST options error: File /home/jwhitney/dadasnake/DBs/blastdbs/nt does not exist.
log: logs/blastn_cluster.log (check log file(s) for error message)
conda-env: /home/jwhitney/programs/dadasnake/conda/66132e6a149ec730ec4c2d24861f8d4c
shell:
if [ -s clusteredTables/consensus.fasta ]; then
if [ ! -f "/home/jwhitney/dadasnake/DBs/blastdbs/nt.nin" ]
then
makeblastdb -dbtype nucl -in /home/jwhitney/dadasnake/DBs/blastdbs/nt -out /home/jwhitney/dadasnake/DBs/blastdbs/nt &> logs/blastn_cluster.log
fi
blastn -db /home/jwhitney/dadasnake/DBs/blastdbs/nt -query clusteredTables/consensus.fasta -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle" -out clusteredTables/blast_results.tsv -max_target_seqs 10 &>> logs/blastn_cluster.log
else
touch clusteredTables/blast_results.tsv
fi
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
And here are the relevant parts of the config.yaml
SETTINGS FOR TAXONOMIC ANNOTATION
taxonomy:
dada:
do: TRUE
classification is only done, if do_taxonomy is true
taxonomy:
mothur:
do: FALSE
db_path: "/home/jwhitney/.basta/taxonomy"
tax_db: ""
blast:
do: true
blast is only done, if do_taxonomy is true
run_on:
- ASV
- cluster
db_path: "/home/jwhitney/dadasnake/DBs/blastdbs"
tax_db: "nt"
e_val: 0.01
tax2id: ""
all: true
max_targets: 10
run_basta: true
basta_db: "/home/jwhitney/.basta/taxonomy"
basta_e_val: 0.00001
basta_alen: 100
basta_number: 0
basta_min: 3
basta_id: 80
basta_besthit: true
basta_perchits: 99
Thanks in advance for any advice.