A time- and memory-efficient HiFi read de novo assembler, producing primary contigs.
# 1. clone or download this repository
git clone https://github.com/cyr20040123/RapidASM.git
# 2. enter the build folder containing makefile
cd build
# 3. compile
make
# then the executable "asm" is generated in the build folderDownload and unzip the HiFi read dataset and reference genome under "Releases" from https://github.com/cyr20040123/downsampled_Ecoli
/path/to/asm -threads 64 -readfile /path/to/SRR11434954.downsampled.fastaThen the assembly result "contigs.fasta" will be generated in the current folder.
# install QUAST with conda:
conda install -c bioconda quast
# run evaluation with the assembly result (contigs.fasta) and reference genome (GCF_000026545.1_ASM2654v1_genomic.fna)
quast -o /path/to/results -r /path/to/GCF_000026545.1_ASM2654v1_genomic.fna /path/to/contigs.fastaUsage: asm [options]
Options:
-overlaps [int] Dump overlaps to file (0: no dump, 1: dump AG format, 2: load dumped overlaps, 3: dump PAF format, 4: dump PAF to /dev/shm <0>
-k [int] k value for minimizer <32>
-threads [int] Number of threads <-1>
-logfile [string] Log file prefix <>
-contigfile [string] Contig file prefix <>
-readfile [string(s)] Read files (COMPULSORY)
-acc [double] Sequencing accuracy, similarity threshold for overlap detection & contig generation. <0.9850>
-worstacc [double] Worst accuracy threshold in overlap detection. <0.9700>
-mmratio [int] Minimizer ratio <193>
-alignmm [int] Minimum aligned minimizer threshold <2>
-minmm [int] Min count of minimizer, suggested value: depth / 10 <3>
-maxmm [int] Max count of minimizer, suggested value: 2.5 * depth <500>
-chains [int] Keep top x chains in minimizer seed-and-chain alignment <3>
-min_ovlp [double] Min overlap bps ratio (0,1] in read overlapping of edge generation. <0.0500>
-minedges [int] Min support edges when extending reads in contig generation <2>
-downsample [int] Down-sample x% reads <100>
-print [int] Print debug info for this node id and its reverse complement. <-1>
-pdebug [int] Print unnecessary debug info. (0: disable, 1: enable) <0>
-ag [string] Prefix of filename to output cleared assembly graph. Blank for no output. <>
-gt [string] Ground truth positions of nodes in reference, for debugging purpose only. <>