Skip to content

cyr20040123/RapidASM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RapidASM: Efficient De Novo Assembly for HiFi Reads

Description

A time- and memory-efficient HiFi read de novo assembler, producing primary contigs.

Build

# 1. clone or download this repository
git clone https://github.com/cyr20040123/RapidASM.git
# 2. enter the build folder containing makefile
cd build
# 3. compile
make
# then the executable "asm" is generated in the build folder

Run on a test dataset

1. Download dataset

Download and unzip the HiFi read dataset and reference genome under "Releases" from https://github.com/cyr20040123/downsampled_Ecoli

2. Run RapidAsm

/path/to/asm -threads 64 -readfile /path/to/SRR11434954.downsampled.fasta

Then the assembly result "contigs.fasta" will be generated in the current folder.

3. (optional) Evaluate assembly result with QUAST

# install QUAST with conda:
conda install -c bioconda quast
# run evaluation with the assembly result (contigs.fasta) and reference genome (GCF_000026545.1_ASM2654v1_genomic.fna)
quast -o /path/to/results -r /path/to/GCF_000026545.1_ASM2654v1_genomic.fna /path/to/contigs.fasta

Arguments

Usage: asm [options]
Options:
  -overlaps    [int] Dump overlaps to file (0: no dump, 1: dump AG format, 2: load dumped overlaps, 3: dump PAF format, 4: dump PAF to /dev/shm <0>
  -k           [int] k value for minimizer <32>
  -threads     [int] Number of threads <-1>
  -logfile     [string] Log file prefix <>
  -contigfile  [string] Contig file prefix <>
  -readfile    [string(s)] Read files (COMPULSORY)
  -acc         [double] Sequencing accuracy, similarity threshold for overlap detection & contig generation. <0.9850>
  -worstacc    [double] Worst accuracy threshold in overlap detection. <0.9700>
  -mmratio     [int] Minimizer ratio <193>
  -alignmm     [int] Minimum aligned minimizer threshold <2>
  -minmm       [int] Min count of minimizer, suggested value: depth / 10 <3>
  -maxmm       [int] Max count of minimizer, suggested value: 2.5 * depth <500>
  -chains      [int] Keep top x chains in minimizer seed-and-chain alignment <3>
  -min_ovlp    [double] Min overlap bps ratio (0,1] in read overlapping of edge generation. <0.0500>
  -minedges    [int] Min support edges when extending reads in contig generation <2>
  -downsample  [int] Down-sample x% reads <100>
  -print       [int] Print debug info for this node id and its reverse complement. <-1>
  -pdebug      [int] Print unnecessary debug info. (0: disable, 1: enable) <0>
  -ag          [string] Prefix of filename to output cleared assembly graph. Blank for no output. <>
  -gt          [string] Ground truth positions of nodes in reference, for debugging purpose only. <>

About

Efficient HiFi Read De Novo Assembly

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors