ruanjue / smartdenovo Goto Github PK
View Code? Open in Web Editor NEWUltra-fast de novo assembler using long noisy reads
License: GNU General Public License v3.0
Ultra-fast de novo assembler using long noisy reads
License: GNU General Public License v3.0
Hi,
Thanks for developing smartdenovo and wtdbg2. I am actually working on a very heterozygous insect genome. The smartdenovo log file returns a genome size of around 400 Mb (which is close to the estimated haploid genome size) whereas the consensus (cns) sequence is around 180Mb. Is it possible to parse the layout files in order to retrieve all FASTA sequences belonging to the 400Mb genome assembly?
P.S.: I am also trying to optimise wtdbg2 parameters on my read data (right now I get rather messy results).
Thank you very much in advance,
Ben
Hi
Is there any way that we can get .gfa file of assembly?
hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 82k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!
Hello,
I encountered the following error when compiling smartdenovo. Could you help to diagnose what might have caused this problem? Thanks!
....
wtzmo.c:91:1: note: in expansion of macro ‘define_list’
define_list(pbreadv, pbread_t);
^~~~~~~~~~~
wtzmo.c: In function ‘push_long_read_wtzmo’:
wtzmo.c:210:2: warning: incompatible implicit declaration of built-in function ‘memcpy’
memcpy(ptr, name, name_len);
^~~~~~
wtzmo.c:210:2: note: include ‘<string.h>’ or provide a declaration of ‘memcpy’
wtzmo.c: In function ‘thread_midx_func’:
wtzmo.c:247:1: warning: incompatible implicit declaration of built-in function ‘memset’
memset(&U, 0, sizeof(hzmh_t));
^~~~~~
wtzmo.c:247:1: note: include ‘<string.h>’ or provide a declaration of ‘memset’
wtzmo.c: In function ‘thread_mzmo_func’:
wtzmo.c:789:1: warning: incompatible implicit declaration of built-in function ‘memset’
memset(&SEED[0], 0, sizeof(wt_seed_t));
^~~~~~
wtzmo.c:789:1: note: include ‘<string.h>’ or provide a declaration of ‘memset’
wtzmo.c:1058:15: warning: implicit declaration of function ‘strdup’ [-Wimplicit-function-declaration]
HIT.cigar = strdup(cigar_str->string);
^~~~~~
wtzmo.c:1058:15: warning: incompatible implicit declaration of built-in function ‘strdup’
In file included from wtzmo.c:25:0:
wtzmo.c: In function ‘main’:
file_reader.h:97:30: warning: implicit declaration of function ‘ref_VStrv’; did you mean ‘ref_diagv’? [-Wimplicit-function-declaration]
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:28: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:28: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:53: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:79: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1766:43: note: in expansion of macro ‘get_col_str’
if((pb1 = kv_get_cuhash(wt->rdname2id, get_col_str(fr, 0))) == 0xFFFFFFFFU) continue;
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1767:43: note: in expansion of macro ‘get_col_str’
if((pb2 = kv_get_cuhash(wt->rdname2id, get_col_str(fr, 1))) == 0xFFFFFFFFU) continue;
^~~~~~~~~~~
Makefile:33: recipe for target 'wtzmo' failed
make: *** [wtzmo] Error 1
Best,
Jia-Xing
Hi,
I had downloaded smartdenovo,and running the command:
-bash-4.1$ ./smartdenovo-master/smartdenovo.pl -c 1 ~/scsio/Nanopore/Data/N_all_filt.fq > wtasm.mak
but the error coming:
Died at ./smartdenovo-master/smartdenovo.pl line 23.
I don't know how to deal with it. Could you tell me how to do about it?
Thank you.
Does it support multiple input files like input*.fa.gz ?
How to provide multiple input files?
For ex I have:
input1.fa
input2.fa
input3.fa
How the command will look like?
Dear Ruanjue,
If I understand correctly, smartdenovo works very different from DBG assemblers, is kind of OLC assembler. So i am not sure if kmer selection should be done based on Jelleyfish or Kmergenie. Therefore, how to select kmer size for smartdenovo?
Sincrely,
panpan
Making a release with an actual version number makes it easier to know exactly what has been installed on a system. Especially HPC sysadmins like versioned software.
Hi, I am trying to run smartdenovo but I keep on getting this output
`ubuntu@biolinux:/mnt/Federico/TD_1/Smartdenovo_assembly$ '/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/smartdenovo.pl' /mnt/Federico/TD_1/basecalled.fasta
PREFIX=wtasm
EXE_PRE=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtpre
EXE_ZMO=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtzmo
EXE_OBT=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtobt
EXE_GBO=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtgbo
EXE_CLP=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtclp
EXE_LAY=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtlay
EXE_CNS=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtcns
N_THREADS=8
all:$(PREFIX).dmo.lay
$(PREFIX).fa.gz:
`
If I include the output > genome.mak nothing happens, Any ideas? Thanks
I want to test wtcorr. However, it failed to compile:
$ make wtcorr
gcc -W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3 -o wtcorr wtcorr.c file_reader.c -lm -lpthread
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:422:1: warning: "E" redefined
wtcorr.c:385:1: warning: this is the location of the previous definition
wtcorr.c: In function 'get_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: (Each undeclared identifier is reported only once
wtcorr.c:425: error: for each function it appears in.)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'prepare_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'exists_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'add_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'remove_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'encap_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'last_base_dp_kmer':
wtcorr.c:554: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:554: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'trace_aln_paths':
wtcorr.c:565: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:565: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'count_covered_qmers':
wtcorr.c:584: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:584: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'call_correct_seq':
wtcorr.c:624: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:624: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:636: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:636: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:644: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:651: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:651: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:659: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:659: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:681: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'init_bf_kmer_dbgaln':
wtcorr.c:689: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:689: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'print_dp_kmers':
wtcorr.c:724: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:724: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'dbg_aln_core':
wtcorr.c:757: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:757: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:757: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:772: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:776: error: 'dbg_aligner' has no member named 'smin'
wtcorr.c:782: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:782: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:784: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:784: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:785: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:785: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:788: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:788: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:793: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:793: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:795: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:795: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:796: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:796: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:804: warning: implicit declaration of function 'process_cached_dps_dbgaln'
wtcorr.c:818: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:822: error: 'dbg_dp_t' has no member named 'fw_idx'
wtcorr.c:823: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:823: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:826: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:829: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:834: error: 'dc' undeclared (first use in this function)
wtcorr.c:834: warning: implicit declaration of function 'prepare_dbgcachehash'
wtcorr.c:834: error: 'dbg_aligner' has no member named 'gcache'
wtcorr.c:834: error: 'dbg_cache_t' undeclared (first use in this function)
wtcorr.c:834: error: expected ')' before '{' token
wtcorr.c:835: error: 'dc_exists' undeclared (first use in this function)
wtcorr.c:837: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:838: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:854: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:868: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:871: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:874: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:874: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:875: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:881: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:895: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:897: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:899: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:899: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:900: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:906: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:918: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:921: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:921: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:922: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:922: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:928: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:752: warning: unused variable 'qpos'
wtcorr.c:752: warning: unused variable 'found'
wtcorr.c:752: warning: unused variable 'kc_exists'
wtcorr.c:750: warning: unused variable 'path'
wtcorr.c:750: warning: unused variable 'pidx'
wtcorr.c:750: warning: unused variable 'didx'
wtcorr.c:749: warning: unused variable 'BR'
wtcorr.c:749: warning: unused variable 'BK'
wtcorr.c:749: warning: unused variable 'bk'
wtcorr.c:747: warning: unused variable 'kc'
wtcorr.c:746: warning: unused variable 'dp1'
wtcorr.c: In function 'dbg_aln':
wtcorr.c:940: warning: implicit declaration of function 'clear_dbgcachehash'
wtcorr.c:940: error: 'dbg_aligner' has no member named 'gcache'
wtcorr.c:947: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:948: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:972: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:973: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:998: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:999: error: 'dbg_dp_t' has no member named 'aux2'
wtcorr.c:1001: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:1001: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1007: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1007: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:1010: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1011: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:1011: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1012: error: 'dbg_aligner' has no member named 'last_cached_pos'
wtcorr.c:1044: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1063: error: 'dbg_dp_t' has no member named 'mat'
wtcorr.c:1080: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1080: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'main':
wtcorr.c:1399: error: 'DBG_MAX_BT_IDX' undeclared (first use in this function)
wtcorr.c:1423: error: 'dbg_aligner' has no member named 'smin'
wtcorr.c:1450: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1456: error: 'dbg_dp_t' has no member named 'qpos'
make: *** [wtcorr] Error 1
How to do with it?
Hi,
I am trying to install smartdenovo for my analysis but it failed with error when i do the make command
This the error i am getting, help me in fixing this
collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1
hello,ruanjue!
wtzmo中的参数 -k: 为什么设置5-32的范围,大于32或者放开限制会怎样?
Dear community,
The -c parameter is used to generate the consensus sequence,but it is too slow.
Can I use prefix.dmo.lay.utg as the final contig.fasta ?
Thank you.
Hi,
Thank you for this useful tool.
I was wondering how you would like it to be cited.
best,
John
Hi Ruan,
Thanks for providing this fast assembly program.
I am using smartdenovo to assemble an insect genome (350 Mb estimated genome size but expected highly heterozygous), with 170 X nanopore raw reads. The first round of smartdenovo resulted in a 672 Mb genome assembly with N50 240 Kb. I am wondering which parameters should I tune to improve the assembly?
In addition, I got 34.4 Mb sequences in prefix.dmo.cns file, which is far less the estimated genome size. Is there anything I did wrong?
Looking forward to your reply!
Dear Ruanjue, I have used defult paramater for assembling a plant genome, the mitochondrial genome (~400kb) is assembled partly but the chloroplast genome (150kb) is totally lost, do you have any suggestion like adjust some paramater?
Best,
panpan
Dear SMARTdenovo,
Hope this email finds you well.
While I was testing the program for a PacBio data (genome size 2.5Gb) in PBSpro environment, I have bumped into the same issue constantly at the "[SMDS849.dmo.ovl] Bus error (core dumped)".
There was a bus error on SMDS849.dmo.ovl. FYI, please see below for the output file.
Looking forward to your reply!
Regards,
Taek
Hi,
I used SMARTdenovo on a yeast strain to assemble it with Nanopore reads. I obtained a very good continuity but in the output *.dmo.cns
, I can only find a part of the mitochondrion.
If I align the *.utg.dup
against the reference genome, I can see that the mitochondrion is present in its entirety. I think that maybe SMARTdenovo is rejecting parts of the mitochondrion due to a very high coverage, since it's present in unitigs but not in contigs.
I tried to decrease the coverage of reads given as input to SMARTdenovo but didn't obtain better results.
Do you have any tips concerning the parameters I could use to get the mitochondrion assembled in the consensus file ?
Hello,
I wonder why smartdenovo is so looong ??!!
I have reserved 10 CPU and 100G per CPU. I have 52G of data (ONTreads >=10Kb) and launched
by default parametters using build_conda_envs/3f08aa12/bin/wtzmo -t 8 -i SMART.fa.gz -fo SMART.dmo.ovl -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000
It has beeing running from 10 days :S
At the moment, I have 55G of SMART.dmo.ovl and 17G of SMART.fa.gz.
Do you think it will be finish a day? How to estimate how many time is required to smartdenovo assembly?
Any idea?
Thanks.
Julie
I want use it on my aarch64(arm64) platform.
It means I must use SSE2NEON.h
instead of emmintrin.h
in ksw.c
.
and the arm version gcc cannot use "-mpopcnt" and "-mssse3".
So we need this patch for aarch64 platform:
diff --git a/Makefile b/Makefile
index 0802f65..3816b6e 100644
--- a/Makefile
+++ b/Makefile
@@ -2,9 +2,9 @@ VERSION=1.0.0
MINOR_VER=20140314
CC=gcc
ifdef DEBUG
-CFLAGS=-g3 -W -Wall -O0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3
+CFLAGS=-g3 -W -Wall -O0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
else
-CFLAGS=-W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3
+CFLAGS=-W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
endif
INSTALLDIR=/usr/local/bin
GLIBS=-lm -lpthread
diff --git a/ksw.c b/ksw.c
index 15dd0f2..22641ed 100644
--- a/ksw.c
+++ b/ksw.c
@@ -25,7 +25,7 @@
#include <stdlib.h>
#include <stdint.h>
-#include <emmintrin.h>
+#include "SSE2NEON.h"
#include "ksw.h"
#ifdef USE_MALLOC_WRAPPERS
Is there any one can help us merge this feature to smartdenovo
?
Hello,
I have a couple of questions.
I was thinking of using Racon to polish the smartdenovo assembly. Racon suggests to use these reads:
racon [options ...]
for the first sequences, I can give illumina reads as the input , I am not sure what would be the best input for the second sequnces as smartdenovo does not make any mhap files like other assemblers like Canu and for the third sequences I will input the .cns file generated by smartdenovo assembly.
Can you please help me out what needs to go in the input files for the overlaps?
2) is there a better polishing tool other than Racon for Smartdenovo assembly? Note : I have used long reads from both PacBio and Nanopore for my assembly
Thanks heaps in advance
S
Hello All,
I am trying to do a smartdenovo assembly on my PacBio and Nanopore MinION and PromethION generated data. I got upto the .cns files and then I am not sure what is meant to happen from there. I was hoping a summary file with the number of contigs and the N50 all that would be generated but could not find any of it. Can you suggest how do i look for it, please?
I use the script
/group/pasture/Saila/Smartdenovo/smartdenovo-master/smartdenovo.pl -p saila_smartdenovo -c 1 /group/pasture/Saila/Smartdenovo/all_fastq_files/all_fastq_files_smartdenovo.fastq > saila_smartdenovo.mak
make -f saila_smartdenovo.mak
The type of files I got so far are
.cns.log
.cns
.lay.utg.dup
.utg
.lnk
.dup
.lay
.lay.1/2/3/4/5.dot
lay.contained_reads
.dmo.obt
.ovl.contained
.ovl
And no summary or fasta files. So I am a bit confused about it.
Thanks
Saila
Dear SMARTdenovo,
Hope this email finds you well. I assume “smartdenovo.pl” includes all these programs: wtzmo, wtgbo, wtclp, ctcns, and wtmsa.
After running this script
/smartdenovo/smartdenovo.pl -p prefix reads.fa > prefix.mak
make -f prefix.mak
It was supposed to generate “prefix.cns” but I could not find it. The generated files are .lay, .lay.dup, .lay.utg, .lay.utg.dup, and dmo.ovl.
Did I miss something?
Looking forward to your reply!
Best regards,
Taek
Hi,
I assemble a 1Gb genome using 17x of canu corrected reads. smartdenovo crashes at the last step : wtlay.
gdb /usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay core.180804
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay...(no debugging symbols found)...done.
[New Thread 180804]
Missing separate debuginfo for
Try: yum --disablerepo='' --enablerepo='-debug*' install /usr/lib/debug/.build-id/f7/95efbe6950d1523c5748594c166cedd4254c33
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `/usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay -i All7.fa.gz -b All7.dmo.'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004166b0 in merge_bubble_core_best_overlap_strgraph ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64
What could cause the crash?
What could I test to go further?
Ch+
I'm interested in your tool after watching the report of Bjorn Usadel at 2017 London Calling.
I'm assembling an animal genome which is repeat-rich. I haven't got good assembly yet and want to test your tool. After running your tool following the steps you write in your github homepage, I got following files and don't know the meaning of some of files. Also I didn't see .cns file.
The following is the files generated.
wtasm.dmo.lay
wtasm.dmo.lay.dup
wtasm.dmo.lay.lnk
wtasm.dmo.lay.utg
wtasm.dmo.lay.utg.dup
wtasm.dmo.lay.4.dot
wtasm.dmo.lay.5.dot
wtasm.dmo.lay.3.dot
wtasm.dmo.lay.2.dot
wtasm.dmo.lay.contained_reads
wtasm.dmo.lay.1.dot
wtasm.dmo.obt
wtasm.dmo.ovl
wtasm.dmo.ovl.contained
wtasm.fa.gz
Thanks!
Dear ruanjue,
May i ask can i get fastq file from wtcns? I need fastq file for downstream analysis such as reads alignment. From now, i can only obtain fasta from wtcns.
Best,
panpan
Hi,
I would like to assemble the nanopore plant mitochondrial genome sequence using SMARTdenonovo. But I could not find any commands to run this data. Could you please give commands for the same.
Thank you.
Hi,
I am trying to use smartdenovo to assemble an invertebrate genome (700Mb genome size) used by 30X coverage from nanopore sequencing. I know that the final output is the .cns file, however, the N50 of the cns file is ~100Kb while we got 500Kb in the case of the dmo.lay file (with 5.4G total assembly size). So my questions are the following: Is this decrease is normal during the consensus step or not? Is there any way to use the dmo.lay file and try to get rid of the redundancy somehow?
Many thanks
Szabolcs
In smartdenovo.pl
, wtgbo
is used only in zmo overlapper mode. When using dmo overlapper in wtzmo
, is it necessary to use wtgbo
to rescue overlaps? I noticed that some overlaps between reads of high frequency, such as those derived from plastid, were missed.
Hi, does smartdenovo apply to PacBio HiFi Reads? Thank you very much!
Hi,
I run SMARTdenovo on some canu corrected reads. In the end it finished with this error message:
CA690_canu.mak:27: recipe for target 'CA690_canu.dmo.cns' failed
make: *** [CA690_canu.dmo.cns] Error 1
What can I do?
Cheers Gabi
Dear developpers,
First, thank you for your great tool, smartdenovo.
I launched this tools, which is very fast, awesome.
In your documentation, you said to run the following commands :
/path/to/smartdenovo/smartdenovo.pl -p prefix reads.fa > prefix.mak
make -f prefix.mak
That’s what I did but I don’t get a « .cns » file and I don’t have any lines that generate it in my make file.
My make file :
PREFIX=GW_smartdenovo
EXE_PRE=/path/to/smartdenovo/wtpre
EXE_ZMO=/path/to/smartdenovo/wtzmo
EXE_OBT=/path/to/smartdenovo/wtobt
EXE_GBO=/path/to/smartdenovo/wtgbo
EXE_CLP=/path/to/smartdenovo/wtclp
EXE_LAY=/path/to/smartdenovo/wtlay
EXE_CNS=/path/to/smartdenovo/wtcns
N_THREADS=10all:$(PREFIX).dmo.lay
$(PREFIX).fa.gz:
$(EXE_PRE) -J 5000 raw_reads.GW.fasta | gzip -c -1 > $ @
$(PREFIX).dmo.ovl:$ (PREFIX).fa.gz
$(EXE_ZMO) -t $ (N_THREADS) -i$(PREFIX).fa.gz -fo $ @ -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000
$(PREFIX).dmo.obt:$ (PREFIX).fa.gz $(PREFIX).dmo.ovl
$(EXE_CLP) -i $ (PREFIX).dmo.ovl -fo $@ -d 3 -k 300 -m 0.1 -FT
$(PREFIX).dmo.lay:$ (PREFIX).fa.gz$(PREFIX).dmo.obt $ (PREFIX).dmo.ovl
$(EXE_LAY) -i $ (PREFIX).fa.gz -b$(PREFIX).dmo.obt -j $ (PREFIX).dmo.ovl -fo $(PREFIX).dmo.lay -w 300 -s 200 -m 0.1 -r 0.95 -c 1
My outputs :
GW_smartdenovo.dmo.lay
GW_smartdenovo.dmo.lay.1.dot
GW_smartdenovo.dmo.lay.2.dot
GW_smartdenovo.dmo.lay.3.dot
GW_smartdenovo.dmo.lay.4.dot
GW_smartdenovo.dmo.lay.5.dot
GW_smartdenovo.dmo.lay.contained_reads
GW_smartdenovo.dmo.lay.dup
GW_smartdenovo.dmo.lay.lnk
GW_smartdenovo.dmo.lay.utg
GW_smartdenovo.dmo.lay.utg.dup
GW_smartdenovo.dmo.obt
GW_smartdenovo.dmo.ovl
GW_smartdenovo.dmo.ovl.contained
GW_smartdenovo.fa.gz
GW_smartdenovo.mak
Is it the file « .dmo.lay.utg » that is the final assembly ? Was there a polishing step ? Knowing that I have « quiver » available on the server on which I launched smartdenovo.
Thank you for your response.
Best,
Amandine
Dear colleague,
Greetings. I am trying your wtdbg2 v2.2 on my data. I have about two million long reads, and wtdbg2/wtpoa-cns successfully assemble them into several thousands of contigs. I however encountered some difficulties trying to correct the assembly using bam file generated using minimap2. The command that I issued is exactly the same as specified in the wiki, which is
samtools view profile.ctg.lay.map.srt | ./wtpoa-cns -t 16 -d profile.ctg.lay.fa -i - -fo profile.ctg.lay.2nd.fa -v
I added "-v" for more verbose output and see the program stops here
(omitted)
JOINT 222259 qe = 2005 -> 2005 te = 497 -> 497 [ 2005, 2015] [ 499, 496,1,2,0]
Segmentation fault (core dumped)
Please kindly let me know if you have any suggestions. Thank you.
Well,I have some corrected ont data and i compressed it for saving the storage.And now ,can I use the gz file as the input data ? I don't know the function of "wtpre"?Can I modify the mak file and skip the "wtpre" step? my mak file like this .
PREFIX=species
2
3 EXE_PRE=/smartdenovo/wtpre
4 EXE_ZMO=/smartdenovo/wtzmo
5 EXE_OBT=/smartdenovo/wtobt
6 EXE_GBO=/smartdenovo/wtgbo
7 EXE_CLP=/smartdenovo/wtclp
8 EXE_LAY=/smartdenovo/wtlay
9 EXE_CNS=/smartdenovo/wtcns
10 N_THREADS=20
11
12 all:$(PREFIX).dmo.lay
13 #$(PREFIX).fa.gz:
14 #
15
16
17
18
19
20
21
22
23
Thank you fo reply
i am getting the following error when i run the software, the following commands were used to run
/home/pbp/smartdenovo/smartdenovo.pl -p haemo3_smartdenovo -c 1 /home/pbp/Documents/run7/run7_filtered/PSHaemo3.fastq > /home/pbp/Documents/run7/run7_filtered/haemo3_smartdenovo.mak make -f haemo3_smartdenovo.mak
make -f pbp_smartdenovo.mak
make: pbp_smartdenovo.mak: No such file or directory
make: *** No rule to make target `pbp_smartdenovo.mak'. Stop.
help to resolve this issue
Hi,
A successful assembly would generate prefix.cns file in the same folder, But in my case, I am not finding this file, Here is the complete log, could you please help in understanding this.
PS: Before that, I am having PacBio Sequel data in fasta format (Quality filtered though).
== Message for debug ==
Sequence coverage statistic:
1046073 10824668 16129060 16678696 13701160 9843146 6814368 4611437 3192178 2359773
1663221 1197984 962899 740646 677049 652907 564440 538527 466340 412399
420808 370740 321274 308875 283371 255599 232553 245738 239084 226107
202768 188779 176133 167918 182581 155351 143535 131581 137333 127253
119882 120116 114681 113873 92210 109550 99129 91885 99879 94600
101932 91627 84404 83529 92875 86413 76855 69795 62975 66319
63799 66225 68975 69673 65977 68269 58591 56706 61120 57096
50637 60998 53594 52469 54132 54345 50195 48843 41447 46475
47198 52709 49962 52236 42375 47926 49525 49452 46554 45652
41429 36299 35484 35829 41193 41163 31880 27408 29407 34106
[Fri Jan 19 16:19:54 2018] output
[Fri Jan 19 16:19:54 2018] Done
~/smartdenovo/wtlay -i TestSample.fa.gz -b TestSample.zmo.obt -j TestSample.zmo.ovl.short -j TestSample.zmo.gbo.short -fo TestSample.zmo.lay -s 200 -m 0.6 -R -r 1 -c 1
[Fri Jan 19 16:19:54 2018] loading reads
[Fri Jan 19 16:20:54 2018] Done, 389105 reads
[Fri Jan 19 16:20:54 2018] loading reads obt information
[Fri Jan 19 16:20:54 2018] Done
[Fri Jan 19 16:20:54 2018] loading alignments
loaded 166589 overlaps
building edges
116045 fine overlaps
[Fri Jan 19 16:20:55 2018] Done
[Fri Jan 19 16:20:55 2018] calculating edge coverage ...
[Fri Jan 19 16:20:56 2018] removed 444 duplicate edges
[Fri Jan 19 16:20:56 2018] Done
[Fri Jan 19 16:20:56 2018] masked 27258 contained reads
[Fri Jan 19 16:20:56 2018] masked 14912 low coverage (<1) edges
[Fri Jan 19 16:20:56 2018] 'best_overlap' cut 100267 non-best edges
1951 tips, 203 bubbles, 2 chimera, 187 non-bog, 0 recoveries
[Fri Jan 19 16:20:56 2018] repair 2341 bog elements
67 tips, 1 bubbles, 0 chimera, 5 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 73 bog elements
7 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 7 bog elements
1 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 1 bog elements
0 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] generated 208703 unitigs
[Fri Jan 19 16:20:57 2018] recovered 878 edges inter unitigs
724 tips, 17 bubbles, 0 chimera, 15 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 756 bog elements
6 tips, 1 bubbles, 0 chimera, 1 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 8 bog elements
0 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:58 2018] generated 209101 unitigs
[Fri Jan 19 16:20:58 2018] recover 62 edges inter unitigs
[Fri Jan 19 16:21:33 2018] output 329 independent unitigs
[Fri Jan 19 16:22:02 2018] Done
Dear RuanJue,
I noticed in the end of wtclp step there is an estimation of genome size, e.g. in this case 1.0GBp (shown below), which is the size we expect for the genome. However the output from wtlay (with parameters: -w 300 -s 200 -m 0.1 -r 0.97 -c 1 ) has only 898Mb total assembly size - is there any way to uncover the missing ~100Mb (10%) sequence to see what they are?
Thank you in advance.
Regards, Chu Shin
--
Total aviable sequences: 51310363234 bp
Average Coverage(?): 51
Genome Size(?): 1006085553 bp
--
Hi,
Thanks for the great tool.
I have messed around with smartdenovo on a few datasets using the following:
~/software/smartdenovo/smartdenovo/smartdenovo.pl -c 1 data/all_subreads.fasta > wtasm.mak
make -f wtasm.mak
I have messed around with providing all reads and varying degrees of higher quality subsets of reads. It seems like smartdenovo is very sensitive to read quality in ways other tools are not. For example, Miniasm and Abruijn do better with all or most of the reads whereas smartdenovo does better with higher quality subsets of them.
Are there any parameters I should tweak to help smartdenovo exploit all the reads?
best,
John
We are assembling a large genome (close to 2 Gb). We have corrected the reads with canu and assembled them with smartdenovo. The assembly size is 16% lower than expected. Some chromosomes share large sub-sequences which are assembled only once and collected twice the average read depth when we realign the reads on the assembly. We think that these parts have not different enough sequences to be separated by smartdenovo with standard parameters.
Have you already seen this?
Which parameter(s) should we change to try to separate both parts?
Hello ruanjue,
I have been actively testing smartdenovo using our own ONT data (on S.cerevisiae with 200X coverage) recently. The final assembly looks good in general except one of the chromosome (chrIII) is completely missing in the final assembly. All the other chromosomes look good even for the most difficult one, chrXII. I obtained the same results with multiple independent runs on different machines, so I guess this must be related with smartdenovo. So I was wondering if you have some suggestions about this. I can share my input reads for your testing if you can send me your email address. Another observation is that smartdenovo seems to consume quite a lot memory in certain intermediate steps. Do you have some suggestions about this? Thanks in advance!
Best,
Jia-Xing
Hi,
I just tried to download and compile smartdenovo using your command lines:
git clone https://github.com/ruanjue/smartdenovo.git
cd smartdenovo
make
but the compilation endend on:
/tmp/ccBHOwYi.o: In function `merge_bubbles_strgraph':
wtlay.c:(.text+0x984d): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x9861): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x992d): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x994a): undefined reference to `cut_biedge_strgraph2'
collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1
make: *** Waiting for unfinished jobs....
Do you have any idea of what could resolve this issue ?
(I am running on an ubuntu 18.04 up to date)
Yo
Hi
I got the .cns output using the smartdenovo assembly of nanopore reads.
I want to convert them into FASTA format for the QUAST assessment. How do I convert the .cns to FASTA format?
Furthermore, It will be a great help if you could let me know which program should be good to use for contiguity and sequence identity (error rate of nanopore assembly) assessment against reference genome.
Thanks
sam
Hi...
while trying to assemble with 20 Gb ONT data I'm getting the error called Illegal instruction.
I'm not sure what is this error means and how to tackle this. Could u please help me in fixing this?
[peru@juncea]$ perl ~/anaconda3/pkgs/smartdenovo-1.0.0-0/bsta > sdn_Bn6zz.mak
[peru@juncea]$ make -f sdn_Bn6zz.mak
make: Warning: File sdn_Bn6zz.mak' has modification time 3.5e+03 s in the future ~/anaconda3/pkgs/smartdenovo-1.0.0-0/bin/wtpre -J 5000 Bn6_QC30x.fasta ~/smartdenovo-1.0.0-0/bin/wtzmo -t 8 -i sdn_Bn6zz.fa.gzn_Bn6zz.dmo.ovl -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000 [Wed Sep 13 19:57:43 2017] loading long reads [Wed Sep 13 20:02:23 2017] Done, 1672538 reads (length >= 0) [Wed Sep 13 20:02:31 2017] sorted sequences by length dsc [Wed Sep 13 20:02:31 2017] calculating overlaps, 8 threads [Wed Sep 13 20:02:31 2017] indexing 1/1 [Wed Sep 13 20:02:31 2017] - scanning kmers (16 bp) 1672538 reads [Wed Sep 13 20:15:24 2017] - high frequency kmer depth is set to 2405 [Wed Sep 13 20:15:25 2017] - average kmer depth = 481 [Wed Sep 13 20:15:25 2017] - 159317 high frequency kmers (>=2405) [Wed Sep 13 20:15:25 2017] - indexing 7015528 kmers 1672538 reads [Wed Sep 13 20:29:32 2017] Done [Wed Sep 13 20:29:32 2017] querying 1/1 **000000000000 0make: *** [sdn_Bn6zz.dmo.ovl] Illegal instruction make: *** Deleting file
sdn_Bn6zz.dmo.ovl'**
Thanks
sam
Hi Can smartdenovo be resumed if failed?
Hi Ruanjue
I am assembling a big genome (>20Gbp), I would like to run wtzmo parallelly, i.e. 10 parts, can you give an example about how to set parameters? And if I run wtzmo parallelly, will the memory usage reduce?
Thank you very much.
Should nano pores reads be corrected by canu or other corrector before using smartdenovo
Hello all,
Was wondering if samartdenovo does hybrid assemblies of Pacbio and Nanopore ?
Or is it just either Pacbio or Nanopore?
Can someone please help me with this ?
Thanks heaps in advance
Saila
when make in fold smartdenovo, errors were showed as follows:
/tmp/ccTfg5aF.o: In function merge_bubbles_strgraph': wtlay.c:(.text+0x984d): undefined reference to
cut_biedge_strgraph2'
wtlay.c:(.text+0x9861): undefined reference to cut_biedge_strgraph2' wtlay.c:(.text+0x992d): undefined reference to
cut_biedge_strgraph2'
wtlay.c:(.text+0x994a): undefined reference to `cut_biedge_strgraph2'
collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1
Dear community,
The -c parameter is used to generate the consensus sequence. And I still not understand the meaning of value with -c parameter. For example:
smartdenovo.pl -c 1 -J 1000 -k ${k} -p Crystal_c1s1_${k}mer ../Crystal_all_1kb.correctedReads.fasta > Crystal_v1.8_1kb_17mer.mak
I don't know what's the difference if I changed -c 1 to -c 2, or -c 5. (Any more value? like -2?)
So far I have Nanopore corrected reads, and I want to use Smartdenovo to do assembly.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.