ruanjue / smartdenovo Goto Github PK

View Code? Open in Web Editor NEW

127.0 15.0 29.0 347 KB

Ultra-fast de novo assembler using long noisy reads

License: GNU General Public License v3.0

Makefile 0.28% Perl 1.64% C 95.20% C++ 2.70% Shell 0.18%

pacbio assembler

smartdenovo's People

Contributors

Stargazers

Watchers

smartdenovo's Issues

Discrepancies genome size

Hi,

Thanks for developing smartdenovo and wtdbg2. I am actually working on a very heterozygous insect genome. The smartdenovo log file returns a genome size of around 400 Mb (which is close to the estimated haploid genome size) whereas the consensus (cns) sequence is around 180Mb. Is it possible to parse the layout files in order to retrieve all FASTA sequences belonging to the 400Mb genome assembly?

P.S.: I am also trying to optimise wtdbg2 parameters on my read data (right now I get rather messy results).

Thank you very much in advance,
Ben

.gfa file produced

Hi
Is there any way that we can get .gfa file of assembly?

how to adjusting parameters to improve the assembly result

hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 82k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!

Compilation error

Hello,

I encountered the following error when compiling smartdenovo. Could you help to diagnose what might have caused this problem? Thanks!
....
wtzmo.c:91:1: note: in expansion of macro ‘define_list’
define_list(pbreadv, pbread_t);
^~~~~~~~~~~
wtzmo.c: In function ‘push_long_read_wtzmo’:
wtzmo.c:210:2: warning: incompatible implicit declaration of built-in function ‘memcpy’
memcpy(ptr, name, name_len);
^~~~~~
wtzmo.c:210:2: note: include ‘<string.h>’ or provide a declaration of ‘memcpy’
wtzmo.c: In function ‘thread_midx_func’:
wtzmo.c:247:1: warning: incompatible implicit declaration of built-in function ‘memset’
memset(&U, 0, sizeof(hzmh_t));
^~~~~~
wtzmo.c:247:1: note: include ‘<string.h>’ or provide a declaration of ‘memset’
wtzmo.c: In function ‘thread_mzmo_func’:
wtzmo.c:789:1: warning: incompatible implicit declaration of built-in function ‘memset’
memset(&SEED[0], 0, sizeof(wt_seed_t));
^~~~~~
wtzmo.c:789:1: note: include ‘<string.h>’ or provide a declaration of ‘memset’
wtzmo.c:1058:15: warning: implicit declaration of function ‘strdup’ [-Wimplicit-function-declaration]
HIT.cigar = strdup(cigar_str->string);
^~~~~~
wtzmo.c:1058:15: warning: incompatible implicit declaration of built-in function ‘strdup’
In file included from wtzmo.c:25:0:
wtzmo.c: In function ‘main’:
file_reader.h:97:30: warning: implicit declaration of function ‘ref_VStrv’; did you mean ‘ref_diagv’? [-Wimplicit-function-declaration]
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:28: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:28: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:53: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1738:79: note: in expansion of macro ‘get_col_str’
set_read_clip_wtzmo(wt, get_col_str(fr, 0), atoi(get_col_str(fr, 1)), atoi(get_col_str(fr, 2)));
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1766:43: note: in expansion of macro ‘get_col_str’
if((pb1 = kv_get_cuhash(wt->rdname2id, get_col_str(fr, 0))) == 0xFFFFFFFFU) continue;
^~~~~~~~~~~
file_reader.h:97:56: error: invalid type argument of ‘->’ (have ‘int’)
#define get_col_str(fr, col) ref_VStrv((fr)->tabs, col)->string
^
wtzmo.c:1767:43: note: in expansion of macro ‘get_col_str’
if((pb2 = kv_get_cuhash(wt->rdname2id, get_col_str(fr, 1))) == 0xFFFFFFFFU) continue;
^~~~~~~~~~~
Makefile:33: recipe for target 'wtzmo' failed
make: *** [wtzmo] Error 1

Best,
Jia-Xing

Died at ./smartdenovo-master/smartdenovo.pl line 23.

Hi,
I had downloaded smartdenovo，and running the command：

-bash-4.1$ ./smartdenovo-master/smartdenovo.pl -c 1 ~/scsio/Nanopore/Data/N_all_filt.fq > wtasm.mak
but the error coming:
Died at ./smartdenovo-master/smartdenovo.pl line 23.
I don't know how to deal with it. Could you tell me how to do about it?
Thank you.

Does it support multiple input files like input*.fa ?

Does it support multiple input files like input*.fa.gz ?
How to provide multiple input files?
For ex I have:
input1.fa
input2.fa
input3.fa

How the command will look like?

Kmer selection

Dear Ruanjue,

If I understand correctly, smartdenovo works very different from DBG assemblers, is kind of OLC assembler. So i am not sure if kmer selection should be done based on Jelleyfish or Kmergenie. Therefore, how to select kmer size for smartdenovo?

Sincrely,
panpan

It would be good to have an actual release of this.

Making a release with an actual version number makes it easier to know exactly what has been installed on a system. Especially HPC sysadmins like versioned software.

Error while trying to run smartdenovo

Hi, I am trying to run smartdenovo but I keep on getting this output

`ubuntu@biolinux:/mnt/Federico/TD_1/Smartdenovo_assembly$ '/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/smartdenovo.pl' /mnt/Federico/TD_1/basecalled.fasta
PREFIX=wtasm

EXE_PRE=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtpre
EXE_ZMO=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtzmo
EXE_OBT=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtobt
EXE_GBO=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtgbo
EXE_CLP=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtclp
EXE_LAY=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtlay
EXE_CNS=/home/ubuntu/miniconda3/pkgs/smartdenovo-1.0.0-pl5.22.0_1/bin/wtcns
N_THREADS=8

all:$(PREFIX).dmo.lay

$(PREFIX).fa.gz:
$(EXE_PRE) -J 5000 /mnt/Federico/TD_1/basecalled.fasta | gzip -c -1 > $@

$(PREFIX).dmo.ovl:$(PREFIX).fa.gz
$(EXE_ZMO) -t $(N_THREADS) -i $(PREFIX).fa.gz -fo $@ -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000

$(PREFIX).dmo.obt:$(PREFIX).fa.gz $(PREFIX).dmo.ovl
$(EXE_CLP) -i $(PREFIX).dmo.ovl -fo $@ -d 3 -k 300 -m 0.1 -FT

$(PREFIX).dmo.lay:$(PREFIX).fa.gz $(PREFIX).dmo.obt $(PREFIX).dmo.ovl
$(EXE_LAY) -i $(PREFIX).fa.gz -b $(PREFIX).dmo.obt -j $(PREFIX).dmo.ovl -fo $(PREFIX).dmo.lay -w 300 -s 200 -m 0.1 -r 0.95 -c 1
`

If I include the output > genome.mak nothing happens, Any ideas? Thanks

failed to `make wtcorr `

I want to test wtcorr. However, it failed to compile:

$ make wtcorr
gcc -W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3 -o wtcorr wtcorr.c file_reader.c -lm -lpthread
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:388:1: error: invalid suffix "hashcode" on integer constant
wtcorr.c:422:1: warning: "E" redefined
wtcorr.c:385:1: warning: this is the location of the previous definition
wtcorr.c: In function 'get_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: (Each undeclared identifier is reported only once
wtcorr.c:425: error: for each function it appears in.)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'prepare_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'exists_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'add_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'remove_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'encap_dphash':
wtcorr.c:425: error: 'gdpv' undeclared (first use in this function)
wtcorr.c:425: error: expected expression before ')' token
wtcorr.c: In function 'last_base_dp_kmer':
wtcorr.c:554: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:554: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'trace_aln_paths':
wtcorr.c:565: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:565: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'count_covered_qmers':
wtcorr.c:584: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:584: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'call_correct_seq':
wtcorr.c:624: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:624: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:636: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:636: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:644: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:651: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:651: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:659: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:659: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:681: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'init_bf_kmer_dbgaln':
wtcorr.c:689: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:689: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'print_dp_kmers':
wtcorr.c:724: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:724: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c: In function 'dbg_aln_core':
wtcorr.c:757: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:757: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:757: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:772: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:776: error: 'dbg_aligner' has no member named 'smin'
wtcorr.c:782: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:782: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:784: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:784: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:785: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:785: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:788: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:788: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:793: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:793: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:795: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:795: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:796: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:796: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:804: warning: implicit declaration of function 'process_cached_dps_dbgaln'
wtcorr.c:818: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:822: error: 'dbg_dp_t' has no member named 'fw_idx'
wtcorr.c:823: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:823: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:826: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:829: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:834: error: 'dc' undeclared (first use in this function)
wtcorr.c:834: warning: implicit declaration of function 'prepare_dbgcachehash'
wtcorr.c:834: error: 'dbg_aligner' has no member named 'gcache'
wtcorr.c:834: error: 'dbg_cache_t' undeclared (first use in this function)
wtcorr.c:834: error: expected ')' before '{' token
wtcorr.c:835: error: 'dc_exists' undeclared (first use in this function)
wtcorr.c:837: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:838: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:854: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:868: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:871: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:874: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:874: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:875: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:881: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:895: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:897: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:899: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:899: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:900: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:906: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:918: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:921: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:921: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:922: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:922: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:928: error: 'dbg_dp_t' has no member named 'link'
wtcorr.c:752: warning: unused variable 'qpos'
wtcorr.c:752: warning: unused variable 'found'
wtcorr.c:752: warning: unused variable 'kc_exists'
wtcorr.c:750: warning: unused variable 'path'
wtcorr.c:750: warning: unused variable 'pidx'
wtcorr.c:750: warning: unused variable 'didx'
wtcorr.c:749: warning: unused variable 'BR'
wtcorr.c:749: warning: unused variable 'BK'
wtcorr.c:749: warning: unused variable 'bk'
wtcorr.c:747: warning: unused variable 'kc'
wtcorr.c:746: warning: unused variable 'dp1'
wtcorr.c: In function 'dbg_aln':
wtcorr.c:940: warning: implicit declaration of function 'clear_dbgcachehash'
wtcorr.c:940: error: 'dbg_aligner' has no member named 'gcache'
wtcorr.c:947: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:948: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:972: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:973: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:998: error: 'dbg_dp_t' has no member named 'aux1'
wtcorr.c:999: error: 'dbg_dp_t' has no member named 'aux2'
wtcorr.c:1001: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:1001: error: 'dbg_dp_t' has no member named 'k'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1005: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1007: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1007: error: 'dbg_aligner' has no member named 'qmaxs'
wtcorr.c:1010: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1011: error: 'dbg_aligner' has no member named 'qtop'
wtcorr.c:1011: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1012: error: 'dbg_aligner' has no member named 'last_cached_pos'
wtcorr.c:1044: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1045: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1063: error: 'dbg_dp_t' has no member named 'mat'
wtcorr.c:1080: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1080: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c: In function 'main':
wtcorr.c:1399: error: 'DBG_MAX_BT_IDX' undeclared (first use in this function)
wtcorr.c:1423: error: 'dbg_aligner' has no member named 'smin'
wtcorr.c:1450: error: 'dbg_dp_t' has no member named 'qpos'
wtcorr.c:1456: error: 'dbg_dp_t' has no member named 'qpos'
make: *** [wtcorr] Error 1

How to do with it?

installation error

Hi,
I am trying to install smartdenovo for my analysis but it failed with error when i do the make command

This the error i am getting, help me in fixing this

collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1

wtzmo中的参数

hello，ruanjue！
wtzmo中的参数 -k：为什么设置5-32的范围，大于32或者放开限制会怎样？

prefix.dmo.lay.utg and contig.fasta

Dear community,
The -c parameter is used to generate the consensus sequence，but it is too slow.
Can I use prefix.dmo.lay.utg as the final contig.fasta ？

Thank you.

citation

Hi,

Thank you for this useful tool.

I was wondering how you would like it to be cited.

best,
John

Performance improvement

Hi Ruan,
Thanks for providing this fast assembly program.
I am using smartdenovo to assemble an insect genome (350 Mb estimated genome size but expected highly heterozygous), with 170 X nanopore raw reads. The first round of smartdenovo resulted in a 672 Mb genome assembly with N50 240 Kb. I am wondering which parameters should I tune to improve the assembly?
In addition, I got 34.4 Mb sequences in prefix.dmo.cns file, which is far less the estimated genome size. Is there anything I did wrong?
Looking forward to your reply!

chloroplast genome totally lost

Dear Ruanjue, I have used defult paramater for assembling a plant genome, the mitochondrial genome (~400kb) is assembled partly but the chloroplast genome (150kb) is totally lost, do you have any suggestion like adjust some paramater?
Best,
panpan

[SMDS849.dmo.ovl] Bus error (core dumped)

Dear SMARTdenovo,

Hope this email finds you well.
While I was testing the program for a PacBio data (genome size 2.5Gb) in PBSpro environment, I have bumped into the same issue constantly at the "[SMDS849.dmo.ovl] Bus error (core dumped)".
There was a bus error on SMDS849.dmo.ovl. FYI, please see below for the output file.

Looking forward to your reply!

Regards,

Taek

SMD_SRX849_Output.txt

the output file is null

I run the smartdenovo.pl,but the output file is null ,and I don't know why,my command is as follows:
/data/wuxiaopei/software/smartdenovo/smartdenovo/smartdenovo.pl –c 1 –p SMARTdenovo_longest_50X longest_50X.fasta > SMARTdenovo_longest_50X.mak
make -f SMARTdenovo_longest_50X.mak

the output is :

Assembly of the mitochondrion

Hi,
I used SMARTdenovo on a yeast strain to assemble it with Nanopore reads. I obtained a very good continuity but in the output *.dmo.cns, I can only find a part of the mitochondrion.

If I align the *.utg.dup against the reference genome, I can see that the mitochondrion is present in its entirety. I think that maybe SMARTdenovo is rejecting parts of the mitochondrion due to a very high coverage, since it's present in unitigs but not in contigs.

I tried to decrease the coverage of reads given as input to SMARTdenovo but didn't obtain better results.

Do you have any tips concerning the parameters I could use to get the mitochondrion assembled in the consensus file ?

so looong!!

Hello,
I wonder why smartdenovo is so looong ??!!
I have reserved 10 CPU and 100G per CPU. I have 52G of data (ONTreads >=10Kb) and launched
by default parametters using build_conda_envs/3f08aa12/bin/wtzmo -t 8 -i SMART.fa.gz -fo SMART.dmo.ovl -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000
It has beeing running from 10 days :S
At the moment, I have 55G of SMART.dmo.ovl and 17G of SMART.fa.gz.
Do you think it will be finish a day? How to estimate how many time is required to smartdenovo assembly?
Any idea?
Thanks.
Julie

aarch64 platform support

I want use it on my aarch64(arm64) platform.
It means I must use SSE2NEON.h instead of emmintrin.h in ksw.c.
and the arm version gcc cannot use "-mpopcnt" and "-mssse3".

So we need this patch for aarch64 platform:

diff --git a/Makefile b/Makefile
index 0802f65..3816b6e 100644
--- a/Makefile
+++ b/Makefile
@@ -2,9 +2,9 @@ VERSION=1.0.0
 MINOR_VER=20140314
 CC=gcc
 ifdef DEBUG
-CFLAGS=-g3 -W -Wall -O0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3
+CFLAGS=-g3 -W -Wall -O0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 else
-CFLAGS=-W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -mpopcnt -mssse3
+CFLAGS=-W -Wall -O4 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 endif
 INSTALLDIR=/usr/local/bin
 GLIBS=-lm -lpthread
diff --git a/ksw.c b/ksw.c
index 15dd0f2..22641ed 100644
--- a/ksw.c
+++ b/ksw.c
@@ -25,7 +25,7 @@

 #include <stdlib.h>
 #include <stdint.h>
-#include <emmintrin.h>
+#include "SSE2NEON.h"
 #include "ksw.h"

 #ifdef USE_MALLOC_WRAPPERS

Is there any one can help us merge this feature to smartdenovo ?

polishing Smartdenovo assembly

Hello,
I have a couple of questions.

I was thinking of using Racon to polish the smartdenovo assembly. Racon suggests to use these reads:
racon [options ...]
input file in FASTA/FASTQ format (can be compressed with gzip) containing sequences used for correction input file in MHAP/PAF/SAM format (can be compressed with gzip) containing overlaps between sequences and target sequences input file in FASTA/FASTQ format (can be compressed with gzip) containing sequences which will be corrected

for the first sequences, I can give illumina reads as the input , I am not sure what would be the best input for the second sequnces as smartdenovo does not make any mhap files like other assemblers like Canu and for the third sequences I will input the .cns file generated by smartdenovo assembly.

Can you please help me out what needs to go in the input files for the overlaps?
2) is there a better polishing tool other than Racon for Smartdenovo assembly? Note : I have used long reads from both PacBio and Nanopore for my assembly

Thanks heaps in advance
S

How does a SMARTdenovo output file look like

Hello All,
I am trying to do a smartdenovo assembly on my PacBio and Nanopore MinION and PromethION generated data. I got upto the .cns files and then I am not sure what is meant to happen from there. I was hoping a summary file with the number of contigs and the N50 all that would be generated but could not find any of it. Can you suggest how do i look for it, please?
I use the script
/group/pasture/Saila/Smartdenovo/smartdenovo-master/smartdenovo.pl -p saila_smartdenovo -c 1 /group/pasture/Saila/Smartdenovo/all_fastq_files/all_fastq_files_smartdenovo.fastq > saila_smartdenovo.mak

make -f saila_smartdenovo.mak

The type of files I got so far are
.cns.log
.cns
.lay.utg.dup
.utg
.lnk
.dup
.lay
.lay.1/2/3/4/5.dot
lay.contained_reads
.dmo.obt
.ovl.contained
.ovl

And no summary or fasta files. So I am a bit confused about it.

Thanks
Saila

No .cns file

Dear SMARTdenovo,

Hope this email finds you well. I assume “smartdenovo.pl” includes all these programs: wtzmo, wtgbo, wtclp, ctcns, and wtmsa.

After running this script
/smartdenovo/smartdenovo.pl -p prefix reads.fa > prefix.mak
make -f prefix.mak

It was supposed to generate “prefix.cns” but I could not find it. The generated files are .lay, .lay.dup, .lay.utg, .lay.utg.dup, and dmo.ovl.

Did I miss something?

Looking forward to your reply!

Best regards,

Taek

dump core while running wtlay

Hi,

I assemble a 1Gb genome using 17x of canu corrected reads. smartdenovo crashes at the last step : wtlay.

gdb /usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay core.180804
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay...(no debugging symbols found)...done.
[New Thread 180804]
Missing separate debuginfo for
Try: yum --disablerepo='' --enablerepo='-debug*' install /usr/lib/debug/.build-id/f7/95efbe6950d1523c5748594c166cedd4254c33
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `/usr/local/bioinfo/src/smartdenovo/smartdenovo/wtlay -i All7.fa.gz -b All7.dmo.'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004166b0 in merge_bubble_core_best_overlap_strgraph ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64

What could cause the crash?
What could I test to go further?

Ch+

The meaning of each output file

I'm interested in your tool after watching the report of Bjorn Usadel at 2017 London Calling.
I'm assembling an animal genome which is repeat-rich. I haven't got good assembly yet and want to test your tool. After running your tool following the steps you write in your github homepage, I got following files and don't know the meaning of some of files. Also I didn't see .cns file.

The following is the files generated.

wtasm.dmo.lay
wtasm.dmo.lay.dup
wtasm.dmo.lay.lnk
wtasm.dmo.lay.utg
wtasm.dmo.lay.utg.dup
wtasm.dmo.lay.4.dot
wtasm.dmo.lay.5.dot
wtasm.dmo.lay.3.dot
wtasm.dmo.lay.2.dot
wtasm.dmo.lay.contained_reads
wtasm.dmo.lay.1.dot
wtasm.dmo.obt
wtasm.dmo.ovl
wtasm.dmo.ovl.contained
wtasm.fa.gz

what's the meaning of :
wtasm.dmo.lay.dup
wtasm.dmo.lay.lnk
wtasm.dmo.lay.utg
wtasm.dmo.lay.utg.dup
how to generate .cns file.

Thanks!

output fastq from wtcns

Dear ruanjue,

May i ask can i get fastq file from wtcns? I need fastq file for downstream analysis such as reads alignment. From now, i can only obtain fasta from wtcns.

Best,
panpan

Nanopore sequence commands

Hi,
I would like to assemble the nanopore plant mitochondrial genome sequence using SMARTdenonovo. But I could not find any commands to run this data. Could you please give commands for the same.

Thank you.

dmo.lay.

Hi,

I am trying to use smartdenovo to assemble an invertebrate genome (700Mb genome size) used by 30X coverage from nanopore sequencing. I know that the final output is the .cns file, however, the N50 of the cns file is ~100Kb while we got 500Kb in the case of the dmo.lay file (with 5.4G total assembly size). So my questions are the following: Is this decrease is normal during the consensus step or not? Is there any way to use the dmo.lay file and try to get rid of the redundancy somehow?

Many thanks

Szabolcs

why is wtgbo skipped when using dmo overlapper in smartdenovo.pl?

In smartdenovo.pl, wtgbo is used only in zmo overlapper mode. When using dmo overlapper in wtzmo, is it necessary to use wtgbo to rescue overlaps? I noticed that some overlaps between reads of high frequency, such as those derived from plastid, were missed.

Does smartdenovo apply to PacBio HiFi Reads?

Hi, does smartdenovo apply to PacBio HiFi Reads? Thank you very much!

SMARTdenovo failure

Hi,
I run SMARTdenovo on some canu corrected reads. In the end it finished with this error message:
CA690_canu.mak:27: recipe for target 'CA690_canu.dmo.cns' failed
make: *** [CA690_canu.dmo.cns] Error 1

What can I do?
Cheers Gabi

Generate consensus fasta file with Smartdenovo

Dear developpers,

First, thank you for your great tool, smartdenovo.

I launched this tools, which is very fast, awesome.

In your documentation, you said to run the following commands :

/path/to/smartdenovo/smartdenovo.pl -p prefix reads.fa > prefix.mak
make -f prefix.mak

That’s what I did but I don’t get a « .cns » file and I don’t have any lines that generate it in my make file.

My make file :

PREFIX=GW_smartdenovo

EXE_PRE=/path/to/smartdenovo/wtpre
EXE_ZMO=/path/to/smartdenovo/wtzmo
EXE_OBT=/path/to/smartdenovo/wtobt
EXE_GBO=/path/to/smartdenovo/wtgbo
EXE_CLP=/path/to/smartdenovo/wtclp
EXE_LAY=/path/to/smartdenovo/wtlay
EXE_CNS=/path/to/smartdenovo/wtcns
N_THREADS=10

all:$(PREFIX).dmo.lay

$(PREFIX).fa.gz:
$(EXE_PRE) -J 5000 raw_reads.GW.fasta | gzip -c -1 > $@

$(PREFIX).dmo.ovl:$(PREFIX).fa.gz
$(EXE_ZMO) -t $(N_THREADS) -i $(PREFIX).fa.gz -fo $@ -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000

$(PREFIX).dmo.obt:$(PREFIX).fa.gz $(PREFIX).dmo.ovl
$(EXE_CLP) -i $(PREFIX).dmo.ovl -fo $@ -d 3 -k 300 -m 0.1 -FT

$(PREFIX).dmo.lay:$(PREFIX).fa.gz $(PREFIX).dmo.obt $(PREFIX).dmo.ovl
$(EXE_LAY) -i $(PREFIX).fa.gz -b $(PREFIX).dmo.obt -j $(PREFIX).dmo.ovl -fo $(PREFIX).dmo.lay -w 300 -s 200 -m 0.1 -r 0.95 -c 1

My outputs :

GW_smartdenovo.dmo.lay
GW_smartdenovo.dmo.lay.1.dot
GW_smartdenovo.dmo.lay.2.dot
GW_smartdenovo.dmo.lay.3.dot
GW_smartdenovo.dmo.lay.4.dot
GW_smartdenovo.dmo.lay.5.dot
GW_smartdenovo.dmo.lay.contained_reads
GW_smartdenovo.dmo.lay.dup
GW_smartdenovo.dmo.lay.lnk
GW_smartdenovo.dmo.lay.utg
GW_smartdenovo.dmo.lay.utg.dup
GW_smartdenovo.dmo.obt
GW_smartdenovo.dmo.ovl
GW_smartdenovo.dmo.ovl.contained
GW_smartdenovo.fa.gz
GW_smartdenovo.mak

Is it the file « .dmo.lay.utg » that is the final assembly ? Was there a polishing step ? Knowing that I have « quiver » available on the server on which I launched smartdenovo.

Thank you for your response.
Best,
Amandine

wtpoa-cns bam correction stage segmentation fault

Dear colleague,
Greetings. I am trying your wtdbg2 v2.2 on my data. I have about two million long reads, and wtdbg2/wtpoa-cns successfully assemble them into several thousands of contigs. I however encountered some difficulties trying to correct the assembly using bam file generated using minimap2. The command that I issued is exactly the same as specified in the wiki, which is

samtools view profile.ctg.lay.map.srt | ./wtpoa-cns -t 16 -d profile.ctg.lay.fa -i - -fo profile.ctg.lay.2nd.fa -v

I added "-v" for more verbose output and see the program stops here

(omitted)
JOINT 222259 qe = 2005 -> 2005 te = 497 -> 497 [ 2005, 2015] [ 499, 496,1,2,0]
Segmentation fault (core dumped)

Please kindly let me know if you have any suggestions. Thank you.

Does it support fa.gz as the input?

Well,I have some corrected ont data and i compressed it for saving the storage.And now ,can I use the gz file as the input data ? I don't know the function of "wtpre"?Can I modify the mak file and skip the "wtpre" step? my mak file like this .

PREFIX=species
2
3 EXE_PRE=/smartdenovo/wtpre
4 EXE_ZMO=/smartdenovo/wtzmo
5 EXE_OBT=/smartdenovo/wtobt
6 EXE_GBO=/smartdenovo/wtgbo
7 EXE_CLP=/smartdenovo/wtclp
8 EXE_LAY=/smartdenovo/wtlay
9 EXE_CNS=/smartdenovo/wtcns
10 N_THREADS=20
11
12 all:$(PREFIX).dmo.lay
13 #$(PREFIX).fa.gz:
14 # $(EXE_PRE) -J 5000 species.correctedReads.fasta.gz | gzip -c -1 > $@
15
16 $(PREFIX).dmo.ovl:$(PREFIX).fa.gz
17 $(EXE_ZMO) -t $(N_THREADS) -i $(PREFIX).fa.gz -fo $@ -k 21 -z 10 -Z 16 -U -1 -m 0.1 -A 1000
18
19 $(PREFIX).dmo.obt:$(PREFIX).fa.gz $(PREFIX).dmo.ovl
20 $(EXE_CLP) -i $(PREFIX).dmo.ovl -fo $@ -d 3 -k 300 -m 0.1 -FT
21
22 $(PREFIX).dmo.lay:$(PREFIX).fa.gz $(PREFIX).dmo.obt $(PREFIX).dmo.ovl
23 $(EXE_LAY) -i $(PREFIX).fa.gz -b $(PREFIX).dmo.obt -j $(PREFIX).dmo.ovl -fo $(PREFIX).dmo.lay -w 300 -s 200 -m 0.1 -r 0.95 -c 1

Thank you fo reply

error while running smartdenovo

i am getting the following error when i run the software, the following commands were used to run
/home/pbp/smartdenovo/smartdenovo.pl -p haemo3_smartdenovo -c 1 /home/pbp/Documents/run7/run7_filtered/PSHaemo3.fastq > /home/pbp/Documents/run7/run7_filtered/haemo3_smartdenovo.mak make -f haemo3_smartdenovo.mak
make -f pbp_smartdenovo.mak
make: pbp_smartdenovo.mak: No such file or directory
make: *** No rule to make target `pbp_smartdenovo.mak'. Stop.
help to resolve this issue

Final Assembly File (*.cns) is not generated

Hi,

A successful assembly would generate prefix.cns file in the same folder, But in my case, I am not finding this file, Here is the complete log, could you please help in understanding this.

PS: Before that, I am having PacBio Sequel data in fasta format (Quality filtered though).

$ make -f TestSample.mak
make: Warning: File `TestSample.mak' has modification time 18 s in the future
~/smartdenovo/wtpre -J 5000 Reads.fasta | gzip -c -1 > TestSample.fa.gz
~/smartdenovo/wtzmo -t 10 -i TestSample.fa.gz -fo - -k 17 -s 200 -m 0.6 | cut -f1-16 > TestSample.zmo.ovl.short
[Fri Jan 19 12:47:16 2018] loading long reads
[Fri Jan 19 12:48:39 2018] Done, 389105 reads (length >= 0)
[Fri Jan 19 12:48:41 2018] sorted sequences by length dsc
[Fri Jan 19 12:48:41 2018] calculating overlaps, 10 threads
[Fri Jan 19 12:48:41 2018] indexing 1/1
[Fri Jan 19 12:48:41 2018] - scanning kmers (17 bp)
389105 reads
[Fri Jan 19 12:52:26 2018] - high frequency kmer depth is set to 155
[Fri Jan 19 12:52:26 2018] - average kmer depth = 31
[Fri Jan 19 12:52:26 2018] - 402525 high frequency kmers (>=155)
[Fri Jan 19 12:52:26 2018] - indexing 20926250 kmers
389105 reads
[Fri Jan 19 12:55:53 2018] Done
[Fri Jan 19 12:55:53 2018] querying 1/1
000000025600 320978
000000025800 322253
000000025900 323051
000000130400 715924
000000137500 727353^[^R
progress: 389105 833476 100.00%, 105970.32 CPU seconds
[Fri Jan 19 15:45:47 2018] Done
~/smartdenovo/wtgbo -t 10 -i TestSample.fa.gz -j TestSample.zmo.ovl.short -fo - | cut -f1-16 > TestSample.zmo.gbo.short
[Fri Jan 19 15:45:47 2018] loading reads
[Fri Jan 19 15:47:00 2018] Done, 389105 reads
[Fri Jan 19 15:47:00 2018] No obt information

[Fri Jan 19 15:47:00 2018] iteration 1
[Fri Jan 19 15:47:00 2018] loading alignments
loaded 833476 overlaps
building edges
380774 fine overlaps
[Fri Jan 19 15:47:02 2018] Done
[Fri Jan 19 15:47:02 2018] calculating edge coverage ...
[Fri Jan 19 15:47:03 2018] removed 30 duplicate edges
[Fri Jan 19 15:47:03 2018] Done
[Fri Jan 19 15:47:03 2018] masked 85365 contained reads
[Fri Jan 19 15:47:03 2018] masked 71350 low coverage (<1) edges
[Fri Jan 19 15:47:03 2018] 'best_overlap' cut 400653 non-best edges
[Fri Jan 19 15:47:03 2018] graph based overlapping
[Fri Jan 19 16:09:58 2018] 389105
[Fri Jan 19 16:09:58 2018] 281326 candidates
[Fri Jan 19 16:09:58 2018] Done, 44582 new overlaps
[Fri Jan 19 16:09:58 2018] anchoring based overlapping
[Fri Jan 19 16:18:53 2018] 389105
[Fri Jan 19 16:18:53 2018] 45340 candidates
[Fri Jan 19 16:18:53 2018] Done, 9790 new overlaps

[Fri Jan 19 16:18:53 2018] iteration 2
[Fri Jan 19 16:18:53 2018] bulding edges
building edges
435146 fine overlaps
[Fri Jan 19 16:18:53 2018] Done
[Fri Jan 19 16:18:53 2018] calculating edge coverage ...
[Fri Jan 19 16:18:53 2018] removed 49433 duplicate edges
[Fri Jan 19 16:18:53 2018] Done
[Fri Jan 19 16:18:53 2018] masked 85482 contained reads
[Fri Jan 19 16:18:53 2018] masked 52618 low coverage (<1) edges
[Fri Jan 19 16:18:53 2018] 'best_overlap' cut 489099 non-best edges
[Fri Jan 19 16:18:53 2018] graph based overlapping
[Fri Jan 19 16:19:39 2018] 389105
[Fri Jan 19 16:19:39 2018] 18555 candidates
[Fri Jan 19 16:19:39 2018] Done, 598 new overlaps
[Fri Jan 19 16:19:39 2018] anchoring based overlapping
[Fri Jan 19 16:19:41 2018] 389105
[Fri Jan 19 16:19:41 2018] 0 candidates
[Fri Jan 19 16:19:41 2018] Done, 0 new overlaps

[Fri Jan 19 16:19:41 2018] iteration 3
[Fri Jan 19 16:19:41 2018] bulding edges
building edges
435744 fine overlaps
[Fri Jan 19 16:19:41 2018] Done
[Fri Jan 19 16:19:41 2018] calculating edge coverage ...
[Fri Jan 19 16:19:42 2018] removed 50023 duplicate edges
[Fri Jan 19 16:19:42 2018] Done
[Fri Jan 19 16:19:42 2018] masked 85482 contained reads
[Fri Jan 19 16:19:42 2018] masked 52612 low coverage (<1) edges
[Fri Jan 19 16:19:42 2018] 'best_overlap' cut 490258 non-best edges
[Fri Jan 19 16:19:42 2018] graph based overlapping
[Fri Jan 19 16:19:44 2018] 389105
[Fri Jan 19 16:19:44 2018] 89 candidates
[Fri Jan 19 16:19:44 2018] Done, 3 new overlaps
[Fri Jan 19 16:19:44 2018] anchoring based overlapping
[Fri Jan 19 16:19:46 2018] 389105
[Fri Jan 19 16:19:46 2018] 0 candidates
[Fri Jan 19 16:19:46 2018] Done, 0 new overlaps

[Fri Jan 19 16:19:46 2018] iteration 4
[Fri Jan 19 16:19:46 2018] bulding edges
building edges
435747 fine overlaps
[Fri Jan 19 16:19:46 2018] Done
[Fri Jan 19 16:19:46 2018] calculating edge coverage ...
[Fri Jan 19 16:19:47 2018] removed 50026 duplicate edges
[Fri Jan 19 16:19:47 2018] Done
[Fri Jan 19 16:19:47 2018] masked 85482 contained reads
[Fri Jan 19 16:19:47 2018] masked 52612 low coverage (<1) edges
[Fri Jan 19 16:19:47 2018] 'best_overlap' cut 490264 non-best edges
[Fri Jan 19 16:19:47 2018] graph based overlapping
[Fri Jan 19 16:19:48 2018] 389105
[Fri Jan 19 16:19:48 2018] 0 candidates
[Fri Jan 19 16:19:48 2018] Done, 0 new overlaps
[Fri Jan 19 16:19:48 2018] anchoring based overlapping
[Fri Jan 19 16:19:50 2018] 389105
[Fri Jan 19 16:19:50 2018] 0 candidates
[Fri Jan 19 16:19:50 2018] Done, 0 new overlaps
~/smartdenovo/wtclp -i TestSample.zmo.ovl.short -i TestSample.zmo.gbo.short -fo TestSample.zmo.obt -F -d 2
[Fri Jan 19 16:19:51 2018] loading alignments
[Fri Jan 19 16:19:53 2018] 922385
[Fri Jan 19 16:19:53 2018] Done, 243517 reads, 922385 overlaps
[Fri Jan 19 16:19:53 2018] clipping based on overlap depth
Before: legal overlaps = 430385
After: legal overlaps = 133017
[Fri Jan 19 16:19:54 2018] Done
[Fri Jan 19 16:19:54 2018] iteration 1
2676 reads were filtered by connection-checking
278 reads were truncated by chimera-checking
legal overlaps = 135341
[Fri Jan 19 16:19:54 2018] iteration 2
4 reads were filtered by connection-checking
11 reads were truncated by chimera-checking
legal overlaps = 135387
[Fri Jan 19 16:19:54 2018] iteration 3
0 reads were filtered by connection-checking
0 reads were truncated by chimera-checking
legal overlaps = 135387
[Fri Jan 19 16:19:54 2018] Done

== Message for debug ==
Sequence coverage statistic:
1046073 10824668 16129060 16678696 13701160 9843146 6814368 4611437 3192178 2359773
1663221 1197984 962899 740646 677049 652907 564440 538527 466340 412399
420808 370740 321274 308875 283371 255599 232553 245738 239084 226107
202768 188779 176133 167918 182581 155351 143535 131581 137333 127253
119882 120116 114681 113873 92210 109550 99129 91885 99879 94600
101932 91627 84404 83529 92875 86413 76855 69795 62975 66319
63799 66225 68975 69673 65977 68269 58591 56706 61120 57096
50637 60998 53594 52469 54132 54345 50195 48843 41447 46475
47198 52709 49962 52236 42375 47926 49525 49452 46554 45652
41429 36299 35484 35829 41193 41163 31880 27408 29407 34106

Total aviable sequences: 961524040 bp
Average Coverage(?): 4
Genome Size(?): 240381010 bp

[Fri Jan 19 16:19:54 2018] output
[Fri Jan 19 16:19:54 2018] Done
~/smartdenovo/wtlay -i TestSample.fa.gz -b TestSample.zmo.obt -j TestSample.zmo.ovl.short -j TestSample.zmo.gbo.short -fo TestSample.zmo.lay -s 200 -m 0.6 -R -r 1 -c 1
[Fri Jan 19 16:19:54 2018] loading reads
[Fri Jan 19 16:20:54 2018] Done, 389105 reads
[Fri Jan 19 16:20:54 2018] loading reads obt information
[Fri Jan 19 16:20:54 2018] Done
[Fri Jan 19 16:20:54 2018] loading alignments
loaded 166589 overlaps
building edges
116045 fine overlaps
[Fri Jan 19 16:20:55 2018] Done
[Fri Jan 19 16:20:55 2018] calculating edge coverage ...
[Fri Jan 19 16:20:56 2018] removed 444 duplicate edges
[Fri Jan 19 16:20:56 2018] Done
[Fri Jan 19 16:20:56 2018] masked 27258 contained reads
[Fri Jan 19 16:20:56 2018] masked 14912 low coverage (<1) edges
[Fri Jan 19 16:20:56 2018] 'best_overlap' cut 100267 non-best edges
1951 tips, 203 bubbles, 2 chimera, 187 non-bog, 0 recoveries
[Fri Jan 19 16:20:56 2018] repair 2341 bog elements
67 tips, 1 bubbles, 0 chimera, 5 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 73 bog elements
7 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 7 bog elements
1 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 1 bog elements
0 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] generated 208703 unitigs
[Fri Jan 19 16:20:57 2018] recovered 878 edges inter unitigs
724 tips, 17 bubbles, 0 chimera, 15 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 756 bog elements
6 tips, 1 bubbles, 0 chimera, 1 non-bog, 0 recoveries
[Fri Jan 19 16:20:57 2018] repair 8 bog elements
0 tips, 0 bubbles, 0 chimera, 0 non-bog, 0 recoveries
[Fri Jan 19 16:20:58 2018] generated 209101 unitigs
[Fri Jan 19 16:20:58 2018] recover 62 edges inter unitigs
[Fri Jan 19 16:21:33 2018] output 329 independent unitigs
[Fri Jan 19 16:22:02 2018] Done

Assembly versus genome size

Dear RuanJue,

I noticed in the end of wtclp step there is an estimation of genome size, e.g. in this case 1.0GBp (shown below), which is the size we expect for the genome. However the output from wtlay (with parameters: -w 300 -s 200 -m 0.1 -r 0.97 -c 1 ) has only 898Mb total assembly size - is there any way to uncover the missing ~100Mb (10%) sequence to see what they are?

Thank you in advance.

Regards, Chu Shin

--
Total aviable sequences: 51310363234 bp
Average Coverage(?): 51
Genome Size(?): 1006085553 bp

Getting started, advice, etc

Hi,

Thanks for the great tool.

I have messed around with smartdenovo on a few datasets using the following:

~/software/smartdenovo/smartdenovo/smartdenovo.pl -c 1 data/all_subreads.fasta > wtasm.mak
make -f wtasm.mak

I have messed around with providing all reads and varying degrees of higher quality subsets of reads. It seems like smartdenovo is very sensitive to read quality in ways other tools are not. For example, Miniasm and Abruijn do better with all or most of the reads whereas smartdenovo does better with higher quality subsets of them.

Are there any parameters I should tweak to help smartdenovo exploit all the reads?

best,

John

separating similar chromosome parts in corrected read assemblies

We are assembling a large genome (close to 2 Gb). We have corrected the reads with canu and assembled them with smartdenovo. The assembly size is 16% lower than expected. Some chromosomes share large sub-sequences which are assembled only once and collected twice the average read depth when we realign the reads on the assembly. We think that these parts have not different enough sequences to be separated by smartdenovo with standard parameters.
Have you already seen this?
Which parameter(s) should we change to try to separate both parts?

missing one entire chromosome in the final assembly

Hello ruanjue,

I have been actively testing smartdenovo using our own ONT data (on S.cerevisiae with 200X coverage) recently. The final assembly looks good in general except one of the chromosome (chrIII) is completely missing in the final assembly. All the other chromosomes look good even for the most difficult one, chrXII. I obtained the same results with multiple independent runs on different machines, so I guess this must be related with smartdenovo. So I was wondering if you have some suggestions about this. I can share my input reads for your testing if you can send me your email address. Another observation is that smartdenovo seems to consume quite a lot memory in certain intermediate steps. Do you have some suggestions about this? Thanks in advance!

Best,
Jia-Xing

Compiation failed on "undefined reference to `cut_biedge_strgraph2'"

Hi,

I just tried to download and compile smartdenovo using your command lines:

git clone https://github.com/ruanjue/smartdenovo.git  
cd smartdenovo  
make

but the compilation endend on:

/tmp/ccBHOwYi.o: In function `merge_bubbles_strgraph':
wtlay.c:(.text+0x984d): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x9861): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x992d): undefined reference to `cut_biedge_strgraph2'
wtlay.c:(.text+0x994a): undefined reference to `cut_biedge_strgraph2'
collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1
make: *** Waiting for unfinished jobs....

Do you have any idea of what could resolve this issue ?
(I am running on an ubuntu 18.04 up to date)

cns output to fasta format

Hi
I got the .cns output using the smartdenovo assembly of nanopore reads.
I want to convert them into FASTA format for the QUAST assessment. How do I convert the .cns to FASTA format?

Furthermore, It will be a great help if you could let me know which program should be good to use for contiguity and sequence identity (error rate of nanopore assembly) assessment against reference genome.

Thanks
sam

Error running SMARTdenovo

Hi...
while trying to assemble with 20 Gb ONT data I'm getting the error called Illegal instruction.
I'm not sure what is this error means and how to tackle this. Could u please help me in fixing this?

[peru@juncea]$ perl ~/anaconda3/pkgs/smartdenovo-1.0.0-0/bsta > sdn_Bn6zz.mak
[peru@juncea]$ make -f sdn_Bn6zz.mak
make: Warning: File sdn_Bn6zz.mak' has modification time 3.5e+03 s in the future ~/anaconda3/pkgs/smartdenovo-1.0.0-0/bin/wtpre -J 5000 Bn6_QC30x.fasta ~/smartdenovo-1.0.0-0/bin/wtzmo -t 8 -i sdn_Bn6zz.fa.gzn_Bn6zz.dmo.ovl -k 16 -z 10 -Z 16 -U -1 -m 0.1 -A 1000 [Wed Sep 13 19:57:43 2017] loading long reads [Wed Sep 13 20:02:23 2017] Done, 1672538 reads (length >= 0) [Wed Sep 13 20:02:31 2017] sorted sequences by length dsc [Wed Sep 13 20:02:31 2017] calculating overlaps, 8 threads [Wed Sep 13 20:02:31 2017] indexing 1/1 [Wed Sep 13 20:02:31 2017] - scanning kmers (16 bp) 1672538 reads [Wed Sep 13 20:15:24 2017] - high frequency kmer depth is set to 2405 [Wed Sep 13 20:15:25 2017] - average kmer depth = 481 [Wed Sep 13 20:15:25 2017] - 159317 high frequency kmers (>=2405) [Wed Sep 13 20:15:25 2017] - indexing 7015528 kmers 1672538 reads [Wed Sep 13 20:29:32 2017] Done [Wed Sep 13 20:29:32 2017] querying 1/1 **000000000000 0make: *** [sdn_Bn6zz.dmo.ovl] Illegal instruction make: *** Deleting file sdn_Bn6zz.dmo.ovl'**

Thanks
sam

Resume

Hi Can smartdenovo be resumed if failed?

How to run wtzmo parallelly

Hi Ruanjue

I am assembling a big genome (>20Gbp), I would like to run wtzmo parallelly, i.e. 10 parts, can you give an example about how to set parameters? And if I run wtzmo parallelly, will the memory usage reduce?

Thank you very much.

Should nanopores reads be corrected by canu or other corrector before using smartdenovo

Should nano pores reads be corrected by canu or other corrector before using smartdenovo

PacBio and Nanopore Hybrid Assembly

Hello all,
Was wondering if samartdenovo does hybrid assemblies of Pacbio and Nanopore ?
Or is it just either Pacbio or Nanopore?
Can someone please help me with this ?
Thanks heaps in advance
Saila

smartdenovo make error

when make in fold smartdenovo, errors were showed as follows:
/tmp/ccTfg5aF.o: In function merge_bubbles_strgraph': wtlay.c:(.text+0x984d): undefined reference to cut_biedge_strgraph2'
wtlay.c:(.text+0x9861): undefined reference to cut_biedge_strgraph2' wtlay.c:(.text+0x992d): undefined reference to cut_biedge_strgraph2'
wtlay.c:(.text+0x994a): undefined reference to `cut_biedge_strgraph2'
collect2: error: ld returned 1 exit status
Makefile:48: recipe for target 'wtlay' failed
make: *** [wtlay] Error 1

About meaning of -c parameter

Dear community,

The -c parameter is used to generate the consensus sequence. And I still not understand the meaning of value with -c parameter. For example:

smartdenovo.pl -c 1 -J 1000 -k ${k} -p Crystal_c1s1_${k}mer ../Crystal_all_1kb.correctedReads.fasta > Crystal_v1.8_1kb_17mer.mak

I don't know what's the difference if I changed -c 1 to -c 2, or -c 5. (Any more value? like -2?)
So far I have Nanopore corrected reads, and I want to use Smartdenovo to do assembly.

Thank you.

ruanjue / smartdenovo Goto Github PK

smartdenovo's People

Contributors

Stargazers

Watchers

Forkers

smartdenovo's Issues

Total aviable sequences: 961524040 bp Average Coverage(?): 4 Genome Size(?): 240381010 bp

Recommend Projects

Recommend Topics

Recommend Org

Total aviable sequences: 961524040 bp
Average Coverage(?): 4
Genome Size(?): 240381010 bp