Comments (6)
Actually, I have no experience on CANU->wtdbg. If you have no other better way, try use --aln-noskip . Also please have a look at wtdbg-1.2.8 --help
.
from wtdbg2.
From my experience using a few assemblers (Canu, HGAP4, MECAT, Miniasm, SMARTdenovo, WTDBG, and Flye) what you're seeing is normal. Specifically, for my species, both Canu and MECAT give greatly increased genome sizes (approx. 430-450Mb) relative to the other assemblers (approx. 270-300Mb).
From doing things like running Redundans on the genomes and BUSCO on my predicted gene models, both Canu and MECAT have a lot of redundancy, i.e., they're assembling the diploid strands individually, whereas most other assemblers are collapsing these regions (which is the behaviour that I actually want).
As an example, my Canu BUSCO score has 73.3% duplicated models, whereas WTDBG has only 0.9% (!). Things like this lead me to believe that WTDBG provides the most realistic assembly size out of the programs I've used (even though SMARTdenovo seems to give me the most balanced result for my species from what I've seen thus far).
I think if you do a similar analysis including Redundans and gene model prediction you'll be able to see if Canu's large genome size is real, or if WTDBG's is more accurate similar to what I've seen with my data.
Zac.
from wtdbg2.
Thanks ZAC. 'Redundans and BUSCO' is really a good way to select one from various assemblies.
You also bring me a useful news about MECAT. I had tried it on c.elegans PacBio assembly, but got ./mecat.sh: line 7: 56374 Segmentation fault (core dumped) mecat2cns -i 0 -t 96 pairwise.pm.m4 wt.fa corrected.fasta
, here, wt.fa
was the input seuqences file. Do you by chance know about it?
Regards,
Jue
from wtdbg2.
Thanks, Zac and Jue. I will try what you did!
from wtdbg2.
Hi Jue,
I'm not sure if I encountered that error with mecat2cns, it was a while ago that I performed the assembly. If it helps, this is the script I used for running the program which finished successfully. It might help to see how I formatted all of my program calls?
export HDF5_INCLUDE=/home/stewarz3/various_programs/hdf5/include
export HDF5_LIB=/home/stewarz3/various_programs/hdf5/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/stewarz3/various_programs/hdf5/lib
export PATH=/home/stewarz3/various_programs/MECAT/Linux-amd64/bin:$PATH
export PATH=/home/stewarz3/various_programs/DEXTRACTOR:$PATH
FILEDIR=/home/stewarz3/species_assembly/assembly_ready
FILENAME=species_subreads
PREFIX=species_mecat
CPUS=12
MEM=80
GENSIZE=300000000
COV=75
mecat2pw -j 0 -d $FILEDIR/${FILENAME}.fasta -o ${PREFIX}.fasta.pm.can -w wrk_dir -t $CPUS
mecat2cns -i 0 -t 16 ${PREFIX}.fasta.pm.can $FILEDIR/${FILENAME}.fasta corrected_species_filtered.fasta
extract_sequences corrected_species_filtered.fasta corrected_species_75x.fasta $GENSIZE $COV
mecat2canu -trim-assemble -p $PREFIX -d $PREFIX genomeSize=$GENSIZE ErrorRate=0.02 maxMemory=$MEM maxThreads=$CPUS useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected corrected_species_75x.fasta.fasta
Otherwise, I used MECAT commit de87b0b4794a3270a1f5c2bc92c7ce15653574ab. Sorry, I'm not sure if I can help much beyond this.
from wtdbg2.
ZAC, Thanks! I will try it again.
from wtdbg2.
Related Issues (20)
- wtpoa-cns状态长时间不更新 HOT 4
- wtdbg2对杂合度以及重复序列敏感度的问题 HOT 1
- Parameters for assembling short sequences (generating small assemblies) HOT 2
- I dont find wtpoa-cns script
- Enquiry about the parameter "-m" HOT 3
- Wtdbg2: Parameter -g setting problem HOT 2
- Runtime Issue with warnings HOT 5
- wtpoa-cns polishing details HOT 9
- Different contig numbers and contig length generated! HOT 1
- Died at /home/synbiolab/SoftWares/wtdbg2/wtdbg2.pl line 25. HOT 2
- No such file: assembly.events HOT 4
- No Consensus ctg.fa file created HOT 5
- Ambiguous bases HOT 1
- adjustable parameters with --load-alignments HOT 3
- Only one sample or multiple samples can be used in wtdbg2? HOT 1
- C++/ Python library HOT 1
- ".cns.fa" output HOT 1
- contigN50 too small HOT 13
- CCS data assemble far too small HOT 4
- Recomendation for AT-high and very repetitive genome
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wtdbg2.