Giter Site home page Giter Site logo

Comments (6)

ruanjue avatar ruanjue commented on July 29, 2024

Actually, I have no experience on CANU->wtdbg. If you have no other better way, try use --aln-noskip . Also please have a look at wtdbg-1.2.8 --help.

from wtdbg2.

zkstewart avatar zkstewart commented on July 29, 2024

From my experience using a few assemblers (Canu, HGAP4, MECAT, Miniasm, SMARTdenovo, WTDBG, and Flye) what you're seeing is normal. Specifically, for my species, both Canu and MECAT give greatly increased genome sizes (approx. 430-450Mb) relative to the other assemblers (approx. 270-300Mb).

From doing things like running Redundans on the genomes and BUSCO on my predicted gene models, both Canu and MECAT have a lot of redundancy, i.e., they're assembling the diploid strands individually, whereas most other assemblers are collapsing these regions (which is the behaviour that I actually want).

As an example, my Canu BUSCO score has 73.3% duplicated models, whereas WTDBG has only 0.9% (!). Things like this lead me to believe that WTDBG provides the most realistic assembly size out of the programs I've used (even though SMARTdenovo seems to give me the most balanced result for my species from what I've seen thus far).

I think if you do a similar analysis including Redundans and gene model prediction you'll be able to see if Canu's large genome size is real, or if WTDBG's is more accurate similar to what I've seen with my data.

Zac.

from wtdbg2.

ruanjue avatar ruanjue commented on July 29, 2024

Thanks ZAC. 'Redundans and BUSCO' is really a good way to select one from various assemblies.

You also bring me a useful news about MECAT. I had tried it on c.elegans PacBio assembly, but got ./mecat.sh: line 7: 56374 Segmentation fault (core dumped) mecat2cns -i 0 -t 96 pairwise.pm.m4 wt.fa corrected.fasta, here, wt.fa was the input seuqences file. Do you by chance know about it?

Regards,
Jue

from wtdbg2.

tangerzhang avatar tangerzhang commented on July 29, 2024

Thanks, Zac and Jue. I will try what you did!

from wtdbg2.

zkstewart avatar zkstewart commented on July 29, 2024

Hi Jue,

I'm not sure if I encountered that error with mecat2cns, it was a while ago that I performed the assembly. If it helps, this is the script I used for running the program which finished successfully. It might help to see how I formatted all of my program calls?

export HDF5_INCLUDE=/home/stewarz3/various_programs/hdf5/include
export HDF5_LIB=/home/stewarz3/various_programs/hdf5/lib

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/stewarz3/various_programs/hdf5/lib
export PATH=/home/stewarz3/various_programs/MECAT/Linux-amd64/bin:$PATH
export PATH=/home/stewarz3/various_programs/DEXTRACTOR:$PATH

FILEDIR=/home/stewarz3/species_assembly/assembly_ready
FILENAME=species_subreads
PREFIX=species_mecat

CPUS=12
MEM=80
GENSIZE=300000000
COV=75

mecat2pw -j 0 -d $FILEDIR/${FILENAME}.fasta -o ${PREFIX}.fasta.pm.can -w wrk_dir -t $CPUS
mecat2cns -i 0 -t 16 ${PREFIX}.fasta.pm.can $FILEDIR/${FILENAME}.fasta corrected_species_filtered.fasta
extract_sequences corrected_species_filtered.fasta corrected_species_75x.fasta $GENSIZE $COV
mecat2canu -trim-assemble -p $PREFIX -d $PREFIX genomeSize=$GENSIZE ErrorRate=0.02 maxMemory=$MEM maxThreads=$CPUS useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected corrected_species_75x.fasta.fasta

Otherwise, I used MECAT commit de87b0b4794a3270a1f5c2bc92c7ce15653574ab. Sorry, I'm not sure if I can help much beyond this.

from wtdbg2.

ruanjue avatar ruanjue commented on July 29, 2024

ZAC, Thanks! I will try it again.

from wtdbg2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.