Comments (7)
I compared the N50 of the fast from the following three files:
comb.hifiasm.p_ctg.fa
haplotype_binning_-3-4out.hap1.p_ctg.fa
haplotype_binning_-3-4out.hap2.p_ctg.fa
The N50 is not much different:
combined primary contain N50:
"totalContigLength": "1419992",
"numberOfContigs": "3",
"contigN50": "492778",
"longestContig": "657757",
"totalScaffoldLength": "1419992",
"numberOfScaffolds": "3",
"scaffoldN50": "492778",
"longestScaffold": "657757",
haplotype1 primary contain N50:
"totalContigLength": "1419063",
"numberOfContigs": "3",
"contigN50": "491528",
"longestContig": "658078",
"totalScaffoldLength": "1419063",
"numberOfScaffolds": "3",
"scaffoldN50": "491528",
"longestScaffold": "658078",
haplotype2 primary contig N50:
"totalContigLength": "1416418",
"numberOfContigs": "3",
"contigN50": "492613",
"longestContig": "655973",
"totalScaffoldLength": "1416418",
"numberOfScaffolds": "3",
"scaffoldN50": "492613",
"longestScaffold": "655973",
They are not much different. I guess this means that the haplotype binning information I provided is not better than what hifiasm can do inherently.
so presumably if the HIFI reads that I provide span a much longer haplotype block than what HIFI reads can phase, it should be better.
Anyways, sorry for bothering.
I was wondering the difference of primary unitigs (p_utg.gfa)/the alternate unitigs (r_utigs.gfa) from primary contigs.
is there a way to visualize the .gfa for these different files for illustration.
Finally, there is a question of how to use the overlap information
(comb.hifiasm.ovlp.source.bin/comb.hifiasm.ovlp.reverse.bin) to identify any HiFi reads that overlap with HiFI reads of interest (among the input HiFi reads)?
thanks a lot,
zhenzhen
from hifiasm.
The assemblies with/without -3 and -4 are totally different:
- If you run hifiasm in default, it will output a primary assembly including haplotype switch between two haplotypes. So the primary assembly is not a fully phased assembly, it is a mixture of two haplotypes.
- If you run hifiasm with -3 and -4, it will try to output two fully phased assemblies, one for each haplotypes (see Figure 1 and Figure 2 in https://arxiv.org/pdf/2008.01237.pdf).
If the N50s of hap1, hap2 and the primary assembly are similar, I guess the reason might be that your sample is too simple. You can have a try with large and complex samples.
As for the overlaps, hifiasm can output all overlaps in PAF format with the option --write-paf
.
from hifiasm.
thanks, Haoyu!
When you say large and complex samples, do you mean input file containing HiFi reads across a bigger region from the genome?
from hifiasm.
Option -1/-2 or -3/-4 is intended for whole-genome phasing.
from hifiasm.
I see. Thanks!
I wonder when using -3/-4, if the ability to designate reads from two separate haplotypes is limited - i.e. only a portion of the reads are indicated to be haplotype-specific and provided via -3/-4, how does hifiasm handle the remaining reads? Does it consider all the remaining reads as homozygous and add to each of the two haplotypes?
from hifiasm.
No, in this case, it can still check which read is heterozygous, and then assign heterozygous reads to one haplotype randomly.
from hifiasm.
No, in this case, it can still check which read is heterozygous, and then assign heterozygous reads to one haplotype randomly.
that's wonderful! thank you!
from hifiasm.
Related Issues (20)
- Output interpretation with HiFi+ONT+HiC with inbred samples + `-l0` HOT 1
- low BUSCO scores HOT 1
- Mitigate Overlapping Sequence Assignments in Haplotypes HOT 3
- Help!!! Segmentation fault (core dumped) HOT 1
- Question about the depth of ONT ultra-long reads HOT 1
- Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly? HOT 1
- setting K parameter in yak HOT 2
- how to make the correct genome size estimation for allotetraploid species? HOT 2
- Possible missing one haplotype in human assemblies HOT 2
- No haploid.gfa files output in trio-binning mode HOT 3
- Hifi + Hi-c + ONT assembly fails
- In Trio-binning, always more on hap1 despite (almost) same sequences for paternal and maternal
- discontinuous assembly with shorter pacbio hifi reads but high coverage HOT 2
- Is x20 of Hifi data enough to construct draft assembly of 6.5Gb genome? HOT 1
- line 8: 110334 Aborted(core dumped) HOT 1
- Ultra Long intergration failed: no output for UL kmer counting HOT 3
- missing 8Mb sequences in the assembly HOT 5
- Empty haplotype 2 gfa files by ONT integration HOT 1
- Basic Question About HiFi Input HOT 3
- Spend too long times to run hifiasm HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hifiasm.