Comments (11)
Could you please zoom in the utg graph around the this 20-Mb region? I'd like to see how the subgraph looks like. Also, could you please show the following numbers at hifiasm log?
[M::ha_pt_gen] peak_hom: []; peak_het: []
[M::purge_dups] purge duplication coverage threshold: []
from hifiasm.
Thanks! I aligned 5 Mb sequence of 20-Mb region to all utgs fasta sequences, and I found it mapped to the utg000017l (47M).
The following information of hifiasm log are listed as below:
[M::ha_pt_gen] peak_hom: 25; peak_het: -1
[M::purge_dups] purge duplication coverage threshold: 31
from hifiasm.
Based on the mapping of genetic markers, can you assign this 20Mb to other chromosomes?
from hifiasm.
Thank you! Dr Li. Very strange, this 20 Mb region did not have any genetic markers.
from hifiasm.
A few more things to try:
- Blast pieces from this 20Mb region against the "nt" database and check the top hits.
- Run RepeatMasker to check the repeat content.
- When you map genetic markers, do you see any hits to this 20Mb or do most hits here have ambiguous mappings?
from hifiasm.
Thank you very much for your suggestions!
- I have blast 5 Mb retrieved from this 20Mb region againt the nt database, and all the top 10 hits are the same plant sequences with mine, thus we could exclude sequence pollution. Furter, I also aligned this 5 Mb sequence to an high-quality reference genome (contig N50 47 Mb), and I found this sequence partially mapped to many unanchored scaffolds.
- I have done repeat annotaion with EDTA, but only performed the LTR annotation of this pipeline, I found that LTR density is lower in this 20 Mb region of chr7.
- Thank you for reminding me. I have filtering the markers and retained unique mapped genetic marker, therefore, I misunderstood that this 20Mb region covered with no genetic markers. So, I recheck the markers, and found that this region is ambiguous mapping with many markers which located in different linkage groups.
- Maybe this region is rDNA or other repeat elements?
from hifiasm.
- Is your sample inbred diploid –– two sets of nearly identical chromosomes?
- You should check rDNA and centromere satellite in this 20Mb.
- Run HiCanu and see how HiCanu assembles this region.
from hifiasm.
- If your sample is inbred, it should be better to disable purge_dups using '-l0'.
- To find the corresponding unitigs at r_utg of this 20Mb region, a better way is to find the reads at this region (A-lines in p_ctg), and then grep them at r_utg. I assume it should correspond to the tangle between utg000017l and utg000018l. The safe way is to drop the 20Mb region of p_ctg at the boundaries of tangle if it is a potential misassembly.
from hifiasm.
Thanks all !
Yes, it is a inbred haploid, het is 0.232% when I did survey analysis, and I assembled the genome using "-l0".
After doing repeat annotation, 85% of this region was annotated as 180-bp knob repeat which is a specific tandem repeat in plants.
Therefore, this region has not been assembled by previous studies, and thus proved that HIFI reads and hifiasm are very efficient and accurate for assembly long tandem repeats. Thank you all again!
Furthermore, I do nucmer alignment using utg000017l and itself, an we can also seen the terminal 11 Mb are tandem repeat.
However, I still not understand why the ccs reads coverage reduced half in this region.
from hifiasm.
As someone was referring to this issue, I have reread the thread. I am seeing:
- A 20Mb region on the chr7 scaffold that has half of the expected coverage.
- The first 5Mb in this 20Mb is located at the end of utg000017l.
If this description is right, this is not a contig misassembly. You have an inbred diploid genome. One possibility is that this region is diverged between the two haplotypes although the rest of the genome is nearly homozygous. The solution is to remove the diverged copy from the primary assembly. By the way, when you scaffolded the contigs, have you discarded prefix.a_ctg.gfa?
from hifiasm.
Maybe you are right, this repeat region with half coverage may be divergence rapidly between the two haplotypes. Yes, I only use prefix.p_ctg.gfa for further assembly.
from hifiasm.
Related Issues (20)
- Assembly running out of memory; tried tuning down minimizer window size and kmer HOT 4
- How to check for circularity of contigs that are labeled as linear?
- can specific haploid parameters be added to improve assembly results? HOT 3
- Illegal instruction Error HOT 3
- error: update_rovlp_chain_qse HOT 5
- N50 is too small
- Suggestion for using HiC
- No k-mer peak HOT 3
- genome and contig increased HOT 2
- Question regarding dataset used in Hifiasm (UL) manuscript HOT 5
- Is purge_dups meaningful for hic phasing assembly? HOT 1
- problem of assembly HOT 1
- position of overlaps in final assembly HOT 3
- what is the default parameter for the minimum length of overlaps in hifiasm ? HOT 1
- Can gfa file add coverage information? HOT 3
- Hifiasm and simplex nanopore data HOT 9
- Much larger assembled genome than flow cytometry estimate HOT 3
- hifiasm not advancing after bins written HOT 1
- change -k parameter HOT 1
- combine hifi data and nanopore data in hifiasm HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hifiasm.