Giter Site home page Giter Site logo

Comments (4)

iminkin avatar iminkin commented on September 27, 2024 1

Hi!

The coverage is calculated as # the total number of positions in all genomes covered by the alignment / total length of the genomes. The alignments blocks output by SibeliaZ do not overlap, at least they are not supposed to. You can add as many input genomes as you want. One more note on coverage: SibeliaZ filters out repeats with many copies, so sometimes the coverage can be less than expected. This behavior is parameter-dependent though.

from sibeliaz.

iminkin avatar iminkin commented on September 27, 2024 1

Hi @ohdongha ,

Sorry for the late reply.

  1. Yes, each position in the genome is only supposed to be a part of a single alignment block.
  2. Yes, lower frequency repeats can be captured, depending on the parameter -a.
  3. MAF can be both reference or non-reference based. I am not sure, but it is likely that the notion of MAF being reference-based comes from the fact that earlier WGA tools like MultiZ were reference-based. SibeliaZ does not treat any particular input genome as a reference. Also, some other aligners (e.g. Cactus) also use MAF, but are not reference-based.
  4. I am not the author of maf2synteny, so I would refer this question to Mikhail Kolmogorov, who wrote the software. AFAIK, maf2synteny follows the general logic of Sibelia, where shorter blocks are gradually merged to form longer ones. This process does not have specific conditions on the multiplicity of the blocks. In other words, the resulting blocks could be either 1:multi, multi:1, or multi:multi, all depending on the actual data.
  5. I think the easiest way to compute it is to write a short Python script analyzing the data. It should be relatively straightforward. For example, BioPython has MAF-parser, which makes reading and operating MAF files very easy (https://biopython.org/docs/1.75/api/Bio.AlignIO.MafIO.html).

Hope this helps!

from sibeliaz.

ohdongha avatar ohdongha commented on September 27, 2024

Thanks, Ilia, for the quick response :) I have a few more questions regarding the "non-overlapping" part since this appears a requirement for maf2synteny.

  1. The 1st alignment from the GRCh38-T2T comparison (attached below) had numerous (=180) sequences from mostly chromosome 1s from the two assemblies. Likewise, the alignments at the beginning of the MAF output contained many sequences aligned. But this number of sequences decreased and later alignments are mostly 1:1.
    human_GRCh38_vs_T2T_firstBlock.maf.txt
    I assume "non-overlapping" means that even if each alignment can be 1:multi, multi:1, or multi:multi, the same genomic region won't appear in more than one alignment. Is it correct?

  2. The 1st alignment appears to be from repeats considering the many sequences in it. I used the default setting (-a 150) which would remove k-mers with frequency >150. Does this mean that repeat with copy number <150 can be still captured in the alignment while higher-copy repeats are more likely removed?

  3. I read that MAF is "reference-based". Does SibeliaZ consider one of the genomes as the reference, e.g. the first one among the two (or many) given as arguments? If so, does the output MAF have a distinction about which is from the reference?

  4. How does the maf2synteny deal with the sequences in 1:multi, multi:1, or multi:multi alignments when merging alignments to create longer synteny blocks? Will they be ignored or somehow some (or one?) of them be included in the synteny block?

  5. Is there an easy way to find out the proportions of each genome that were aligned 1:1 or 1:multi / multi:1 / multi:multi from the MAF or GFF output?

Thanks again!

from sibeliaz.

ohdongha avatar ohdongha commented on September 27, 2024

Thanks a lot, Ilia.
I noticed that sometimes the first sequence of the SibeliaZ output maf was on - strand direction and some tools (e.g. phyloFit of PHAST) don’t like it. In case of Cactus, it seems to estimate an ancestral sequence and put it on top of all alignments (even for a pairwise comparison) (and in + direction) so that the ancestral sequence kind of serve as the reference. But these may be more about the MAF specification not very well defined/agreed upon?

I also noticed that some alignment blocks have sequences from only one species (more precisely, only one chromosome of a species). Does SibeliaZ print paralogous alignment as well?

I will also try ‘maf2synteny’ and the MAF-parser and report back!
Dong-Ha

from sibeliaz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.