Giter Site home page Giter Site logo

Comments (3)

estebanpw avatar estebanpw commented on August 18, 2024

Hello @lijing28101

Nice to hear that you are using Chromeister!

I have just pushed an update to the gecko repository (and updated the information in chromeister README) where coordinates are included when you extract alignments. I think this will be helpful for what you are trying to achieve.

When you run gecko from the output of chromeister, you will get a csv file which already contains the coordinates of each alignment, e.g.:

Type,xStart,yStart,xEnd,yEnd,strand(f/r),block,length,score,ident,similarity,%ident,SeqX,SeqY
Frag,10501365,169863604,10501485,169863724,f,0,121,292,97,60.33,0.80,0,0
Frag,10501365,169863600,10501485,169863720,f,0,121,324,101,66.94,0.83,0,0
Frag,10417407,169989214,10417776,169988845,r,0,370,920,300,62.16,0.81,0,0
Frag,10437686,169985195,10437886,169984995,r,0,201,564,171,70.15,0.85,0,0
Frag,10534666,169927933,10535652,169926947,r,0,987,3452,925,87.44,0.94,0,0
[...]

The second column is xStart (start coordinate on the query), third column is yStart (start coordinate on the reference) and fourth and fifth are the same for ending coordinates, respectively.

If additionally you need the alignments and their coordinates, just add the keyword alignments in your gecko execution like this:

bin/guidefastas.sh query.fasta ref.fasta hits-XY-dotplot.mat.hits 1000 100 60 32 alignments

This will generate an alignments file containing the alignments and their coordinates, such as:

AAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAGAAAGAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAGAAAGAAAA
||||||||||||||||||||||||||||||||||| | |||  ||  || ||| ||| |||||||||||||||||||||||||||||||||| | | | ||
AAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAAGAAAGAAGGAAGGAAGGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAA
@ FORWARD STRAND x1: 10501385 y1: 169863509 x2: 10501485 y2: 169863609 Identity: 88/101 (87.1287%)
TTTTCCCATTGATTAATATTTTTCCTGTTGAGCAGATGAGAGAAAGCCAAAAAAAGCACAGCTGGGCCATTTCCCCTCACTGGGAACGTCATTTCCAGGCACTTTGTGCTTACTTGAT
|||||||||||||| ||||||||| | |||| | |||||||||||||||||||||||||||||||||||||||| |||||||  |||||||||||||| |||| |||||| |||||||
TTTTCCCATTGATTGATATTTTTCTTATTGAACTGATGAGAGAAAGCCAAAAAAAGCACAGCTGGGCCATTTCCTCTCACTGTAAACGTCATTTCCAGTCACTCTGTGCTCACTTGAT
@ REVERSE STRAND x1: 10451224 y1: 169981322 x2: 10451341 y2: 169981205 Identity: 107/118 (90.678%)

Notice that the coordinates are referred to as x1: 10501385 y1: 169863509 x2: 10501485 y2: 169863609.

Let me know if this helps you. Also if you use it, remember to run git pull origin in your gecko repository within the inmemory_guided_chrom branch.

Best regards,
Esteban

from chromeister.

lijing28101 avatar lijing28101 commented on August 18, 2024

Hi Esteban,

I've tried the new version of gecko. But the result is still not what I want.
The coordinates in csv for whole genome comparison is accumulative, not the real coordinate for each chromosomes.
For example, If chr1 is 1-10000, then the coordinate for chr2 is 10000-20000, chr3 is 20000-30000.....
But I want the coordinate for each block is based on each chromosome.
Furthermore, the output of syntenic block for chr1 is not from start of chromosome. I tested on two maize line, the first block is

#chromeister output
Type,xStart,yStart,xEnd,yEnd,strand(f/r),block,length,score,ident,similarity,%ident,SeqX,SeqY
Frag,1098106,2069163,1099226,2070283,f,0,1121,4076,1070,90.90,0.95,_chr1,_chr1
Frag,1102117,2067025,1102307,2067215,f,0,191,596,170,78.01,0.89,_chr1,_chr1
Frag,1102517,2067414,1103418,2068315,f,0,902,2560,771,70.95,0.85,_chr1,_chr1

However, when I tried mummer4, it can found synteny block from beginning

#mummer4 output
chr1    1       1867    chr1    10038   11881   95.45   -
chr1    1       1170    chr1    14911   16057   93.00   -
chr1    1       1819    chr1    273654165       273655922       85.96   +
chr1    1       3628    chr3    892015  895591  87.05   +
chr1    1       2471    chr5    199334512       199336939       91.96   -
chr1    1       1582    chr5    200550730       200552286       94.26   -
chr1    1       7776    chr5    201082035       201089629       89.17   -
chr1    1       7768    chr5    201256076       201263686       91.20   +
chr1    1       3970    chr5    201438340       201442219       93.36   -
chr1    1       13056   chr5    201538604       201551437       90.03   +

Best,
Jing

from chromeister.

estebanpw avatar estebanpw commented on August 18, 2024

Hello @lijing28101

Thank you for your feedback. I have added (and changed) functionality to the chromeister/gecko pipeline in order to achieve what you are asking.

First, remember to update your gecko repository within the inmemory_guided_chrom branch.

Second, in regards to getting the coordinates in respect to the chromosomes as well as sorted, you can now run the guidefastas script like this:

bin/guidefastas.sh querySeqs.fasta refSeqs.fasta hits-XY-dotplot.mat.hits 1000 100 60 32 --local
(of course remember to change your dimension/length/similarity/wordsize parameters accordingly)

This will both change the coordinates from cumulative global to local in respect to each sequence and sort them first by their sequences and then by their coordinates, such as:

Frag,18304,910588,18370,910522,r,0,67,228,62,85.07,0.93,1,3
Frag,18376,910508,19135,909749,r,0,760,2496,692,82.11,0.91,1,3
Frag,1,475077,476,474602,r,0,476,1888,474,99.16,1.00,1,4
Frag,2485,472593,7003,468075,r,0,4519,17956,4504,99.34,1.00,1,4
Frag,6982,468128,7228,467882,r,0,247,756,218,76.52,0.88,1,4
Frag,7184,467927,7505,467606,r,0,322,1184,309,91.93,0.96,1,4
Frag,9256,465864,9326,465794,r,0,71,276,70,97.18,0.99,1,4

Notice that the third alignment starts at position 1 in the 1,4 comparison.

Also, if you would rather have the names instead of the 1,4 comparison, execute instead like this:

bin/guidefastas.sh HOMSA.Chr.X.fasta MUSMU.Chr.X.fasta hits-XY-dotplot.mat.hits 1000 100 60 32 --local --names

Finally, even if you use --local, two csv files will be generated, the original csv which still has the accumulated coordinates and a second csv called *.localsorted.csv. This is the one you want. The idea behind keeping both is that you can still take your regular csv and upload it into our visualizer in order to play interactively with the alignments.

Hope this helps, also, since these are new changes, please report any bugs if you find them.
Bests,
Esteban

from chromeister.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.