Giter Site home page Giter Site logo

Comments (9)

AstrMary avatar AstrMary commented on August 18, 2024 2

Dear Esteban,

You tool is exactly what I was looking for !

Thank you again for you help,
Maria

from chromeister.

estebanpw avatar estebanpw commented on August 18, 2024

Dear Mary,

what do you mean by messy? Can you share the dotplot?
Do you mean something like the Multi-fasta example in the Readme (find it here) but on a larger scale?

What score are you getting?

If there are a lot of contigs and these are unordered, then there is nothing that can be done in chromeister to make it less "messy". Also, what is your aim with the comparison?

If you could share the dotplot (or a section of it, not sure if you have privacy constraints) then I might be able to help : - )

from chromeister.

AstrMary avatar AstrMary commented on August 18, 2024

Dear Esteban,
thank you for your prompt response.
My score is 0.996 setting z =1.
If I run the command : ./CHROMEISTER -query Genome1.fa -db Genome2.fa -out dotplot.mat -diffuse 1 && Rscript compute_score.R dotplot.mat 1000 i get the below plot

dotplot mat filtWithGrids

If I run this command : /CHROMEISTER -query Genome1.fa -db /Genome2.fa -out dotplot.mat -diffuse 1 && Rscript compute_score-nogrid.R dotplot.mat 1000
I get the following plot :
dotplot mat filt

I expect to get something like this :

dotplot mat filtsame

We try to decide our reference genome, so, my aim is to compare these two varieties of olea in order to define their similarity .
Yes you are right, the genomes of these two varieties have many many contigs, so it might be this the problem.

your help would be more than welcome :-)
Many thanks,
Maria

from chromeister.

estebanpw avatar estebanpw commented on August 18, 2024

Dear Mary,

thanks for the info. I think that the problem might be this one:

There are so many contigs (around ~40k in one of the genomes) that none of them has enough length on its own for chromeister to consider it an interesting signal (and it might be filtering them). This would be aggravated if the contigs were unordered (which I assume they are, since otherwise you would have probably assembled them into larger scaffolds, is this true?)

In any case I have just included a new script unfiltered_plot.R that should be able to plot without filtering anything, so that you should also be able to see the matches. You can run this script similarly to what you were doing:

(remember to git pull origin first in your local repository)

./CHROMEISTER -query Genome1.fa -db Genome2.fa -out dotplot.mat -diffuse 1 && Rscript unfiltered_plot.R dotplot.mat 1000

For instance, running Homo sapiens chrX with Mus musculus chrX with such script generates:

HX-MX-unfiltered

Note that the plot is now more "blurry" as it includes single matches which are considered "noise" in the original CHROMEISTER pipeline.

And in a case that might be similar to yours, this is a comparison of two contigs file (from the same species):

contigs-contigs

As you can see there are matches between the contigs, but these are scattered around according to the order in the files.

However, if it is in fact the case that contigs are small and unordered, then I doubt that you will get a straight diagonal, but rather a lot of scattered points (such as in the previous contigs example). Let me know if this helps and please post here the new results that you get with the unfiltered script.

Bests,
Esteban

from chromeister.

AstrMary avatar AstrMary commented on August 18, 2024

Dear Esteban,
thank you so much for your new script !! you really helped me a lot

I just run it and I have the below plot

dotplot mat filt

Yea, it seems that the contigs are small and unordered... I will try to find a way to order them and then I will run the new script again..

As soon as I have the new results I will post them :-)

Thank you for your time,
Maria

from chromeister.

estebanpw avatar estebanpw commented on August 18, 2024

I am happy it helped!

Btw: if you have a reference genome (even if its contigs or scaffolds, as long as they are in order) then you can compare the unordered one with the reference and sort the contigs according to the coordinates of the alignments they are matched to. Also, let me know if you find a better way to order them, since its a problem we have had in the past at our lab.

Bests,
esteban

from chromeister.

AstrMary avatar AstrMary commented on August 18, 2024

The truth is that from my side is the first time that I face this kind of problem, so your advice is very helpful..
Of course, I will post the results with a description of my analysis and I will be glad to read your comments.
Kind thanks,
Maria

from chromeister.

AstrMary avatar AstrMary commented on August 18, 2024

Dear Esteban,
I hope you are well.
Further to our previous conversation, I checked the quality of the two assemblies and I used the assembly with the best quality to order the other assembly. The tool that I used for the ordering is the Mauve and I run it through command line
"java -Xmx500m -cp Mauve.jar org.gel.mauve.contigs.ContigOrderer -output results_dir -ref reference.gbk -draft draft.fasta"
Then I run your tool
"./CHROMEISTER -query ordered.fasta -db reference.fasta -out dotplot.mat -diffuse 1 && Rscript compute_score-nogrid.R dotplot.mat 1000" and I got this plot :

dotplot mat filtWithNoFilter
Then I run the following command and I got
"./CHROMEISTER -query ordered.fasta -db reference.fasta -out dotplot.mat -diffuse 1 && Rscript unfiltered_plot.R dotplot.mat 1000"
dotplot mat filtered ordered

Any comments :-) ?
Thank you again for your help
Maria

from chromeister.

estebanpw avatar estebanpw commented on August 18, 2024

Dear Maria,

its looking much better now! I would say that the diagonal is "there", just some contigs/scaffolds seem to be missing or not in the correct order.

CHROMEISTER wont get you much further though, as its mostly aimed to produce the "big picture". Do you need the actual alignments?

Also:

The tool that I used for the ordering is the Mauve and I run it through command line
"java -Xmx500m -cp Mauve.jar org.gel.mauve.contigs.ContigOrderer -output results_dir -ref reference.gbk -draft draft.fasta"

Didnt know about mauve doing that. Thanks for letting me know!

Bests,
Esteban

from chromeister.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.