Giter Site home page Giter Site logo

Comments (4)

pjm43 avatar pjm43 commented on July 1, 2024

Sorry, I should be more specific. The events.txt file has multiple descriptions of the syntenic blocks, I'm specifically interested in the rearranged block (i.e., inversions, transposition, etc.) and not the conserved blocks.
Thanks,
Jeff

from chromeister.

estebanpw avatar estebanpw commented on July 1, 2024

Hello Jeff!

You are correct in your guess: the coordinates are always global (internally the fasta headers > sequence ... get removed). In case you want these to be local there are two approaches:

  • You could re-run everything with every chromosome in a separate file (this takes a bit more time but it is still fast).
  • You can convert them by using the following script that I just pushed into the repository (remember to update your repository). It is located in the bin folder and named global_to_local.sh. To use it, you have to run bin/global_to_local.sh comparison.csv comparison.events.txt > localEvents.txt. Important: the two provided arguments are the comparison.csv which is the auto-generated csv file (which contains coordinates) and the events file. The new local coordinates per chromosome will be in the localEvents.txt file. Disclaimer: I have not had much time to test it so be careful with it and check that it works (e.g. if you notice that coordinates are too big or negative, then that would be a sign that it does not work). Also, remember that the coordinates in the events file are a coarse-grained approximation!

Note: The script uses a naive approach which makes it slow if you the fasta files contain many sequences or if the events file contains many events. It is naive because it will check every event with all the lengths of the sequences until one that is bigger is found. This could be easily improved with a binary search, but it should work without further problem if you have e.g. 30 by 30 chromosomes.

Btw: I am no longer able to maintain this repository on a daily basis as I recently switched jobs. Still I will try to help when possible!

Hope this helps,
Esteban

from chromeister.

pjm43 avatar pjm43 commented on July 1, 2024

Esteban,

Thanks so much for you help - very much appreciated - I'll give you script a try now!

A couple more quick questions - is there a way to export the figure as a higher resolution file? I'm looking for something that I could import into illustrator to make manuscript quality figures. Lastly, I'm still trying to wrap my head around the level argument. It appears that it should be set to 4, except when using with large plant genome where you indicate that it could be set to 1. What exactly is this argument doing?

I know you've switched jobs and really appreciate you still helping with chromeister.

Jeff

from chromeister.

estebanpw avatar estebanpw commented on July 1, 2024

Hey there @pjm43 ,

Thanks so much for you help - very much appreciated

Thanks for using CHROMEISTER! :)

I'll give you script a try now!

Let me know if it works alright!

is there a way to export the figure as a higher resolution file?

I think this should be possible. In essence, what CHROMEISTER plots is basically a matrix of 0's and 1's after it has computed some stuff. This is the matrix called dotplot.mat.raw.txt (although the first two lines include the sequence lengths). I have tried something out that might work out for you. First, remove the two length lines from the file by doing:

tail -n +3 dotplot.mat.raw.txt > dotplot.nolen

Then copy this python code:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib
matplotlib.use('Agg')


fig = plt.figure(figsize=(16, 16))
matrix = np.loadtxt('dotplot.nolen')
plt.imshow(matrix, cmap='Greys', interpolation='nearest')
plt.savefig("dotplot.svg")

Note: Remember to change the dotplot.nolen to the name of your comparison file.
Note 2: You may need to set up a virtual env and install python packages with pip.

Then you can run the above python script and it will save the matrix to an svg file.
If you are happy with the output, you can then add the axes names and titles (check out the matplotlib library for this).

This is the resulting svg for a normal comparison:

dotplot

(Note: you should be able to download the svg by right clicking on it, I am not sure how github will render it).

This is not a very clean solution but it might work for what you need.
IMPORTANT: saving dotplots from CHROMEISTER as vector graphics file was not planned originally because the png file should provide with enough detail - especially considering that CHROMEISTER is very, very heuristic. This means that if you make the dotplot of too much resolution then some boundary pixels might show signal when there is none in reality, i.e. CHROMEISTER will give you a good coarse-grain preview, but going into the fine-grain detail might produce some artifacts (remember that every pixel in the dotplot typically corresponds to around 100,000 base pairs in sequences of length 100 MBp!). Always check that the results are correct and that they make sense!

I'm still trying to wrap my head around the level argument. It appears that it should be set to 4, except when using with large plant genome where you indicate that it could be set to 1. What exactly is this argument doing?

I would not give too much though to this parameter, but in essence it controls how many nucleotides are skipped to calculate the hash function. This means that if you input z=4 (the default) only 1 in every 4 nucleotides will be used to compute the hash of a word. If you use z=1, all of them will be used, thus leading to less collisions. Therefore, a low z value is recommended when comparing very large sequences with many repeats (such as plants) to avoid "too many" collisions. But in my experience this affects the output by very little and only in some comparisons, so I think you are probably safe leaving to 4.

Hope it helps!
Esteban

from chromeister.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.