Comments (4)
Sorry, I should be more specific. The events.txt file has multiple descriptions of the syntenic blocks, I'm specifically interested in the rearranged block (i.e., inversions, transposition, etc.) and not the conserved blocks.
Thanks,
Jeff
from chromeister.
Hello Jeff!
You are correct in your guess: the coordinates are always global (internally the fasta headers > sequence ...
get removed). In case you want these to be local there are two approaches:
- You could re-run everything with every chromosome in a separate file (this takes a bit more time but it is still fast).
- You can convert them by using the following script that I just pushed into the repository (remember to update your repository). It is located in the
bin
folder and namedglobal_to_local.sh
. To use it, you have to runbin/global_to_local.sh comparison.csv comparison.events.txt > localEvents.txt
. Important: the two provided arguments are thecomparison.csv
which is the auto-generated csv file (which contains coordinates) and the events file. The new local coordinates per chromosome will be in thelocalEvents.txt
file. Disclaimer: I have not had much time to test it so be careful with it and check that it works (e.g. if you notice that coordinates are too big or negative, then that would be a sign that it does not work). Also, remember that the coordinates in the events file are a coarse-grained approximation!
Note: The script uses a naive approach which makes it slow if you the fasta files contain many sequences or if the events file contains many events. It is naive because it will check every event with all the lengths of the sequences until one that is bigger is found. This could be easily improved with a binary search, but it should work without further problem if you have e.g. 30 by 30 chromosomes.
Btw: I am no longer able to maintain this repository on a daily basis as I recently switched jobs. Still I will try to help when possible!
Hope this helps,
Esteban
from chromeister.
Esteban,
Thanks so much for you help - very much appreciated - I'll give you script a try now!
A couple more quick questions - is there a way to export the figure as a higher resolution file? I'm looking for something that I could import into illustrator to make manuscript quality figures. Lastly, I'm still trying to wrap my head around the level argument. It appears that it should be set to 4, except when using with large plant genome where you indicate that it could be set to 1. What exactly is this argument doing?
I know you've switched jobs and really appreciate you still helping with chromeister.
Jeff
from chromeister.
Hey there @pjm43 ,
Thanks so much for you help - very much appreciated
Thanks for using CHROMEISTER! :)
I'll give you script a try now!
Let me know if it works alright!
is there a way to export the figure as a higher resolution file?
I think this should be possible. In essence, what CHROMEISTER plots is basically a matrix of 0's and 1's after it has computed some stuff. This is the matrix called dotplot.mat.raw.txt
(although the first two lines include the sequence lengths). I have tried something out that might work out for you. First, remove the two length lines from the file by doing:
tail -n +3 dotplot.mat.raw.txt > dotplot.nolen
Then copy this python code:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
matplotlib.use('Agg')
fig = plt.figure(figsize=(16, 16))
matrix = np.loadtxt('dotplot.nolen')
plt.imshow(matrix, cmap='Greys', interpolation='nearest')
plt.savefig("dotplot.svg")
Note: Remember to change the dotplot.nolen
to the name of your comparison file.
Note 2: You may need to set up a virtual env and install python packages with pip.
Then you can run the above python script and it will save the matrix to an svg
file.
If you are happy with the output, you can then add the axes names and titles (check out the matplotlib library for this).
This is the resulting svg
for a normal comparison:
(Note: you should be able to download the svg
by right clicking on it, I am not sure how github will render it).
This is not a very clean solution but it might work for what you need.
IMPORTANT: saving dotplots from CHROMEISTER as vector graphics file was not planned originally because the png
file should provide with enough detail - especially considering that CHROMEISTER is very, very heuristic. This means that if you make the dotplot of too much resolution then some boundary pixels might show signal when there is none in reality, i.e. CHROMEISTER will give you a good coarse-grain preview, but going into the fine-grain detail might produce some artifacts (remember that every pixel in the dotplot typically corresponds to around 100,000 base pairs in sequences of length 100 MBp!). Always check that the results are correct and that they make sense!
I'm still trying to wrap my head around the level argument. It appears that it should be set to 4, except when using with large plant genome where you indicate that it could be set to 1. What exactly is this argument doing?
I would not give too much though to this parameter, but in essence it controls how many nucleotides are skipped to calculate the hash function. This means that if you input z=4
(the default) only 1 in every 4 nucleotides will be used to compute the hash of a word. If you use z=1
, all of them will be used, thus leading to less collisions. Therefore, a low z
value is recommended when comparing very large sequences with many repeats (such as plants) to avoid "too many" collisions. But in my experience this affects the output by very little and only in some comparisons, so I think you are probably safe leaving to 4.
Hope it helps!
Esteban
from chromeister.
Related Issues (20)
- remove the [1] and null device from the stdout of both R scripts
- Comparison of large genomes.. messy plot HOT 9
- Sorting the plot from chromeister HOT 3
- compute_score.r error HOT 2
- get the coordinate of synteny block HOT 3
- parallel analysis HOT 1
- Interpretation of scores HOT 1
- Get plots in pdf or svg formats? HOT 4
- Breakpoint about inversion HOT 2
- Problem with run_and_plot_chromeister.sh
- error cannot open file 'dotplot.mat.csv': No such file or directory HOT 11
- Bad Install due opencv-python HOT 1
- Not finding any synteny in test data HOT 7
- error of removal of "index-refseq-qryseq.csv" HOT 1
- Whole genome comparisons. HOT 1
- bring chromeister to bioconda and update galaxy tool HOT 20
- Error in axis - no locations are finite HOT 2
- Look into the dotplot matrix format in galaxy
- Remove empty lines before she bang lines
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chromeister.