kn-bibs / dotplot Goto Github PK
View Code? Open in Web Editor NEWSimple visualisation tool for sequences' similarity in bioinformatics
License: GNU Lesser General Public License v3.0
Simple visualisation tool for sequences' similarity in bioinformatics
License: GNU Lesser General Public License v3.0
This is the easy one - it is basically a copy of window_size
but with different name, description and validation. None should be allowed here and maximal range should be defined as pow(windows_size, 2)
.
Today we don't have any licence. It should be there to encourage contribution from outside programmers.
Right now we cannot invoke ./dotplot.py
properly because we don't have a shebang pointing to the python3 location.
This is an easy task: one should find out the most generic (cross-platform compatible) shebang ("google is your friend") and place it at the very beginning of the dotplot.py
file.
Later it would be great if we can test it both with Linux and MacOS.
At the moment nothing indicates which sequence is on which axis. We should use at least a basic label with the sequence's name.
We should allow to use different window-sizes (now it's "1" - we have only exact matching) for plotter.
It would be nice to have some more general information visible after typing:
python3 dotplot.py -h
Only small changes are needed: to start with one can add description parameter to ArgumentParser copying and modifying some texts from README file.
This involves all chunks of code: we should have at least 75% of coverage before publishing beta version.
Dotplot.py and gui.py have score lower than 7.5 according to pylint. According to README, we don't allow that. We should clean the code - add docstrings, fix import errors, cut too long lines, get rid of unused variables etc.
We should add an option to present result as a heatmap. I think it could be a good idea to use matplotlib's pyplot package. Examples would be https://plot.ly/matplotlib/heatmaps/ and http://heartland.geocities.jp/ecodata222/ed-e/ede1-3-0.html.
When plotting:
./dotplot.py --fasta 1.fa 2.fa --gui
and
./dotplot.py --fasta 1.fa 2.fa --window_size 10 --gui
in both cases we really consider all residues/nucleotides etc, so full sequence should be shown in both cases. In the second case, the sequence is trimmed to the dimensions of plotted matrix, which shouldn't be the case.
Currently we display sequences as labels on sides of plot if the sequence is shorter than 100 (and only if chosen drawer is matplotlib).
We could add a command line argument drawer.show_sequences
which would allow users to overwrite this behavior and specify if sequences should be shown.
Also adding sequences to other drawers would be a nice-to-have feature; the problem will be if window_sizes != 1
will be chosen - then we should refrain from adding sequences in unicode and ascii mode (for simplicity and because those modes are not of the foremost priority).
It would be great if we could display an example or two of sequence identifier (for each online source) in a window where user can choose a sequence to download. It might be interactive (so shown only after user's choice on given radio button) and implemented by changing text in some new QLabel (e.g. below the radio buttons).
We can start with the most poplar aminoacid substitution matrices (like PAM250) and adding a switch in command line arguments.
Using Re-StructuredText instead of Markdown will allow to display the readmy easily on pip pages.
@pjanek @bartma11 and rest of drawer team:
Maybe we can (beside basic ASCII characters) use special Unicode characters (https://en.wikipedia.org/wiki/Box-drawing_character) that are cross-platform (section DOS?) and produce nice-looking effects. There could be also a switch (an argument for argparse) so user can use ASCII as fallback in case of really odd and old terminal. Do you agree?
The user should be able to save the plot to file.
We have fragments of identical code in plotter.py
file: https://codeclimate.com/github/kn-bibs/dotplot/issues.
This should be easy to fix.
Since implementation of window_size
we are able to generate "gradients" to show partial matches of fragments of sequences in "windows" of specified size. You can see the effect running:
./dotplot.py --fasta 1.fa 2.fa --gui --window_size 2
It will be good to have this functionality available also in UTF8 drawer. We can use characters described on Wikipedia: Block Elements page, like: ▓, ▒, ░ to show different percentages (right now we have █ if 1, else [space]; we want to have ranges defined for different fractions of 1).
Databases we are currently interested in are: NCBI (APIs: https://www.ncbi.nlm.nih.gov/home/develop/api.shtml), Uniprot (http://www.uniprot.org/help/programmatic_access#retrieving_individual_entries), Ensembl (BioMart) (API: http://rest.ensembl.org/)
Now Dotplot generates ASCII plots. We should enable creating more sophisticated graphics. Because it looks nicer, that's why.
We should detect when there is more than one sequence in a fasta file, and ask the user which one he would like to use. Allow to add sequence name as a program parameter.
It would be great to have a built-in window with tutorial or help describing how to:
That's self explanatory. It would be nice to offer to the user an option to undo/redo previous actions. We could add two entries in menu and create a "state" object which would hold all the current configuration and have a list of those states (or even better, a queue of fixed length).
We could show user the last action they performed, changing the text in statusbar to log like "Sequence file selected" or "Plot generation completed". Currently it shows only "Welcome".
We can place it beside the sequences panel; it can be created in a separate file and imported to GUI as it will be extensively used later (but it's not obligatory to start in a new file). We can start with windows size control only - a spinner or text field will be enough for the beginning.
We could add more data sources and enable user to download sequences be gene name (HGNC and Ensembl). We will retrieve the canonical sequence for given gene.
As of right now our program only accepts text in the FASTA format. We'd like to be able to load any plain text files as long as it meets certain criteria (i.e is a valid sequence). In such situation we also need to provide a way for users to input basic information about the sequence (e.g name of the sequence).
Right now we have implemented functions allowing us to fetch sequences directly from online databases (#23). We should expose some interface so users could use that easily. Two things to be done here:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.