kn-bibs / dotplot Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 2.0 215 KB

Simple visualisation tool for sequences' similarity in bioinformatics

License: GNU Lesser General Public License v3.0

Python 99.93% Shell 0.07%

bioinformatics dotplot gene-similarity protein-sequence visualisation

dotplot's People

Contributors

Stargazers

Watchers

Forkers

sienkie jackywu

dotplot's Issues

Add "stringency" parameter to CLI (argument_parser.py)

This is the easy one - it is basically a copy of window_size but with different name, description and validation. None should be allowed here and maximal range should be defined as pow(windows_size, 2).

We need to have a licence

Today we don't have any licence. It should be there to encourage contribution from outside programmers.

Add appropriate shebang to main file

Right now we cannot invoke ./dotplot.py properly because we don't have a shebang pointing to the python3 location.

This is an easy task: one should find out the most generic (cross-platform compatible) shebang ("google is your friend") and place it at the very beginning of the dotplot.py file.

Later it would be great if we can test it both with Linux and MacOS.

Add axes labels

At the moment nothing indicates which sequence is on which axis. We should use at least a basic label with the sequence's name.

Implement "window-size" for plotter

We should allow to use different window-sizes (now it's "1" - we have only exact matching) for plotter.

Add basic description to built-in help

It would be nice to have some more general information visible after typing:
python3 dotplot.py -h
Only small changes are needed: to start with one can add description parameter to ArgumentParser copying and modifying some texts from README file.

Increase tests coverage to at least 75%

This involves all chunks of code: we should have at least 75% of coverage before publishing beta version.

Clean dotplot.py and gui.py to raise its score in pylint to at least 7.5

Dotplot.py and gui.py have score lower than 7.5 according to pylint. According to README, we don't allow that. We should clean the code - add docstrings, fix import errors, cut too long lines, get rid of unused variables etc.

Dotplot should be able to generate a heatmap

We should add an option to present result as a heatmap. I think it could be a good idea to use matplotlib's pyplot package. Examples would be https://plot.ly/matplotlib/heatmaps/ and http://heartland.geocities.jp/ecodata222/ed-e/ede1-3-0.html.

Sequence label in matplotilb does not work well with window_size option

When plotting:

./dotplot.py --fasta 1.fa 2.fa --gui

and

./dotplot.py --fasta 1.fa 2.fa --window_size 10 --gui

in both cases we really consider all residues/nucleotides etc, so full sequence should be shown in both cases. In the second case, the sequence is trimmed to the dimensions of plotted matrix, which shouldn't be the case.

Let user decide if sequence should be displayed

Currently we display sequences as labels on sides of plot if the sequence is shorter than 100 (and only if chosen drawer is matplotlib).

We could add a command line argument drawer.show_sequences which would allow users to overwrite this behavior and specify if sequences should be shown.

Also adding sequences to other drawers would be a nice-to-have feature; the problem will be if window_sizes != 1 will be chosen - then we should refrain from adding sequences in unicode and ascii mode (for simplicity and because those modes are not of the foremost priority).

Add exemplar sequence identifiers to chooser

It would be great if we could display an example or two of sequence identifier (for each online source) in a window where user can choose a sequence to download. It might be interactive (so shown only after user's choice on given radio button) and implemented by changing text in some new QLabel (e.g. below the radio buttons).

Add different substitution matrices for plotter

We can start with the most poplar aminoacid substitution matrices (like PAM250) and adding a switch in command line arguments.

Our program should interpret without us!

[proposition] Rewrite "readme" to rst

Using Re-StructuredText instead of Markdown will allow to display the readmy easily on pip pages.

Let's use unicode characters for drawing

@pjanek @bartma11 and rest of drawer team:
Maybe we can (beside basic ASCII characters) use special Unicode characters (https://en.wikipedia.org/wiki/Box-drawing_character) that are cross-platform (section DOS?) and produce nice-looking effects. There could be also a switch (an argument for argparse) so user can use ASCII as fallback in case of really odd and old terminal. Do you agree?

Save plot to file

The user should be able to save the plot to file.

Eliminate repetitive fragments of code in 'plotter.py'

We have fragments of identical code in plotter.py file: https://codeclimate.com/github/kn-bibs/dotplot/issues.

This should be easy to fix.

Use block elements from UTF-8 charset to display different shades of dotplot

Since implementation of window_size we are able to generate "gradients" to show partial matches of fragments of sequences in "windows" of specified size. You can see the effect running:

./dotplot.py --fasta 1.fa 2.fa --gui --window_size 2

It will be good to have this functionality available also in UTF8 drawer. We can use characters described on Wikipedia: Block Elements page, like: ▓, ▒, ░ to show different percentages (right now we have █ if 1, else [space]; we want to have ranges defined for different fractions of 1).

We need to be able to download sequences from various databases

Databases we are currently interested in are: NCBI (APIs: https://www.ncbi.nlm.nih.gov/home/develop/api.shtml), Uniprot (http://www.uniprot.org/help/programmatic_access#retrieving_individual_entries), Ensembl (BioMart) (API: http://rest.ensembl.org/)

We should generate graphics

Now Dotplot generates ASCII plots. We should enable creating more sophisticated graphics. Because it looks nicer, that's why.

Reading multiple sequences from one fasta file

We should detect when there is more than one sequence in a fasta file, and ask the user which one he would like to use. Allow to add sequence name as a program parameter.

Add help/tutorial window to GUI

It would be great to have a built-in window with tutorial or help describing how to:

interpret the results
use the GUI

"Undo" option

That's self explanatory. It would be nice to offer to the user an option to undo/redo previous actions. We could add two entries in menu and create a "state" object which would hold all the current configuration and have a list of those states (or even better, a queue of fixed length).

Show messages in status bar after file selection and plotting

We could show user the last action they performed, changing the text in statusbar to log like "Sequence file selected" or "Plot generation completed". Currently it shows only "Welcome".

Add options panel where a user can set parameters for plotter like "window size" or "stringency"

We can place it beside the sequences panel; it can be created in a separate file and imported to GUI as it will be extensively used later (but it's not obligatory to start in a new file). We can start with windows size control only - a spinner or text field will be enough for the beginning.

Fetch sequence by gene

We could add more data sources and enable user to download sequences be gene name (HGNC and Ensembl). We will retrieve the canonical sequence for given gene.

Loading non-FASTA plain text files

As of right now our program only accepts text in the FASTA format. We'd like to be able to load any plain text files as long as it meets certain criteria (i.e is a valid sequence). In such situation we also need to provide a way for users to input basic information about the sequence (e.g name of the sequence).

Graphical and command-line interface for sequence retrieval

Right now we have implemented functions allowing us to fetch sequences directly from online databases (#23). We should expose some interface so users could use that easily. Two things to be done here:

add several options to argument parser
create GUI widgets (maybe dedicated window?) with adequate options