Giter Site home page Giter Site logo

kn-bibs / dotplot Goto Github PK

View Code? Open in Web Editor NEW
13.0 7.0 2.0 215 KB

Simple visualisation tool for sequences' similarity in bioinformatics

License: GNU Lesser General Public License v3.0

Python 99.93% Shell 0.07%
dotplot visualisation protein-sequence bioinformatics gene-similarity

dotplot's Introduction

Dotplot

Build Status Code Climate Coverage Status

Idea

Dotplot is a plot used mainly in biology for graphical visualisations of sequences' similarity. Read more on Wikipedia.

Why to create a new package?

There are many programs that attempt to create dotplots already. Unfortunately most of these programs was created long time ago and written in old versions of Java. This Python3 package will allow new generations of bioinformaticians to generate dotplots much easier.

Installation & usage

Instalation with pip

The easiest way to install this package with all dependencis is to use pip:

pip install dotplot

Manual installation

git clone https://github.com/kn-bibs/dotplot

To use graphical user interface, you will need to have pyqt5 installed, e.g. with:

sudo apt-get install python3-pyqt5

To use matplotlib for drawing, you need to have it installed, e.g. with:

sudo pip3 install matplotlib

Note: If you have chosen manual installation, use python3 dotplot command to run the program (while in the dotplot directory) instead of sole dotplot.

Basic usage

dotplot --fasta 1.fa 2.fa

To use graphical user interface, type:

dotplot --fasta 1.fa 2.fa --gui

You can also fetch sequences from various sources (at once):

dotplot --gui --ncbi NP_001009852 --uniprot P03086

Advanced options

You can set window size to be used in plot creation:

dotplot --fasta 1.fa 2.fa --gui --window_size 2

Furthermore, you can combine it with stringency:

dotplot --fasta 1.fa 2.fa --gui --window_size 2 --stringency 2

And you can use a similarity matrix to compare aminoacids:

dotplot --fasta 1.fa 2.fa --gui --window_size 2 --stringency 2 --matrix PAM120

Getting help

To access list of available options run command above with added option -h.

What will it do?

In the future our application will be able to read a wide range of input formats, and users will be able to parametrize alignment process and output format to their liking.

Dependencies & development

We are writing in Python3 and strict on code styling, with pep8 and pylint validation. We require all code merging to master to have at least 7,5 pylint score. To check this, at first install pytlint with pip3 and then, run the following command: python3 -m pylint dotplot.py, where in place of dotplot.py use any name of the module to be tested.

dotplot's People

Contributors

agreal1118 avatar bartma11 avatar behoston avatar chmielowyzboj avatar kinga322 avatar krassowski avatar maciosz avatar magickris93 avatar maryniak95 avatar pasliwa avatar pjanek avatar rlatawiec avatar sienkie avatar szymek2137 avatar vaira123 avatar xidron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

sienkie jackywu

dotplot's Issues

Graphical and command-line interface for sequence retrieval

Right now we have implemented functions allowing us to fetch sequences directly from online databases (#23). We should expose some interface so users could use that easily. Two things to be done here:

  • add several options to argument parser
  • create GUI widgets (maybe dedicated window?) with adequate options

Let user decide if sequence should be displayed

Currently we display sequences as labels on sides of plot if the sequence is shorter than 100 (and only if chosen drawer is matplotlib).

We could add a command line argument drawer.show_sequences which would allow users to overwrite this behavior and specify if sequences should be shown.

Also adding sequences to other drawers would be a nice-to-have feature; the problem will be if window_sizes != 1 will be chosen - then we should refrain from adding sequences in unicode and ascii mode (for simplicity and because those modes are not of the foremost priority).

Add basic description to built-in help

It would be nice to have some more general information visible after typing:
python3 dotplot.py -h
Only small changes are needed: to start with one can add description parameter to ArgumentParser copying and modifying some texts from README file.

Add exemplar sequence identifiers to chooser

It would be great if we could display an example or two of sequence identifier (for each online source) in a window where user can choose a sequence to download. It might be interactive (so shown only after user's choice on given radio button) and implemented by changing text in some new QLabel (e.g. below the radio buttons).

"Undo" option

That's self explanatory. It would be nice to offer to the user an option to undo/redo previous actions. We could add two entries in menu and create a "state" object which would hold all the current configuration and have a list of those states (or even better, a queue of fixed length).

Add axes labels

At the moment nothing indicates which sequence is on which axis. We should use at least a basic label with the sequence's name.

Sequence label in matplotilb does not work well with window_size option

When plotting:

./dotplot.py --fasta 1.fa 2.fa --gui

and

./dotplot.py --fasta 1.fa 2.fa --window_size 10 --gui

in both cases we really consider all residues/nucleotides etc, so full sequence should be shown in both cases. In the second case, the sequence is trimmed to the dimensions of plotted matrix, which shouldn't be the case.

We should generate graphics

Now Dotplot generates ASCII plots. We should enable creating more sophisticated graphics. Because it looks nicer, that's why.

Loading non-FASTA plain text files

As of right now our program only accepts text in the FASTA format. We'd like to be able to load any plain text files as long as it meets certain criteria (i.e is a valid sequence). In such situation we also need to provide a way for users to input basic information about the sequence (e.g name of the sequence).

Fetch sequence by gene

We could add more data sources and enable user to download sequences be gene name (HGNC and Ensembl). We will retrieve the canonical sequence for given gene.

Use block elements from UTF-8 charset to display different shades of dotplot

Since implementation of window_size we are able to generate "gradients" to show partial matches of fragments of sequences in "windows" of specified size. You can see the effect running:

./dotplot.py --fasta 1.fa 2.fa --gui --window_size 2

It will be good to have this functionality available also in UTF8 drawer. We can use characters described on Wikipedia: Block Elements page, like: ▓, ▒, ░ to show different percentages (right now we have █ if 1, else [space]; we want to have ranges defined for different fractions of 1).

Add appropriate shebang to main file

Right now we cannot invoke ./dotplot.py properly because we don't have a shebang pointing to the python3 location.

This is an easy task: one should find out the most generic (cross-platform compatible) shebang ("google is your friend") and place it at the very beginning of the dotplot.py file.

Later it would be great if we can test it both with Linux and MacOS.

We need to have a licence

Today we don't have any licence. It should be there to encourage contribution from outside programmers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.