Giter Site home page Giter Site logo

achtman-lab / grapetree Goto Github PK

View Code? Open in Web Editor NEW
76.0 8.0 23.0 28.5 MB

GrapeTree is a fully interactive, tree visualization program, which supports facile manipulations of both tree layout and metadata. Click the first link to launch: https://achtman-lab.github.io/GrapeTree/MSTree_holder.html

Home Page: https://genome.cshlp.org/content/28/9/1395

License: GNU General Public License v3.0

JavaScript 45.28% Python 9.56% C++ 2.70% HTML 37.66% CSS 4.73% Shell 0.04% Batchfile 0.03%
mlst tree visualization bigdata

grapetree's Introduction

GrapeTree

Build Status License: GPL v3 Docs Status

Launch a local version of GrapeTree!

GrapeTree is an integral part of EnteroBase and we advise that you use GrapeTree through EnteroBase for the best results. However, many people have asked for a stand-alone GrapeTree version that they could use offline or integrate into the other applications.

The stand-alone version emulates the EnteroBase version through a lightweight webserver running on your local computer. You will be interacting with the program as you would in EnteroBase; through a web browser. We recommend Google Chrome for best results.

For detailed help please see: http://enterobase.readthedocs.io/en/latest/grapetree/grapetree-about.html

For a formal description, please see the accepted manuscript in Genome Research: https://doi.org/10.1101/gr.232397.117

Installing and Running GrapeTree

There are number of different ways to interact with GrapeTree, for best results install via pip :

pip install grapetree
grapetree

We also have ready-made binaries for download here: https://github.com/achtman-lab/GrapeTree/releases

Running on Mac: Download GrapeTree_mac.zip

You will need to unzip GrapeTree_mac.zip (just double click). Inside there will be an app you can drag into your Applications folder. You may be warned about Security settings, if you right click on the GrapeTree app and then click "Open" it should be fine.

Running on Windows: Download GrapeTree_win.zip

Once downloaded, you will need to untzip GrapeTree_win.zip and then open the extracted folder and run GrapeTree_win.exe. When you run it the first time on windows you might get a prompt about security. On Windows 10, click the small text: "More info", and then the button "Run Anyway".

Running from Source code

GrapeTree requires Python 2.7 or Python 3.6 and some additional python modules (listed in requirements.txt). The easiest way to install these modules is with pip:

pip install -r requirements.txt
chmod +x binaries/

On Linux or MacOSX you need to make sure the binaries in binaries/ can be executed. To run GrapeTree;

  1. Navigate to the directory where you installed GrapeTree.
  2. Run it through python as below.
\GrapeTree>python grapetree.py
 * Running on http://127.0.0.1:8000/ (Press CTRL+C to quit)

The program will automatically open your web browser and you will see the GrapeTree Splash Screen. If at anytime you want to restart the page you can visit http://localhost:8000 in your web browser. To view a tree (newick or Nexus) or create a tree from an allele profile, just drag and drop the file into the browser window.

Configuration

Runtime behaviour can be configured in grapetree/config.py.

Developers may wish to look at the JavaScript documentation (JSDoc).

Tests

To run tests, run pytests in the top level directory.

pytest

Usage - Command line module for generating Trees

>grapetree -h
usage: MSTrees.py [-h] --profile FNAME [--method TREE] [--matrix MATRIX_TYPE]
                  [--recraft] [--missing HANDLER] [--wgMLST]
                  [--heuristic HEURISTIC] [--n_proc NUMBER_OF_PROCESSES]
                  [--check]

For details, see "https://github.com/achtman-lab/GrapeTree/blob/master/README.md".
In brief, GrapeTree generates a NEWICK tree to the default output (screen)
or a redirect output, e.g., a file.

optional arguments:
  -h, --help            show this help message and exit
  --profile FNAME, -p FNAME
                        [REQUIRED] An input filename of a file containing MLST or SNP character data, 
                        OR a fasta file containing aligned sequences.
  --method TREE, -m TREE
                        "MSTreeV2" [DEFAULT]
                        "MSTree"
                        "NJ": FastME V2 NJ tree
                        "RapidNJ": RapidNJ for very large datasets
                        "distance": p-distance matrix in PHYLIP format.
  --matrix MATRIX_TYPE, -x MATRIX_TYPE
                        "symmetric": [DEFAULT: MSTree and NJ]
                        "asymmetric": [DEFAULT: MSTreeV2].
  --recraft, -r         Triggers local branch recrafting. [DEFAULT: MSTreeV2].
  --missing HANDLER, -y HANDLER
                        ONLY FOR symmetric DISTANCE MATRIX.
                        0: [DEFAULT] ignore missing data in pairwise comparison.
                        1: Remove column with missing data.
                        2: treat data as an allele.
                        3: Absolute number of allelic differences.
  --heuristic HEURISTIC, -t HEURISTIC
                        Tiebreak heuristic used only in MSTree and MSTreeV2
                        "eBurst" [DEFAULT: MSTree]
                        "harmonic" [DEFAULT: MSTreeV2]
  --n_proc NUMBER_OF_PROCESSES, -n NUMBER_OF_PROCESSES
                        Number of CPU processes in parallel use. [DEFAULT]: 5.
  --check, -c           Only calculate the expected time/memory requirements.

NOTE:

Inputs

profile

The profile file is a tab-delimited text file.

Follow an example here: https://github.com/achtman-lab/GrapeTree/blob/master/examples/simulated_data.profile

#Strain	Gene_1	Gene_2	Gene_3	Gene_4	Gene_5	Gene_6	Gene_7	...
0	1	1	1	1	1	1	1	...
1	1	1	1	1	1	1	1	...
2	1	2	2	2	2	2	2	...
...

The first row is required and represents column labels. It has to start with a '#'. Collumn labels that start with a '#' are treated as comments and will not be used in downstream analysis. The first column needs to be unique identifiers for strains. Each of the remaining rows presents a different strain.

Use '-' or '0' to represent missing alleles.

Aligned FASTA

An aligned FASTA file contains multiple sequences of the same length in FASTA format. Many sequence alignment tools, e.g., MAFFT and MUSCLE, use FASTA as a default format for their outputs.

Find an example here: http://wwwabi.snv.jussieu.fr/public/Clustal2Dna/fastali.html

Note that GrapeTree supports only p-distance for the moment.

metadata

The metadata file is either a tab-delimited or a comma-delimited text file. This is only used for tree presentation in the standardalone version.

Follow an example here: https://github.com/achtman-lab/GrapeTree/blob/master/examples/simulated_data.metadata.txt

ID	Country	Year
0	China	1983
1	China	1984
...

The first row is required and describes the labels of the columns. If a column labeled with "ID" presents, it will be used to correlate metadata with profiles, otherwise the first column will be used.

outputs

tree

The tree is described in NEWICK format. https://en.wikipedia.org/wiki/Newick_format

distance matrix

Use the option '--method distance' to generate a distance matrix without calculating the tree. The matrix is presented in PHYLIP format. http://evolution.genetics.washington.edu/phylip/doc/distance.html

Command line examples

MSTree V2

python grapetree.py -p examples/simulated_data.profile -m MSTreeV2

NJ tree

python grapetree.py -p examples/simulated_data.profile -m NJ

distance matrix

python grapetree.py -p examples/simulated_data.profile -m distance

License

Copyright Warwick University This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

External programs

Detailed information for the standard NJ implemented in FastME V2: http://www.atgc-montpellier.fr/fastme/

Citation

EnteroMSTree - GrapeTree has been formally accepted by Genome Research. Please use the citation:

Z Zhou, NF Alikhan, MJ Sergeant, N Luhmann, C Vaz, AP Francisco, JA Carrico, M Achtman (2018) "GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens", Genome Res; doi: https://doi.org/10.1101/gr.232397.117

grapetree's People

Contributors

happykhan avatar kjolley avatar pertuyf avatar zheminzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grapetree's Issues

Memory error on edmonds for Windows machines.

Memory error on edmonds for Windows machine using fairly large allele profile.

rMLST.rep.txt

Traceback (most recent call last): File "MSTrees.py", line 340, in <module> tre = backend(**dict([p.split('=') for p in sys.argv[1:]])) File "MSTrees.py", line 330, in backend tre = eval('methods.' + params['method'])(names, profiles, **params) File "MSTrees.py", line 216, in MSTree tree = eval('methods._'+matrix_type)(wdist, **params) File "MSTrees.py", line 89, in _asymmetric mstree = Popen([params['edmonds_' + platform.system()]], stdin=PIPE, stdout=PIPE).communicate(input='\n'.join(['\t'.join([str(dd) for dd in d]) for d in dist.tolist()]))[0] MemoryError

Seems to run to completion in linux version

Enterobase buttons

The menu buttons under "Enterobase" are a bit confusing.
Suggestion:

  • remove "Highlight selected" and move this over to the Enterobase window
  • "Load selected" and "Filter selected" are kind of the same, keep only load
  • Synch -> Sync ( can be a checkbox?)

Then there are 4 buttons left:
"Load selected nodes"
"Import fields"
"Save tree"
"Sync" <- can be a checkbox

There should be some notification for the user that all these buttons trigger a change in the enterobase table based on selections in the GrapeTree.

Disabling buttons

If there is no tree loaded, pressing buttons like save tree, or centre graph, the browser throws a big ugly error and buttons dont do anything. Either users should not be allowed to press those buttons; or there should be message

Enterobase mst

Should block over 8K nodes and should explain how to simplify dataset.

How to Search for related strains. For cgmlst only.

Limit categories coloured

BLACK balls show up with too many categories, there should just be an upper limit when you run out of colours. It's not meaningful showing 300 different groups on a tree. Also, the default column it loads from the metadata should NEVER be the ID.

wish list --- Mark

  1. multi column legend
  2. allow assignment of a new column to UDF after created.
  3. Multi column sorting in metadata grid
  4. SNPs -> chose presentation between GrapeTree and phylogram.

"ID" requirement for metadata is too specific

When loading metadata, having the ID column is very specific, it shoudl be case insensitive. ID only at the moment, but it should allow id. Rather than being so strict, If the ID column specification fails, the program should just try to use the first column. If That also fails, then give a message that ID cannot be found (at the moment it just says my file is no good but doesnt give any hints as to why).

MSTREE.py does not handle special character for Newick output

If one of the taxa in the profile has a ' : ' it in, this will break the final newick file, as this is a special character. As such, mstree.py needs to sanitize the input labels for " : ( ) , "

Also the newick file loaded doesnt seem to ever stop loading and recognise there is something wrong with the file.

figure legend overlaps menu

the initial placement of the figure legend overlaps the menu on the left hand side

  • laptop screen
  • firefox

suggestion: figure legend initially placed on right side of the window looks nicer

GUI changing

Outputs -> Export

Label text right after Show labels and possibly remove title.

Relative scaling -> Kurtosis

add metadata -> add column

search in label -> highlight label

Legend issues

  • Separate strains duplicate legend and then it doesnt go away.
  • svg output font is weird serif font.

Help

  • Embedded help, especially for key combinations to do things
  • Help link for metadata & Main window
  • Help links on contextual menus.

Alter. collapsing branch

There should be different metrics for collapsing branch. Either on an absolute cutoff (what it does at the moment) Or is should measure the branch distance from tip to root and do so cumulatively, This will allow people to just prune the tips without losing the deeper structure.

Parameters working with profiles

When I have loaded a profile, and change the parameters, I pressed refresh and nothing happened. The only way to change these parameters and affect the tree is at load time apparently. Either these options should only be set at load time, or the refresh button should recalculate the tree with those new settings.

branch scaling issue

the branch scaling slider/ option causes two issues in automatic rendering mode:

  1. the scaling is only triggered after I click on something else, it is not scaling with the slider
  2. after adjusting branch lengths, the tree is frozen and nodes cannot be moved (it can be fixed when switching to static rendering and back again)

Better exception handling for BASA/ maketree step:

Better exception handling for BASA/ maketree step: When this fails I get a singular error message back with no consideration for the exception thrown at the back end. We may want to have more informative message cause its very hard to troubleshoot (even for me).

Issues with MacOSX binaries

All profiling methods fail from running in App (issue with launching subprocess)?

MSTree fails when running with

python MSTree.py profile=examples/simulated_data.profile method=MSTree edge_weight=eBurst

TypeError: apply_along_axis() got unexpected keyword argument 'minlength'

Button to bring back splash screen

The splash screen is very informative, but once I've loaded a file for the first time and want to load another one, i cant find it. Either there should be a button ("Help") that brings it back, or it should up when an exception/error is thrown.

right click menu

select all
unselect all
show/hide metadata
show/hide hypothetical nodes

Method Dropdown still shown

"tree construction is not available" message, i still see the dropdown for the method select (NJ, BASA, MST), should be hidden

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.