Giter Site home page Giter Site logo

biocore / empress Goto Github PK

View Code? Open in Web Editor NEW
45.0 45.0 31.0 46.91 MB

A fast and scalable phylogenetic tree viewer for microbiome data analysis

License: BSD 3-Clause "New" or "Revised" License

JavaScript 76.85% HTML 3.84% CSS 1.16% Makefile 0.28% Python 17.77% TeX 0.10%
bioinformatics biplot microbiome ordination phylogeny qiime qiime2 tree-plot visualization

empress's People

Contributors

96radhikajadhav avatar ahdilmore avatar antgonza avatar chriskeefe avatar eldeveloper avatar ericap258 avatar esayyari avatar fedarko avatar gibsramen avatar gwarmstrong avatar htoo97 avatar kwcantrell avatar mestaki avatar mortonjt avatar sjanssen2 avatar wasade avatar yimengyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

empress's Issues

Gradient coloring of edges / clades

Would be great to have continuous variables plotted for edges / clades

Maybe worthwhile performing a logit transform for plotting probabilities

Toy Model

Need a simple model that

  1. Calculates the coordinates of the nodes given an skbio.TreeNode object.
  2. Calculates all of the parent/child relationships
  3. Assigns metadata (i.e. edge color and node size)
  4. Outputs a node metadata pd.DataFrame and an edge metadata pd.DataFrame

See the coords method in Gneiss. It may make more sense to output the parent node rather than the children, since every node will have a parent (except for the root node).

Close ports on shutdown

Need a way to shutdown ports when the program is closed.

kill -9 $(lsof -t -i:<portnumber>) will force a port to close

Alpha release TODO

  1. Merge in remaining pull requests (particularly with tests)
  2. Documentation
  3. Qiime2 compability
  4. Pypi / Conda / Git release

Visualization rendering exploration

Depends on #7

Need to build a simple renderer that

  1. Plots lines with attributes such as width, color and transparency
  2. Plots nodes with attributes such as size, color and transparency
  3. Plots polygons, (such as triangles) with specified coordinates

For starters, its worthwhile checking out these tools

  • D3
  • Cytoscape
  • Vega (this is extremely exploratory)
    But other packages are welcome.

Worthwhile to check out JSFiddle to prototype.
An example of d3 can be found here.

relative zoom and move

The current mouse wheel zooming and mouse drag moving operate on an absolute amount of distance, which isn't preferrable when the scaling is too high or too low. Would rather make this distance relative to the current scaling factor.

Metadata docs

There are two metadata files, namely

  • internal_metadata
  • leaf_metadata

There will need to be documentation on this

Metadata color handling

TODO: need to change edge metadata to hold color as "#RRGGBB" hex string and the enable webgl to parse it

Raised in #22

Branch length handling

The branch length preprocessing will need to be done in the model (i.e there are some branches that don't have lengths).

Issue identified in #22

Seg fault

Seg fault occured when handling a tree of around 800k tips plus metadata

Tests to make

Model

  • merge_metada (not a function but part of the init)
  • center_tree(self)
  • select_edge_category(self)
  • select_node_category(self)
  • select_category(self, attributes, is_visible_col)
  • update_edge_category(self, attribute, category)
  • collapse_clades(self, sliderScale)

Tree

  • cache_ntips(self)
  • from_tree(cls, tree, use_lengths)
  • coords(self, height, width)
  • rescale(self, width, height)

View

init_webpage.js

  • function extractEdgeInfo()
  • function normalizeTree()

drawing_loop.js

  • function loop()
  • function draw()

callbacks.js

  • function initCallbacks()
  • function mouseDown(event)
  • function mouseUp(event)
  • function mouseMove(event)
  • function mouseWheel(event)
  • function selectHighlight()

Controller

  • ModelHandler(RequestHandler)
  • EdgeHandler(RequestHandler)
  • TriangleHandler(RequestHandler)
  • ZoomHandler(RequestHandler)
  • HighlightHandler(RequestHandler)
  • BenchmarkHandler(RequestHandler)
  • CollapseHandler(RequestHandler)
  • CollapseEdgeHandler(RequestHandler)

Code Review - Creating pandas dataframe

To create pandas data frame, first a dictionary is created by iterating through the tree in order to extract tree coordinates and other information. Then pandas is created from that dictionary. On very large tree, pandas takes a while to convert dictionary to data frame and also uses a lot of memory to create the data frame which causes a memory overflow.

Metadata selection

It would be great if we could highlight nodes based metadata categories similar to what is currently being done with qiime feature-table filter-samples

CC @ebolyen

Testing framework

Will want to have all of the code in place unittested and reviewed. We will be following scikit-bio's guidelines for developing code.

See Contributing.md

Tests that need to be made

ToyModel.py

view.py

  • plot : will want to add in tests for matplotlib (may want to look at test_heatmap.py)

Toy Restful interface

We'll want this in order to facilitate communication between the model / view / controller.
For starters, let's just have simple get / post commands made.

Eventually, we'll want to have the following enabled.

  • Post node/edge metadata (model)
  • Get node/edge metadata (viewer)

We'll want to have this protocol prototyped in

  • Python Flask
  • Javascript node.js

This python tutorial could be useful.
This node.js tutorial could be useful.

Magic numbers

There are a few magical numbers in init_webgl.js that could use some more documentation

Type checking in webGL

In the webGL initialization, there needs to be type checking in the properties

Issue identified in #22 .

Robust metadata specification

I got this error when trying to test out the code

image

The fix isn't too bad. The work-around is to convert all metadata index labels to strings i.e.

    cols = pd.read_csv(metadata_file, nrows=1, sep='\t').columns.tolist()

    metadata = pd.read_table(
        metadata_file, dtype={cols[0]: object})
    metadata = metadata.set_index(cols[0])

A working setup.py and proper imports

We need to make sure that this completely working.

In addition, we need to make sure that all of the imports are working properly (i.e. the following is unnecessary in controller.py

sys.path.append('../..')

Clade collapsing idea

  • Triangles (shortest node, longest node, root)
  • Triangles (root, shortest node, longest node, overall angle)
  • Convex hull around branches in the tree (see scipy )

Node / Edge metadata format

Node Metadata format

pd.DataFrame with the following columns

  1. Node id (i.e. name of the node). Should be able to query this in the tree directly using the find method.
  2. x coordinate to be rendered
  3. y coordinate to be rendered
    4+. attributes to be plotted (i.e color, size, width, transparency, ...)

Edge Metadata format
pd.DataFrame with the following columns

  1. Node id (i.e. name of the node). Should be able to query this in the tree directly using the find method.
  2. x coordinate to be rendered
  3. y coordinate to be rendered
  4. Parent id (i.e. the name of the parent node). If the node is root, the parent id is None.
  5. px parent x coordinate to be rendered (is None if there is no parent).
  6. py parent y coordinate to be rendered (is None if there is no parent).
    7+. attributes to be plotted (i.e color, size, width, transparency, ...)

TODO: rationalize why px and py must be stored rather than looked up.

TODO before demo

  1. Merge in outstanding changes wrt the optimizations of label rendering -- maybe worthwhile to look at sklearn kdtrees
  2. Add example commands + data in README (create data folder)
  3. Test out installation instructions
  4. Add screenshot to README
  5. Make sure that the metadata handling works: #54
  6. Prepare 5-10 slides outlining motivation, things added, demo, and things to add, and open up to suggestions

Video tutorial

Prior to an official release, we need to have a tutorial on how to use Empress

Optimization tweaks in webGL

There are some unnecessary compute going on with the webGL rendering that was noticed in #22, namely with the use of requestAnimationFrame(loop);

As noted by @ElDeveloper

With this loop you will redraw a new frame every possible time, which means that even if the objects have not changed since the last rendering, you will redraw the same frame leading to your CPU burning a lot of unnecessary compute. This is fine for now, but it would be a good idea to start integrating logic in this code-base to allow for selectively rendering a new frame i.e. only when something in the view has changed. I suggest opening an issue about this.

Clean up model code

  1. split up the Dendrogram object and the Model object into separate files
  2. We need tests for the model

Scalable layout calculations

Depends on #8 . We'll want to abstract out the recursive calls, and replace with for loop.
In this way, we'll avoid potential issues with running into stack errors when we scale this up to trees with 100,000 nodes.

Toy viewer

Need to create a simple view model that

  1. Accepts edge metadata pd.DataFrame
  2. Accepts node metadata pd.DataFrame
  3. Plots the results onto a canvas using matplotlib

Automatic port detection

Maybe worthwhile to think about a good way to automatically detect open ports similar to how it is done in Jupyter notebook and Tensorboard.

And it would be nice if ports can be specified from the command line interface

  • Port specification from CLI
  • Automatic port detection

Node ranks

Will need to brainstorm a ranking function that will prioritize which nodes will be displayed.
One such function is to rank the nodes based on number of tips.

But some exploratory analysis will be necessary to do this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.