Documentation / Tutorials

Need to have tutorials with documentation in order to run the program on someone else's machine.

Allow user to specify custom color for their hightlight file

Gradient coloring of edges / clades

Would be great to have continuous variables plotted for edges / clades

Maybe worthwhile performing a logit transform for plotting probabilities

Porting skbio TreeNode object

We'll want to port over the TreeNode object from scikit-bio and modify accordingly. This will be necessary for calculating the layout coordinates.

Relevant to #8

Toy Model

Need a simple model that

Calculates the coordinates of the nodes given an skbio.TreeNode object.
Calculates all of the parent/child relationships
Assigns metadata (i.e. edge color and node size)
Outputs a node metadata pd.DataFrame and an edge metadata pd.DataFrame

See the coords method in Gneiss. It may make more sense to output the parent node rather than the children, since every node will have a parent (except for the root node).

Need a way to shutdown ports when the program is closed.

kill -9 $(lsof -t -i:<portnumber>) will force a port to close

Alpha release TODO

Merge in remaining pull requests (particularly with tests)
Documentation
Qiime2 compability
Pypi / Conda / Git release

Pulldown menu / autofilling?

List all of the nodes for inspection (rather than having the user type everything in)

Take a look at ili for an example of how to do this

https://ili.embl.de/

Have model accept dataframes directly

Visualization rendering exploration

Depends on #7

Need to build a simple renderer that

Plots lines with attributes such as width, color and transparency
Plots nodes with attributes such as size, color and transparency
Plots polygons, (such as triangles) with specified coordinates

For starters, its worthwhile checking out these tools

D3
Cytoscape
Vega (this is extremely exploratory)
But other packages are welcome.

Worthwhile to check out JSFiddle to prototype.
An example of d3 can be found here.

Unknown attributes

Not clear what size and width are

https://github.com/biocore/empress/blob/code_review/empress/tree.py#L186

The current mouse wheel zooming and mouse drag moving operate on an absolute amount of distance, which isn't preferrable when the scaling is too high or too low. Would rather make this distance relative to the current scaling factor.

Metadata docs

There are two metadata files, namely

internal_metadata
leaf_metadata

There will need to be documentation on this

Metadata color handling

TODO: need to change edge metadata to hold color as "#RRGGBB" hex string and the enable webgl to parse it

Raised in #22

Branch length handling

The branch length preprocessing will need to be done in the model (i.e there are some branches that don't have lengths).

Issue identified in #22

Seg fault

Seg fault occured when handling a tree of around 800k tips plus metadata

metadata table font too big and serif

But if the consensus is to remove metadata table completely, then this issue is not relevant.

Uncollapsing clades

Clades cannot be uncollapsed

Tests to make

Model

merge_metada (not a function but part of the init)
center_tree(self)
select_edge_category(self)
select_node_category(self)
select_category(self, attributes, is_visible_col)
update_edge_category(self, attribute, category)
collapse_clades(self, sliderScale)

Tree

cache_ntips(self)
from_tree(cls, tree, use_lengths)
coords(self, height, width)
rescale(self, width, height)

View

init_webpage.js

function extractEdgeInfo()
function normalizeTree()

drawing_loop.js

function loop()
function draw()

callbacks.js

Controller

ModelHandler(RequestHandler)
EdgeHandler(RequestHandler)
TriangleHandler(RequestHandler)
ZoomHandler(RequestHandler)
HighlightHandler(RequestHandler)
BenchmarkHandler(RequestHandler)
CollapseHandler(RequestHandler)
CollapseEdgeHandler(RequestHandler)

Model needs some way of dynamically finding screen resolution

Right now screen resolution is hard coded in init function

Make WebGL resize the canvas when user changes size of browser window

Code Review - Creating pandas dataframe

To create pandas data frame, first a dictionary is created by iterating through the tree in order to extract tree coordinates and other information. Then pandas is created from that dictionary. On very large tree, pandas takes a while to convert dictionary to data frame and also uses a lot of memory to create the data frame which causes a memory overflow.

Metadata selection

It would be great if we could highlight nodes based metadata categories similar to what is currently being done with qiime feature-table filter-samples

CC @ebolyen

Testing framework

Will want to have all of the code in place unittested and reviewed. We will be following scikit-bio's guidelines for developing code.

See Contributing.md

Tests that need to be made

ToyModel.py

pep8 (also worthwhile look at autopep8)
Model unittests
- If we use the stack, make sure to unittest all of the methods
- Otherwise, take a look at preorder **
- cache_ntips
- from_tree
- coords
- rescale
- update_coordinate

view.py

plot : will want to add in tests for matplotlib (may want to look at test_heatmap.py)

rectangular selection does not work in Firefox

Maybe because the keycode of "Shift" is different in Chrome and in Firefox.

Toy Restful interface

We'll want this in order to facilitate communication between the model / view / controller.
For starters, let's just have simple get / post commands made.

Eventually, we'll want to have the following enabled.

Post node/edge metadata (model)
Get node/edge metadata (viewer)

We'll want to have this protocol prototyped in

Python Flask
Javascript node.js

This python tutorial could be useful.
This node.js tutorial could be useful.

Magic numbers

There are a few magical numbers in init_webgl.js that could use some more documentation

Type checking in webGL

In the webGL initialization, there needs to be type checking in the properties

Issue identified in #22 .

Robust metadata specification

I got this error when trying to test out the code

The fix isn't too bad. The work-around is to convert all metadata index labels to strings i.e.

    cols = pd.read_csv(metadata_file, nrows=1, sep='\t').columns.tolist()

    metadata = pd.read_table(
        metadata_file, dtype={cols[0]: object})
    metadata = metadata.set_index(cols[0])

A working setup.py and proper imports

We need to make sure that this completely working.

In addition, we need to make sure that all of the imports are working properly (i.e. the following is unnecessary in controller.py

sys.path.append('../..')

Need to make reading meta-data file more general.

Clade collapsing idea

Triangles (shortest node, longest node, root)
Triangles (root, shortest node, longest node, overall angle)
Convex hull around branches in the tree (see scipy )

Command Line Interface

We will want to expose some command-line interface. Namely there should be only two files accepted, the tree and the feature metadata. I'd recommend looking at click

http://click.pocoo.org/5/

Node / Edge metadata format

Node Metadata format

pd.DataFrame with the following columns

Node id (i.e. name of the node). Should be able to query this in the tree directly using the find method.
x coordinate to be rendered
y coordinate to be rendered
4+. attributes to be plotted (i.e color, size, width, transparency, ...)

Edge Metadata format
pd.DataFrame with the following columns

Node id (i.e. name of the node). Should be able to query this in the tree directly using the find method.
x coordinate to be rendered
y coordinate to be rendered
Parent id (i.e. the name of the parent node). If the node is root, the parent id is None.
px parent x coordinate to be rendered (is None if there is no parent).
py parent y coordinate to be rendered (is None if there is no parent).
7+. attributes to be plotted (i.e color, size, width, transparency, ...)

TODO: rationalize why px and py must be stored rather than looked up.

TODO before demo

Merge in outstanding changes wrt the optimizations of label rendering -- maybe worthwhile to look at sklearn kdtrees
Add example commands + data in README (create data folder)
Test out installation instructions
Add screenshot to README
Make sure that the metadata handling works: #54
Prepare 5-10 slides outlining motivation, things added, demo, and things to add, and open up to suggestions

Shift bug in rendering

The clade coloring is offset a bit

@kwcantrell could you provide screenshot?

Tree displayed incorrectly on open until metadata is highlighted

Code Review - javascript memory leak

For the most part, a functional programming approach has been used. Global data and functions that then act on that data.

Collapsed clades don't transfer to subtree

A working online example

It would be great if there is an online working example of this.

Video tutorial

Prior to an official release, we need to have a tutorial on how to use Empress

Optimization tweaks in webGL

There are some unnecessary compute going on with the webGL rendering that was noticed in #22, namely with the use of requestAnimationFrame(loop);

As noted by @ElDeveloper

With this loop you will redraw a new frame every possible time, which means that even if the objects have not changed since the last rendering, you will redraw the same frame leading to your CPU burning a lot of unnecessary compute. This is fine for now, but it would be a good idea to start integrating logic in this code-base to allow for selectively rendering a new frame i.e. only when something in the view has changed. I suggest opening an issue about this.

Test cases

Need to fix test_model

Clean up model code

split up the Dendrogram object and the Model object into separate files
We need tests for the model

Scalable layout calculations

Depends on #8 . We'll want to abstract out the recursive calls, and replace with for loop.
In this way, we'll avoid potential issues with running into stack errors when we scale this up to trees with 100,000 nodes.

Code Review - Metadata handling

Javascript gets memory overflow error when passing large amount of metadata (i.e. tempilton metadata)

Identify and document scalability benchmarks

Empress needs to be run against a huge tree (> 1 million tips)

Toy viewer

Need to create a simple view model that

Accepts edge metadata pd.DataFrame
Accepts node metadata pd.DataFrame
Plots the results onto a canvas using matplotlib

Automatic port detection

Maybe worthwhile to think about a good way to automatically detect open ports similar to how it is done in Jupyter notebook and Tensorboard.

And it would be nice if ports can be specified from the command line interface

Port specification from CLI
Automatic port detection

Node ranks

Will need to brainstorm a ranking function that will prioritize which nodes will be displayed.
One such function is to rank the nodes based on number of tips.

But some exploratory analysis will be necessary to do this.

biocore / empress Goto Github PK

empress's People

Contributors

Stargazers

Watchers

Forkers

empress's Issues

Need a way to shutdown ports when the program is closed.

Model

Tree

View

Controller

Recommend Projects

Recommend Topics

Recommend Org