biocore / empress Goto Github PK
View Code? Open in Web Editor NEWA fast and scalable phylogenetic tree viewer for microbiome data analysis
License: BSD 3-Clause "New" or "Revised" License
A fast and scalable phylogenetic tree viewer for microbiome data analysis
License: BSD 3-Clause "New" or "Revised" License
Need to have tutorials with documentation in order to run the program on someone else's machine.
Would be great to have continuous variables plotted for edges / clades
Maybe worthwhile performing a logit transform for plotting probabilities
We'll want to port over the TreeNode object from scikit-bio and modify accordingly. This will be necessary for calculating the layout coordinates.
Relevant to #8
Need a simple model that
skbio.TreeNode
object.pd.DataFrame
and an edge metadata pd.DataFrame
See the coords method in Gneiss. It may make more sense to output the parent node rather than the children, since every node will have a parent (except for the root node).
kill -9 $(lsof -t -i:<portnumber>)
will force a port to close
List all of the nodes for inspection (rather than having the user type everything in)
Take a look at ili for an example of how to do this
Depends on #7
Need to build a simple renderer that
For starters, its worthwhile checking out these tools
Worthwhile to check out JSFiddle to prototype.
An example of d3 can be found here.
Not clear what size
and width
are
https://github.com/biocore/empress/blob/code_review/empress/tree.py#L186
The current mouse wheel zooming and mouse drag moving operate on an absolute amount of distance, which isn't preferrable when the scaling is too high or too low. Would rather make this distance relative to the current scaling factor.
There are two metadata files, namely
There will need to be documentation on this
TODO: need to change edge metadata to hold color as "#RRGGBB" hex string and the enable webgl to parse it
Raised in #22
The branch length preprocessing will need to be done in the model (i.e there are some branches that don't have lengths).
Issue identified in #22
Seg fault occured when handling a tree of around 800k tips plus metadata
But if the consensus is to remove metadata table completely, then this issue is not relevant.
Clades cannot be uncollapsed
merge_metada (not a function but part of the init)
center_tree(self)
select_edge_category(self)
select_node_category(self)
select_category(self, attributes, is_visible_col)
update_edge_category(self, attribute, category)
collapse_clades(self, sliderScale)
cache_ntips(self)
from_tree(cls, tree, use_lengths)
coords(self, height, width)
rescale(self, width, height)
init_webpage.js
function extractEdgeInfo()
function normalizeTree()
drawing_loop.js
function loop()
function draw()
callbacks.js
function initCallbacks()
function mouseDown(event)
function mouseUp(event)
function mouseMove(event)
function mouseWheel(event)
function selectHighlight()
ModelHandler(RequestHandler)
EdgeHandler(RequestHandler)
TriangleHandler(RequestHandler)
ZoomHandler(RequestHandler)
HighlightHandler(RequestHandler)
BenchmarkHandler(RequestHandler)
CollapseHandler(RequestHandler)
CollapseEdgeHandler(RequestHandler)
Right now screen resolution is hard coded in init function
To create pandas data frame, first a dictionary is created by iterating through the tree in order to extract tree coordinates and other information. Then pandas is created from that dictionary. On very large tree, pandas takes a while to convert dictionary to data frame and also uses a lot of memory to create the data frame which causes a memory overflow.
It would be great if we could highlight nodes based metadata categories similar to what is currently being done with qiime feature-table filter-samples
CC @ebolyen
Will want to have all of the code in place unittested and reviewed. We will be following scikit-bio's guidelines for developing code.
See Contributing.md
Tests that need to be made
ToyModel.py
view.py
Maybe because the keycode of "Shift" is different in Chrome and in Firefox.
We'll want this in order to facilitate communication between the model / view / controller.
For starters, let's just have simple get / post commands made.
Eventually, we'll want to have the following enabled.
We'll want to have this protocol prototyped in
This python tutorial could be useful.
This node.js tutorial could be useful.
There are a few magical numbers in init_webgl.js
that could use some more documentation
In the webGL initialization, there needs to be type checking in the properties
Issue identified in #22 .
I got this error when trying to test out the code
The fix isn't too bad. The work-around is to convert all metadata index labels to strings i.e.
cols = pd.read_csv(metadata_file, nrows=1, sep='\t').columns.tolist()
metadata = pd.read_table(
metadata_file, dtype={cols[0]: object})
metadata = metadata.set_index(cols[0])
We need to make sure that this completely working.
In addition, we need to make sure that all of the imports are working properly (i.e. the following is unnecessary in controller.py
sys.path.append('../..')
We will want to expose some command-line interface. Namely there should be only two files accepted, the tree and the feature metadata. I'd recommend looking at click
Node Metadata format
pd.DataFrame with the following columns
Node id
(i.e. name of the node). Should be able to query this in the tree directly using the find method.x
coordinate to be renderedy
coordinate to be renderedEdge Metadata format
pd.DataFrame with the following columns
Node id
(i.e. name of the node). Should be able to query this in the tree directly using the find method.x
coordinate to be renderedy
coordinate to be renderedParent id
(i.e. the name of the parent node). If the node is root, the parent id is None
.px
parent x coordinate to be rendered (is None
if there is no parent).py
parent y coordinate to be rendered (is None
if there is no parent).TODO: rationalize why px
and py
must be stored rather than looked up.
The clade coloring is offset a bit
@kwcantrell could you provide screenshot?
For the most part, a functional programming approach has been used. Global data and functions that then act on that data.
It would be great if there is an online working example of this.
Prior to an official release, we need to have a tutorial on how to use Empress
There are some unnecessary compute going on with the webGL rendering that was noticed in #22, namely with the use of requestAnimationFrame(loop);
As noted by @ElDeveloper
With this loop you will redraw a new frame every possible time, which means that even if the objects have not changed since the last rendering, you will redraw the same frame leading to your CPU burning a lot of unnecessary compute. This is fine for now, but it would be a good idea to start integrating logic in this code-base to allow for selectively rendering a new frame i.e. only when something in the view has changed. I suggest opening an issue about this.
Need to fix test_model
Depends on #8 . We'll want to abstract out the recursive calls, and replace with for loop.
In this way, we'll avoid potential issues with running into stack errors when we scale this up to trees with 100,000 nodes.
Javascript gets memory overflow error when passing large amount of metadata (i.e. tempilton metadata)
Empress needs to be run against a huge tree (> 1 million tips)
Need to create a simple view model that
pd.DataFrame
pd.DataFrame
Maybe worthwhile to think about a good way to automatically detect open ports similar to how it is done in Jupyter notebook and Tensorboard.
And it would be nice if ports can be specified from the command line interface
Will need to brainstorm a ranking function that will prioritize which nodes will be displayed.
One such function is to rank the nodes based on number of tips.
But some exploratory analysis will be necessary to do this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.