The ankura from byu-aml-lab

Tandem Anchors adding epsilon to single facet "tandem anchors"

Inside the tandem_anchors function, we are still calling hmean(Q[anchor, :] + epsilon, axis=0) for anchors that actually only contain one word. This ends up just adding epsilon to all the anchors that have one word. Although default epsilon is 1e-10, it has actually been shown to make a difference in the classification (in some cases making the accuracy go up). This is has been noticed when all the anchors are just one word (for instance, when we call tandem_anchors directly from the gs_anchors in TBUIE).

At some point, this should be addressed: either fixed or agreed that this is as desired. For now, I will just use a smaller epsilon in TBUIE.

Split off active and sampler into a separate repo

This repo will be a bit more coherent if the only thing it contains is vanilla anchor words related. The split off repo(s) will use the ankura import as a dependency, and pipeline functions which add feature columns to Q for supervised anchors, as well as contain sampling based supervised topic models and active learning.

Add topic evaluations

Given a word-topic matrix, we need a way to quantitatively evaluate the quality of those topics. Potential evaluations include:

accuracy on a classification using predicted topics as features
topic coherence as defined by Newman et al.

Add interactive operators

Add functions allowing users to modify a list of anchor words. These functions will not likely be automatically exported by ankura, but will be available through a sub-package. These will mostly be used for interactive exploration inside of ipython. Some of the potential operators include:

append word to existing anchor
remove word from existing anchor
merge two or more existing anchors
delete an existing anchor
create an anchor with a word
create an anchor using gram-schmidt

Many of these operators might be useful when composed. For example we might remove a word from an anchor, and create a new anchor from that same word.

Migrate to P3k

Its really time to get with the times and move on to Python 3. Seriously. Its time.

Can you please share a link to the live demo?

Python3 Slowness

Since switching to Python3, things have been a lot slower. Time to spend some time with the profiler to figure out why!

Rows after index 0 are incorrectly translated!

https://github.com/jefflund/ankura/blob/ea70b229095eb1ee49b22e6ffeac7f1bec0c7e29/ankura/anchor.py#L176

Simple web server

In the interest of eventually facilitating a web based user study of interactive anchors, it would be useful to be able to have a simple web server which could serve some useful end points. Some potential queries include:

retrieve a dataset
retrieve the topics resulting from a set of anchors
retrieve topic predictions for a document

For now, we will only worry about coming up with these end points. The easiest way to get started is probably with a micro-framework like Flask. Eventually, it would be best to move as much as possible client side, but for now, we'll do inference server side. We leave the work of actually coming up with the client views for later.

Store row-normalized Q on Dataset

Each call to ankura.topic.recover_topic copies Q and row_normalizes the copy. This work can be precomputed (so it doesn't get included as part interactive topic updates). Store this precomputed row-normalized Q on the Dataset object, and point out where Nozomu can add hooks to tweak individual rows of Q and row-normalized Q.

changing sparsity during import gives a warning

When starting up the server, I get the following warning:

/usr/lib64/python3.4/site-packages/scipy/sparse/compressed.py:698: SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)

Getting rid of LAPACK headers

We need to figure out why the LAPACK headers aren't getting found by the C compiler when they aren't in the same directory as the C code.

byu-aml-lab / ankura Goto Github PK

ankura's People

Stargazers

Watchers

Forkers

ankura's Issues

Tandem Anchors adding epsilon to single facet "tandem anchors"

Split off active and sampler into a separate repo

Add topic evaluations

Add interactive operators

Migrate to P3k

Can you please share a link to the live demo?

Python3 Slowness

Rows after index 0 are incorrectly translated!

Simple web server

Store row-normalized Q on Dataset

changing sparsity during import gives a warning

Getting rid of LAPACK headers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent