Giter Site home page Giter Site logo

ankura's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ankura's Issues

Tandem Anchors adding epsilon to single facet "tandem anchors"

Inside the tandem_anchors function, we are still calling hmean(Q[anchor, :] + epsilon, axis=0) for anchors that actually only contain one word. This ends up just adding epsilon to all the anchors that have one word. Although default epsilon is 1e-10, it has actually been shown to make a difference in the classification (in some cases making the accuracy go up). This is has been noticed when all the anchors are just one word (for instance, when we call tandem_anchors directly from the gs_anchors in TBUIE).

At some point, this should be addressed: either fixed or agreed that this is as desired. For now, I will just use a smaller epsilon in TBUIE.

Split off active and sampler into a separate repo

This repo will be a bit more coherent if the only thing it contains is vanilla anchor words related. The split off repo(s) will use the ankura import as a dependency, and pipeline functions which add feature columns to Q for supervised anchors, as well as contain sampling based supervised topic models and active learning.

Add topic evaluations

Given a word-topic matrix, we need a way to quantitatively evaluate the quality of those topics. Potential evaluations include:

  • accuracy on a classification using predicted topics as features
  • topic coherence as defined by Newman et al.

Add interactive operators

Add functions allowing users to modify a list of anchor words. These functions will not likely be automatically exported by ankura, but will be available through a sub-package. These will mostly be used for interactive exploration inside of ipython. Some of the potential operators include:

  • append word to existing anchor
  • remove word from existing anchor
  • merge two or more existing anchors
  • delete an existing anchor
  • create an anchor with a word
  • create an anchor using gram-schmidt

Many of these operators might be useful when composed. For example we might remove a word from an anchor, and create a new anchor from that same word.

Migrate to P3k

Its really time to get with the times and move on to Python 3. Seriously. Its time.

Python3 Slowness

Since switching to Python3, things have been a lot slower. Time to spend some time with the profiler to figure out why!

Simple web server

In the interest of eventually facilitating a web based user study of interactive anchors, it would be useful to be able to have a simple web server which could serve some useful end points. Some potential queries include:

  • retrieve a dataset
  • retrieve the topics resulting from a set of anchors
  • retrieve topic predictions for a document

For now, we will only worry about coming up with these end points. The easiest way to get started is probably with a micro-framework like Flask. Eventually, it would be best to move as much as possible client side, but for now, we'll do inference server side. We leave the work of actually coming up with the client views for later.

Store row-normalized Q on Dataset

Each call to ankura.topic.recover_topic copies Q and row_normalizes the copy. This work can be precomputed (so it doesn't get included as part interactive topic updates). Store this precomputed row-normalized Q on the Dataset object, and point out where Nozomu can add hooks to tweak individual rows of Q and row-normalized Q.

changing sparsity during import gives a warning

When starting up the server, I get the following warning:

/usr/lib64/python3.4/site-packages/scipy/sparse/compressed.py:698: SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)

Getting rid of LAPACK headers

We need to figure out why the LAPACK headers aren't getting found by the C compiler when they aren't in the same directory as the C code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.