Giter Site home page Giter Site logo

hlda-cpp-1's Introduction

Hierarchical Latent Dirichlet Allocation (HLDA)

URL: http://code.google.com/p/hlda-cpp/
License: Apache 2.0

This is a C++ re-implementation of David Blei's Hierarchical Latent Dirichlet 
Allocation (HLDA) topic modeling software.
The model finds a hierarchy of topics, where the data determines the structure 
of the hierarchy. The depth of the hierarchy is fixed.

The HLDA model is described in two of David Blei's papers:
http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordanTenenbaum2003.pdf
http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009a.pdf

David Blei's implementation of HLDA in C:
http://www.cs.princeton.edu/~blei/downloads/hlda-c.tgz

Dependencies:
This code depends on the GSL GNU Scientific Library:
http://www.gnu.org/software/gsl/

Input:
The input is in the following format:
[number of unique words] [word id] : [count] ...
There is a sample test file in the testdata directory, along with a settings 
file for this test data.

Other corpora in the same format can be found here:
* the Associated Press corpus: 
http://www.cs.princeton.edu/~blei/lda-c/ap.tgz
* Journal of the ACM corpus: 
http://www.cs.princeton.edu/~blei/downloads/jacm.tgz 

Instructions to run the code:
* Compile the code by running make
* ./hlda input_corpus settings_file

input_corpus: the input file, in the format described under "Input".
settings_file: a settings file. An example settings file can be found in the 
testdata directory. 

HLDA settings file:

DEPTH the maximum depth of the hierarchy.
ETA represents the expected variance of the underlying topics.
GAM not used in this implementation.
GEM_MEAN a parameter of the GEM distribution. It shows the proportion 
of general words relative to specific words.
GEM_SCALE a parameter of the GEM distribution. It shows how strictly 
documents should follow the general versus specific word proportions.
SCALING_SHAPE scaling parameter for the G prior.
SCALING_SCALE scaling parameter for the G prior.
SAMPLE_ETA if the ETA parameter is sampled.
SAMPLE_GEM if the GEM parameters are sampled.

This work was done in the context of the RENDER project
(http://www.render-project.eu/).

hlda-cpp-1's People

Watchers

James Cloos avatar ClarkWang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.