clarkwang1214 / hlda-cpp-1 Goto Github PK
View Code? Open in Web Editor NEWHierarchical Latent Dirichlet Allocation
License: Apache License 2.0
Hierarchical Latent Dirichlet Allocation
License: Apache License 2.0
Hierarchical Latent Dirichlet Allocation (HLDA) URL: http://code.google.com/p/hlda-cpp/ License: Apache 2.0 This is a C++ re-implementation of David Blei's Hierarchical Latent Dirichlet Allocation (HLDA) topic modeling software. The model finds a hierarchy of topics, where the data determines the structure of the hierarchy. The depth of the hierarchy is fixed. The HLDA model is described in two of David Blei's papers: http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordanTenenbaum2003.pdf http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009a.pdf David Blei's implementation of HLDA in C: http://www.cs.princeton.edu/~blei/downloads/hlda-c.tgz Dependencies: This code depends on the GSL GNU Scientific Library: http://www.gnu.org/software/gsl/ Input: The input is in the following format: [number of unique words] [word id] : [count] ... There is a sample test file in the testdata directory, along with a settings file for this test data. Other corpora in the same format can be found here: * the Associated Press corpus: http://www.cs.princeton.edu/~blei/lda-c/ap.tgz * Journal of the ACM corpus: http://www.cs.princeton.edu/~blei/downloads/jacm.tgz Instructions to run the code: * Compile the code by running make * ./hlda input_corpus settings_file input_corpus: the input file, in the format described under "Input". settings_file: a settings file. An example settings file can be found in the testdata directory. HLDA settings file: DEPTH the maximum depth of the hierarchy. ETA represents the expected variance of the underlying topics. GAM not used in this implementation. GEM_MEAN a parameter of the GEM distribution. It shows the proportion of general words relative to specific words. GEM_SCALE a parameter of the GEM distribution. It shows how strictly documents should follow the general versus specific word proportions. SCALING_SHAPE scaling parameter for the G prior. SCALING_SCALE scaling parameter for the G prior. SAMPLE_ETA if the ETA parameter is sampled. SAMPLE_GEM if the GEM parameters are sampled. This work was done in the context of the RENDER project (http://www.render-project.eu/).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.