Giter Site home page Giter Site logo

hongtaicao / class-slda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from blei-lab/class-slda

0.0 1.0 0.0 608 KB

Implements supervised topic models with a categorical response.

License: GNU General Public License v2.0

Makefile 0.45% C++ 92.03% C 7.52%

class-slda's Introduction

Supervised latent Dirichlet allocation for classification

(C) Copyright 2009, Chong Wang, David Blei and Li Fei-Fei. Written by Chong Wang, [email protected], part of code is from lda-c.

This is a C++ implementation of supervised latent Dirichlet allocation (sLDA) for classification. Note that the code here is slightly different from what was described in [2] in order to speed up. Note that this is only the sLDA. The annotation part is not yet posted.

Sample data

A preprocessed 8-class image dataset [2] from Labelme. Download ./data.tgz.

UIUC Sports annotation files: annotations and meta information. The source image files can be found here. (Note: there might be some discrepancies and it is unclear why...)

References

[1] Chong Wang, David M. Blei and Li Fei-Fei. Simultaneous image classification and annotation. In CVPR, 2009. [PDF]

[2] David M. Blei and Jon McAuliffe. Supervised topic models. In NIPS, 2007. [PDF]


README

Note that this code requires the Gnu Scientific Library, http://www.gnu.org/software/gsl/


TABLE OF CONTENTS

A. COMPILING

B. ESTIMATION

C. INFERENCE


A. COMPILING

Type "make" in a shell. Make sure the GSL is installed.


B. ESTIMATION

Estimate the model by executing:

 slda [est] [data] [label] [settings] [alpha] [k] [seeded/random/model_path] [directory]

The saved models are in two files:

 <iteration>.model is the model saved in the binary format, which is easy and
 fast to use for inference.

 <iteration>.model.txt is the model saved in the text format, which is
 convenient for printing topics or analysis using python.

The variational posterior Dirichlets are in:

 <iteration>.gamma

Data format

(1) [data] is a file where each line is of the form:

 [M] [term_1]:[count] [term_2]:[count] ...  [term_N]:[count]

where [M] is the number of unique terms in the document, and the [count] associated with each term is how many times that term appeared in the document.

(2) [label] is a file where each line is the corresponding label for [data]. The labels must be 0, 1, ..., C-1, if we have C classes.


C. INFERENCE

To perform inference on a different set of data (in the same format as for estimation), execute:

 slda [inf] [data] [label] [settings] [model] [directory]

where [model] is the binary file from the estimation.

The predictive labels are in:

 inf-labels.dat

The variational posterior Dirichlets are in:

 inf-gamma.dat

class-slda's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.