Giter Site home page Giter Site logo

aterenin / partiallycollapsedlda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lejon/partiallycollapsedlda

0.0 3.0 0.0 16.3 MB

Implementations of various fast parallelized samplers for LDA, including Partially Collapsed LDA, Light LDA, Partially Collapsed Light LDA

Java 100.00%

partiallycollapsedlda's Introduction

Build Status PC-LDA

Repo for our Partially Collapsed ParallelLDA implementation described in the article arXiv:1506.03784:

   @unpublished{MagnussonJonsson2015,
      author = "Måns Magnusson, Leif Jonsson, Mattias Villani, David Broman",
      title = "Parallelizing LDA using Partially Collapsed Gibbs Sampling",
      note = "http://github.com/lejon/PartiallyCollapsedLDA",
      year = 2015}

The toolkit is Open Source Software, and is released under the Common Public License. You are welcome to use the code under the terms of the license for research or commercial purposes, however please acknowledge its use with a citation: Magnusson, Jonsson, Villani, Broman. "Parallelizing LDA using Partially Collapsed Gibbs Sampling."

The dataset (in the datasets folder) and the stopwords file (stopwords.txt, included in the repository) should be in the same folder as you run the sampler.

Example Run command:

java -cp PCPLDA-X.X.X.jar cc.mallet.topics.tui.ParallelLDA --run_cfg=src/main/resources/configuration/PLDAConfig.cfg
java -jar PCPLDA-X.X.X.jar --run_cfg=src/main/resources/configuration/PLDAConfig.cfg

(PCPLDA-X.X.X.jar is created in the 'target' folder by the 'mvn package' command)

For very large datasets you might need to add the -Xmx60g flag to Java

Please remember that this is a research prototype and the standard disclaimers apply. You will see printouts during unit tests, commented out code, old stuff not cleaned out yet etc.

But the basic sampler is tested and evaluated in a scientific manner and we have gone to great pains to ensure that it is correct. The sampler that is referred to in the article as "PC sampler" or "PC-LDA" corresponds to the class 'cc.mallet.topics.SpaliasUncollapsedParallelLDA' in the code. The variable selection parts are implemented in the 'cc.mallet.topics.NZVSSpaliasUncollapsedParallelLDA' class.

An example of a "main" class is cc.mallet.topics.tui.ParallelLDA

Installation

  1. Install Apache Maven and run:

mvn package

in bash.

Occationally some of the "probabilistic" tests fail due to random chance. This is ok in a statistical sense but not for a test suite so this should eventually be tuned. For now if the suite is re-run it should be ok.

Example run using binary (the release JAR)

java -cp PCPLDA-4.7.3.jar cc.mallet.topics.tui.ParallelLDA --run_cfg=src/main/resources/configuration/PLDAConfig.cfg

Public DOI

DOI

partiallycollapsedlda's People

Contributors

aterenin avatar lejon avatar mansmeg avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.