Giter Site home page Giter Site logo

distributed_skipgram_mixture's Introduction

Distributed Multisense Word Embedding

The Distributed Multisense Word Embedding(DMWE) tool is a parallelization of the Skip-Gram Mixture [1] algorithm on top of the DMTK parameter server. It provides an efficient "scaling to industry size" solution for multi sense word embedding.

For more details, please view our website http://www.dmtk.io

Download

$ git clone https://github.com/Microsoft/distributed_skipgram_mixture

Build

Prerequisite

DMWE is built on top of the DMTK parameter sever, therefore please download and build DMTK first (https://github.com/Microsoft/multiverso).

For Windows

Open windows\distributed_skipgram_mixture\distributed_skipgram_mixture.sln using Visual Studio 2013. Add the necessary include path (for example, the path for DMTK multiverso) and lib path. Then build the solution.

For Ubuntu (Tested on Ubuntu 12.04)

Download and build by running $ sh build.sh. Modify the include and lib path in Makefile. Then run $ make all -j4.

Run

For parameter settings, see scripts/parameters_settings.txt. For running it, see the example script scripts/run.py.

Reference

[1] Tian, F., Dai, H., Bian, J., Gao, B., Zhang, R., Chen, E., & Liu, T. Y. (2014). A probabilistic model for learning multi-prototype word embeddings. In Proceedings of COLING (pp. 151-160).

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

distributed_skipgram_mixture's People

Contributors

feiga avatar makrai avatar msftgits avatar projdler avatar ustctf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

distributed_skipgram_mixture's Issues

CMake Error

cmake .. && make && sudo make install
OpenMP found
/usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so
/usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
Test/unittests/CMakeLists.txt:3 (find_package)

CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
Unable to find the requested Boost libraries.

Unable to find the Boost header files. Please set BOOST_ROOT to the root
directory containing Boost or BOOST_INCLUDEDIR to the directory containing
Boost's headers.
Call Stack (most recent call first):
Test/unittests/CMakeLists.txt:3 (find_package)

CMake Error at Test/unittests/CMakeLists.txt:9 (MESSAGE):
message called with incorrect number of arguments

Boost_INCLUDE_DIR-NOTFOUND
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
/clouddisk/shaolin/DMTK/multiverso/Test/unittests/Boost_INCLUDE_DIR
used as include directory in directory /clouddisk/shaolin/DMTK/multiverso/Test/unittests

-- Configuring incomplete, errors occurred!
See also "/clouddisk/shaolin/DMTK/multiverso/build/CMakeFiles/CMakeOutput.log".

are there pre-trainned models?

can you post your pretrainned models? or are there any available?

it takes a while to train them so it would be nice. if storage is the issue, torrent is fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.