Giter Site home page Giter Site logo

usif's Introduction

uSIF

This is an implementation of unsupervised smoothed inverse frequency (uSIF), a simple but effective way to create sentence embeddings without any labelled data (Best Paper, Repl4NLP @ ACL 2018). See the paper for more details.

*01/11/18 Code now works for Python3 instead of Python2.

Setup

  1. Unzip the pre-trained ParaNMT word vectors (thanks to John Wieting for providing this).
  2. Install the python packages in requirements.txt.
  3. Initialize a uSIF embedding model with usif.py. Call get_paranmt_usif to get the model that uses the ParaNMT vectors and call test_STS to see if you get the expected results. Once you know it's working, feel free to try it with other word vectors.

Embedding Individual Sentences

If you don't have a sizable list of related sentences to embed, then there is not much point to doing piecewise common component removal, in which case you can set m = 0 when initializing uSIF. Even for STS tasks, setting m = 0 only decreases performance by 1 - 4%.

Reference

If you use this code, please cite

@article{ethayarajh2018unsupervised,
  title={Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline},
  author={Ethayarajh, Kawin},
  journal={ACL 2018},
  pages={91},
  year={2018}
}

usif's People

Contributors

kawine avatar soaxelbrooke avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.