Giter Site home page Giter Site logo

tteofili / parse2vec Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 45.37 MB

NLP tool to enrich word embeddings with parse tree information and generate type, word and sentence embeddings

License: Apache License 2.0

Java 100.00%
nlp parsetree embeddings natural-language-processing machine-learning

parse2vec's Introduction

parse2vec

Tool for generating parse tree embeddings, parse tree enriched word embeddings and parse tree enriched sentence embeddings.

How it works

Given a set of sentences (one by line) in a text file, this tool:

  • learns word embeddings using word2vec
  • builds the parse tree of each sentence
  • using the parse tree structure it recursively averages word embeddings PoS Type-wise from all the sentences' parse tree
  • each PoS tag finally has an embedding
  • to enrich the word embedding with parse tree information, for each existing word:
    • recursively sums the type embeddings of the word ancestors (in the parse tree) and averages the result with its word embedding
  • to generate a sentence embedding enhanced with parse tree information:
    • recursively builds the sentence vector using parse tree enriched word embeddings using the algorithm from par2hier to build sentence vectors from hierarchical structures

Examples

Parse Tree Embeddings as visualized in TensorBoard

parse tree embeddings nearest neighbour sample results:

nearest(VB) = VP
nearest(JJR) = RBR
nearest(CONJP) = AUX
...

Parse Tree Enriched Word Embeddings as visualized in TensorBoard

parse tree enriched word embeddings sample results:

nearest(crowd) = multiple, man, ...
nearest(hierarchical) = relationship, soft-max ...
nearest(Sutskever) = Greg, Kai, ...
...

Parse Tree Enriched Seentence Embeddings as visualized in TensorBoard

parse tree enriched sentence embeddings sample results:

nearest(In order to capture in a quantitative way the nuance necessary to distinguish man from woman ...) = 
 - In parallel in the last few years language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection.
 - Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters we explore use of new neural network models that are adapted to ad-hoc IR tasks.

nearest(We introduce a new dataset with human judgments on pairs of words in sentential context and ...) =
 - The result can be used to enrich lexicons of under-resourced languages to identify ambiguities and to perform clustering and classification .
 - We consider the conditional probabilities p(c|w) and given a corpus Text the goal is to set the parameters θ of p(c|w;θ) so as to maximize the corpus probability .
...

parse2vec's People

Contributors

tteofili avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.