Giter Site home page Giter Site logo

4lang's Introduction

4lang

This repository provides

  • the 4lang concept dictionary, which contains manually written concept definitions
  • the text_to_4lang module, which creates concept graph representations from running text
  • the dict_to_4lang module, which builds more of these definitions from human-readable dictionaries

Dependencies

pymachine

Our tools require an installation of the pymachine implementation of Eilenberg-machines, just clone it to your machine and run setup.py:

git clone https://github.com/kornai/pymachine.git
cd pymachine
python setup.py install

hunmorph

For lemmatization, 4lang uses the hunmorph tool, on most UNIX-based systems you can use these pre-compiled executables and models (just extract them in your 4lang directory). If they don't work on your system, you may have to download and recompile hunmorph and/or the model it uses following the instructions here. This process is quite error-prone, but please reach out to us and we'll be happy to help you!

NOTE: All remaining dependencies are required only for building 4lang graphs, so in case you only want to use the graphs we provide (e.g. for the machine similarity component of our Semeval STS system), you can skip the rest of this section and continue to download pre-compiled graphs.

Stanford Parser, CoreNLP, jython

For parsing dictionary definitions, 4lang requires the Stanford Dependency Parser. Additionally, text_to_4lang.py requires the Stanford CoreNLP toolkit for parsing and coreference resolution, while the dict_to_4lang tool requires jython for customized parsing via the Stanford Parser API. Both tools require a copy of the RNN-based parser model for English, which is distributed alongside the Stanford Parser.

After downloading and installing these tools, all you need to do is edit the stanford and corenlp sections of the default configuration file conf/default.cfg so that the relevant fields point to your installations of each tool and your copy of the englishRNN.ser.gz model (more on config files below).

Downloading pre-compiled graphs

We provide serialized machine graphs built from 4lang definitions as well as from the English Wiktionary (using the dict_to_4lang module). Unpacking this archive in your 4lang directory will place them in the data/machines directory, which is the default location for compiled machine graphs.

Usage

Semeval STS

To use 4lang from our Semeval STS system you just need to edit the 4langpath and hunmorph_path attributes in your semeval config file so that they point to your 4lang directory and the downloaded hunmorph binaries, respectively.

Dict_to_4lang and Text_to_4lang

To run each module on small test datasets, simply run

python src/dict_to_4lang.py
python src/text_to_4lang.py

Both tools can be configured by editing a copy of conf/default.cfg and running

python src/dict_to_4lang.py MY_CONFIG_FILE

to build 4lang-style definitions from a monolingual dictionary such as Wiktionary or Longman

cat INPUT_FILE | python src/text_to_4lang.py MY_CONFIG_FILE

to create concept graphs from running English text

The config file

Contact

This repository is maintained by Gábor Recski. Questions, suggestions, bug reports, etc. are very welcome and can be sent by email to recski at mokk bme hu.

Publications

If you use the 4lang module, please cite:

@unpublished{Kornai:2015a,
author = "Andr\'as Kornai and Judit \'Acs and M\'arton Makrai and D\'avid Nemeskey and Katalin Pajkossy and G\'abor Recski",
title = "Competence in Lexical Semantics",
year = 2015,
note = "{T}o appear in Proc. *SEM-2015"
}

4lang's People

Contributors

makrai avatar recski avatar bolevacz avatar gaebor avatar kornai avatar juditacs avatar pajkossy avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.