Giter Site home page Giter Site logo

xiliangsong / defvectors Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jgc128/defvectors

0.0 2.0 0.0 3.04 MB

A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from the Wikipedia articles (other texts may be also used). The extraction method implemented in this system is based on three similarity measures (cosine, gloss overlap, and Karaulov's measure) between texts and two nearest-neighbor algorithms (KNN and Mutual KNN). The tool is a cross-platform console application.

Makefile 0.68% Python 61.72% C++ 33.13% C 3.32% Batchfile 0.57% Shell 0.57%

defvectors's Introduction

Serelex

Name

Serelex - a tool for semantic relation extraction.

Synopsis

serelex [options]

Description

A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from the Wikipedia articles (other texts may be also used). The extraction method implemented in this system is based on three similarity measures (cosine, gloss overlap, and Karaulov's measure) between texts and two nearest-neighbor algorithms (KNN and Mutual KNN). The tool is a cross-platform console application.

Licensed under LGPLv3.

Options

-c file
Concepts file, default concepts.csv. A text file containing a set of input words (one word per line). The program will try to find semantic relations between these input words. For instance, if words 'crocodile, alligator, house, and building' were given as input the program will try to return pairs 'crocodile,alligator' and 'house,building' among all possible combinations.

-d file
Definitions file, default definitions.csv. A text file containing a set of definitions for words specified in the concepts.csv file. If this file contains definitions for more words that given in concepts.csv the program will skip the words which do not appear in the concepts.csv. The sample-data directory contains files with "definitions" derived from the introduction of Wikipedia articles. Other texts may be used as definitions (traditional dictionary glosses, WordNet, etc.) if provided in the same format.

-s file
Stop-words file, default stoplist.csv. A text file containing stop words (one word per line). Words from this list will not be used by a similarity measure. All occurrences of these words in definitons.csv will be removed.

-o file
Output file, default result.csv. A text file containing set of found semantic relations between words specified in the file concepts.csv. Each line of this file contains a pair of semantically related words, according to the specified method.

-S o|c|k
Similarity method, default o. Semantic similarity measure used for relation extraction.
o - Gloss overlap measure equal to number of common words in the definitions of two words.
c - Cosine between bag-of-word vectors build from definitions of respective of two words
k - Karaulov's semantic similarity measure.

-M 1|2
Component analysis method, default knn. An algorithm used to derive semantic relations from pairwise similarity scores between the words.
1 - knn. Standard nearest-neighbor algorithm (KNN). Here K most similar words are related to a target word.
2 - mknn. Mutual nearest-neighbor algorithm (MKNN). Here K mutually most similar words are related to a target word.

-K NUM
Number of nearest-neighbors, default K = 2.

-T T1 T2 T3
T1, T2, T3 - parameters of Karaulov's semantic similarity measure, default T1 = 2, T1 = 1, T3 = 6.

Files and Catalogs

bin - contains binary excutable file:

  • serelex_win32.exe - excutable file for 32-bit Windows
  • serelex_win64.exe - excutable file for 64-bit Windows
  • serelex_i686 - executable file for i686 Debian-based systems
  • serelex_amd64 - executable file for amd64 Debian-based systems

docs - provides documentation in html format

sample-data - contains sample source data:

  • concepts.csv - a text file containing a set of input words (one word per line)
  • definitions.csv - a text file containing a set of definitions for words specified in the concepts.csv file
  • stoplist.csv - a text file containing stop words (one word per line)

src - the source code

windows - provides project to Microsoft Visual Studio 2010

test.sh and test.bat - run analysis for algorithm KNN, MutalKNN, with overlap and cosinus method and K = 1, 2, 5, 10

Build

To build under Windows, use MS Visual Studio 2010 project in windows folder. To select configuration - x32 or x64 - use select list on tools panel.
To build under *nix use command make. To build on amd64 use command make ARCH=-m64

defvectors's People

Contributors

jgc128 avatar pomanob avatar alexanderpanchenko avatar

Watchers

James Cloos avatar Song Xiliang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.