Giter Site home page Giter Site logo

dcavar / fomamwt Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 1.0 629 KB

Foma-based multi-word tagger and morphological analyzer

Home Page: http://damir.cavar.me/

License: Apache License 2.0

CMake 20.01% C++ 79.99%
foma cpp nlp nlp-parsing natural-language-processing multiword-extraction multiword-expressions finite-state-transducer xfst

fomamwt's Introduction

Foma example codes

(C) 2016-2018 by Damir Cavar

Last edited: 2018-08-06, Damir Cavar

Intro

This code example shows how a Foma-based FST can be used to process multi-word expressions that are given in a dictionary and compiled into a Finite State Transducer.

There is a default window size specified in the code. It can be altered using command line arguments.

The maximum multi-word window size can actually be compiled into the Finite State Transducer and read out using the C-wrapper. This way one can avoid unnecessary lookups. The advantage of this method, assuming that one has a comprehensive list of multi-word expressions to compile into the transducer, is that it is very fast and that it shows internal structural and morphosyntactic properties of multi-word expressions.

The example implementation should be straight forward to understand. It expects an input in form of a file (or a stream) that contains a tokenized sentence per line. See the included test.txt file for an example.

Build from Code

To compile this example, you need to have the entire Foma collection of binaries, includes and libraries set up on your system. You will also need some C++11 compiler and various other libraries for it, for example the Boost libraries.

The project is a CMake project. Make sure that you have also CMake installed and set up on your system.

To create the running binary for the code in FomaMWT, in the folder run:

cmake CMakeList.txt

This will generate the Makefile and other files in the same folder. Run:

make

and it should compile correctly, if all the paths and folders are OK, and if the libraries were found.

If you want to test the speed of the processor, run the following command:

time ./mwtagger test.txt > res.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.