Giter Site home page Giter Site logo

timur's Introduction

Finite-state morphology for German

Join the chat at https://gitter.im/timur-morph/community

This package started as a migration of a set of finite-state grammars for the morphological analysis of German words delivered with SFST, a finite-state transducer (FST) toolkit by Helmut Schmid, to Pynini, another FST toolkit. The latter has the advantage that it is implemented as a python library allowing for seamless interaction with tons of other useful python packages. By now, a number of morphological operations have been added and some analysis strategies adjusted in comparison to the original rule set.

Installation

timur is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5 and 3.6) installation as well as a working C++ compiler supporting C++-11.

OpenFST

The underlying FST toolkit Pynini is itself based on OpenFST, a C++ library for constructing, combining, optimizing, and searching weighted FSTs. Get the latest version of OpenFST which works with the current version of Pynini (finding a working combination can by a little tricky since Pynini usually is a bit behind OpenFST; comparing the release dates helps), unpack the archive, build and install via

$ ./configure --enable-grm
$ make
$ [sudo] make install && [sudo ldconfig]

re2

TODO

virtualenv

Using virtualenv is highly recommended, although not strictly necessary for installing timur. It may be installed via:

$ [sudo] pip install virtualenv

Create a virtual environement in a subdirectory of your choice (e.g. env) using

$ virtualenv -p python3 env

and activate it.

$ . env/bin/activate

Python requirements

timur uses various 3rd party Python packages (including Pynini) which may best be installed using pip:

(env) $ pip install -r requirements.txt

Finally, timur itself can be installed via pip:

(env) $ pip install .

timur's People

Contributors

gitter-badger avatar wrznr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

gitter-badger

timur's Issues

Add a morpheme class for formatives

Words like Himbeere are clearly morphologically complex although their first part is not a word. To mark the non-word part in such constructions, a special morpheme category formative (<FT>) should be introduced.

Distinguish the different German prefix types

Right now, only <PREF> is available as prefix marker. We want at least

  1. Verbal prefixes
    a. separable (i.e. particle)
    b. non-separable (i.e. prefix)
  2. Inflectional prefix (i.e. ge in past participle)
  3. Adjective and noun prefixes

Add proper testing and deployment infrastructure

At the moment, only a few trivial tests exist. Perspectively, we want continuous integration and 100 % test coverage. @kba I'd be really thankful if you could give me hand in setting up the necessary 3rd-party tools.

Add stem type filter for prefixes

Right now, productive prefixes can be added to manually prefixed words:

{"result": "zer<PREF>:<epsilon>aus<PREF>:<epsilon><epsilon>:g<epsilon>:eW:wert<NN>:<epsilon><CONV>:<epsilon>e:<epsilon>n:<epsilon><epsilon>:e<epsilon>:t<+V>:<epsilon><PPast>:<epsilon>", "query": "zerausgewertet"}

This could be prevented by applying a stem type filter to prefixing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.