Giter Site home page Giter Site logo

senna's Introduction

SENNA LuaJIT Interface

Disclaimer: while this glue code is provided under a BSD license, SENNA is not. Please refer to SENNA license.

This interface supports Part-of-speech tagging, Chunking, Name Entity Recognition and Semantic Role Labeling.

Installation

Because SENNA is shipped under a particular license, we do not include it into this repository. You thus need to follow these steps to install SENNA LuaJIT interface:

  • Clone the SENNA LuaJIT interface:
git clone https://github.com/torch/senna.git
  • Go into the created directory:
cd senna
  • Get SENNA. You must accept the license to proceed further.

  • Unpack SENNA archive into the git directory.

  • Run luarocks:

luarocks make rocks/senna-scm-1.rockspec

Running

We provide an example usage called senna.run. It outputs tags into stdout for anything coming in stdin. Typical usage:

luajit -lsenna.run < file_to_tag.txt > tags.txt

Typical output:

echo "The Dow Jones industrials closed at 2569.26 ." | luajit -lsenna.run
                 The        DT      (NP*         *                   -      (A1*
                 Dow       NNP         *    (MISC*                   -         *
               Jones       NNP         *        *)                   -         *
         industrials       NNS        *)         *                   -        *)
              closed       VBD     (VP*)         *              closed      (V*)
                  at        IN     (PP*)         *                   -  (AM-EXT*
             2569.26        CD     (NP*)         *                   -        *)
         .         .         *         *                   -         *

Please look into the example usage file (run.lua) if you want to use the interface on your own in LuaJIT. It provides a good overview on how things work.

Interface Description

The LuaJIT interface provides several objects encapsulating SENNA's tools.

Hash

SENNA's Hash.

senna.Hash(path, filename[, admissible_keys_filename])

Load a hash stored at filename, into the given path. If the admissible_keys_filename is present, this will create a hash with admissible keys (needed for NER).

Hash:index(key)

Returns the index of the given string key.

Hash:key(idx)

Returns the string at the given index idx (a number).

Hash:size()

Returns the number of pairs (key, value) stored into the hash.

Hash:IOBES2IOB()

Transform IOBES hash values (strings) into IOB format.

Hash:IOBES2BRK()

Transform IOBES hash values (strings) into bracket format.

Tokens

Encapsulate tokens returned by the Tokenizer. Only created by the tokenizer.

Tokens:words()

Return a table containing tokenized word strings.

Tokenizer

Encapsulate SENNA's tokenizer.

senna.Tokenizer([is_tokenized])

Create a new tokenizer. The tokenizer will be able to tokenize and create any features required by SENNA subroutines. If is_tokenized is at true, then the tokenizer assumes words are already tokenized, separated with spaces.

Tokenizer:tokenize(sentence)

Tokenize the given string. Returns Tokens.

Important note: because of internal states retained into the Tokenizer, it is not possible to tokenize and process several sentences at the time. Keep this in mind when calling the analyzing tools.

Part Of Speech

SENNA's Part-of-speech (POS) module.

senna.POS()

Creates a POS analyzer.

POS:forward(tokens)

Returns a table containing POS tags computed on the given tokens (which must be from coming the Tokenizer module).

Chunking

SENNA's chunking (shallow parsing) module.

senna.CHK([hashtype])

Creates a chunking analyzer. The optional hashtype argument indicates the format of the generated tags. By default it will be IOBES. Other options are IOB or BRK (for bracketing tags).

CHK:forward(tokens, pos_tags)

Returns a table containing chunking tags, computed on the given tokens (which must be coming from the Tokenizer module) and POS tags (which must be coming from the POS module).

Name Entity Recognition

SENNA's name entity recognition (NER) module.

senna.NER([hashtype])

Creates a NER analyzer. The optional hashtype argument indicates the format of the generated tags. By default it will be IOBES. Other options are IOB or BRK (for bracketing tags).

NER:forward(tokens)

Returns a table containing NER tags, computed on the given tokens (which must be coming from the Tokenizer module).

Semantic Role Labeling

SENNA's semantic role labeling (SRL) module.

senna.SRL([hashtype],[verbtype])

Creates a SRL analyzer. The optional hashtype argument indicates the format of the generated tags. By default it will be IOBES. Other options are IOB or BRK (for bracketing tags).

The optional verbtype indicates how verbs should be found. Default is VBS, SENNA's custom way of finding verbs. One can also use verbs from POS with POS or user provided verbs with USR.

SRL:forward(tokens, pos_labels[, usr_verb_labels])

Returns a table containing a table of SRL tags, computed on the given tokens (which must be coming from the Tokenizer module) and POS tags (which must be coming from the POS module).

Each table in the table corresponds to a particular detected/provided verb and contains tags for each word in the sentence.

The returned table also contains a verb field, which is a table of booleans. A boolean at true means the word was considered as a verb.

If USR was passed as verbtype during creation of the module, the user must also provide a list of words considered as verbs in usr_verb_labels. The list must be a list of booleans, of the size of the number of tokens in the sentence. A boolean at true means the corresponding word will be considered as a verb.

Additional available functions

senna.verbose(flag)

Set SENNA's verbose mode to flag (true or false).

senna's People

Contributors

andresy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.