Giter Site home page Giter Site logo

pydelphin's Introduction

PyDelphin
Python libraries for DELPH-IN

Branch Status
master Build Status
develop Build Status

NOTE for previous PyDelphin users: Recent versions of PyDelphin may have backwards-incompatible changes with prior versions. Please file an issue if you have trouble upgrading.

PyDelphin is a set of Python libraries for the processing of DELPH-IN data. It doesn't aim to do heavy tasks like parsing or treebanking, but rather to provide Python modules for loading a variety of DELPH-IN formats, such as [incr tsdb()] profiles or Minimal Recursion Semantics representations. These modules offer a programmatic interface to the data to enable developers or researchers to boostrap their own tools without having to re-invent the wheel. PyDelphin also provides a front-end tool for accomplishing some tasks such as refreshing [incr tsdb()] profiles to a new schema, creating sub-profiles, or converting between MRS representations (SimpleMRS, MRS XML, DMRS, etc.).

Documentation

Documentation is available on the wiki. Help is appreciated! See here for instructions.

Usage Examples

Here's a brief example of using the itsdb library:

>>> from delphin import itsdb
>>> prof = itsdb.ItsdbProfile('~/logon/dfki/jacy/tsdb/gold/mrs')
>>> for row in prof.read_table('item'):
...     print(row.get('i-input'))
  降っ 太郎  吠え   開い 太郎  次郎  追っ  .
[...]
>>> next(prof.read_table('result')).get('derivation')
'(utterance-root (91 utterance_rule-decl-finite -0.723739 0 4 (90 head_subj_rule -1.05796 0 4 (87 hf-complement-rule -0.50201 0 2 (86 quantify-n-rule -0.32216 0 1 (5 ame-noun 0 0 1 ("雨" 1 "\\"雨\\""))) (6 ga 0.531537 1 2 ("が" 2 "\\"が\\""))) (89 vstem-vend-rule -0.471785 2 4 (88 t-lexeme-c-stem-infl-rule 0.120963 2 3 (14 furu_1 0 2 3 ("降っ" 3 "\\"降っ\\""))) (24 ta-end -0.380719 3 4 ("た" 4 "\\"た\\""))))))'

Here's an example of loading a SimpleMRS representation:

>>> from delphin.codecs import simplemrs
>>> m = simplemrs.loads_one('[ LTOP: h1 INDEX: e2 [ e TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - SF: PROP ASPECT: DEFAULT_ASPECT PASS: - ] RELS: < [ udef_q_rel<0:1> LBL: h3 ARG0: x6 [ x PERS: 3 ] RSTR: h5 BODY: h4 ] [ "_ame_n_rel"<0:1> LBL: h7 ARG0: x6 ] [ "_furu_v_1_rel"<2:3> LBL: h8 ARG0: e2 ARG1: x6 ] > HCONS: < h5 qeq h7 > ]')
>>> m.ltop
'h1'
>>> for p in m.preds():
...     print('{}|{}|{}|{}'.format(p.string, p.lemma, p.pos, p.sense))
... 
udef_q_rel|udef|q|None
"_ame_n_rel"|ame|n|None
"_furu_v_1_rel"|furu|v|1
>>> print(simplemrs.dumps_one(m, pretty_print=True))
[ TOP: h1
  INDEX: e2 [ e TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - SF: PROP ASPECT: DEFAULT_ASPECT PASS: - ]
  RELS: < [ udef_q_rel<0:1> LBL: h3 ARG0: x6 [ x PERS: 3 ] RSTR: h5 BODY: h4 ]
          [ "_ame_n_rel"<0:1> LBL: h7 ARG0: x6 ]
          [ "_furu_v_1_rel"<2:3> LBL: h8 ARG0: e2 ARG1: x6 ] >
  HCONS: < h5 qeq h7 > ]

Here is TDL introspection:

>>> from delphin import tdl
>>> f = open('~/logon/lingo/erg/fundamentals.tdl', 'r')
>>> types = {t.identifier: t for t in tdl.parse(f)}
>>> types['basic_word'].supertypes
['word_or_infl_rule', 'word_or_punct_rule']
>>> types['basic_word'].features()
[('SYNSEM', <TdlDefinition object at 140559634120136>), ('TOKENS', <TdlDefinition object at 140559631479864>), ('ORTH', <TdlDefinition object at 140559631479000>)]
>>> types['basic_word'].coreferences
[('#rb', ['ORTH.RB', 'TOKENS.+LAST.+TRAIT.+RB']), ('#to', ['SYNSEM.LKEYS.KEYREL.CTO', 'ORTH.TO', 'TOKENS.+LAST.+TO']), ('#lb', ['ORTH.LB', 'TOKENS.+LIST.FIRST.+TRAIT.+LB']), ('#form', ['ORTH.FORM', 'TOKENS.+LIST.FIRST.+FORM']), ('#tl', ['SYNSEM.PHON.ONSET.--TL', 'TOKENS.+LIST']), ('#from', ['SYNSEM.LKEYS.KEYREL.CFROM', 'ORTH.FROM', 'TOKENS.+LIST.FIRST.+FROM']), ('#class', ['ORTH.CLASS', 'TOKENS.+LIST.FIRST.+CLASS'])]

And here's how to compile, parse, and generate with the ACE wrapper:

>>> from delphin.interfaces import ace
>>> ace.compile('../jacy/ace/config.tdl', 'jacy.dat')
[...]
>>> response = ace.parse('jacy.dat', '犬 が 吠える')
>>> len(response.results())
1
>>> response.result(0).keys()
dict_keys(['DERIV', 'MRS'])
>>> response.result(0)['MRS']
'[ LTOP: h0 INDEX: e2 [ e TENSE: pres MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] RELS: < [ udef_q_rel<0:1> LBL: h4 ARG0: x3 [ x PERS: 3 ] RSTR: h5 BODY: h6 ]  [ "_inu_n_rel"<0:1> LBL: h7 ARG0: x3 ]  [ "_hoeru_v_1_rel"<4:7> LBL: h1 ARG0: e2 ARG1: x3 ] > HCONS: < h0 qeq h1 h5 qeq h7 > ]'
>>> response.result(0).mrs()
<Xmrs object (udef inu hoeru) at 140352613240112>
>>> ace.generate('jacy.dat', response.result(0)['MRS']).results()
[ParseResult({'SENT': '犬 が 吠える'})]

Installation and Requirements

PyDelphin is developed for Python 3 (3.4+), but it has also been tested to work with Python 2.7. Optional requirements include:

  • NetworkX for MRS isomorphism checking
  • requests for the REST client
  • Pygments for TDL and SimpleMRS syntax highlighting
  • Penman for PENMAN serialization of DMRS and EDS
  • tikz-dependency, while not a Python requirement, is needed for compiling LaTeX documents using exported DMRSs

PyDelphin itself does not need to be installed to be used. You can adjust PYTHONPATH to include the PyDelphin directory.

If you would rather install it, however, it is available on PyPI:

$ pip install pydelphin

Sub-packages

The following packages/modules are available:

  • derivation: Derivation trees
  • itsdb: [incr tsdb()] profiles
  • mrs: Minimal Recursion Semantics
  • tdl: Type-Description Language
  • tfs: Typed-Feature Structures
  • tokens: Token lattices
  • extra.highlight: Pygments-based syntax highlighting (currently just for TDL and SimpleMRS)
  • extra.latex: Formatting for LaTeX (just DMRS)
  • interfaces.ace: Python wrapper for common tasks using ACE
  • interfaces.rest: Client for the RESTful web API

Contributors

Related Software

pydelphin's People

Contributors

dantiston avatar fcbond avatar goodmami avatar guyemerson avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.