Giter Site home page Giter Site logo

dkpro-lsr's People

Contributors

dependabot[bot] avatar jlleitschuh avatar logological avatar reckart avatar zesch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dkpro-lsr's Issues

Sample config file

As far as I can tell, there is no documentation whatsoever about the config file format for DKPro LSR, nor is a sample file provided in the source tree. It's critical that we provide some documentation about the format, as well as an example file and instructions on where to put it so that DKPro LSR can find it. DKPro LSR cannot be used without a config file.

Is it possible there was some documentation about this on Google Code that didn't get migrated here to GitHub?

JUnit test testGraphIntegrity fails randomly

The JUnit test de.tudarmstadt.ukp.dkpro.lexsemresource.graph.EntityGraphJGraphTTest.testGraphIntegrity seems to occasionally fail when Jenkins runs it, even though no corresponding code has changed. (For example, in Build 36 the test passed, but in Build 37, where the only change was to add a .gitignore file to the repository, the test failed. The assertion fails as follows:

Error Message

expected:<17> but was:<16>

Stacktrace

java.lang.AssertionError: expected:<17> but was:<16>
    at org.junit.Assert.fail(Assert.java:91)
    at org.junit.Assert.failNotEquals(Assert.java:645)
    at org.junit.Assert.assertEquals(Assert.java:126)
    at org.junit.Assert.assertEquals(Assert.java:470)
    at org.junit.Assert.assertEquals(Assert.java:454)
    at de.tudarmstadt.ukp.dkpro.lexsemresource.graph.EntityGraphJGraphTTest.testGraphIntegrity(EntityGraphJGraphTTest.java:75)

The test always seems to pass when run on my local machine.

Using WordNet as Maven Dependency

It would be nice if DKPro LSR could be used with WordNet without having to create a resource.xml and without having to download WordNet manually. We already use extjwnl which can load WordNet from a Maven dependency... we should be able to benefit from that.

Update GermaNet API to support GermaNet 15

The package de.tudarmstadt.ukp.dkpro.lexsemresource.germanet-gpl should be updated to use a more recent version of the GermaNet API, as the current one does not support the most recent GermaNet dataset.

I've been testing version 13.2.1 on my own machine, and there is just one minor incompatibility -- but it's unclear to me whether the change that's responsible is intentional or not (hopefully we get an answere here). I'm also unsure what would be the best way to handle the situation in LSR, so I think for now version 13.1.0 would be sufficient. This version also supports the current GermaNet data, and there is no incompatibility.

Don't output serialized graphs to the root directory

The JUnit tests in de.tudarmstadt.ukp.dkpro.lexsemresource/de.tudarmstadt.ukp.dkpro.lexsemresource.graph-asl read in LSR graphs and serialize them to files in the module's root directory. (I'm not sure if the tests themselves are to blame, or if it's the underlying API methods which are doing this.) In any case, this shouldn't be happening. Unless otherwise specified, the graphs should probably go into the target directory, or some other place where the developer and their SCM aren't going to confuse them with newly added source files.

Migrate from JWNL to extJWNL 1.9

DKPro LSR currently uses the ancient, buggy, and unmaintained JWNL. Most modern NLP applications have moved to extJWNL, which is maintained. Unfortunately, using both JWNL and extJWNL in the same project can lead to errors, so this limits the usability of DKPro LSR in other projects.

It would be nice if DKPro LSR were rewritten to use the latest version of extJWNL (1.9).

I've done a local rewrite myself. The only classes which I was unable to migrate were de.tudarmstadt.ukp.dkpro.lexsemresource.wordnet.util.UkpFileManagerImpl and de.tudarmstadt.ukp.dkpro.lexsemresource.wordnet.util.UkpRandomAccessDictionaryFile. I have no idea what these are for; they have no meaningful documentation and they're not referenced anywhere else in DKPro LSR, including in any unit tests. Would it be OK if I simply removed these two classes?

Wordnet 'index.sense' file name automatically set to 'sense.idx'

Originally reported by turkovic.mladen, Jul 7, 2014 on GoogleCode

In UkpFileManagerImpl.java
line 108-111

Resets WordNet "index.sense" file-name to "sense.idx"

String sense = "index.sense";
if (JWNL.getVersion().getNumber() < 2.1) {
sense = "sense.idx";
}

But all my WordNet (1.7, 2.1, 3.0, 3.1) installations in Unix (Ubuntu 12.04) have only "index.sense" files

This results in error when calling "LsrSenseInventoryResource"


Workaround:
Copy "index.sense" file in WordNet 1.7/dict and rename to "sense.idx"


The problem occurred with:
de.tudarmstadt.ukp.dkpro.wsd.examples-gpl / Senseval2EnLSExample.java

Running Ubuntu 12.04 (Eclipse Kepler Service Release 2)

Migrate to extJWNL 1.8.0

DKPro LSR depends on JWNL, a long-unmaintained Java library for WordNet. Modern NLP code is increasingly moving to extJWNL, a fork of JWNL which is well maintained. Unfortunately, recent versions of extJWNL clash with JWNL; it's not possible to have both dependencies in the same project, so much recent NLP code which depends on extJWNL cannot use DKPro LSR any more. (See extjwnl/extjwnl#1 for further details.)

I suggest migrating DKPro LSR from JWNL to extJWNL. This should be pretty painless; the API is mostly the same, and the WordNet properties file syntax is only slightly different.

Update SCM data and copyright notices

The copyright notices in all files need to be updated to reflect the current year.

The SCM data in the POMs needs to be updated to reflect the project's new home on GitHub.

Accessing DKPro LSR's instance of WordNet/GermaNet

I think it's a common use case to utilize not only the functions exposed by DKPro LSR for GermaNet and WordNet, but also other functions provided only in the WordNet/GermaNet APIs. (One example missing in LSR is getIlirecords() from the GermaNet API.)
However, creating a LexicalSemanticResource object and also a WordNet/GermaNet object can require too much memory.
It should be possible to either access the WordNet/GermaNet object created by LSR, or to provide an existing GermaNet/WordNet object to LSR to construct the resource class with (through the ResourceFactory class or maybe directly to the relevant resource class?).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.