The flair from reynoldsnlp

Add Russian detailed morphological analysis

The view project uses the seco lookup tool. The Russian model can come from my udar project here.

Maybe start with this file to get an idea how it is used: https://gitlab.com/view/core/blob/master/app/src/main/java/werti/util/HFSTAnalyser.java

In general, anything with HFST or CG3 in the name might be relevant. HFST is the tool that does the lookup. A given token could have more than one possible grammatical analysis (reading), so we use a Constraint Grammar (CG3) to eliminate readings based on context.

User Interface - search button doesn't hide after search

After loading flair, the problem can be found by:

Entering a search query and hitting search
Canceling the search
Entering another search query
Waiting for results to load
Click on the orange magnifying glass floating action button in the bottom right.
Note that the floating action buttons are still displayed over the bottom sheet where a search query is entered.

Decide what Russian grammatical constructions to incorporate

We need to decide which grammatical constructions for the project to recognize and report.

Remove src/main/webapp/META-INF directory and ensure functionality

It seems that the directory src/main/webapp/META-INF is extraneous. Assuming this is correct and its removal will not cause problems, it should be removed.

server appears to be down

This may be planned maintenance or something, but when I run a search in any language I get the following errors.

Failed to load resource: the server responded with a status of 404 ()

Failed to load resource: the server responded with a status of 500 ()

SEVERE: Couldn't begin web search operation. Exception: com.google.gwt.user.client.rpc.StatusCodeException: 500 The call failed on the server; see server log for details
Obb @ flair-0.js:3041
Hhc @ flair-0.js:2862
Ghc @ flair-0.js:2070
Ihc @ flair-0.js:1836
Tz @ flair-0.js:2844
Pz @ flair-0.js:1639
Tl @ flair-0.js:3040
Wc @ flair-0.js:3035
rib @ flair-0.js:2974
LS @ flair-0.js:2259
YS @ flair-0.js:3041
(anonymous) @ flair-0.js:2110
xJ @ flair-0.js:1445
AJ @ flair-0.js:2750
(anonymous) @ flair-0.js:2100

testing

We need automated testing. Let's use this thread to discuss/plan/prioritize.

add Russian to the front end

This was a localization issue.

localization files are not being updated

src/main/webapp/WEB-INF/deploy/flair/symbolMaps/* are static files, checked in to the repository. They should probably NOT be in the repo, but should be dynamically generated.

Also, mvn clean should probably remove them.

Add names to constructions for conjugation classes

On the branch russian2 in the file src/main/java/com/flair/client/localization/resources/strings-en-constructions.tsv, lines 688 through 741 need user-facing info about conjugation classes.

Rows that start with _gram-name_ need a short name describing the conjugation class
Rows that start with _gram-helpText_ need a short description and/or examples

These values should be placed in the 3rd column (after the 2nd \t character)

Make resource locations consistent across platforms and implementations

In other words, remove all absolute paths from code. This includes reading/writing weka models, reading/writing temporary MADAMIRA files, etc.

The location of the source repository should not be assumed
The location of the tomcat instance should not be assumed
Tomcat can be configured to restart an instance when a new file is written inside webapps/, so this location should be avoided for writing.

upgrade tika-core

...to at least 1.20

Speed up search times

make flair deployable using docker

One container for MADAMIRA and one for FLAIR.

Make `Constructions:` section optional (for Arabic)

add persian

Recognize indirect speech that includes a question.

We need to use a list of verbs that can introduce indirect speech.
For example, "скажу ему, когда ты придешь." should be recognized as not being a question

page crashes when you click on arabic document

Recognize separate conjugation classes and declension classes in Russian

We need some way to recognize which conjugation class each verb belongs to, as well as which declension class each noun belongs to.

make front-end more responsive to smaller screens

Make Grammar constructions language-specific

fix arabic default readability scores

Get feedback from users

Google Form or email

upgrade tika in pom.xml

Document how to add a new language

search on specific websites, possibly even curate a list of sites

Custom readability levels for each language

The Arabic model has 4 readability levels (1, 2, 3 and 4), and they are not based on CEFR levels (A1-C2), so the labels on the webpage are misleading.

Arabic NLP resources?

I saw this on corpora-list, and thought they could come in handy for us.

I don't understand what you exactly mean by basic vocabulary used for defining words but if you mean roots from which Arabic words can be generated, here are two dictionaries. In our team we have two Arabic dictionaries available in LMF format

Contemporary Arabic dictionary with 32300 lexical entries generated from 5778 roots

Al wassit" Arabic dictionary with 61101 lexical entries generated from 6900 roots

Both can be freely downloaded from http://arabic.emi.ac.ma/alelm/?q=Resources

You can also take a look at the second release of Arabic wordnet from http://globalwordnet.org/arabic-wordnet

best

karim

German user-facing text for Russian constructions

Russian constructions need names, paths, and help texts for German localization of the UI.

These localizations should be edited in these files:
src/main/java/com/flair/client/localization/resources/strings-de-constructions.tsv
src/main/java/com/flair/client/localization/resources/strings-de-general.tsv

The original English localizations can be seen in these files:
src/main/java/com/flair/client/localization/resources/strings-en-constructions.tsv
src/main/java/com/flair/client/localization/resources/strings-en-general.tsv

arabic results are coming back VERY slowly

Searched for الحور الرجراج for 40 results and they were being processed very slowly. Starting about 25/04/2019 16:19:41 in catalina.out.

Load models automatically on server startup?

~~What will it take to daemonize this?~~

~~One approach to this sort of thing is Apache UIMA, but this would take a huge refactor.~~

Rework MADAMIRA<=>flair interface to not use filesystem

Currently, data is sent to MADAMIRA server using a temporary file. We should try to find a way to pass it directly without all of the disk I/O.

Let user know it's still processing

When there are more search results than can be seen on the page, it is unclear that FLAIR is still processing. It would be beneficial to have some kind of notice saying that it's in the middle of processing.

MADAMIRA keeps dying

Figure out why, and maybe we need to integrate into systemd, i.e. turn it into a service.

Verify that "unreal conditionals" are considered separate.

Should there be a specific sub-category of conditionals for unreal situations? For example, the English sentence "If you had been here, It would've been better" is an unreal conditional. Should that be a separate construction (with accompanying weight slider) from just "conditionals"?

Ability to request the next batch of results

It would be nice if it were possible to request the next x search results. For example, if I request 10 sites, and none of them are quite what I'm looking for, it would be nice to just look for the next 10 instead of having to request 20 and reprocess the 10 I already know aren't going to work.

Handle outOfMemory errors and exceptions

When an individual server instance has to use more than three parser models, it results in an outOfMemory error from the server. On the user's end, all they see is a web page that is stuck loading documents. There should be a way for us to gracefully handle such situations. Ideally we would find a way to work around the high memory needs of this project.

reynoldsnlp / flair Goto Github PK

flair's People

Contributors

Stargazers

Watchers

flair's Issues

Recommend Projects

Recommend Topics

Recommend Org