Giter Site home page Giter Site logo

flair's People

Contributors

dependabot[bot] avatar example avatar jhveem avatar mjbriggs avatar reynoldsnlp avatar rkechols avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

flair's Issues

Add Russian detailed morphological analysis

The view project uses the seco lookup tool. The Russian model can come from my udar project here.

Maybe start with this file to get an idea how it is used: https://gitlab.com/view/core/blob/master/app/src/main/java/werti/util/HFSTAnalyser.java

In general, anything with HFST or CG3 in the name might be relevant. HFST is the tool that does the lookup. A given token could have more than one possible grammatical analysis (reading), so we use a Constraint Grammar (CG3) to eliminate readings based on context.

User Interface - search button doesn't hide after search

After loading flair, the problem can be found by:

  1. Entering a search query and hitting search
  2. Canceling the search
  3. Entering another search query
  4. Waiting for results to load
  5. Click on the orange magnifying glass floating action button in the bottom right.
    Note that the floating action buttons are still displayed over the bottom sheet where a search query is entered.

server appears to be down

This may be planned maintenance or something, but when I run a search in any language I get the following errors.

Failed to load resource: the server responded with a status of 404 ()

Failed to load resource: the server responded with a status of 500 ()

SEVERE: Couldn't begin web search operation. Exception: com.google.gwt.user.client.rpc.StatusCodeException: 500 The call failed on the server; see server log for details
Obb @ flair-0.js:3041
Hhc @ flair-0.js:2862
Ghc @ flair-0.js:2070
Ihc @ flair-0.js:1836
Tz @ flair-0.js:2844
Pz @ flair-0.js:1639
Tl @ flair-0.js:3040
Wc @ flair-0.js:3035
rib @ flair-0.js:2974
LS @ flair-0.js:2259
YS @ flair-0.js:3041
(anonymous) @ flair-0.js:2110
xJ @ flair-0.js:1445
AJ @ flair-0.js:2750
(anonymous) @ flair-0.js:2100

testing

We need automated testing. Let's use this thread to discuss/plan/prioritize.

localization files are not being updated

src/main/webapp/WEB-INF/deploy/flair/symbolMaps/* are static files, checked in to the repository. They should probably NOT be in the repo, but should be dynamically generated.

Also, mvn clean should probably remove them.

Add names to constructions for conjugation classes

On the branch russian2 in the file src/main/java/com/flair/client/localization/resources/strings-en-constructions.tsv, lines 688 through 741 need user-facing info about conjugation classes.

  • Rows that start with _gram-name_ need a short name describing the conjugation class
  • Rows that start with _gram-helpText_ need a short description and/or examples

These values should be placed in the 3rd column (after the 2nd \t character)

Make resource locations consistent across platforms and implementations

In other words, remove all absolute paths from code. This includes reading/writing weka models, reading/writing temporary MADAMIRA files, etc.

  1. The location of the source repository should not be assumed
  2. The location of the tomcat instance should not be assumed
  3. Tomcat can be configured to restart an instance when a new file is written inside webapps/, so this location should be avoided for writing.

Arabic NLP resources?

I saw this on corpora-list, and thought they could come in handy for us.

I don't understand what you exactly mean by basic vocabulary used for defining words but if you mean roots from which Arabic words can be generated, here are two dictionaries. In our team we have two Arabic dictionaries available in LMF format

  • Contemporary Arabic dictionary with 32300 lexical entries generated from 5778 roots

  • Al wassit" Arabic dictionary with 61101 lexical entries generated from 6900 roots

Both can be freely downloaded from http://arabic.emi.ac.ma/alelm/?q=Resources

You can also take a look at the second release of Arabic wordnet from http://globalwordnet.org/arabic-wordnet

best

karim

German user-facing text for Russian constructions

Russian constructions need names, paths, and help texts for German localization of the UI.

These localizations should be edited in these files:
src/main/java/com/flair/client/localization/resources/strings-de-constructions.tsv
src/main/java/com/flair/client/localization/resources/strings-de-general.tsv

The original English localizations can be seen in these files:
src/main/java/com/flair/client/localization/resources/strings-en-constructions.tsv
src/main/java/com/flair/client/localization/resources/strings-en-general.tsv

Let user know it's still processing

When there are more search results than can be seen on the page, it is unclear that FLAIR is still processing. It would be beneficial to have some kind of notice saying that it's in the middle of processing.

MADAMIRA keeps dying

Figure out why, and maybe we need to integrate into systemd, i.e. turn it into a service.

Verify that "unreal conditionals" are considered separate.

Should there be a specific sub-category of conditionals for unreal situations? For example, the English sentence "If you had been here, It would've been better" is an unreal conditional. Should that be a separate construction (with accompanying weight slider) from just "conditionals"?

Ability to request the next batch of results

It would be nice if it were possible to request the next x search results. For example, if I request 10 sites, and none of them are quite what I'm looking for, it would be nice to just look for the next 10 instead of having to request 20 and reprocess the 10 I already know aren't going to work.

Handle outOfMemory errors and exceptions

When an individual server instance has to use more than three parser models, it results in an outOfMemory error from the server. On the user's end, all they see is a web page that is stuck loading documents. There should be a way for us to gracefully handle such situations. Ideally we would find a way to work around the high memory needs of this project.

cg-conv hangs on certain inputs

When the cg-conv utility is run from src/main/java/com/flair/server/utilities/CgConv.java, certain inputs cause it to hang, then time out as programmed.

One such input for Russian is the content of this site, which can be found by searching говорить in Russian with curated domains; it is the 3rd result (as of June 18, 2020)

Changes to the project's name

Is the change of name to "Foreign Language Acquisition Information Retrieval" intentional or a typo? The original project is called "Form-Focused Linguistically Aware Information Retrieval".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.