Giter Site home page Giter Site logo

Comments (2)

kosloot avatar kosloot commented on June 16, 2024

OK. This is caused by the fact that Frog can process FoLiA sequential.
The <s> in the input bears the text: "Een Bug? Nee toch?

The tokenizer sees 2 sentences here, which is correct.
The first sentence "Een Bug?" generates 3 <w> nodes, that are appended to the <s>.
Then is is given to the Chunker which creates a a <chunking> layer with one Chunk.

Then the second sentence "Nee toch?" is analyzed. Again 3 words are generated and appended AFTER the Chunking Layer
Then the Chunker generates another 2 Chunks, which are added to the already provided <chunking> layer.
So BEFORE the last 3 words.
But the chunks carry Word references, therefore creating a problem.

Possible solutions:

  • allow Frog/Tokenizer to create a <p> and 2 new sentences, replacing the original sentence. This seems nice, but creates a lot of problems. Al least it should be done as a Correction
  • Detect this case, and just IGNORE that the tokenizer detected multiple sentences, gather all words in one list to append to the <s>, and also handle this list to the Chunker.
  • Allow multiple Chunking layers interleaved with Words inside the Sentence

For now, i assume the second solution to be the "best" and lesser intrusive.

from frog.

kosloot avatar kosloot commented on June 16, 2024

As far as I can see, the fix, (using the second scenario,) is indeed sound and solves the problem.
Planning a new release RSN

from frog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.