Giter Site home page Giter Site logo

Comments (6)

dakrone avatar dakrone commented on July 17, 2024

I agree, it'd be better to have an actual parser for the S-expressions treebank string. I'd like to be able to write one that handled it well. In the meantime though, the string is still available for people (which is why I didn't run the treebank stuff through the make-tree method by default).

I'll leave this open, in the future I may write an actual parser for it (or have one contributed).

from clojure-opennlp.

croeder avatar croeder commented on July 17, 2024

It may turn out to be a fun Clojure exercise. I've parsed treebank strings in Python before so I see the challenge partly as involving Clojure in general, and partly how to identify the different types of scalar. If you're itching to take a crack at it, I probably won't for a little while. I want to integrate found names with the parse.

from clojure-opennlp.

alexott avatar alexott commented on July 17, 2024

@croeder I can try to write such parser - can you provide test data as example?

from clojure-opennlp.

croeder avatar croeder commented on July 17, 2024

@alexott It'll be a few weeks before I can make the time. My wife and I just moved.

To create some, run the treebank-parser here on a nearly any sentence, then modify the tokens so they aren't proper lisp s-expressions. I mentioned 2:30 above. I think it's the leading digit that makes lisp expect either an integer or a float, but the colon throws it off. Quotes would make it the string of characters. Some creative thinking while reading the LISP spec.s may reveal more.

OK, maybe I can do one off the top of my head:
"2:30 is bad." might create a treebank like this: (S (SUBJ (2:30 NN)) (VP (is VB) (bad ADJ)) )
It's horrid linguistics, the structure and tags are wrong, but it captures the idea. LISP/Clojure is happy to read that string, but it, IIRC, wants numbers, strings and identifiers, and the 2:30 breaks it.
(S (SUBJ ("2:30" NN)) (VP (is VB) (good ADJ)) ) would work. As would
(S (SUBJ (230 NN)) (VP (is VB) (good ADJ)) )

from clojure-opennlp.

alexott avatar alexott commented on July 17, 2024

Ok, thank you - I'll try to investigate this issue

from clojure-opennlp.

dakrone avatar dakrone commented on July 17, 2024

Merged Alex's PR and released 0.3.0 with his fix; thanks @alexott!

from clojure-opennlp.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.