Comments (6)
I agree, it'd be better to have an actual parser for the S-expressions treebank string. I'd like to be able to write one that handled it well. In the meantime though, the string is still available for people (which is why I didn't run the treebank stuff through the make-tree method by default).
I'll leave this open, in the future I may write an actual parser for it (or have one contributed).
from clojure-opennlp.
It may turn out to be a fun Clojure exercise. I've parsed treebank strings in Python before so I see the challenge partly as involving Clojure in general, and partly how to identify the different types of scalar. If you're itching to take a crack at it, I probably won't for a little while. I want to integrate found names with the parse.
from clojure-opennlp.
@croeder I can try to write such parser - can you provide test data as example?
from clojure-opennlp.
@alexott It'll be a few weeks before I can make the time. My wife and I just moved.
To create some, run the treebank-parser here on a nearly any sentence, then modify the tokens so they aren't proper lisp s-expressions. I mentioned 2:30 above. I think it's the leading digit that makes lisp expect either an integer or a float, but the colon throws it off. Quotes would make it the string of characters. Some creative thinking while reading the LISP spec.s may reveal more.
OK, maybe I can do one off the top of my head:
"2:30 is bad." might create a treebank like this: (S (SUBJ (2:30 NN)) (VP (is VB) (bad ADJ)) )
It's horrid linguistics, the structure and tags are wrong, but it captures the idea. LISP/Clojure is happy to read that string, but it, IIRC, wants numbers, strings and identifiers, and the 2:30 breaks it.
(S (SUBJ ("2:30" NN)) (VP (is VB) (good ADJ)) ) would work. As would
(S (SUBJ (230 NN)) (VP (is VB) (good ADJ)) )
from clojure-opennlp.
Ok, thank you - I'll try to investigate this issue
from clojure-opennlp.
Merged Alex's PR and released 0.3.0 with his fix; thanks @alexott!
from clojure-opennlp.
Related Issues (17)
- opennlp library HOT 2
- Custom Feature generation impossible via 'make-name-finder' HOT 7
- java.io.FileNotFoundException: Could not locate opennlp/nlp__init.class or opennlp/nlp.clj on classpath HOT 5
- bare clojure.java.io/readers, writers, input-streams etc etc all over tools/train.clj HOT 2
- could you include the models for dates, organizations, money, location, and time? HOT 2
- build-posdictionary is broken? HOT 2
- The chunker needs punctuation to work properly HOT 4
- IOException Mark invalid java.BufferedReader.reset (BufferedReader.java:505) HOT 2
- NullPointerException when chunk-filter encounters a phrase with {:tag nil} HOT 3
- NoClassDefFoundError for instaparse when creating uberjar HOT 4
- Tokenizing not happening perfectly HOT 3
- CompilerException clojure.lang.ArityException HOT 5
- How to deal with indeterminacy? HOT 4
- Upgrading to OpenNLP 1.6 HOT 8
- Proposal for treebank-parser tree structure HOT 5
- Upgrade to OpenNLP 1.5.2 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clojure-opennlp.