Giter Site home page Giter Site logo

core's Introduction

org.clojurenlp.core

Clojars Project Build Status Gitter Lobby

Natural language processing in Clojure based on the Stanford-CoreNLP parser.

๐Ÿ‘‹ MAINTAINERS WANTED!

We need help getting this project moving. Please feel free to email to [email protected] to join the org, or drop a line in the chat room.

This is a work in progress, currently in the POC phase.

Usage

Tokenization

(use 'org.clojurenlp.core)
(tokenize "This is a simple sentence.")
;; => '({:token "This", :start-offset 0, :end-offset 4}
        {:token "is", :start-offset 5, :end-offset 7}
        {:token "a", :start-offset 8, :end-offset 9}
        {:token "simple", :start-offset 10, :end-offset 16}
        {:token "sentence", :start-offset 17, :end-offset 25}
        {:token ".", :start-offset 25, :end-offset 26}) 

Part-of-Speech Tagging

To get a list of TaggedWord objects:

(use 'org.clojurenlp.core)
;;  use any of these:
(-> "Short and sweet." tokenize pos-tag)
(-> "Short and sweet." split-sentences first pos-tag)
(-> ["Short" "and" "sweet" "."] pos-tag)
(-> "Short and sweet." pos-tag)

;; => [#<TaggedWord Short/JJ> #<TaggedWord and/CC> ...]

To return a tag string from TaggedWord object:

(->> "Short and sweet." tokenize pos-tag first .tag)
;; => JJ
(->> "Short and sweet." tokenize pos-tag (map #(.tag %)))
;; => ("JJ" "CC" "JJ" ".")

For more information, see the relevant Javadoc

Named Entity Recognition

To tag named entities utilizing standard Stanford NER model:

(use 'org.clojurenlp.core)
(def pipeline (initialize-pipeline))
(def text "The United States of America will be tagged as a location")
(tag-ner pipeline text)

Training your own model How to Train Your Own Model

To tag named entities utilizing custom trained model:

(use 'org.clojurenlp.core)
(def pipeline (initialize-pipeline "path-to-serialized-model"))
(def text "The United States of America will be tagged as a location")
(tag-ner pipeline text)

Utilizing either NER tagging strategy, a map containing the original text, sentences, tokens, and ner tags will be returned.

Parsing

To parse a sentence:

(use 'org.clojurenlp.core)
(parse (tokenize text))

You will get back a LabeledScoredTreeNode which you can plug in to other Stanford CoreNLP functions or can convert to a standard Treebank string with:

(str (parse (tokenize text)))

Stanford Dependencies

(dependency-graph "I like cheese.")

will parse the sentence and return the dependency graph as a loom graph, which you can then traverse with standard graph algorithms like shortest path, etc. You can also view it:

(def graph (dependency-graph "I like cheese."))
(use 'loom.io)
(view graph)

This requires GraphViz to be installed.

License

ยฉ 2018 The ClojureNLP Organization and Contributors

Distributed under the Apache 2.0 License. See LICENSE for details.

The ClojureNLP Organization

  • Leon Talbot @leontalbot
  • Andrew McLoud @andrewmcloud

Contributors

  • Cory Giles
  • Hans Engel
  • Damien Stanton
  • Andrew McLoud
  • Leon Talbot
  • Marek Owsikowski

donation

core's People

Contributors

andrewmcloud avatar damienstanton avatar gilesc avatar hans avatar leontalbot avatar redmonc2 avatar wiseman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

core's Issues

README examples fail

From the README:

core> (corenlp/pos-tag (corenlp/tokenize "Colorless green ideas sleep furiously."))
IOException Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL  edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem (IOUtils.java:434)

and

mefingram2.core> (corenlp/parse (corenlp/tokenize "I like cheese."))
NullPointerException   clojure.lang.Reflector.invokeInstanceMethod (Reflector.java:26)

and

mefingram2.core> (corenlp/dependency-graph "I like cheese")
#loom.graph.SimpleDigraph{:nodeset #{}, :adj {}, :in {}}

dependency-graph breaks on some sentences

(-> "Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas" nlp/dependency-graph);; => IllegalArgumentException No implementation of method: :src of protocol: #'loom.graph/Edge found for class: java.lang.Long clojure.core/-cache-protocol-fn (core_deftype.clj:583)

Available on clojars?

the readme doesn't include simple install instructions (e.g. for Leiningen), and searching clojars for stanford-corenlp actually has a lot of results. Is there an easy clojar install for stanford-corenlp? That would be trememdously useful at the top of the readme.

ClassNotFoundException

Hi..I am trying to load the project in lein repl, and I get this class not found exception -

ClassNotFoundException java.lang.ClassNotFoundException: edu.stanford.nlp.ling CoreAnnotations

I am working on Windows 7 OS, clojure 1.5.1 .What should I do to have this class available?

Loom throwing compiler exception

CompilerException java.lang.IllegalArgumentException: 
No implementation of method: :src of protocol: 
#'loom.graph/Edge found for class: java.lang.Long, 
compiling:(form-init6444626591033701781.clj:1:12)

Appears to be caused by the (now several-years-old) version of Loom that the original repo depended on. I will work on fixing this in the coming week or so.

More unit tests

Currently there is only a single test that checks tagger output. We should have unit test coverage for everything currently in the core namespace.

Tasks:

  • Unit test coverage for org.clojurenlp.core

pprint methods

We could simply the way java objects are represented in REPL.

;; from
[#object[edu.stanford.nlp.ling.TaggedWord 0x6a16b1ef\"Short/JJ\"]]
;; to 
[#<TaggedWord Short/JJ>]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.