Giter Site home page Giter Site logo

Comments (7)

jimpil avatar jimpil commented on August 16, 2024

Actually this won't work because the dispath-fn expects a single argument...

from clojure-opennlp.

jimpil avatar jimpil commented on August 16, 2024

We need variadic args in the dispatch-fn and the 2 defmethods like so:

(defmulti make-name-finder 
  "Return a function for finding names from tokens based on a given
   model file."
 (fn [model & args] (class model))) ;;the new dispatch-fn

(defmethod make-name-finder :default
  [modelfile & args]   ;; args are ignored here
  (with-open [model-stream (input-stream modelfile)]
    (make-name-finder (TokenNameFinderModel. model-stream))))

(defmethod make-name-finder TokenNameFinderModel
  [model & {:keys [feature-generator]}] ;;but not ignored here
  (fn name-finder
    [tokens & contexts]
    {:pre [(seq tokens)
           (every? #(= (class %) String) tokens)]}
    (let [finder (NameFinderME. model feature-generator *beam-size*) ;;can be nil - no problem
          matches (.find finder (into-array String tokens))
          probs (seq (.probs finder))]
      (with-meta
        (distinct (Span/spansToStrings matches (into-array String tokens)))
        {:probabilities probs}))))

What do you think?

Jim

from clojure-opennlp.

jimpil avatar jimpil commented on August 16, 2024

and to finish things off, the NER trainer must be slightly changed to take into account the custom feature generator:

(defn ^TokenNameFinderModel train-name-finder
  "Returns a name finder based on a given training file"
  ([in] (train-name-finder "en" in))
  ([lang in] (train-name-finder lang in 100 5))
  ([lang entity in feature-gen iter cut] 
     (NameFinderME/train
      lang
     (if (nil? entity) "default" entity)
      (->> (reader in)
           (PlainTextByLineStream.)
           (NameSampleDataStream.))
      feature-gen     
      {}
      iter
      cut)))

from clojure-opennlp.

jimpil avatar jimpil commented on August 16, 2024

Well, obviously the above will break a couple of things first being the f2 overloads that delegate to it, so again I suggest optional arguments like this:

(defn ^TokenNameFinderModel train-name-finder
  "Returns a name finder based on a given training file"
  ([in] (train-name-finder "en" in))
  ([lang in] (train-name-finder lang in 100 5))
  ([lang in iter cut & {:keys [entity-type feature-gen]  
                        :or  {entity-type "default"}}]                    
     (NameFinderME/train
      lang
      entity-type
      (->> (reader in)
           (PlainTextByLineStream.)
           (NameSampleDataStream.))
      feature-gen     
      {}
      iter
      cut)))

this doesn't break anything - whatever callers used 4 arguments they can keep doing so. Custom feature generation is very important for real use...As a last point, it would be nice to be able to train a perceptron as well, so we can compare with maXent. The only thing difference here is that we need to call a different NameFinderME.train() method. The one that takes a TrainingParameters object. I cannot squeeze it in the existing implementation via an optional arg without involving a big ugly 'if' so the simplest thing that comes to me as non-breaking is pulling out the real work done into 3 separate functions. the current top level 'train-name-finder' would take another optional argument :classifier (could default to maXent) and will delegate the real work accordingly to a 'train-name-finder-maxent' or a 'train-name-finder-perceptron' . It makes no sense to paste the whole thing here but if you're interested let me know.

Jim

from clojure-opennlp.

dakrone avatar dakrone commented on August 16, 2024

I am interested, I'll take a look at pulling it into a better method. In the meantime, I'm more than happy to take a pull request if you have specific ideas about implementation.

from clojure-opennlp.

jimpil avatar jimpil commented on August 16, 2024

Yeah sure... I've already forked it but I thought I should wait cos basically the same thing can be applied to all the models...I've only touched the name-finder and his training.

Jim

from clojure-opennlp.

dakrone avatar dakrone commented on August 16, 2024

This is in clojure-opennlp 0.2.0, just released.

from clojure-opennlp.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.