Giter Site home page Giter Site logo

bazaar's People

Contributors

ajratner avatar alldefector avatar netj avatar raphaelhoffmann avatar zifeishan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bazaar's Issues

Medical Map

Hi,

We are trying to map medical articles. I looked at the PMC-OA dataset but the words are not recognized as per medical terms. Have any of you dealt with NER for medical articles. I am exploring Metamap from NLM. I was wondering if the bazaar parser already has something that does NER tagging for medical articles

Any help is highly appreciated

regards,
Manjunath

[error] error message in setting up pip

Dear @raphaelhoffmann

I cloned from 'netj-view-scripts-cleanup' branch and when I run setup.sh, every time I got this error:

...
[error] Server access Error: Unexpected end of file from server url=https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar
[warn]  [NOT FOUND  ] edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar (139350ms)
[warn] ==== public: tried
[warn]   https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar
...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::              FAILED DOWNLOADS            ::
[warn]  :: ^ see resolution messages for details  ^ ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: download failed: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
...
[error] (*:update) sbt.ResolveException: download failed: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
[error] Total time: 214 s, completed Aug 4, 2015 10:03:12 AM

...

It seems there is something wrong with the download, thus I tried this:

wget https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar

It shows that I can successfully download this file.

Could you give me some suggestion about how to solve this error message?

Thank a lot!

Vincent

Auto-calculate the batch size?

@raphaelhoffmann Especially since the loading time of coreNLP is so (relatively) long, it seems like a better default than batch_size=1000 would be to divide lines by the number of cores (or cores*nodes for distribute). For example I just got a 2x speedup on an ec-2 node really easily here (had forgotten to set batch size first time around...). Thoughts?

Something else causing fab parse to abort...

This run seems to have gone through almost the entire process with no errors, and I still can't find the suspect error/exception in the output logs...

This time it is not due to memory or disk space as before.

Exception and no TSV's

Hello, I get :

Exception in thread "main" java.lang.ClassCastException: play.api.libs.json.JsNull$ cannot be cast to play.api.libs.json.JsString
at com.clearcut.nlp.JSONReader$$anonfun$1.apply(JSONReader.scala:40)
at com.clearcut.nlp.JSONReader$$anonfun$1.apply(JSONReader.scala:40)
at scala.Option.map(Option.scala:145)
at com.clearcut.nlp.JSONReader.fetchNext(JSONReader.scala:40)
at com.clearcut.nlp.JSONReader.next(JSONReader.scala:18)
at com.clearcut.nlp.JSONReader.next(JSONReader.scala:7)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at com.clearcut.nlp.JSONReader.foreach(JSONReader.scala:7)
at com.clearcut.nlp.Main$delayedInit$body.apply(Main.scala:116)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at com.clearcut.nlp.Main$.main(Main.scala:10)
at com.clearcut.nlp.Main.main(Main.scala)

And no .tsv file appears. There are .parsed files that have data in them in my *.json.split directory, but only 2 out of 457 have any data in them.

What's gone wrong?

Parser processes are killed (memory error)?

In the parse operation, if the parallelism / batch size is set too high the process aborts due to killed processes...

I suppose a rough solution is to leave a large enough memory overhead... but large enough is poorly defined...

Condor/AWS unified interface?

This is awesome, and I want to re-parse all of open data with it.

Can we make sure that we can run easily on AWS and Condor? Let's just make sure there are simple scripts for us to make it easy.

Ideally, we could make Bazaar a submodule of DeepDive that makes it super easy to parse users own documents (locally, and then an AWS directory, and a condor directory, whatever...)

@zhangce @netj @raphaelhoffmann @feiranwang @senwu @alldefector

Awesome work, Raphael!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.