bazaar's People
Forkers
colossus infinitespace sgt101 musfir90 jcye kaychaks codeaudit ryan2x pireerliu jackysnake suveshbaskarbazaar's Issues
Keep local server register of started & completed segments
Medical Map
Hi,
We are trying to map medical articles. I looked at the PMC-OA dataset but the words are not recognized as per medical terms. Have any of you dealt with NER for medical articles. I am exploring Metamap from NLM. I was wondering if the bazaar parser already has something that does NER tagging for medical articles
Any help is highly appreciated
regards,
Manjunath
[error] error message in setting up pip
Dear @raphaelhoffmann
I cloned from 'netj-view-scripts-cleanup' branch and when I run setup.sh, every time I got this error:
...
[error] Server access Error: Unexpected end of file from server url=https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar
[warn] [NOT FOUND ] edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar (139350ms)
[warn] ==== public: tried
[warn] https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar
...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: FAILED DOWNLOADS ::
[warn] :: ^ see resolution messages for details ^ ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: download failed: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
...
[error] (*:update) sbt.ResolveException: download failed: edu.stanford.nlp#stanford-corenlp;3.5.2!stanford-corenlp.jar
[error] Total time: 214 s, completed Aug 4, 2015 10:03:12 AM
...
It seems there is something wrong with the download, thus I tried this:
wget https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2-models.jar
It shows that I can successfully download this file.
Could you give me some suggestion about how to solve this error message?
Thank a lot!
Vincent
Auto-calculate the batch size?
@raphaelhoffmann Especially since the loading time of coreNLP is so (relatively) long, it seems like a better default than batch_size=1000
would be to divide lines by the number of cores (or cores*nodes
for distribute). For example I just got a 2x speedup on an ec-2 node really easily here (had forgotten to set batch size first time around...). Thoughts?
Instances can run out of HD space during parse operation
It seems like both to avoid this issue & for general reasons we might want to collect parsed segments as we parse- will see if relatively easy way to do this
Something else causing fab parse to abort...
This run seems to have gone through almost the entire process with no errors, and I still can't find the suspect error/exception in the output logs...
This time it is not due to memory or disk space as before.
Exception and no TSV's
Hello, I get :
Exception in thread "main" java.lang.ClassCastException: play.api.libs.json.JsNull$ cannot be cast to play.api.libs.json.JsString
at com.clearcut.nlp.JSONReader$$anonfun$1.apply(JSONReader.scala:40)
at com.clearcut.nlp.JSONReader$$anonfun$1.apply(JSONReader.scala:40)
at scala.Option.map(Option.scala:145)
at com.clearcut.nlp.JSONReader.fetchNext(JSONReader.scala:40)
at com.clearcut.nlp.JSONReader.next(JSONReader.scala:18)
at com.clearcut.nlp.JSONReader.next(JSONReader.scala:7)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at com.clearcut.nlp.JSONReader.foreach(JSONReader.scala:7)
at com.clearcut.nlp.Main$delayedInit$body.apply(Main.scala:116)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at com.clearcut.nlp.Main$.main(Main.scala:10)
at com.clearcut.nlp.Main.main(Main.scala)
And no .tsv file appears. There are .parsed files that have data in them in my *.json.split directory, but only 2 out of 457 have any data in them.
What's gone wrong?
Bazaar/Parser doesn't correctly escape some characters in TSV
For example, carriage returns should also be escaped properly but not, hence causing troubles like HazyResearch/deepdive#523.
I think this part of the code needs more careful work to conform to Postgres' TSV format or some other stricter standard:
Launching a r3.8xlarge instance doesn't work
Launches an m3.2xlarge instead (confirmed via ssh)- any ideas here?
Parser processes are killed (memory error)?
In the parse operation, if the parallelism / batch size is set too high the process aborts due to killed processes...
I suppose a rough solution is to leave a large enough memory overhead... but large enough is poorly defined...
Condor/AWS unified interface?
This is awesome, and I want to re-parse all of open data with it.
Can we make sure that we can run easily on AWS and Condor? Let's just make sure there are simple scripts for us to make it easy.
Ideally, we could make Bazaar a submodule of DeepDive that makes it super easy to parse users own documents (locally, and then an AWS directory, and a condor directory, whatever...)
@zhangce @netj @raphaelhoffmann @feiranwang @senwu @alldefector
Awesome work, Raphael!
Default settings require empty config.properties
We define a few default settings for the different annotators. However, these are only loaded when the current directory contains a config.properties
file (can be empty). The issue is that setProperties(..)
doesn't get called without an existing properties file: https://github.com/HazyResearch/bazaar/blob/master/pipe/src/main/scala/com/clearcut/pipe/Main.scala#L81
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.