Giter Site home page Giter Site logo

clulab / eidos Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 24.0 173.14 MB

Machine reading system for World Modelers

License: Apache License 2.0

Scala 75.41% Shell 0.95% HTML 0.20% JavaScript 19.97% CSS 1.98% Batchfile 0.07% Python 1.05% Dockerfile 0.06% Java 0.30% Makefile 0.01%

eidos's Introduction

eidos's People

Contributors

adarshp avatar ajaynagesh avatar beckysharp avatar bethard avatar bgyori avatar brandomr avatar egolaparra avatar fan-luo avatar gcgbarbosa avatar hclent avatar jerryzeyu avatar kwalcock avatar marcovzla avatar maxaalexeeva avatar mihaisurdeanu avatar mithunpaul08 avatar vanh17 avatar vikas95 avatar yanzv avatar zhengtang1120 avatar zhengzhongliang avatar zupon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eidos's Issues

ported_token_1_noun-Causal is overall not good

I've been systematically looking at extractions and the overwhelming majority of extractions by ported_token_1_noun-Causal aren't right. The failures come from actual texts but I could reproduce them on minimal examples in the shell.

The issue seems to be that causal relationships are extracted from non-causal phrases due to inadvertently being triggered by a noun e.g. "IPC Phase Source", "food aid delivery", "several aid agencies", "social support networks", "climate change impacts", "methodological limitations identified", and the list goes on. There are some cases where the phrase does have some sort of a causal influence in it like "fuel options compound food insecurity risk" but even in these cases the cause and effect are assigned in the opposite of what I would expect, (i.e. food insecurity risk causes fuel options, delivery causes food, impacts cause climate, etc.).

While the above examples are not due to this, I also think that this rule is particularly sensitive to messy PDF to text conversions where groups of disconnected nouns can show up next to each other, separated by spaces, even though they don't actually appear in a real sentence.

Hearst patterns extracted as causal

I noticed that the ported_syntax_noun_Hearst-Causal rule extracts Hearst patterns that could be useful for finding hyponyms (I just learned about what all this means in the last 10 minutes) but I don't think these should be extracted as Causal. For instance,

  • dietary alternatives, such as other livestock
  • emergency livestock support including provision of feed

currently produce causal relations.

DOMSource cannot be processed

On certain strings, I am seeing a large number of these errors

Error 
  DOMSource cannot be processed: check that saxon8-dom.jar is on the classpath

when reading via the python interface which uses a fat JAR of Eidos and its dependencies. These errors flood the screen but don't seem to have a serious effect, the reading completes and returns results.

I found that one example of a string that triggers this is "NOV". It looks like lowercase "nov" does too. So then I thought maybe it has to do with month names, and indeed "dec", "jan", "feb", etc. all result in the same error. Similarly, "november", "december", etc. result in the same.

So given this info, does anybody know what this could be?

Reading Software

Model for reading software:

  • Joshua wants a POC to show that the idea is possible and to justify DARPA funding
  • Look for dependency graph software that parses fortran code to dependency graphs
    -- Need a graph of variables in the code
    -- Note: there are many versions of fortran that are diff in what they do
  • Joshua wants engagement from crop-modelling community
  • There’s a workshop this Spring, we’d like a walk-through example

Grammars

  • Convert to mention_with_mod (now in Processors master)
    -- update taxonomy
    -- use action with modification rules to convert NPs
  • Convert to universal dependencies

change argument names

  • to more generic names (i.e., src/target, controller/controlled, whichever)
  • check to make sure other code doesn't break in webapp/brat vis, etc
  • coord with @kwalcock to make names work with JSON-LD reps

Rule files have a comment at the beginning

Each of the rule.yml files have a comment at the beginning describing what are the type of rules present in that file. It helps developers/test fixers to narrow down where to look.

OutOfMemoryError when calling EidosSystem.extractFromText

I'm getting an OutOfMemoryError when calling the extractFromText method, even though I raised my memory allowance in .sbtopts from 6GB to 12GB (this is what is in .sbtopts: -J-Xmx12G. Here is the sequence of actions that produces the error:

Bring up the scala REPL using scala, then do:

import org.clulab.wm.eidos.EidosSystem
val reader = new EidosSystem()
val text = "Conflict causes displacement"
val annotatedDocument = reader.extractFromText(text, true)

Can anybody reproduce this error?

Here is the error log:

java.lang.OutOfMemoryError: Java heap space
  at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3468)
  at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3275)
  at java.io.ObjectInputStream.readString(ObjectInputStream.java:1650)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
  at java.util.HashMap.readObject(HashMap.java:1394)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
  at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2591)
  at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1468)
  at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1500)
  at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2941)
  at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282)
  at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266)
  at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:142)
  at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)
  at edu.stanford.nlp.pipeline.NERCombinerAnnotator.<init>(NERCombinerAnnotator.java:91)
  at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:70)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$44(StanfordCoreNLP.java:498)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$1351/1167582947.apply(Unknown Source)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)

UN Reports

  • remind Becky to send query she used for 8500 papers

  • add to dropbox folder

IndexOutOfBoundsException when reading multiple documents

When I try to read the 52 MITRE documents with ExtractFromDirectory, I run into the following error after a certain number of documents are processed.

java.lang.IndexOutOfBoundsException: 4
	at scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:123)
	at scala.collection.immutable.Vector.apply(Vector.scala:114)
	at org.clulab.wm.eidos.EidosActions.$anonfun$mergeAttachments$4(EidosActions.scala:190)

(I have suppressed the rest of the stack trace for readability)

This occurs whether I keep or remove the parallelism in ExtractFromDirectory. The issue seems to be in the mergeAttachments function in EidosActions - in this line:

exampleMention = group(index)

the index is out of bounds. Was it @ZhengTang1120 who wrote the function? If so, could you (@ZhengTang1120 ) take a look at it and see what's going on?

Questionable code for matching brackets and braces

This code was copied into Eidos from Processors. The definition of "match" is in question. After the first pair, the match is quite loose. Can @myedibleenso, who GitHub blames for this, explain the intention? I rewrote it to check for proper nesting, but that may be overkill. Neither code deals well with things like "We want to 1) do this 2) do that and 3) do the other thing." which doesn't pair the parens. I wonder if we should keep, remove, or replace. Thanks.

  /** Check if brackets and braces match */
  def matchingBrackets(mention: Mention): Boolean = {
    val pairs = Seq(("(", ")"), ("{", "}"), ("[", "]"))
    pairs.forall(pair => matchingBrackets(mention, pair._1, pair._2))
  }

  def matchingBrackets(mention: Mention, opening: String, closing: String): Boolean = {
    val lhsIdx = mention.words.indexOf(opening)
    val rhsIdx = mention.words.indexOf(closing)
    (lhsIdx, rhsIdx) match {
      // no brackets found
      case (-1, -1) => true
      // unmatched set
      case (-1, _) => false
      case (_, -1) => false
      // closing bracket appears before first opening
      case (broken1, broken2) if broken1 > broken2 => false
      // lhs precedes rhs, so count pairs
      case _ =>
        val lhsCnt = mention.words.count(_ == opening)
        val rhsCnt = mention.words.count(_ == closing)
        lhsCnt == rhsCnt
    }
  }

broke display of attachments in webapp

we now get:
List(NounPhrase, Entity) => production
------------------------------
Rule => simple-np++Decrease_ported_syntax_2_verb
Type => TextBoundMention
------------------------------
NounPhrase, Entity => production
* Attachments: org.clulab.wm.eidos.attachments.Decrease@1c905409, org.clulab.wm.eidos.attachments.Quantification@b716b2d4
------------------------------

quantification1 pulls out quantifier as part of entity

In attempting to fix test 8 in p6s2, I found that quantification1 in entityQuantification.yml pulls out the quantifier along with the noun that it applies to. E.g. "poor" is included in "poor access to services", whereas I would have expected this entity to just be "access to services":

List(NounPhrase, Entity) => poor access to services
	------------------------------
	Rule => simple-np++quantification1
	Type => TextBoundMention
	------------------------------
	NounPhrase, Entity => poor access to services
	  * Attachments: Quantification(poor)
	------------------------------

Am I right that this is a problem? Or is the quantifier supposed to be part of the entity?

Inconsistent results

Two runs of, for instance, ExtractFromDirectory, especially in the parallel version, can produce different results.

Classpath problem with sbt

The kwalcock-classpath branch has a test TestSerialization with a few lines that cause problems. (See PR #289.) They are commented out in the master branch. Basically, the line val copy = decoder.readObject() causes

[info] TestSerialization:
[info] Standard Serializer
[info] - should serialize and deserialize mentions *** FAILED ***
[info]   java.lang.ClassNotFoundException: org.clulab.struct.DirectedGraph
[info]   at java.net.URLClassLoader.findClass(Unknown Source)
[info]   at java.lang.ClassLoader.loadClass(Unknown Source)
[info]   at java.lang.ClassLoader.loadClass(Unknown Source)
[info]   at java.lang.Class.forName0(Native Method)
[info]   at java.lang.Class.forName(Unknown Source)
[info]   at java.io.ObjectInputStream.resolveClass(Unknown Source)
[info]   at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
[info]   at java.io.ObjectInputStream.readClassDesc(Unknown Source)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
[info]   at java.io.ObjectInputStream.readObject0(Unknown Source)
[info]   ...

when run under sbt. However, it works for IntelliJ and Eclipse. I assume these have taken over the classpath and magically solve the problem. Our users, most importantly Travis, need to be able to make do with sbt.

Another possible symptom of a build.sbt which may not be quite right is that when I import the project into IntelliJ, it gets an extra eidos module beside the core, core-build, webapp, and webapp-build. The IDE wil not let me right click and run tests until the extra module is removed. If I manually make a configuration to run the test, I recall that it also has classpath problems until the module is removed.

It seems like build.sbt has some kind of implicit module (or lack of one) that is causing some problems. It does seem structured differently than the same file for processors and reach.

Canonical name handling formatting-related characters

I found that the canonicalName in the JSON-LD output sometimes contains things like list bullet points. I'm sorry I don't understand how canonical names are constructed deeply enough to make this change myself, so I'm just reporting the issue.

Here is a case where the bullet point character is not stripped off in the canonical name:

       "@type": "Entity",
       "@id": "_:Entity_207",
       "labels": [
         "NounPhrase",
         "Entity"
       ],
       "text": "\u2022 The wetter",
       "rule": "simple-np",
       "canonicalName": "\u2022 The wetter",

Here is a more problematic case where the actual string is lost and only the bullet point (arrow) character remains as the canonical name:

      "@type": "Entity",
       "@id": "_:Entity_1780",
       "labels": [
        "NounPhrase",
         "Entity"
       ],
       "text": "\u27a4\u27a4 Across Central Asia",
       "rule": "simple-np",
      "canonicalName": "\u27a4",
     ...

Better expansion for VP entities.

In the text:

"Urgent action to end the conflict, improve humanitarian access
|to severely food insecure populations, and increase size and
|scope of emergency assistance delivery is critical to save
|lives over the coming year."

We should pick "save lives over the coming year" as an entity. That is, expand verbal entities across dobj and nmod:* dependencies.

Extension of JSON-LD format to accommodate BBN data and our own future work

Disclaimer: There are many ways to do this and picking one will likely involve unknown forces of subjectivity, intuition, and conjecture that won't match mine. In other words, this may be totally off base.

I'd like to divide the participants in our (binary) DirectedRelation and (binary) UndirectedRelation into essential and non-essential participants and not mix them. This is so that the focus is clear and simple and not everyone is forced to deal with the non-essential. This comes at the expense of needing different code to process the two types.

In the case of DirectedRelation, we have to somehow specify the two (sets of) participants and the direction. Without these there just isn't a DirectedRelation. These should be placed in unambiguous locations and not require extra reasoning to extract. We have something called source and destination to fulfill these requirements.

The situation is then complicated by the fact that there may be different kinds of DirectedRelation. We've been focussed on Causal, but there are more in our own pipeline and other groups will have theirs. Right now we have some indication of the particular relationship in the labels field and less directly, a rule field. The former contains a list for us and it is only by convention that someone would know that the first item is most specific. It doesn't seem likely that all groups will have a list or a rule field. I suggest a new field called relationName that specifies in a single string fit for human consumption the kind of DirectedRelation.

Each kind relation may have special names for the participants that will help a human understand what is meant by the generic source and destination. These might be used to label nodes even when the relationName is not recognized. I'd call these sourceName and destinationName.

It is possible to store this information in many different ways, such as
participants: [ { role: "source", name: "cause", value: "lack of food" }, { role: "destination", name: "effect", value: "hunger" } ]
and I will suggest that for the non-essential participants that have more open class roles. For the essential participants I think this complicates extraction of information such as the name of the destination. Finding the answer requires a search through a list.

The non-essential participants, which may be thought of as modifying the main relationship in any of a myriad of ways (and drawn as such) need a field name. Modifier and argument are already in use once, attachment is used in the program, but not JSON-LD. A brainstorm of other names is pasted below. I'd like to require that these non-essential participants be other instances of Entity, DirectedRelation, or UndirectedRelation and be included by reference. They can be in charge of their own rules, triggers, labels, provenance, etc. Other fields can be "role" for a formal, potentially standardized explanation of what the non-essential is doing there, and then "name" for a human friendly version, and lastly, "value".

Here is a simplified example:

She gave him a book yesterday at the library. (This is an increase event.)

{
@type: "DirectedRelation",
trigger: { text : "gave" },
relationName : "transfer"
sourceName : "transferrer",
destinationName: "transferred",
sources : [ "She" ],
destinations : [ "book, a" ]
relatedItems : [ { role : "indirectObject", name : "recipient", value : "him" }, { role : "time", name : "when", value: "yesterday" }, { role: "location", name: "where", value: "at the library" } ]

For UndirectedRelation, I would add relationName and just argumentName, since there is no distinction between source and destination. After that, the non-essential items can be added.

Brainstorm: environment, details, observers, hangersOn, audience, players, relatedItems, relatedInfo, relatives, accomplices, helpers, cast, crew, costars, public, detours, waymarks, context, supportingRoles, supportingArguments, sidekicks, optional

P4S2 in TestCagP2.scala

Sentence:

Violence has caused livestock to be looted, killed and disease-prone and crops destroyed, and displacement has caused delayed planting.

Test:

tester.test(newEdgeSpec(displacement, Causal, delayedPlanting)) should be (successful)

Problem:

delayedPlanting is not extracted, as shown in figure below.

I tried to debug and noticed that both ‘delayed’ and ‘planting’ are considered as I-VP

The tag for 'delayed planting’ also seems wrong:
‘delayed’ is considered as VBN, and ‘planting’ is considered as VBG.

This issue seems from tagPartsOfSpeech in processor.

import vars

import the variables at the tops of grammar yml files (e.g. noun_modifiers)

Getting grounding in JSON-LD output

We're working on processing grounding information from the JSON-LD output (can get it from the example json-ld file posted here on Github) but haven't yet figured out how to run the reader and the serializer to get the groundings in the output. We found that running the example at https://github.com/clulab/eidos/blob/master/src/main/scala/org/clulab/wm/eidos/apps/examples/ExtractFromText.scala
doesn't produce the grounding entries for each of the entities. What would need to be changed to get those?

ported_syntax_1c_verb-Causal is overall not good

Partly due to #265, which is quite common, but also due to a number of other issues, the vast majority of ported_syntax_1c_verb-Causal extractions are incorrect. Since this rule accounts for ~18% of all extractions, putting some effort into improving it would make a big difference.

I found many cases where a causal relation was warranted like:

  • supplementation of work oxen is likely to lead to improved tillage
  • job-seekers leading to low real wage levels
  • political instability have themselves contributed further to difficulties
  • CO2 concentrations will lead to complex
  • campaign also contributed to peace and security
  • access to food have contributed to political instability

but in each of these cases, the cause and effect were flipped i.e. tillage causes supplementation of work oxen.

There are also a large number of cases where there isn't a causal relation mentioned but we get extractions from sentences like:

  • hunger has spread to locations
  • remote rural locations that had limited to no basic services
  • searches were limited to publications in Popline

Then I found some cases where the causal relation is essentially correct but the polarity of one of the arguments is incorrect:

  • Hot days induced by precipitation -> here "hot days" for some reason comes with a negative polarity

Parentheses escaped as -LRB-, -RRB-

When parentheses are present in the text, they get escaped as -LRB-, -RRB-, etc. This gets propagated to the sentence text in the JSON-LD output file. I suspect it might also cause some weird issues - such as the entity "Conflict" not being grounded in the sentence

"Conflict affects mostly the Greater Upper Nile Region (states of Upper Nile, Unity and Jonglei) with Central Equatoria remaining by and large unaffected after the early stages of the conflict."

Minimal working example with sbt console:

import org.clulab.wm.eidos.EidosSystem
val reader = new EidosSystem()
reader.extractFromText("X (Y) causes Z").document.sentences.head.getSentenceText
res2: String = X -LRB- Y -RRB- causes Z

I think the issue is related to Universal Dependencies - I managed to find the following issues filed in 2015:

UniversalDependencies/UD_English-EWT#1
UniversalDependencies/docs#148

Any ideas on how to fix this?

Handling Quantifiers modifiers

Does the system handle modifiers of Quantifiers? For example, in quantification_adjective_rule_3 there is the following line commented:

# adverbs: Quantifier? = ${quant_modifiers} // todo: we should allow for modification here (e.g. 'more')
Is this already implemented?

Thank!

valid edge tag in entity finder

makes it so expanded entities can be omitted as a whole:
This is the highest level of hunger since the conflict in South Sudan began two years ago.

compare with:
This is the highest level of hunger since the conflict in South Sudan began two years.

copular quants

[resolved for this issue]
"We are now seeing sharp spikes of need in new areas, such as Eastern Equatoria or Western
Bahr el-Ghazal, where malnutrition rates in some places are reaching dangerous levels.

X [lemma=be] VB-reaching Quant(dangerous) NN-transparent(levels|amounts|quantities)

Memory requirements and configuration

I'm recording this as an issue so that it doesn't take up meeting time and for the record. Some assignees can also double check to make sure I haven't gone insane.

As @adarshp noticed some time ago and documented nicely on the README.md page, Eidos does not perform well under low memory conditions, and by low I mean a very high low. One solution is to use the .sbtopts file to increase the maximum memory allowed. As @hclent once discovered, Travis isn't necessarily happy with that.

The short story is that Travis has been reconfigured to be happy with it and 20 days ago the file slipped out to GitHub and is now in master, with no apparent ill effects. Travis is now using, instead of a default Amazon EC2 instance with 4GB memory and only 2GB for Java, the GCE from Google with 7.5GB memory. (See .travis.yml for details.) Other projects, Processors and Reach, had been doing this all along, unbeknownst to me. A side-effect of not borrowing their configuration files was discovering how to activate cacheing at Travis, which halved the test time. Sometimes ignorance has a silver lining.

So, if people have a personal .sbtopts file they have been maintaining, it should no longer be necessary. Indeed, the file will likely be overwritten with the GitHub version. I should change some of the documentation in README to describe the new situation. I'm also double checking whether the exceptions for Windows are necessary. I still get out of memory with 6GB configured in a different way (along with the appropriately sized heap dump), so maybe .sbtopts was actually being heeded after all.

Mistaken bidirectional causality with certain sentences

When Eidos processes the sentence:

Water trucking has decreased due to the increased cost of fuel

It reports two causal events with opposite directions, instead of just one:

events:
List(Causal, DirectedRelation, EntityLinker, Event) => Water trucking has decreased due to the increased cost of fuel
	------------------------------
	Rule => dueToSyntax4-Causal
	Type => EventMention
	------------------------------
	trigger => due
	cause (NounPhrase, Entity) => cost of fuel
	  * Attachments: Increase(increased,None)
	effect (NounPhrase, Entity) => Water trucking
	  * Attachments: Decrease(decreased,None)
	------------------------------

List(Causal, DirectedRelation, EntityLinker, Event) => Water trucking has decreased due to the increased cost of fuel
	------------------------------
	Rule => ported_syntax_7_verb-Causal
	Type => EventMention
	------------------------------
	trigger => decreased
	effect (NounPhrase, Entity) => cost of fuel
	  * Attachments: Increase(increased,None)
	cause (NounPhrase, Entity) => Water trucking
	  * Attachments: Decrease(decreased,None)
	------------------------------

Is this something that can be fixed?

Reference corpus to highlight rule side-effects

I brought up this idea last year for REACH and then never pursued it, but perhaps we could do it for Eidos. If we had a standard text corpus, we could maintain a reference output for that corpus.
As part of the Travis process, this corpus could be re-read, and its output compared against the reference to highlight changes. These changes could then be browsed by a human to explore the side effects of the rule updates. If any specific, unwanted side-effect change occurs, an explicit unit test for it can be added. I think this could be a very useful safe-guard in addition to the explicit unit tests. Does this make sense? I feel like a lot of the issues cropping up recently could be detected early and more systematically like this.

Affect relation -- X was affected by Y

per convo between Ajay and Becky:
When X and Y share semantic polarity (i.e., rainfall and crop production), implied increase
When X and Y don't (i.e., rainfall deficit and crop production), implied decrease...
simple EdgeSpec(Y, Causal, X) doesn't capture this, but without Affect relation that is the best fit?

ExtractFromDirectory app throws errors

Hi,

I tried extracting events from the MITRE evaluation document corpus with the following invocation:

sbt "runMain org.clulab.wm.eidos.apps.ExtractFromDirectory ~/ml4ai/WorldModelers/MITRE_Evaluation_Documents output_dir"

and things seem to run ok for a while, but then I get this error:

17:29:52.902 [run-main-0] INFO  org.clulab.wm.eidos.EidosSystem$ - domainOntologyPath: /org/clulab/wm/eidos/toy_ontology.yml
[error] (run-main-0) java.lang.NullPointerException
java.lang.NullPointerException
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$ExtractFromDirectory$1(ExtractFromDirectory.scala:22)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$delayedInit$body.apply(ExtractFromDirectory.scala:14)
	at scala.Function0.apply$mcV$sp(Function0.scala:34)
	at scala.Function0.apply$mcV$sp$(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App.$anonfun$main$1$adapted(App.scala:76)
	at scala.App$$Lambda$5/1422905801.apply(Unknown Source)
	at scala.collection.immutable.List.foreach(List.scala:389)
	at scala.App.main(App.scala:76)
	at scala.App.main$(App.scala:74)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$.main(ExtractFromDirectory.scala:14)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory.main(ExtractFromDirectory.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
[trace] Stack trace suppressed: run last core/compile:runMain for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
	at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last core/compile:runMain for the full output.
[error] (core/compile:runMain) Nonzero exit code: 1
[error] Total time: 3 s, completed Mar 21, 2018 5:29:52 PM
  1. Do I have the right invocation?
  2. Has anybody else run into this issue?

Move content to Eidos

  • grounding can stay in, but not be called
  • the json serialization and deserialization will live here

Incresed by X percent extracted as a causal relation

I guess the pattern here is "A increased by B" being extracted as B causes A but "increase by" is often followed by the magnitude of the increase, not its cause:

  • Wheat consumption increased by more than 900 percent
  • net sorghum production increased by only 15 percentage points
  • wheat increased by 69, 40, and 40 percentage points

These types of sentences shouldn't yield causal extractions.

Similar issues include extractions from sentences like:

  • absolute number of IDPs increased by about 250 000 people
  • surplus shrunk to only about 15 000 tonnes

Eidos not picking up entity

It seems that Eidos is not picking up entities when they are at the beginning of a sentence that is preceded by another sentence.

  • The sentence Conflict causes displacement works fine - the nodes and the edge are picked up.
  • The sentences X causes Y . Conflict causes displacement. results in no entities or events picked up in the second sentence.
  • If an article is added before 'Conflict', it works: X causes Y . The conflict causes displacement. Now, all entities and events are picked up.

These were tested using the EidosShell.

P3S2 in TestCagP3.scala

I have problems with three of the four tests for p3s2. Besides, there seem to be some faulty causal relations.


causalgraph

depgraph


Test:

 val impacts = NodeSpec("impacts of flooding")
 val insecurity = NodeSpec("food insecurity", Quant("critical"))
 tester.test(EdgeSpec(impacts, Causal, production)) should be (successful)

Problem:

 Error In the dependency graph: "flooding" is a nmod_of of "collapse" instead of "impacts", so they are extracted as separated entities

Test:

  val conflict = NodeSpec("conflict")
  val production = NodeSpec("agricultural production", Dec("reduced"))
  tester.test(EdgeSpec(conflict, Causal, production)) should be (successful)

Problem:

 Error In the dependency graph: "spikes in conflict" is a nmod_as of "remained" instead of nsubj of "reduced".

Test:

  val production = NodeSpec("agricultural production", Dec("reduced"))
  val insecurity = NodeSpec("food insecurity", Quant("critical"))
  tester.test(EdgeSpec(production, Correlation, insecurity)) should be (successful)

Problem:

 There is not a clear trigger for this case. 

syntax_4_verb-Correlation

in explicit linkers:
see comments in rule

    # todo: this is likely only valid if there's some verb happening in at least one of the entities.  We should check for
    # that in an action maybe -- look for either a literal verb in the span OR an inc/dec attachment
    # example of when it *shouldn't* match: "Fields were additionally treated with fertilizer for increased crop yield."```

Print syntactic dependencies in RAPShell

I would love to see the syntactic dependencies printed in the RAPShell (though I know it's available via the web interface). After some digging, I found that the reason dependencies aren't currently displayed is that this line is commented out:

// printSyntacticDependencies(s)

Shall I uncomment it in a PR, or is there a good reason not to do that? Thanks!

Effect of period at end of sentence?

With a period at the end of the sentence, one of the causal relations is missed:

screen shot 2018-04-01 at 11 47 57 pm

But when the period is removed, the third one is picked up:
screen shot 2018-04-01 at 11 48 23 pm

Does anyone know why this is?

How to get JSON-LD with groundings?

What is the invocation for getting a JSON-LD file with Eidos mentions with attached groundings? I tried doing

sbt “runMain org.clulab.wm.eidos.apps.ExtractFromDirectory ../../ml4ai/WorldModelers/MITRE_Evaluation_Documents output_dir”

but the output .jsonld files do not have the grounding information.

JSON serialization has multiple instances of same event

For the sentence

The government promotes improved cultivar to boost agricultural production for ensuring food security.

The JSON output contains 4 copies of the same event, here with ID E:-1154077341:

    {
      "type": "EventMention",
      "id": "E:-1154077341",
      "text": "cultivar to boost agricultural production",
      "labels": [
        "Causal",
...

with the same foundBy, same arguments that have the same IDs, etc. Note that this is the JSON I obtain from serialization.json.WMJSONSerializer.

Is this a serialization issue or something else?

'not been possible' is not recognized as decrease trigger 'inhibit'

Sentence p4s4 in testCAGP4.scala:

Borehole repairs have not been possible in areas hardest hit by conflict, including large swathes of Upper Nile, due to lack of access due to insecurity and lack of technical expertise and supplies.

It is supposed to capture entity "Borehole repairs" with decrease trigger 'inhibit', but 'inhibit' itself does not appear in the sentence.

val repairs = newNodeSpec("Borehole repairs", newDecrease("inhibit"))

Do not get the correct entities.

When parsing "the rapidly depreciating value of the South Sudanese Pound (SSP)", the system splitted the entity "value of the South Sudanese Pound" into "value" and "South Sudanese Pound" and failed the test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.