Giter Site home page Giter Site logo

Comments (58)

kwalcock avatar kwalcock commented on August 17, 2024

Those assignees are there just so they can get pinged for feedback (except for me).

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

I personally think this is overcomplicating things...
I suggest we get inspired in this representation by FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), who has been doing this for a while... What I would suggest is:

  1. We remove the distinction between Directed vs. Undirected because this will become clear from other things, see below.
  2. We instead allow the event @type to be something custom, from a fixed taxonomy. In our case, we currently have two types: Causal, Correlation. BBN will add a bunch more. We understand directionality from these types, e.g., Causal is directed, Correlation is not. Most of the others extracted by BBN are not directed. For example, I would argue that Keith's Give event, "She gave a book", is not directed.
  3. We simplify the participants. That is, instead of having both @ROLE and @name, we can keep just @ROLE. Following FrameNet, essential arguments are then Agent and Patient, e.g., "She" is Agent, "him" is Patient, "a book" is Object. All the others are non-essential, and can take other arbitrary names.

Directionality comes naturally out of this: for Causal events, it goes from Agent to Patient. And is not enforced on other frames, where we don't need it.

What do you all think?
Also, I think we should invite BBN to this discussion. @bsharpataz: can you please ask Bonan for his github id?

from eidos.

adarshp avatar adarshp commented on August 17, 2024

I agree with Mihai's suggestions. Since we are thinking of going to a 'hyperedge' representation, perhaps there should be a meta-label called participants at the same level as @type, which maps to a list. This would allow for arbitrary numbers of agents and patients.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

My first reaction is "Collin Baker, Chuck Fillmore, ICSI? I used to know these people. Cool." (Not that I especially knew their work.) There are definitely some things I like about the idea. I saw the word revenge which is associated with avenger, degree, depictive, injured_party, injury, instrument, manner, offender, place, punishment, purpose, result, time. This sounds like the kind of thing heard at the meeting. Some group wanted to fill in a large number of blanks/slots like these. On the other hand, I can't find much information about their file format other than XML for both frame and LU. I'm hoping maybe for a large, possibly grandiose change in how we think about what seemed like simple nodes and edges before and yet somehow a small change in their representation.

  1. I'm also slightly worried about this distinction as well. One advantage of distinguishing it explicitely somewhere is that the information is sitting right there even if the kind of relation, maybe a revenge relation, is unknown to the receiver of the data.

  2. It seemed like we had quite a list: causal, isA, origin, transparentLink for directed; correlation and sameAs for undirected. Maybe they were just ideas. I'd be mostly concerned with the "fixed" part and how this taxonomy is maintained and distributed. It would be possible to include the necessary parts of the taxonomy in the exchanged file format. This is similar to how context is used in the current format. "If/when you see causal below, treat it as directed and expect a cause and effect. If you see gave, treat it as undirected and expect a giver and recipient and maybe a place and time..."

  3. I agree that role and name overlap greatly and that one can probably be eliminated. On the other hand, if there is an avenger and an injuredParty for revenge, it might be useful to compare them to an agent in patient for gave. I have no idea what our use cases are; this one does seem unlikely. How do we specify which are essential vs. non-essential? Arbitrary names causes a small alarm bell to go off for me, perhaps falsely.

So, my concern is mostly for the taxonomy and maybe what comes naturally. However, I think that these high level design decisions are best left to people with more background than me--those who know, for example, that this FrameNet even existed.

Should we use the online tool they have to try to process some sentences?
GitHub isn't too bad at this, but do we have something better to use for further discussion with BBN?

from eidos.

BeckySharp avatar BeckySharp commented on August 17, 2024

@MihaiSurdeanu sorry for the delay -- just emailed Bonan

@kwalcock about the list of relations -- those are largely going away/getting set aside for now. We'll end up with Causal and Correlation (and Inc/Dec/Quant, though they don't make it to the output)

from eidos.

BeckySharp avatar BeckySharp commented on August 17, 2024

OK -- trying to add @bnmin now...

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks for looping me in @bsharpataz ! I prefer the simplified solution as @MihaiSurdeanu suggested, though I might not know as much context as many of you do. If you could share a JSON-LD representation for a few sentences, that would be very helpful.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

A draft of the new format can be found at https://github.com/clulab/eidos/wiki/JSON-LD2. The two relations have been reduced to one relation with the addition of a "type" field to specify something like "causation" or "correlation". Those must come from a second table which lists relations and their expected arguments. Arguments are described in a single list which is similar to what was there before, but each argument has its own "type" like "cause" or "effect" which again should come from an agreed upon list.

I had to use a Wiki page for the formatting, but will look for feedback here. There should be some real, working output in this format shortly. Thanks.

from eidos.

adarshp avatar adarshp commented on August 17, 2024

Just as a heads-up @kwalcock : You can make formatted tables in Github issues as well: you just need to convert each + in the second row to a minimum of three dashes ---.

|Name|Property|Type|Description|
|---|---|---|---|
|Corpus|`@type`|"Corpus"|A corpus is typed.|
||documents|[Document]|It has a list of documents|

becomes:

Name Property Type Description
Corpus @type "Corpus" A corpus is typed.
documents [Document] It has a list of documents

(You need to put the backticks around @type to prevent Github from automatically creating a mention link and notifying the Github user with the username type.)

The JSON-LD description in the wiki looks good.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

Thanks. I had concluded before that Org-mode was necessary, but I guess that's not the case. I just hadn't used the secret word.

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks a lot! In general, BBN's CAG representation is similar to this.

Here are a few places that are different:

  1. Our CAG doesn't include sentences, words, and dependencies. We keep these for internal analysis but not as output to downstream applications.

For any extraction (entity, relation, event), we include a document ID and a pair of character offsets (start and end) as provenance. Our offsets are character offsets to the beginning of the document (with the beginning of the document as 0).

  1. BBN's causal factors (arguments of a causal relation) are mostly events. We define event broadly to include occurrences, action, process, state, change of states, etc. An event has 1) a trigger (a word or a phrase), 2) properties such as polarity (positive, negative), tense (past, present, and future), etc, and optionally 3) a list of arguments.

It looks like event could be defined as a sub-class that combines Entity and Relation

  • We can represent event properties as a list of states (similar to Entity)
  • Arguments can be represented as a list of arguments (similar to Relation)
  • The trigger can be represented similarly to Relation

How about making a new concept Event in this way (it has all attributes from Entity and Relation)?

  1. Arguments of a relation can be entities/events/relations.

  2. We also output entities, their mentions, and value mentions such as dates. These entities and dates are used as arguments of events. Our entity types include Person, GeoPoliticalEnity (GPE), and Organization. We follow the ACE definition of entities.

We could reuse the "Entity" concept to represent our ACE-style entities as well as value mentions (as an instance of the temporal entity in your ontology).

  1. More relation types

We extract temporal relations (e.g., occurs_before) between events, as well as entity-entity relations such as <GPE1, part_of, GPE2>. We can provide a list of types to be added into the ontology (their representation will be similar to the relation "causation").

A few questions:

  1. Does grounding here mean assigning a type to an entity?

  2. The usage of "label" is not clear to us. It looks like it can refer to relation type ("Correlation"), directed or undirected ("UndirectedRelation"), among other things (e.g., "EntityLinker", "Event")

Thanks,
Bonan

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Hi Bonan,
Thanks for the detailed comments!

Let me try to answer:
"Our CAG doesn't include sentences, words, and dependencies. We keep these for internal analysis but not as output to downstream applications." - this is fine. Sentences/words/deps are optional. Some systems may produce them, some not.

Offsets: we token offsets because during tokenization we may transform the text (e.g., replace Unicode greek characters with their ASCII version). But maybe we can change the provenance to allow for either token or character offsets?

Representation of events: I think these fit very well under Relation (btw, we can change the name "Relation" to something more descriptive...). Our Relations can take state as well, we just don't use it now. Correct, @bsharpataz? I think the simplest solution is adjust Relation to accommodate your Events. I think these adjustments will be minimal.

"Arguments of a relation can be entities/events/relations" - yep, same for us.

"We also output entities, their mentions, and value mentions such as dates" - We have a representation for entities (as you saw), which I think accommodates all these types. But we output only entities that participate in events to avoid overwhelming the downstream user, since in our case any NP is a potential entity. But this doesn't really matter for the format. I think our entities are similar (but yours have more types).

"We extract temporal relations" - this format supports an arbitrary number of relation types, so no issues here.

"Does grounding here mean assigning a type to an entity" - essentially yes. But we allow a 1-to-n mapping between one entity and possible types. Further, we plan to ground to multiple name spaces. For example, we will continue to ground to our in-house ontology, but we will add another that contains FAO indicators, which is much more fine grained.

"The usage of "label" is not clear to us" - yes, it does seem that @type and @labels are now redundant. @kwalcock, @bsharpataz: what did you have in mind?

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

The @type with @ is a JSON-LD thing. labels without @ and type without @ are largely redundant. If nobody needs the list of labels (or any of the other fields), they can be trimmed, of course. Some elements may be present more for completeness than usefulness. I would say that JSON is pretty good about ignoring things, though, so missing items may be more important than extra items.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

I don't think so, at least not for Relation, but perhaps for Argument. The value for @type is a closed class value that is used to define the JSON syntax, much like a class name. Perhaps the second type should be renamed to avoid confusion. There we're just expecting a string from an open (but agreed upon with others as need be) class to describe the relation or argument, but not influence any syntax. As far as I know, if we see ("@type" : "newfangled"), we'd be expected to say "I don't know what you're talking about." If we see ("type" : "newfangled"), we say "That's a strange value, but whatever. You still have to have a this and that field because your @type tells us so." If the syntax of certain kinds of relations or arguments needs to be different, though, then we'd have to do more with @type.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Thanks! To make sure we're on the same page, can you please include here a simple example, say, what is the JSON format for the causal relation extracted from "A causes B"?

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks, Mihai!

Offsets: I think supporting both character offsets in raw documents and token offsets would be great! This allows systems that assume raw character offsets to be incorporated.

Representation of events: that matches my understanding. "relation" is less of a descriptive word to me:)
Adjusting Relation to include states/event properties would be useful. I think it would be really helpful to rename it to something that's more "inclusive". I also heard "causal events" from other performers when they were describing causal relations.

Thanks,
Bonan

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

It might look like this with the new design, whereby the changed part is at the bottom:

  {
    "@type" : "Relation",
    "@id" : "_:Relation_1",
    "labels" : [ "Causal", "DirectedRelation", "EntityLinker", "Event" ],
    "text" : "A causes B",
    "rule" : "ported_syntax_1_verb-Causal",
    "canonicalName" : "a cause b",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "positions" : {
        "@type" : "Interval",
        "start" : 1,
        "end" : 3
      }
    } ],
    "trigger" : {
      "@type" : "Trigger",
      "text" : "causes",
      "provenance" : [ {
        "@type" : "Provenance",
        "document" : {
          "@id" : "_:Document_1"
        },
        "sentence" : {
          "@id" : "_:Sentence_1"
        },
        "positions" : {
          "@type" : "Interval",
          "start" : 2,
          "end" : 2
        }
      } ]
    },
    "type" : "causal",
    "arguments" : [ {
      "@type" : "Argument",
      "type" : "cause",
      "value" : {
        "@id" : "_:Entity_1"
      }
    }, {
      "@type" : "Argument",
      "type" : "effect",
      "value" : {
        "@id" : "_:Entity_2"
      }
    } ]
  }

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

I see. So "@type" is simply the data structure type encoded in the JSON.
I vote to rename this to Event from Relation.

"@labels" are the labels that apply to this event. In Keith's example above, these are hypernymy labels from our taxonomy. That is, Causal IS-A DirectedRelation IS-A EntityLinker IS-A Event, where Event is the top of the taxonomy, and Causal is the terminal. Bonan, I think what we store in here can be adjusted. Minimally, of course, we want at least the actual type of the relation/event.

I think we're close to convergence, no?

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

I renamed Relation to Event as was suggested.

For each of our words in a sentence, we have startOffset and endOffset. These appear to be offsets in characters from the start of the document. Perhaps it isn't the original document, though, and I'll check. The value in the sentence texts has at least spaces added and there is some conversion of things like ( to -LRB-. For those, the offsets indicate a single character and not five. We may want to add a text field for the entire document which preserves as pristine a copy as possible.

Still, the provenance we use is based on start and end words. We have wordPositionsInSentence rather than characterPositionsInDocument. It takes an extra mapping to follow the start and end words to startOffset and endOffset of the document. The words are contained in sentences, not directly in the document, so we have an extra layer there. Both could be included in something like the following (where documentWordPositions and sentenceCharPositions would also be possible). It seems like allowing for all possibilities may be more expensive than standardizing on just one of them.

    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_6"
      },
      "documentCharPositions" : {
        "@type" : "Interval",
        "start" : 0,
        "end" : 5
      },
      "sentence" : {
        "@id" : "_:Sentence_6"
      },
      "sentenceWordPositions" : {
        "@type" : "Interval",
        "start" : 12,
        "end" : 12
      }
    } ],

It sounds like some of the other BBN information could be encoded like

"extractions" : [ {
  "@type" : "Event",
  "type" : "occurrance|action|process|state|change_of_state|occurs_before|part_of",
  "trigger" : ...
  "states" : [ {
    "@type" : "State",
    "type" : "polarity",
    "text" : "positive|negative" /* maybe value is a better name than text */
  }, {
    "@type" : "State",
    "type" : "tense",
    "text" : "past|present|future"
  } ],
  "arguments" : [ {
    "@type" : "Argument",
    "type" : ?,
    "value" : {
      "@id" : "_:_Entity|Event|Relation_1"
  } ]
} , {
  "@type" : "Entity",
  "type" : "Person|GeoPoliticalEntity|Organization|Temporal?",
  "mentions" : [
    "?mention?"
  ],
  "valueMentions" : [
    "?date?"
  ]
} ]

Please feel free to copy and edit.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Thank you @kwalcock!
It seems to me that we have a format.

@bnmin: I wonder if you could produce the output of some representative BBN relations/events in this format, to make sure that we are on the same page? Then we can write a spec for this format, and share it with the rest of the program.

Thanks!

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thank you! @kwalcock @MihaiSurdeanu

@MihaiSurdeanu Sure! We are working on implementing a serializer that can output this format. I'll keep you posted. There are a few issues we found. I will also post our comments here.

Thanks,
Bonan

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Here is a CAG produced by our preliminary implementation of this JSON-LD format:
BBN_wm_m6_debug_10doc.v0.1.json-ld.zip

We haven't implemented all required/useful features (for example, provenances for relations are missing). My apologies if this looks like "half-baked". We will send updated version in the next a few days.

The following block shows examples of a document (with sentences), an entity (with mentions), a "Cause-Effect" relation between a pair of events, and a "PART-WHOLE.Geographical" between two GeoPolitical Entities, etc.

Please let me know if you have any questions or suggestion of changes. I also plan to post our comments later on.

{
    "@context": {
        "Argument": "https://github.com/clulab/eidos/wiki/JSON-LD#Argument",
        "Corpus": "https://github.com/clulab/eidos/wiki/JSON-LD#Corpus",
        ...
    },
    "@type": "Corpus",
    "documents": [
        {
            "@id": "ENG_NW_20180124",
            "@type": "Document",
            "sentences": [
                {
                    "@id": "SEN-ENG_NW_20180124-42",
                    "@type": "Sentence",
                    "text": "1.5 million S. Sudanese risk facing famine, says UN"
                },
                {
                    "@id": "SEN-ENG_NW_20180124-43",
                    "@type": "Sentence",
                    "text": "January 24, 2018 (JUBA) –"
                },
                {
                    "@id": "SEN-ENG_NW_20180124-44",
                    "@type": "Sentence",
                    "text": "At least 1.5 million South Sudanese could face famine while up to 20,000 of them are experiencing famine conditions, a United Nations humanitarian officials told the Security Council on Wednesday."
                }
            ]
        }
    ],
    "extractions": [
        {
            "@id": "ENT-ENG_NW_20160629-64",
            "@type": "Entity",
            "canonicalName": "some 80 million people",
            "grounding": [
                {
                    "@type": "Grounding",
                    "ontologyConcept": "/entity/PER/Group",
                    "value": 0.5
                }
            ],
            "labels": [
                "Entity"
            ],
            "mentions": [
                {
                    "provenance": {
                        "@type": "Provenance",
                        "document": {
                            "@id": "ENG_NW_20160629"
                        },
                        "positions": {
                            "@type": "Interval",
                            "end": 4940,
                            "start": 4935
                        }
                    },
                    "text": "some 80 million people"
                }
            ]
        },
        {
            "@id": "EVE-ENG_NW_20180101-340",
            "@type": "Event",
            "arguments": [
                {
                    "@type": "Argument",
                    "type": "Place",
                    "value": { 
                        "@id": "ENT-ENG_NW_20180101-392"
                    }
                }
            ], 
            "grounding": [
                {
                    "@type": "Grounding", 
                    "ontologyConcept": "/event/Agriculture",
                    "value": 1.0
                }
            ], 
            "labels": [
                "Event"
            ], 
            "provenance": [
                {
                    "@type": "Provenance",
                    "document": {
                        "@id": "ENG_NW_20180101"
                    }, 
                    "positions": {
                        "@type": "Interval",
                        "end": 213, 
                        "start": 204
                    }
                }
            ], 
            "states": [
                {
                    "@type": "State", 
                    "text": "Asserted",
                    "type": "modality"
                },
                {
                    "@type": "State", 
                    "text": "Specific", 
                    "type": "genericity"
                },
                {
                    "@type": "State", 
                    "text": "Positive",
                    "type": "polarity"
                }
            ], 
            "trigger": { 
                "@type": "Trigger",
                "provenance": [
                    {
                        "@type": "Provenance",
                        "document": {
                            "@id": "ENG_NW_20180101"
                        }, 
                        "positions": {
                            "@type": "Interval",
                            "end": 213, 
                            "start": 204
                        }
                    }
                ], 
                "text": "production"
            }
        },
        {
            "@id": "REL-ENG_NW_20170811-229",
            "@type": "Relation",
            "arguments": [
                {
                    "@type": "Argument",
                    "type": "has_cause",
                    "value": {
                        "@id": "EVE-ENG_NW_20170811-247"
                    }
                },
                {
                    "@type": "Argument",
                    "type": "has_effect",
                    "value": {
                        "@id": "EVE-ENG_NW_20170811-248"
                    }
                }
            ],
            "labels": [
                "Cause-Effect"
            ]
        },
        {
            "@id": "REL-ENG_NW_20180117-19",
            "@type": "Relation",
            "arguments": [
                {
                    "@type": "Argument",
                    "type": "left_arg",
                    "value": {
                        "@id": "ENT-ENG_NW_20180117-75"
                    }
                },
                {
                    "@type": "Argument",
                    "type": "right_arg",
                    "value": {
                        "@id": "ENT-ENG_NW_20180117-76"
                    }
                }
            ],
            "labels": [
                "PART-WHOLE.Geographical"
            ]
        }
    ]
}
   

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Thanks @bnmin!

We are close, but there are a few differences:

  • Are you grouping all entity mentions under an Entity block? That is, do you report all mentions of "John Doe" under a "John Doe" entity? We report mentions, so each instance gets its own block.
  • We need to add state to Event.
  • We need to allow Events to have grounding, not just Entities.
  • I would like to merge the Event and Relation objects, for simplicity. I think they are very similar, no?

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024
Event type Argument type Notes
?Event?
Place
Cause-Effect
has_cause
has_effect
PART-WHOLE.Geographical
left_arg
right_arg
State type State text
modality Asserted
genericity Specific
polarity Positive

Here are some notes from a very low level perspective.

  • On the events and relations I was expecting to see some kind of "type". Maybe it's implied and generic or as the Wiki table says, [general], or recorded in the label instead.
  • Where there is "mentions": [ { "provenance": { we would just have "provenance": [. We could perhaps add the text, but that is already identified by the provenance.

from eidos.

bnmin avatar bnmin commented on August 17, 2024

@MihaiSurdeanu To answer your questions:

  1. Entity mentions: Yes, we grouped all entity mentions under an entity block, if possible (there would be NPs that we only have one mention per entity)
  2. Merge events and relations: Yes. They are similar in representation. We need to tweak our code a bit to support that.

Thanks!
Bonan

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Please find a newer version of our CAG in this JSON-LD format:
wm_m6_debug_10doc.json-ld.zip

A few issues/questions, or our thoughts:

  1. Word, Dependency, and Sentence

We did not output Word nor Dependency. Words are not necessary because we only output document character offsets as provenances.

In fact, Sentences aren't very useful in our current output. We did output Sentence objects because we plan to provide event provenance pointing to sentences (for some events, provenance come from non-consecutive words, so it is more useful to just use sentence as provenance).

  1. We use "documentCharPositions" instead of "positions", as suggested by @kwalcock

  2. "labels", "type", and "groundings"

We include "labels", "type" for entities, events, and relations. We include "groundings" for entities and events.

  • Label: ["Entity"] (entities), or ["Event"] (events), or ["DirectedRelation"] (relations)
  • grounding: a list of "soft" groundngs to ontological types
  • type: a single ontology type (best "grounding")
  1. Mentions and value mentions

Our entities (e.g., Person, GeoPolitical entity) will have a list of mentions. I think it is still useful to use the fillowing structure because it allows richer information such as "text", mention level (pronoun, name, descriptor) in addition to provenance.

"mentions" : [
"provenance": {},
"text": "Xyz Inc.",
...
],

  1. "Unifies" relation

To represent soft grouping of entities, as well for events, we propose to add a new relation type "Unifies". This is similar to cross-document coreference, but can be broadly defined, for example,

Event1: "food insecurity"
Event2: "famine in South Sudan"
Event3: "famine in Sudan"

We will create another Event Event4 ("food insecurity"), and add

Event4 Unifies Event1
Event4 Unifies Event2
Event4 Unifies Event3

This kind of grouping allows higher-level of abstraction of causal semantics, and better visualization. Related events can be grouped (similar for relations).

The current JSON-LD representation is already sufficient for this purpose - We just have to add a relation type "Unifies".

  1. Document location (path)

It is useful to include a path to the original document for each file. Neither "@id" or "title" is a good placeholder.

Can we create a property "path" or "filename" for Document?

  1. namespaces for contexts and concepts

This is a minor issue. At the ontology telecon, ISI suggested us to use a URI naming schema that all preformers can access, contribute and "negotiate" content. While the "https://github.com/clulab/eidos/wiki/JSON-LD" namespace is certainly very useful, is it possible to put it in a place that can faciliate colloboration. An example suggested by ISI is w3id.org.

A similar problem apply to ontology concepts. For example:
"ontologyConcept" : "/entities/human/livelihood"

Where is the ontology concepts defined? It would be great if it resides in a URL similar to the contexts.

Thank you!

Best,
Bonan

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Thanks @bnmin! I think we're close.
Answers to your points:

  1. Agreed. Words/sentences are only needed if the system outputs token offsets. Otherwise they should be optional.
  2. Ok with me. @kwalcock?
  3. Ok. This is compatible with the format. I would still like to merge Event and Relation. Semantically they seem very close to me... Plus, I think State should be supported on all types (from Entity to Event). @kwalcock: can you please indicate how we would represent BBN's state as attributes of these objects?
  4. I see. I think we should include a mentions [] block in the representation. In your case, this may contain multiple mentions for you. For us, at least for now, it will be 1 mention per entity. @kwalcock: can you please adjust the format to include such mentions? (We can talk off line if needed).
  5. I like this. Similar to your Unifies relation, we will add others, including CorefersWith. This is fully compatible with the format.
  6. I agree! @kwalcock: can you please add a field for this?
  7. I agree. @kwalcock: any suggestions for more global name space that is equally available to all performers? @bnmin: our ontologies are defined in github as well. Maybe we should include a path to each ontology used?

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thank you @MihaiSurdeanu !

  1. Re: merge Event and Relation

Yes. We have merged Event and Relation. There will only be Event (Relation can be idenfied by looking at "labels").

  1. Re: #7

I might be missing something, but I couldn't find where the path to ontology is specified in your JSON-LD file. Is there a hidden assuption of where is it?

Here I'm throwing random thoughts (not a big fan of either of these two):

2.1. Maybe we can use prefix such as the following?

  • "/ontology/UA/entities/human/livelihood", and
  • "/ontology/BBN/entities/per"

2.2 or it looks like we can also go with a single ontology that everyone can view and edit (this is more elegant, but might be hard to do at this moment)

Thanks,
Bonan

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

On 7: we do not report the path to the ontology in the grounding now, but we should. @kwalcock: what is the simple idea for this (see also the work that Ajay is implementing this week)?

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

@bnmin: I slept on it, and I have some issue with 4 above: you seem to include mentions for entities but not for events/relations. Correct? If so, wouldn't be more elegant to store individual mentions for all extractions, and add a CorefersWith to link mentions of the same entity together?

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks @MihaiSurdeanu. For the CoreferesWith relation, this would be more elegant. However, we would prefer to use "Unifies" relations which are broadly defined (it can group extractions that don't corefer with each other). This relation allows better visualization and showing of a reasonable, abstract causal graph from limited extractions.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

That's a lot to process, but I'll try to update the Wiki document based on the comments and the one example file from above. In general, I did not try to indicate at all whether or not something was required. I think that the convention is to ignore anything unrecognized, such as an unexpected path/filename field, but on the other hand to try to explain anything that might reasonably be expected. "If you see this, then it means..."

Regarding the ontology or ontologies, we recently added the ability to use multiple ontologies. In the example below they have just been given a name, "un" or "fao". This could be something more involved like "/ontology/UA/un" but I would hesitate to combine the name (or some other description) of the ontology and the ontologyConcept into one string. Separating the parts requires extra knowledge about the conventions used. These ontologies are just local files which are published with the source code. Should they be made public in a more public way with details in the JSON-LD output?

    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "un",
      "values" : [ {
        "@type" : "Grounding",
        "ontologyConcept" : "/entities/human/livelihood",
        "value" : 0.506400226133985
      }, {
        "@type" : "Grounding",
        "ontologyConcept" : "/entities/human/government/government_entity",
        "value" : 0.4335030428381624
      } ]
    }, {
      "@type" : "Groundings",
      "name" : "fao",
      "values" : [ {
        "@type" : "Grounding",
        "ontologyConcept" : "/events/Value/Value of food imports over total merchandise exports (%) (3-year average)",
        "value" : 0.4484180772282565
      } ]
} 

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

Right now we (UA) do not output the original document text, but only the processed sentence text. It would be good to see the original text without having to consult the original file. They (BBN) do record the original text at the sentence level (I'm not sure about any sentence separators), but then report documentCharPositions. To get to the text, one would have to concatenate all the sentence texts and then count to the correct position (give or take some separators?). Perhaps we should both include text at the document level. We could also allow for sentenceCharPositions if the provenance identifies a particular sentence.

from eidos.

adarshp avatar adarshp commented on August 17, 2024

@kwalcock: I thought the original document text is contained in the Sentence objects (under Corpus/Document)? Is that not accurate?

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

FWIW, we have IDs like "@id" : "_:Document_1" rather than "@id": "SEN-ENG_NW_20180124-42" based on https://www.w3.org/TR/json-ld/#node-identifiers where it says

6.14 Identifying Blank Nodes
This section is non-normative.

At times, it becomes necessary to be able to express information without being able to uniquely identify the node with an IRI. This type of node is called a blank node. JSON-LD does not require all nodes to be identified using @id. However, some graph topologies may require identifiers to be serializable. Graphs containing loops, e.g., cannot be serialized using embedding alone, @id must be used to connect the nodes. In these situations, one can use blank node identifiers, which look like IRIs using an underscore (_) as scheme. This allows one to reference the node locally within the document, but makes it impossible to reference the node from an external document. The blank node identifier is scoped to the document in which it is used.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

@adarshp, here's an example:

"text" : "The International Food Policy Research Institute -LRB- IFPRI -RRB- , established in 1975 , provides evidence-based"

from eidos.

adarshp avatar adarshp commented on August 17, 2024

Ok, got it. I think it would be good to have the original sentence text included with the JSON-LD output. If we encounter a link in a CAG that seems off, we can examine the sentence that produced it and try to understand why it happened (broken syntax/entity not being captured, etc). In other words, increased transparency, ability to 'drill down', etc.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

If what are known in the table (https://github.com/clulab/eidos/wiki/JSON-LD2) as Entity and Event are consolidated, something needs to be done with Entity.mentions, Entity.state, Event.trigger, and Event.arguments. Entity.mentions might morph well into Event.arguments

Extraction type Argument type Notes
"entity" "mention"

but an argument is supposed to be an existing entity. Maybe the Mention is indeed a budding entity.

Maybe state and trigger are just not required for certain kinds of extractions.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

The value "type": "/entity/GPE/Nation" for an entity, "labels": [ "Entity" ] found in the file, is not quite what we had in mind. I think that we would express it the other way around: "type": "entity" and "labels": [ "Nation", "GPE", "entity" ]. The extraction type (as in the table above) is thought to come from a small set of pre-defined values like causation, correlation, unification.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024
State type State text
modality Asserted
genericity Specific
polarity

There was one table above that describes State. For us the state is something like INC, DEC, and QUANT, which compares mostly OK. For state text we have actual text from the document rather than some pre-defined values like Asserted and Specific above. We're using these fields differently. The output may be mixed up in the file.

            "states": [
                {
                    "@type": "State", 
                    "text": "Asserted", 
                    "type": "modality"
                }, 

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks @kwalcock ! I was confused by the usage of "type" and "labels". In your example, Entity doesn't have "type" but have "labels" such as "labels" : [ "NounPhrase", "Entity" ]. That led us to believe that "label" comes from a small set (e.g., whether this is a NounPhrase and/or an Entity) but "type" comes from a large sets of ontological types such as "/entity/GPE/Nation", etc.

Thanks,
Bonan

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

Re: "https://github.com/clulab/eidos/wiki/JSON-LD" namespace is certainly very useful, is it possible to put it in a place that can faciliate collaboration. An example suggested by ISI is w3id.org.

Most of the JSON-LD schemas are at a place called schema.org, but I don't see a way to easily add your own schemas there. Our Wiki page has the additional disadvantage of not completely supporting HTML, so we can't easily make an anchor https://github.com/clulab/eidos/wiki/JSON-LD#Corpus. The link just gets us pretty close.

For w3id.org we need to come up with a PROJECT-ID to fit their format https://w3id.org/PROJECT-ID/SUB-ID... I need to figure out how to get an .htaccess file to redirect to wherever the data really is and then update the data there. It may be that the rewrite rules in the .htaccess file can convert wiki/JSON-LD/Corpus to something like wiki/JSON-LD#WikiStyleAnchorToCorpus. That would be an added bonus.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

Not much has been happening here, so this is supposed to restart the conversation. Our output has changed slightly to account for the more straightforward issues via commit #305. There is document text, documentCharIntervals, provenance on most everything, multiple groundings, and arrays of intervals.
There are multiple somewhat sticky issues remaining. One is the combination of what we in the JSON-LD called Entity, DirectedRelation, and UndirectedRelation into some unified kind of node/object. We've used the term extraction for this before. A comment above asks what to do with Entity.mentions from BBN, Entity.state, Event.trigger, and Event.arguments. Most could just be optional values. The Entity.mentions may become part of a unifies relation. Mihai in his most recent comment previous to this expressed some conerns. Then there were some smaller issues with State and whether type and text were OK or whether it is more type and value. See the table above.
In any case this all needs to be worked out. Please have at it.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Hi @bnmin,
I hope your LREC trip was great.
If you're back, any comments on @kwalcock's comment above?
Thank you!
Mihai

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks @MihaiSurdeanu
There are many points made in @kwalcock 's comment above, for which I don't fully understand. Could you please elaborate on each of them...?

I understand that states, trigger, arguments can be optional values.

I think Entity.mentions ("CoreferWith" relations between mentions) are different from what a broadly-defined "Unifies" relation would ideally represent ("Unifies" can group extractions that don't corefer with each other)

For provenance/offsets, we used documentCharPositions which has been removed from the latest JSON-LD2 format?

I think it would be nice to have a way to track changes in the JSON-LD format so that we know what has been changed.

@MihaiSurdeanu We would be happy to hear your thoughts on our latest JSON-LD files for iteration 2, including but not limited to representation:)

Thanks,
Bonan

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

I don't think that documentCharPositions was ever in JSON-LD2, but I'll put it there. The plan is to adopt anything that is agreed upon from JSON-LD2 into the standard format and then drop this one. documentCharPositions was already advanced to JSON-LD.

I'll make an update to combine the three different kinds of extractions and note that some things are optional like the triggers.

To see the history of a Wiki page, click on the revisions text, apparently.

image

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Thanks @bnmin, @kwalcock!

On the Unifies vs. CoreferWith relations: yes, they are different. The point I was trying to make was that coreference relations could be represented in a similar fashion, with an explicit relation that links two mentions (CoreferWith), rather than packaging them in the same Entity block. I think having the explicit CoreferWith relation is more elegant for at least two reasons:

  • We can include a score to indicate the strength of the connection.
  • We can then keep all Entity/Event objects as mentions. Right now it seems to me (if I understand it correctly!) that in the BBN output Events are mentions, whereas Entity aggregates all mentions of the same entity.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

The JSON-LD2 page was updated. The Entity and Event have been combined into an Extraction (a better word is welcome). What used to be an Entity is now an Extraction with an Extraction type (see the bottom table) of "entity". Events have some other kind of Extraction type such as "causation", etc.

The trigger, state, and arguments may be difficult to discern and here is a summary of the differences as far as JSON-LD is concerned.

Type Arity Data Description
Trigger singular has text not an extraction, provenance only
State plural has text and string type from known list (INC, DEC, QUANT) not an extraction, provenance only
Argument plural has a string type from known list (cause, effect, argument) references an extraction with its own text and provenance
Modifier plural has text attaches to state, has some grounding info, provenance only

This may too closely match our software design in some places. The Trigger could easily be expressed as a State with a type TRIG.

The ?mentions? will probably be discussed.

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Thanks! @kwalcock @MihaiSurdeanu
Thanks for the hint on viewing revisions @kwalcock! As you can see I don't use github as much as I should:)
Agree with @MihaiSurdeanu on the value of CoreferWith relations. I think we can use CoreferWith relations to represent coreference (for entities or other types), in addition to the Unifies relations.

On @kwalcock 's post above, in general, these all make sense. I'll review with the team and let you know if we have questions.

Thanks,
Bonan

from eidos.

dgarijo avatar dgarijo commented on August 17, 2024

Hello,
@kwalcock
regarding the w3id, I created some time ago https://w3id.org/wm/ontology. (WM is for world modelers). If you want, we can use that w3id for this effort. Something like https://w3id.org/wm/cag? (Or any other name that you prefer). Please let me know if you want me to help here, I have set up many w3ids in the past.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

@dgarijo Thanks for the offer and the reminder. Somewhere above I wondered about what name we should use and nobody responded. If it turns out to be wm/*, you'll be sure to find out. @MihaiSurdeanu should be sure to weigh in. I wonder if there are any issues about who is in charge of the mapping.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

@dgarijo's suggestion sounds great. @kwalcock: can we go with that namespace?
Thanks!

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

@dgarijo, there is a new issue #315 for namespace considerations.

from eidos.

kwalcock avatar kwalcock commented on August 17, 2024

The context is no longer based on the relatively short-lived https://github.com/clulab/eidos/wiki/JSON-LD but rather on https://w3id.org/wm/cag/ and the two JSON-LD Wiki pages have been updated to reflect that.

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Hi @bnmin: to follow up on our conversation at NAACL, I think the only major thing remaining in us declaring success on this task is switching to a mention-based representation for Entities. Agree?

from eidos.

bnmin avatar bnmin commented on August 17, 2024

Yes! @MihaiSurdeanu Agree that we should switch to a mention-based representation for Entities. I think we would want to represent an Entity coreference cluster {e1, e2, e3} by 1) creating an event group e4 (also of type Event), 2) creating relations CoreferWith(e4, e1), CoreferWith(e4, e2), CoreferWith(e4, e3). Does this make sense? Thanks! - Bonan

from eidos.

MihaiSurdeanu avatar MihaiSurdeanu commented on August 17, 2024

Yes, it does. Thanks!

from eidos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.