nlp2rdf / ontologies Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 7.0 659 KB

All ontologies used in NIF 2.0 (NIF-Core + vocabulary modules + helper modules)

Home Page: http://nlp2rdf.org

ApacheConf 1.48% Shell 1.48% Web Ontology Language 3.69% HTML 93.35%

ontologies's People

Contributors

Stargazers

Watchers

Forkers

dave-e-lewis jimregan vladimiralexiev zxenia claudeshen piaobin badjaja

ontologies's Issues

add and describe nif:taNerdCoreClassRef

nif:taNerdCoreClassRef

ITSRDF properties for confidence and provenance

@VladimirAlexiev wrote:

Just noticed that this proposal also disregards the direct props that exist in ITSRDF [...]

I added (A new section in the docs)[http://nif.readthedocs.org/en/2.1-rc/prov-and-conf.html#relation-of-nif-2-1-companion-properties-to-itsrdf-properties] discussion the ITS semantics and why we decided to create own complementary versions.

464c0d7 also added notes in the ontology documents and formal OWL declarations of their relatedness to the maximum degree possible from my point of view without imposing OWL inference ramification to ITSRDF project without prior coordination.

nif:topic assertions contains illegal punning

This property is structurally violating OWL2 DL constraints:

nif:topic 
    a owl:DatatypeProperty ;
    owl:versionInfo "0.0.1" ; 
    rdfs:label "topic" ;
    rdfs:comment """The topic of a string
    Changelog:
    * 0.0.1 initial commit of property"""@en ;
    rdfs:domain nif:String ;
    rdfs:range nif:Annotation .

@der-bruemmer Sebastian told me you added this property a bit 'quick-and-dirty' for GERBIL. How is it actually used there? (ObjectProperty pointing to nif:Annotation vs. DatatypeProperty with some string description)
@der-bruemmer Can you maybe also suggest a more fleshed-out description with better guidance when and how usage of this property is appropriate?

Are lots of sub-classes and sub-properties needed?

Consider are the consequences of making sub-classes and sub-properties. Economy of representation (number of triples) is an important consideration to keep NLP as RDF a feasible idea (because NLP generates a lot of data), and NIF 2 thought carefully about that (counting triples for the Simple, Stanbol and OpenAnnotation profiles).
Injudicious use of sub-classes and sub-properties might induce NIF users to abandon RDFS... or NIF itself.

remove nif:topic

@der-bruemmer says in #10: "The point of this property was to annotate a complete text with topical tags or entity classes, as opposed to a single entity... I'll ask the Gerbillians for more information."

These questions of mine were not answered, and more information from Gerbilians was not received:

If it's a property of the complete text, why not define the domain as nif:Context?
For "topical tag", why not use dct:subject but invent a new property?
(Actually, its:taIdentRef is better for this purpose)
For "entity class" why not use its:taClassRef but invent a new property?

I don't see a need for such property, we can use its:taIdentRef or its:taClassRef

comment on itsrdf:taAnnotatorsRef

In http://vladimiralexiev.github.io/Multisensor/#sec-4 I got this:

<#char=1116,1123> a nif:Word;
  itsrdf:taClassRef nerd:Organization;
  itsrdf:taConfidence 0.9; # means the same as "0.9"^^xsd:decimal
  itsrdf:taAnnotatorsRef "text-analysis|http://linguatec.com". !!!!

And later: "itsrdf:taAnnotatorsRef is not a URL but a specially formatted string (coming from the XML heritage of ITS, see 5.7 ITS Tools Annotation)"

But ITSRDF defines itsrdf:taAnnotatorsRef a owl:ObjectProperty, so it should be a URL not a "specially formatted string". So my mind is badly twisted, interpreting XML data into RDF in such twisted way.

To help people like me, please add a comment eg

itsrdf:taAnnotatorsRef rdfs:comment 
"""URL or URI of the software or person that made the annotation, eg 
<http://some-company.com> or <http://some-company.com/some-software> or 
<http://some-company.com/some-software/v1.23>.

Unlike ITS XML which uses a specially formatted string 
(e.g. "text-analysis|http://some-company.com"),
use a proper URL or URI, so you can attach extra info to it, eg

<http://some-company.com/some-software/v1.23>
  a prov:SoftwareAgent, doap:Version ;
  doap:shortdesc "Some Company's powerful Software" ;
  doap:revision "1.23".

Both itsrdf:taAnnotatorsRef and itsrdf:taConfidence apply to all annotations attached to the same node,
including itsrdf:taClassRef, itsrdf:taIdentRef, nif:oliaLink, nif:oliaClass, etc
"""

(Note: this assumes that NIF uses itsrdf:taAnnotatorsRef and not nif:provenance, as per #15). If not, put this comment on nif:provenance

nif:AnnotationUnit vs nifs:Annotation vs fise:EntityAnnotation vs fam:EntityAnnotation

@kurzum "do you also agree to merge nifs:Annotation from https://github.com/NLP2RDF/ontologies/blob/master/nif-core/nif-stanbol.ttl#L78 into nif-core and rename it AnnotationUnit ? We really need the feature to express basic alternative annotations."

Such node is definitely needed. But let's examine "prior art". in chronological order I think it is:

FISE (Stanbol): http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
http://jens-lehmann.org/files/2013/iswc_nif.pdf fig.3, which uses FISE
NIF Stanbol: http://persistence.uni-leipzig.org/nlp2rdf/specification/stanbol.html, incl nifs:Annotation
FAM (Fusepool P3): https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md

2 is contradicted by 3, eg

2	3
nif:EntityAnnotation	fise:EntityAnnotation
nifs:extractedFrom	fise:extracted-from
nif:oliaConf	fise:confidence

But I guess 2 is older and 3 is current.

I don't know if someone has used 3 (NIF Stanbol). Multisensor hasn't.

I'd be happy if you consolidate all this into AnnotationUnit and friends.
KEY QUESTION: I guess that with NIF 2.1, the Stanbol profile will go away?

Ideally and if possible, this should be coordinated with the FAM people, but I don't know any of them...

An additional property for annotation assertions grouped by distinct individual?

As previously discussed, we want a hybrid scheme for annotations in NIF 3.0:

a lean way to attach annotations directy to nif:Strings when there is only one annotation
per aspect
a scheme with an intermediate (blank) node for alternative annotations concerning the
same aspect

Here is my idea for details, already as draft for documentation text:

==== QUOTE START====

NIF 2.1 annotation schemes

NIF 2.1 offers two schemes to attach annotations to a nif:String individual S:

direct attatchment

assertions comprising information for a given annotation are attachted
directly, S being des subject of corresponding triples. Several
sub-properties of nif:annotation are provided for this approach,
e.g. itsrdf:taIdentRef, nif:oliaLink.

If one intends to specify confidence and provenance information, no more than
one annotation property assertion per aspect must be attached directy,to prevent ambiguity:

<http://example.org/document/1#offset_20_24>
    a nif:String, nif:OffsetBasedString ;
    nif:beginIndex "20"^^xsd:int ;
    nif:endIndex "24"^^xsd:int ;
    nif:anchorOf "Bill"^^xsd:string ;
    itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Clinton> ;
    itsrdf:taConfidence "0.9"^^xsd:decimal .

To allow use of the direct attachtment scheme to annotate several aspect on S
simultanesously, corresponding specialisation of nif:confidence and nif:provenance
are provided for some of the specialisations of nif:annotation (e.g. nif:oliaConf
and nif:oliaProv for nif:oliaLink):

<http://example.org/document/1#offset_20_24>
    a nif:String, nif:OffsetBasedString ;
    nif:beginIndex "20"^^xsd:int ;
    nif:endIndex "24"^^xsd:int ;
    nif:anchorOf "Bill"^^xsd:string ;
    itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Clinton> ;
    itsrdf:taConfidence "0.9"^^xsd:decimal ;
    nif:oliaLink <http://acoli.cs.uni-frankfurt.de/resources/olia/penn.owl#NNP> ;
    nif:oliaProv "0.95"^^xsd:decimal .

related T-Box assertions / constraints

SubObjectPropertyOf(itsrdf:taIdentRef nif:annotion)
SubObjectPropertyOf(nif:oliaLink  nif:annotion)

SubDataPropertyOf(itsrdf:taConfidence nif:confidence)
SubDataPropertyOf(nif:oliaConf nif:confidence)
FunctionalDataProperty(itsrdf:taConfidence) # this still has to be coordinated with the itsrdf maintainers
FunctionalDataProperty(nif:oliaConf)

individual per annotation

a new nif:AnnotationUnit indiviual is introduced for each annotation and
conntected to S using nif:annotationUnit. Using this scheme, several
annotations for the same annocation aspect can be expressed for S
(for example, different links to Linked Data Resources for the same
token obtained from several Named Entity Recognition and
Disabmiguation System)

<http://example.org/document/1#offset_20_24>
    a nif:String, nif:OffsetBasedString ;
    nif:beginIndex "20"^^xsd:int ;
    nif:endIndex "24"^^xsd:int ;
    nif:anchorOf "Bill"^^xsd:string ;
    nif:annotationUnit [
        itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Evans> ;
        itsrdf:taConfidence "0.8"^^xsd:decimal ;
    ] .

related T-Box assertions / constraints

ObjectPropertyDomain(nif:annotationUnit nif:String)
ObjectPropertyRange(nif:annotationUnit nif:AnnotationUnit)

each nif:AnnotationUnit carries the information of exactly one
piece of annotation information. This can be a property assertion
with a nif:annotation, nif:classAnnotation or nif:literalAnnotation
or inherent annotation of a text span if the annotation unit is also
a nif:TextSpanAnnotation.

There is no more than one confidence and provenance assertion to
prevent ambiguity:

SubClassOf(nif:AnnotationUnit DataMaxCardinality( 1 nif:confidence))
SubClassOf(nif:AnnotationUnit ObjectMaxCardinality( 1 nif:provenance))

unified use and querying of both schemes

Naturally, both annotation schemes can be combined for a nif:String indiviual:

<http://example.org/document/1#offset_20_24>
    a nif:String, nif:OffsetBasedString ;
    nif:beginIndex "20"^^xsd:int ;
    nif:endIndex "24"^^xsd:int ;
    nif:anchorOf "Bill"^^xsd:string ;
    itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Clinton> ;
    itsrdf:taConfidence "0.9"^^xsd:decimal ;
    nif:annotationUnit [
        itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Evans_(saxophonist)> ;
        itsrdf:taConfidence "0.4"^^xsd:decimal 
    ] ;
    nif:annotationUnit [
        itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Evans> ;
        itsrdf:taConfidence "0.8"^^xsd:decimal ;
    ] ;
    nif:annotationUnit [
        itsrdf:taIdentRef <http://dbpedia.org/resource/Bill_Stewart_(musician)> ;
        itsrdf:taConfidence "0.2"^^xsd:decimal
    ].

The property path feature in SPARQL allows for a concise way to access
annotation expressed using both schemes simultaneously:

PREFIX nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>

select ?str ?link ?conf {
 ?str nif:annotationUnit? [
      itsrdf:taIdentRef ?link ;
      itsrdf:taConfidence ?conf
  ] ;
  a nif:String .
}

If the used SPARQL processor supports RDFS-inference (or if relevant inferences are
materialized) it is possible to query more general for all (object) annotations in a
NIF document in a similar fashion:

PREFIX nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>

select ?str ?anno ?conf {
 ?str nif:annotationUnit? [
      nif:annotation ?anno ;
      nif:confidence ?conf
  ] ;
  a nif:String .
}

==== QUOTE END====

Do you also deem a specific nif:annotationUnit property justified and useful?

location of NIF3.0 and issue tracker

Dmitris Kontokostas told me NIF 3.0 is coming out end of this month. Where can I take a look at a draft, and change log?
What's the preferred tracker? I see issues tracked here, but some months ago I posted a couple at https://github.com/NLP2RDF/specification/issues
Is it correct to make a reference to a linguistic resource, eg itsrdf:taIdentRef bn:123456. I though that taIdentRef is for named entities. I like such usage, but it should be clarified with an example.

Cheers!

update dc:description in the ontology

at the end of dc:descriptions some updates are necessary, e.g. who to contact for feedback

annotations over strings

the idea is to enable annotation over strings. So that we could track provenance at annotation level - if needed. Currently, it is proposed to use nif:AnnotationGroup for this purpose.

URL persistence vs modularisation

Modularisation of NIF moved a bunch of elements from the nif: namespace to the nif-ann: namespace. See "deprecated" in https://github.com/NLP2RDF/ontologies/blob/nif2.1/nif-core/nif-core.ttl.

Modularization is all nice, but have you considered the tons (megatons!) of NIF data already in existence? Is the word persistence in http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# merely a charade?

I think this one change will prevent us in Multisensor from using NIF 2.1. I think that an overriding prioriy in NIF 2.1 must be backward compatibility.

Note: backward compatibility only starts with URL permanence. Other issues to consider are the consequences of making sub-classes and sub-properties. Economy of representation (number of triples) is an important consideration to keep NLP as RDF a feasible idea, and NIF 2 thought carefully about that. Injudicious use of sub-classes and sub-properties will force NIF users to abandon RDFS... or NIF itself.

nif:wasConvertedFrom

define that the conversion is done between different URI syntax patterns.

mix of namespaces "nifs" and "nif" for Annotation class

In the nif-stanbol.ttl ontology you can find the class Annotation under two namespace, nif and nifs.

nif:lang has multiple domains

nif-core.ttl defines nif:lang to have two domains:

# in manchester: "String and not Context"
rdfs:domain nif:String ;
# additional constraint, must not be used on a nif:Context.
rdfs:domain [ rdf:type owl:Class ; owl:complementOf nif:Context ] . 
# too complex: rdfs:domain [ owl:intersectionOf (  nif:String 
    #                         [ rdf:type owl:Class ; owl:complementOf nif:Context ]     ) ] .

Unfortunately this doesn't work: rather than constraining the subject of nif:lang, it will infer said subject to have both types. So either:

Use the "too complex" definition.
Use just nif:String and leave the rest to a RDFUnit test case.
I favor 2.

For 1, you should mention it's an owl:Class as follows.

     rdfs:domain [ rdf:type owl:Class ;
                   owl:intersectionOf ( nif:String
                                        [ rdf:type owl:Class ;
                                          owl:complementOf nif:Context ])].

Don't know if it's critical but you can verify it by converting this Manchester notation to Turtle using this convertor

Prefix: nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>
Class: nif:String
Class: nif:Context
ObjectProperty: nif:lang
    Domain: 
        nif:String and not nif:Context

nlp2rdf / ontologies Goto Github PK

ontologies's People

Contributors

Stargazers

Watchers

Forkers

ontologies's Issues

NIF 2.1 annotation schemes

direct attatchment

related T-Box assertions / constraints

individual per annotation

related T-Box assertions / constraints

unified use and querying of both schemes

Recommend Projects

Recommend Topics

Recommend Org