Giter Site home page Giter Site logo

Comments (12)

jonathanrobie avatar jonathanrobie commented on August 14, 2024 1

Inline glosses are very useful for queries and also to help people who do not read Hebrew or Greek have some idea what is in the tree. I would really prefer glosses on the words themselves.

For Ref, SubjRef, and Frame, using an external dictionary may be the right way to go.

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024 1

Let's raise a new issue for references to things larger than single words.

#10

For now, the references are what they are.

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024

Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"

@rkjtan - If I use SubjRef and Ref whenever the value is not "{}", will that give the right result? Are there instances where these have values but are not desired?

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024

Glosses:
English and Chinese glosses in the full trees cannot be used--Mike has Andi's automatically calculated glosses for English and Chinese mapped for YTB & ClearSuite that we should be able to use

@themikejr - could you please tell me where to find these?

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024

Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"

@rkjtan I assume we also want FrameGloss?

from macula-hebrew.

rkjtan avatar rkjtan commented on August 14, 2024

For SubjRef & Ref, there are only two known typos right now.

In da4:29, there is a reference, MorphId 270040290241, in "SubjRef" for two verbs with MorphId 270040290281 (יִצְבֵּא‎) and 270040290291 (יִתְּנִנּ‎), there is a typo: It should be 270040290231 "the Most High" & not 270040290241.

In Es4:4 the "SubjRef" reference 170040040092 (for the verb תִּשְׁלַח‎ with MorphId 170040040102) is a typo: Should be 170040040082, not 170040040092.

from macula-hebrew.

rkjtan avatar rkjtan commented on August 14, 2024

On the glosses, I had sent you an email with a link to Andi’s tsv file on DropBox & also to Ulrik's application of these to Andi's version of the OSHB trees. Mike may have another source.

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024

Except for the typos mentioned above and the glosses from Mike, this should be close:

annotations.xml.zip

Here is the query I used to generate this:

declare function local:annotations($n)
{
  <node>{
      $n ! (
      @morphId,
      @StrongNumberX,
      @Vocative[.="True"],
      @SenseNumber[. ne "0"],
      
      @Frame[. ne "{}"],
      @FrameGloss,
      
      @Ref[. ne "{}"],
      @SubjRef[. ne "{}"],
      
      @Greek[. ne ""],
      @GreekStrong[. ne "0"]
      )
  }</node>
};

<annotations>{
  for $n in //Node[empty(*)]
  order by $n/@morphId
  return local:annotations($n) 
}</annotations>

from macula-hebrew.

rkjtan avatar rkjtan commented on August 14, 2024

If we are not doing glosses for the Ref or SubjRef, perhaps we shouldn't do gloss for the Frame either for the XML? One way we could go is to use the Ids strictly as an index to find the right words, including any glosses you have for them. This would show up nicely in a UI. However, we might be clogging up the XML file with lots of glosses if we add glosses for every annotation. What do you think?

from macula-hebrew.

rkjtan avatar rkjtan commented on August 14, 2024

One important caveat for the Refs, SubjRef, & Frame should be expressed: At the time I was doing the work, I used Id because I knew that nodeIds could change & lead to errors more easily. However, when a referent is actually a phrase rather than a single word, the Id doesn't cover the whole phrase. In other words, whenever the word that I refer to in Ref, SubjRef, or Frame is actually part of a larger phrase, it is really the phrase node that is the referent & not just the single word identified in the Id. For example, if the referent is a phrase "Jesus Christ," rather than just "Jesus," I was only able to refer to the head of the phrase "Jesus" with Id. However, the referent is "Jesus Christ." By going up to the full noun phrase with "Jesus" as head, you find the full referent. The most extreme example in the OT is ps49:14-15, there are references to MorphId 190490140052, in "Ref", "SubjRef", and "Frame" for a number of morphemes in these two verses (190490140022, 190490140071, 190490150031, 190490150052, 190490150072, 190490150113, 190490150152). The Id used is 190490140052, which refers to a "prep." This is because it is a pp that actually functions as a noun "those after them/their followers." So the whole pp phrase that functions as an np is the referent. So, in the UI, we need to make sure that the Ids for Ref, SubjRef, Frame by default result in the selection of the whole phrase node of which the word with the Id is the head. Not sure what the best way to do this is for the github release of the data itself (perhaps switch over to nodeId at a later point according to the rule I express here once nodeIds are stable).

from macula-hebrew.

jonathanrobie avatar jonathanrobie commented on August 14, 2024

wrt phrase references #3 (comment), I think we need a good way to reference subtrees. One way might be to use the first and last morphemes in a spanning tree as a reference to their least common ancestor. Would that be sufficient?

For instance, lca(010010010011, 010010010012) could refer to the least common ancestor of these two nodes in the following subtree:

<Node Cat="pp" Start="0" End="1" Rule="PrepNp" Head="1" nodeId="010010010010060" Length="6">
  <Node Cat="pp" Start="0" End="0" Rule="P2PP" Head="0" nodeId="010010010010011" Length="1">
    <Node n="010010010011" Cat="prep" Start="0" End="0" Length="1" morphId="010010010011" Unicode="בְּ" nodeId="010010010010010">
      <m n="010010010011" morph="R" lang="H" lemma="b" pos="preposition">בְּ</m>
    </Node>
  </Node>
  <Node Cat="np" Start="1" End="1" Rule="N2NP" Head="0" nodeId="010010010020051" Length="5">
    <Node n="010010010012" Cat="noun" Start="1" End="1" Length="5" morphId="010010010012" Unicode="רֵאשִׁית" nodeId="010010010020050">
      <m n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" " pos="noun" type="common" gender="feminine" number="singular" state="absolute">רֵאשִׁית</m>
    </Node>
  </Node>
</Node>

from macula-hebrew.

rkjtan avatar rkjtan commented on August 14, 2024

That might work. Note, however, that a prepositional phrase (with the one exception in Ps49:14-15 in the OT & an unknown small number of cases in the NT) is usually not going to be a Ref or SubjRef (Frame might be different). A good example is:

            <Node Cat="S" Start="2" End="3" Rule="Np2S" Head="0" nodeId="010020070060091" Length="9">
              <Node Cat="np" Start="2" End="3" Rule="Np-Appos" Head="0" nodeId="010020070060090" Length="9">
                <Node Cat="np" Start="2" End="2" Rule="N2NP" Head="0" nodeId="010020070060041" Length="4">
                  <Node n="010020070021" Cat="noun" Start="2" End="2" Length="4" morphId="010020070021" Unicode="יְהוָ֨ה" nodeId="010020070060040"><m n="010020070021" lang="H" after=" " lemma="3068" morph="Np" id="01pPp" pos="noun" type="proper">יְהוָ֨ה</m></Node>
                </Node>
                <Node Cat="np" Start="3" End="3" Rule="N2NP" Head="0" nodeId="010020070100051" Length="5">
                  <Node n="010020070031" Cat="noun" Start="3" End="3" Length="5" morphId="010020070031" Unicode="אֱלֹהִ֜ים" nodeId="010020070100050"><m n="010020070031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01ieN" pos="noun" type="common" gender="masculine" number="plural" state="absolute">אֱלֹהִ֜ים</m></Node>
                </Node>
              </Node>
            </Node> 

"Yahweh God" is the implied subject of the verb "he breathed" (n="010020070082"), but the SubjRef just uses 010020070021 for Yahweh because Yahweh is the head of the noun phrase. Head, Start, & End are zero-based. So, Start="2" End="3" tells us that words 3 & 4 in the verse are in the phrase. Head="0" tells us that the first word inside the phrase, "Yahweh," is the head of the noun phrase. We can safely take the whole noun phrase as the referent. You could theoretically use the nodeId. However, nodeId="010020070060090" is currently consonant based. If it were word based, it would be 010020070030020. For the Ref & SubjRef, theoretically prepositional phrases in the node with Cat="pp" Rule="PrepNp" also still have the head noun as the head. So, we could end up with systematically bringing in the prepositional phrase when only the noun phrase that is the object of the preposition is the referent. We will want to make the head of the prepositional phrase consistently the preposition to avoid this problem.

from macula-hebrew.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.