Additional Data to Port over from Original Trees about macula-hebrew HOT 12 OPEN

rkjtan commented on August 14, 2024

Additional Data to Port over from Original Trees

from macula-hebrew.

Comments (12)

jonathanrobie commented on August 14, 2024 1

Inline glosses are very useful for queries and also to help people who do not read Hebrew or Greek have some idea what is in the tree. I would really prefer glosses on the words themselves.

For Ref, SubjRef, and Frame, using an external dictionary may be the right way to go.

from macula-hebrew.

jonathanrobie commented on August 14, 2024 1

Let's raise a new issue for references to things larger than single words.

#10

For now, the references are what they are.

from macula-hebrew.

jonathanrobie commented on August 14, 2024

Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"

@rkjtan - If I use SubjRef and Ref whenever the value is not "{}", will that give the right result? Are there instances where these have values but are not desired?

from macula-hebrew.

jonathanrobie commented on August 14, 2024

Glosses:
English and Chinese glosses in the full trees cannot be used--Mike has Andi's automatically calculated glosses for English and Chinese mapped for YTB & ClearSuite that we should be able to use

@themikejr - could you please tell me where to find these?

from macula-hebrew.

jonathanrobie commented on August 14, 2024

Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"

@rkjtan I assume we also want FrameGloss?

from macula-hebrew.

rkjtan commented on August 14, 2024

For SubjRef & Ref, there are only two known typos right now.

In da4:29, there is a reference, MorphId 270040290241, in "SubjRef" for two verbs with MorphId 270040290281 (יִצְבֵּא‎) and 270040290291 (יִתְּנִנּ‎), there is a typo: It should be 270040290231 "the Most High" & not 270040290241.

In Es4:4 the "SubjRef" reference 170040040092 (for the verb תִּשְׁלַח‎ with MorphId 170040040102) is a typo: Should be 170040040082, not 170040040092.

from macula-hebrew.

rkjtan commented on August 14, 2024

On the glosses, I had sent you an email with a link to Andi’s tsv file on DropBox & also to Ulrik's application of these to Andi's version of the OSHB trees. Mike may have another source.

from macula-hebrew.

jonathanrobie commented on August 14, 2024

Except for the typos mentioned above and the glosses from Mike, this should be close:

annotations.xml.zip

Here is the query I used to generate this:

declare function local:annotations($n)
{
  <node>{
      $n ! (
      @morphId,
      @StrongNumberX,
      @Vocative[.="True"],
      @SenseNumber[. ne "0"],
      
      @Frame[. ne "{}"],
      @FrameGloss,
      
      @Ref[. ne "{}"],
      @SubjRef[. ne "{}"],
      
      @Greek[. ne ""],
      @GreekStrong[. ne "0"]
      )
  }</node>
};

<annotations>{
  for $n in //Node[empty(*)]
  order by $n/@morphId
  return local:annotations($n) 
}</annotations>

from macula-hebrew.

rkjtan commented on August 14, 2024

If we are not doing glosses for the Ref or SubjRef, perhaps we shouldn't do gloss for the Frame either for the XML? One way we could go is to use the Ids strictly as an index to find the right words, including any glosses you have for them. This would show up nicely in a UI. However, we might be clogging up the XML file with lots of glosses if we add glosses for every annotation. What do you think?

from macula-hebrew.

rkjtan commented on August 14, 2024

One important caveat for the Refs, SubjRef, & Frame should be expressed: At the time I was doing the work, I used Id because I knew that nodeIds could change & lead to errors more easily. However, when a referent is actually a phrase rather than a single word, the Id doesn't cover the whole phrase. In other words, whenever the word that I refer to in Ref, SubjRef, or Frame is actually part of a larger phrase, it is really the phrase node that is the referent & not just the single word identified in the Id. For example, if the referent is a phrase "Jesus Christ," rather than just "Jesus," I was only able to refer to the head of the phrase "Jesus" with Id. However, the referent is "Jesus Christ." By going up to the full noun phrase with "Jesus" as head, you find the full referent. The most extreme example in the OT is ps49:14-15, there are references to MorphId 190490140052, in "Ref", "SubjRef", and "Frame" for a number of morphemes in these two verses (190490140022, 190490140071, 190490150031, 190490150052, 190490150072, 190490150113, 190490150152). The Id used is 190490140052, which refers to a "prep." This is because it is a pp that actually functions as a noun "those after them/their followers." So the whole pp phrase that functions as an np is the referent. So, in the UI, we need to make sure that the Ids for Ref, SubjRef, Frame by default result in the selection of the whole phrase node of which the word with the Id is the head. Not sure what the best way to do this is for the github release of the data itself (perhaps switch over to nodeId at a later point according to the rule I express here once nodeIds are stable).

from macula-hebrew.

jonathanrobie commented on August 14, 2024

wrt phrase references #3 (comment), I think we need a good way to reference subtrees. One way might be to use the first and last morphemes in a spanning tree as a reference to their least common ancestor. Would that be sufficient?

For instance, lca(010010010011, 010010010012) could refer to the least common ancestor of these two nodes in the following subtree:

<Node Cat="pp" Start="0" End="1" Rule="PrepNp" Head="1" nodeId="010010010010060" Length="6">
  <Node Cat="pp" Start="0" End="0" Rule="P2PP" Head="0" nodeId="010010010010011" Length="1">
    <Node n="010010010011" Cat="prep" Start="0" End="0" Length="1" morphId="010010010011" Unicode="בְּ" nodeId="010010010010010">
      <m n="010010010011" morph="R" lang="H" lemma="b" pos="preposition">בְּ</m>
    </Node>
  </Node>
  <Node Cat="np" Start="1" End="1" Rule="N2NP" Head="0" nodeId="010010010020051" Length="5">
    <Node n="010010010012" Cat="noun" Start="1" End="1" Length="5" morphId="010010010012" Unicode="רֵאשִׁית" nodeId="010010010020050">
      <m n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" " pos="noun" type="common" gender="feminine" number="singular" state="absolute">רֵאשִׁית</m>
    </Node>
  </Node>
</Node>

from macula-hebrew.

rkjtan commented on August 14, 2024

That might work. Note, however, that a prepositional phrase (with the one exception in Ps49:14-15 in the OT & an unknown small number of cases in the NT) is usually not going to be a Ref or SubjRef (Frame might be different). A good example is:

            <Node Cat="S" Start="2" End="3" Rule="Np2S" Head="0" nodeId="010020070060091" Length="9">
              <Node Cat="np" Start="2" End="3" Rule="Np-Appos" Head="0" nodeId="010020070060090" Length="9">
                <Node Cat="np" Start="2" End="2" Rule="N2NP" Head="0" nodeId="010020070060041" Length="4">
                  <Node n="010020070021" Cat="noun" Start="2" End="2" Length="4" morphId="010020070021" Unicode="יְהוָ֨ה" nodeId="010020070060040"><m n="010020070021" lang="H" after=" " lemma="3068" morph="Np" id="01pPp" pos="noun" type="proper">יְהוָ֨ה</m></Node>
                </Node>
                <Node Cat="np" Start="3" End="3" Rule="N2NP" Head="0" nodeId="010020070100051" Length="5">
                  <Node n="010020070031" Cat="noun" Start="3" End="3" Length="5" morphId="010020070031" Unicode="אֱלֹהִ֜ים" nodeId="010020070100050"><m n="010020070031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01ieN" pos="noun" type="common" gender="masculine" number="plural" state="absolute">אֱלֹהִ֜ים</m></Node>
                </Node>
              </Node>
            </Node>

"Yahweh God" is the implied subject of the verb "he breathed" (n="010020070082"), but the SubjRef just uses 010020070021 for Yahweh because Yahweh is the head of the noun phrase. Head, Start, & End are zero-based. So, Start="2" End="3" tells us that words 3 & 4 in the verse are in the phrase. Head="0" tells us that the first word inside the phrase, "Yahweh," is the head of the noun phrase. We can safely take the whole noun phrase as the referent. You could theoretically use the nodeId. However, nodeId="010020070060090" is currently consonant based. If it were word based, it would be 010020070030020. For the Ref & SubjRef, theoretically prepositional phrases in the node with Cat="pp" Rule="PrepNp" also still have the head noun as the head. So, we could end up with systematically bringing in the prepositional phrase when only the noun phrase that is the object of the preposition is the referent. We will want to make the head of the prepositional phrase consistently the preposition to avoid this problem.

from macula-hebrew.

Additional Data to Port over from Original Trees about macula-hebrew HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent