Comments (12)
Inline glosses are very useful for queries and also to help people who do not read Hebrew or Greek have some idea what is in the tree. I would really prefer glosses on the words themselves.
For Ref, SubjRef, and Frame, using an external dictionary may be the right way to go.
from macula-hebrew.
Let's raise a new issue for references to things larger than single words.
For now, the references are what they are.
from macula-hebrew.
Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"
@rkjtan - If I use SubjRef and Ref whenever the value is not "{}", will that give the right result? Are there instances where these have values but are not desired?
from macula-hebrew.
Glosses:
English and Chinese glosses in the full trees cannot be used--Mike has Andi's automatically calculated glosses for English and Chinese mapped for YTB & ClearSuite that we should be able to use
@themikejr - could you please tell me where to find these?
from macula-hebrew.
Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"
@rkjtan I assume we also want FrameGloss?
from macula-hebrew.
For SubjRef & Ref, there are only two known typos right now.
In da4:29, there is a reference, MorphId 270040290241, in "SubjRef" for two verbs with MorphId 270040290281 (יִצְבֵּא) and 270040290291 (יִתְּנִנּ), there is a typo: It should be 270040290231 "the Most High" & not 270040290241.
In Es4:4 the "SubjRef" reference 170040040092 (for the verb תִּשְׁלַח with MorphId 170040040102) is a typo: Should be 170040040082, not 170040040092.
from macula-hebrew.
On the glosses, I had sent you an email with a link to Andi’s tsv file on DropBox & also to Ulrik's application of these to Andi's version of the OSHB trees. Mike may have another source.
from macula-hebrew.
Except for the typos mentioned above and the glosses from Mike, this should be close:
Here is the query I used to generate this:
declare function local:annotations($n)
{
<node>{
$n ! (
@morphId,
@StrongNumberX,
@Vocative[.="True"],
@SenseNumber[. ne "0"],
@Frame[. ne "{}"],
@FrameGloss,
@Ref[. ne "{}"],
@SubjRef[. ne "{}"],
@Greek[. ne ""],
@GreekStrong[. ne "0"]
)
}</node>
};
<annotations>{
for $n in //Node[empty(*)]
order by $n/@morphId
return local:annotations($n)
}</annotations>
from macula-hebrew.
If we are not doing glosses for the Ref or SubjRef, perhaps we shouldn't do gloss for the Frame either for the XML? One way we could go is to use the Ids strictly as an index to find the right words, including any glosses you have for them. This would show up nicely in a UI. However, we might be clogging up the XML file with lots of glosses if we add glosses for every annotation. What do you think?
from macula-hebrew.
One important caveat for the Refs, SubjRef, & Frame should be expressed: At the time I was doing the work, I used Id because I knew that nodeIds could change & lead to errors more easily. However, when a referent is actually a phrase rather than a single word, the Id doesn't cover the whole phrase. In other words, whenever the word that I refer to in Ref, SubjRef, or Frame is actually part of a larger phrase, it is really the phrase node that is the referent & not just the single word identified in the Id. For example, if the referent is a phrase "Jesus Christ," rather than just "Jesus," I was only able to refer to the head of the phrase "Jesus" with Id. However, the referent is "Jesus Christ." By going up to the full noun phrase with "Jesus" as head, you find the full referent. The most extreme example in the OT is ps49:14-15, there are references to MorphId 190490140052, in "Ref", "SubjRef", and "Frame" for a number of morphemes in these two verses (190490140022, 190490140071, 190490150031, 190490150052, 190490150072, 190490150113, 190490150152). The Id used is 190490140052, which refers to a "prep." This is because it is a pp that actually functions as a noun "those after them/their followers." So the whole pp phrase that functions as an np is the referent. So, in the UI, we need to make sure that the Ids for Ref, SubjRef, Frame by default result in the selection of the whole phrase node of which the word with the Id is the head. Not sure what the best way to do this is for the github release of the data itself (perhaps switch over to nodeId at a later point according to the rule I express here once nodeIds are stable).
from macula-hebrew.
wrt phrase references #3 (comment), I think we need a good way to reference subtrees. One way might be to use the first and last morphemes in a spanning tree as a reference to their least common ancestor. Would that be sufficient?
For instance, lca(010010010011, 010010010012)
could refer to the least common ancestor of these two nodes in the following subtree:
<Node Cat="pp" Start="0" End="1" Rule="PrepNp" Head="1" nodeId="010010010010060" Length="6">
<Node Cat="pp" Start="0" End="0" Rule="P2PP" Head="0" nodeId="010010010010011" Length="1">
<Node n="010010010011" Cat="prep" Start="0" End="0" Length="1" morphId="010010010011" Unicode="בְּ" nodeId="010010010010010">
<m n="010010010011" morph="R" lang="H" lemma="b" pos="preposition">בְּ</m>
</Node>
</Node>
<Node Cat="np" Start="1" End="1" Rule="N2NP" Head="0" nodeId="010010010020051" Length="5">
<Node n="010010010012" Cat="noun" Start="1" End="1" Length="5" morphId="010010010012" Unicode="רֵאשִׁית" nodeId="010010010020050">
<m n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" " pos="noun" type="common" gender="feminine" number="singular" state="absolute">רֵאשִׁית</m>
</Node>
</Node>
</Node>
from macula-hebrew.
That might work. Note, however, that a prepositional phrase (with the one exception in Ps49:14-15 in the OT & an unknown small number of cases in the NT) is usually not going to be a Ref or SubjRef (Frame might be different). A good example is:
<Node Cat="S" Start="2" End="3" Rule="Np2S" Head="0" nodeId="010020070060091" Length="9">
<Node Cat="np" Start="2" End="3" Rule="Np-Appos" Head="0" nodeId="010020070060090" Length="9">
<Node Cat="np" Start="2" End="2" Rule="N2NP" Head="0" nodeId="010020070060041" Length="4">
<Node n="010020070021" Cat="noun" Start="2" End="2" Length="4" morphId="010020070021" Unicode="יְהוָ֨ה" nodeId="010020070060040"><m n="010020070021" lang="H" after=" " lemma="3068" morph="Np" id="01pPp" pos="noun" type="proper">יְהוָ֨ה</m></Node>
</Node>
<Node Cat="np" Start="3" End="3" Rule="N2NP" Head="0" nodeId="010020070100051" Length="5">
<Node n="010020070031" Cat="noun" Start="3" End="3" Length="5" morphId="010020070031" Unicode="אֱלֹהִ֜ים" nodeId="010020070100050"><m n="010020070031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01ieN" pos="noun" type="common" gender="masculine" number="plural" state="absolute">אֱלֹהִ֜ים</m></Node>
</Node>
</Node>
</Node>
"Yahweh God" is the implied subject of the verb "he breathed" (n="010020070082"), but the SubjRef just uses 010020070021 for Yahweh because Yahweh is the head of the noun phrase. Head, Start, & End are zero-based. So, Start="2" End="3" tells us that words 3 & 4 in the verse are in the phrase. Head="0" tells us that the first word inside the phrase, "Yahweh," is the head of the noun phrase. We can safely take the whole noun phrase as the referent. You could theoretically use the nodeId. However, nodeId="010020070060090" is currently consonant based. If it were word based, it would be 010020070030020. For the Ref & SubjRef, theoretically prepositional phrases in the node with Cat="pp" Rule="PrepNp" also still have the head noun as the head. So, we could end up with systematically bringing in the prepositional phrase when only the noun phrase that is the object of the preposition is the referent. We will want to make the head of the prepositional phrase consistently the preposition to avoid this problem.
from macula-hebrew.
Related Issues (20)
- Add lemmas to Hebrew nodes trees HOT 4
- There are missing `m/@xml:id`s in our current lowfat trees HOT 1
- Marble Domains (`Domain`, `ContextualDomain`, `CoreDomain`) HOT 6
- 5. Repopulate Hebrew lowfat with the latest updates:
- transcription and gloss attributes from SIL are still missing, at least from Genesis 1.
- Problems in `morpheme-mappings.xml` HOT 1
- Word Sense (from macula-greek) HOT 1
- Greek beta-to-unicode in Genesis 1:1 HOT 1
- Incorrect closing </w> tag
- Implicit article stealing attributes from following sibling
- Split node at GEN 50:10!4
- Replace `c` node with merged `m` in PSA 102:4
- After in Gen 1:12 HOT 2
- Incorrect mapping to lowfat HOT 1
- _ki_ missing in Lev 5:21. HOT 2
- Low-fat word parts missing HOT 5
- Lowfat 'c' fields have no glosses HOT 1
- include Ketiv into Macula-Hebrew ? HOT 2
- Misnumbered nodes in 1 Chronicles 20 HOT 1
- Macula Contextual Domains
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from macula-hebrew.