clear-bible / macula-hebrew Goto Github PK
View Code? Open in Web Editor NEWSyntax trees, morphology, and linguistic annotations for the Hebrew Bible
License: Other
Syntax trees, morphology, and linguistic annotations for the Hebrew Bible
License: Other
Ulrik's original proposal (which you can modify as needed) makes this need clear:
The OSHB trees are based on the Open Scriptures morphhb Hebrew Bible data. This data contains a “Lemma” attribute that is really a pointer into an index that Open Scriptures created. Each index entry ties together:
We should have the following:
In addition, what is now the “Lemma” attribute should be renamed into “AugmentedStrongs”.
Create a mapping between Groves, MARBLE, and MACULA at:
I've created a new annotations.xml file from the full trees
, this time. When I tried to change the morphId
s for issues #13 and #14, it appeared that our mapping (between old trees and macula) does not suffice for the annotations file: the morphId
s in the annotation file do not align with those in the Full trees, and as is not clear where they come from (a different version of the full trees?) or the original file might be lost, it is probably best to recreate it from a file that won't likely be changed or deleted (the full trees in trees-oshb
.
So what I've done is:
Frame
, Ref
and SubjRef
ids, but see below.However, there are at least 2 issues:
morphId
s into. OSHB n
s or the new morphId
(macula)? I've left both options open for now.We should probably review these cases manually. These cases can be retrieved easily, because I added a @duplicate
attribute to the node
element. These are all such cases:
<node morphId="010380240022" macula-text="מִ" duplicate="2macula-1full" OLDmorphId="010380240022" StrongNumberX="7969a" Unicode="מִשְׁלֹ֣שׁ"/>
<node morphId="010380240023" macula-text="ִשְׁלֹ֣שׁ" duplicate="2macula-1full" OLDmorphId="010380240022" StrongNumberX="7969a" Unicode="מִשְׁלֹ֣שׁ"/>
<node morphId="010430200021" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="010430200021|010430200022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{010430180022}" Greek="δεόμεθα" GreekStrong="1189"/>
<node morphId="010440180051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="010440180051|010440180052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{010440180031}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="020040100051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="020040100051|020040100052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{020040100021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="020040130021" macula-text="בִּ֣יּ" duplicate="1macula-2full" OLDmorphId="020040130021|020040130022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{020040100021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="040120110051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="040120110051|040120110052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{040120110021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="060070080011" macula-text="בִּ֖י" duplicate="1macula-2full" OLDmorphId="060070080011|060070080012" StrongNumberX="0994" Unicode="בּ|ִ֖י" Ref="{060070070021}"/>
<node morphId="070060130041" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070060130041|070060130042" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070060130031}"/>
<node morphId="070060150031" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070060150031|070060150032" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070060130031}"/>
<node morphId="070090410032" macula-text="ארוּמָ֑ה" duplicate="1macula-2full" OLDmorphId="070090410032|070090410033" StrongNumberX="1886a|0725" Unicode="ארוּמָ֑ה" Greek="αρημα"/>
<node morphId="070130080061" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070130080061|070130080062" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070130080021}"/>
<node morphId="070190130041" macula-text="וְ" duplicate="1macula-2full" OLDmorphId="070190130032|070190130041" StrongNumberX="1886j|2050b" Unicode="ָ֥|וְ" Greek="καὶ" GreekStrong="2532"/>
<node morphId="090010260021" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="090010260021|090010260022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{090010230182}"/>
<node morphId="110030170041" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="110030170041|110030170042" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Vocative="True" Ref="{110030170022}"/>
<node morphId="110030260141" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="110030260141|110030260142" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{110030260022}"/>
<node morphId="130020520061" macula-text="הָרֹאֶ֖ה" duplicate="1macula-2full" OLDmorphId="130020520061|130020520062" StrongNumberX="1886a|7204a" Unicode="הָ|רֹאֶ֖ה" SenseNumber="1" Greek="αραα"/>
<node morphId="130050090041" macula-text="לְ" duplicate="2macula-1full" OLDmorphId="130050090041" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="ἐρξομένων" GreekStrong="2064"/>
<node morphId="130050090042" macula-text="ב֣וֹא" duplicate="2macula-1full" OLDmorphId="130050090041" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="ἐρξομένων" GreekStrong="2064"/>
<node morphId="130200040021" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="130200040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֔ן"/>
<node morphId="130200040022" macula-text="כֵ֔ן" duplicate="2macula-1full" OLDmorphId="130200040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֔ן"/>
<node morphId="140200010021" macula-text="אַֽחֲרֵי" duplicate="2macula-1full" OLDmorphId="140200010021" StrongNumberX="0310" Unicode="אַֽחֲרֵיכֵ֡ן"/>
<node morphId="140200010022" macula-text="כֵ֡ן" duplicate="2macula-1full" OLDmorphId="140200010021" StrongNumberX="0310" Unicode="אַֽחֲרֵיכֵ֡ן"/>
<node morphId="140200350012" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="140200350012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֗ן"/>
<node morphId="140200350013" macula-text="כֵ֗ן" duplicate="2macula-1full" OLDmorphId="140200350012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֗ן"/>
<node morphId="140240040021" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="140240040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֑ן"/>
<node morphId="140240040022" macula-text="כֵ֑ן" duplicate="2macula-1full" OLDmorphId="140240040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֑ן"/>
<node morphId="140260080081" macula-text="לְ" duplicate="2macula-1full" OLDmorphId="140260080081" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="εἰσόδου" GreekStrong="1529"/>
<node morphId="140260080082" macula-text="ב֣וֹא" duplicate="2macula-1full" OLDmorphId="140260080081" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="εἰσόδου" GreekStrong="1529"/>
<node morphId="150030050012" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="150030050012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֞ן"/>
<node morphId="150030050013" macula-text="כֵ֞ן" duplicate="2macula-1full" OLDmorphId="150030050012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֞ן"/>
<node morphId="150040090091" macula-text="דִּ֠ינָי" duplicate="2macula-1full" OLDmorphId="150040090091" StrongNumberX="1784" Unicode="דִּ֠ינָיֵא" Greek="διναῖοι"/>
<node morphId="150040090092" macula-text="ֵא" duplicate="2macula-1full" OLDmorphId="150040090091" StrongNumberX="1784" Unicode="דִּ֠ינָיֵא" Greek="διναῖοι"/>
<node morphId="220080060181" macula-text="שַׁלְהֶ֥בֶתְ" duplicate="2macula-1full" OLDmorphId="220080060181" StrongNumberX="7957a" SenseNumber="1" Unicode="שַׁלְהֶ֥בֶתְיָֽה"/>
<node morphId="220080060182" macula-text="יָֽה" duplicate="2macula-1full" OLDmorphId="220080060181" StrongNumberX="7957a" SenseNumber="1" Unicode="שַׁלְהֶ֥בֶתְיָֽה"/>
<node morphId="380020130061" macula-text="עֲלֵי" duplicate="2macula-2full" OLDmorphId="380020130061|380020130062" StrongNumberX="5921|3963a" Unicode="עֲל|ֵיהֶ֔ם" Greek="ἐπ’|αὐτούς" GreekStrong="1909|848" Ref="{380020120102}"/>
<node morphId="380020130062" macula-text="הֶ֔ם" duplicate="2macula-2full" OLDmorphId="380020130061|380020130062" StrongNumberX="5921|3963a" Unicode="עֲל|ֵיהֶ֔ם" Greek="ἐπ’|αὐτούς" GreekStrong="1909|848" Ref="{380020120102}"/>
The morphId column in annotations.xml uses Groves Center numbering. It needs to be updated.
We need to decide whether to apply these annotations at the leaf Node level or at lower levels when we merge into the tree. @rkjtan , any thoughts?
I think this should be orthogonal to prepare-oshb
, it should be a separate merge.
In annotations.xml, attributes that contain Septuagint Greek are in Beta encoding. They should be converted to Unicode.
I've extracted all English lexicon entries from marble-lexicon\SDBH\SDBH-EXPORT-en.XML
and grouped it by the ids they are associated with. This can be found (sorted) in \trees-oshb\py\create-annotations-and-glosses\annotations-and-glosses\marble-lexicon-entries.xml
.
There are a few issues with the extracted data:
1 There is often more than one lexicon entry
associated with single morphemes.
2 These entries
often don't refer exactly to the morpheme they are connected to (by id).
3 There often is some unclear data in the entries, like "NO DATA YET" or several references (?) like "({S:0010010010})"
A simple algorithm strips, tweaks, and merges multiple entries/meanings into one entry, which is the one that has been added to the trees. This is also included in the extracted and sorted lexicon entries file.
Consider the following examples:
<morph marble-id="00503300200046" marble-text="אשׁ">
<entry lemma="אֵשׁ" gloss="fire" definition="= state of burning, in which substances combine chemically with oxygen from the air and give out bright light, heat, and smoke; ► used for cooking, melting, cleansing, heating, and destroying; ≈ many aspects of life are compared with fire, such as anger, jealousy, aggression, wickedness, words, life, certain sicknesses, suffering, etc."/>
<entry lemma="אֲשֵׁדָה" gloss="slope" definition="= side of a mountain or hill"/>
<entry lemma="אֲשֵׁדָה" gloss="NO DATA YET" definition="NO DATA YET"/>
<entry lemma="אֵשְׁדָּת" gloss="fiery law" definition="read {L:אֵשׁ<SDBH:אֵשׁ>} {L:דָּת<SDBH:דָּת>} with {A:MT-Q}"/>
<entry lemma="דאה" gloss="to swoop down" definition="= action by which a bird moves swiftly downwards ► in order to capture its prey; ≈ often used metaphorically to refer to an army attacking the enemy"/>
<entry lemma="דאה" gloss="NO DATA YET" definition="NO DATA YET"/>
<merged-entry lemma="אֲשֵׁדָה|אֵשְׁדָּת|דאה" gloss="fire|slope|fiery law|to swoop down" definition="= state of burning, in which substances combine chemically with oxygen from the air and give out bright light, heat, and smoke; ► used for cooking, melting, cleansing, heating, and destroying; ≈ many aspects of life are compared with fire, such as anger, jealousy, aggression, wickedness, words, life, certain sicknesses, suffering, etc.|= side of a mountain or hill|read אֵשׁ (אֵשׁ) דָּת (דָּת) with |= action by which a bird moves swiftly downwards ► in order to capture its prey; ≈ often used metaphorically to refer to an army attacking the enemy"/>
</morph>
<morph marble-id="01002400600024" marble-text="יַּ֔עַן">
<entry lemma="דָּן" gloss="Dan" definition="= man, tribe, and territory; ◄ fifth son of {L:Jacob<SDBH:יַעֲקֹב>} and first son of {L:Bilhah<SDBH:בִּלְהָה>}, slave of {L:Rachel<SDBH:רָחֵל>}; ► founder of a tribe"/>
<entry lemma="דָּן" gloss="NO DATA YET" definition="NO DATA YET"/>
<entry lemma="יַעַן" gloss="Dan Jaan" definition="read {L:דָּנָה יַעַן<SDBH:דָּן יַעַן>}"/>
<entry lemma="עִיֹּון" gloss="Ijon" definition="= town; ◄ territory of {L:Naphtali<SDBH:נַפְתָּלִי>}; ► part of northern kingdom of {L:Israel<SDBH:יִשְׂרָאֵל>}; conquered by {L:Ben-Hadad<SDBH:בֶּן־הֲדַד>} of Syria during reign of king {L:Baasha<SDBH:בַּעְשָׁא>}; conquered by {L:Tiglath-Pileser<SDBH:תִּגְלַת פִּלְאֶסֶר>} of Assyria during reign of {L:Pekah<SDBH:פֶּקַח>}"/>
<entry lemma="עִיֹּון" gloss="NO DATA YET" definition="NO DATA YET"/>
<merged-entry lemma="דָּן|יַעַן|עִיֹּון" gloss="Dan Jaan|Ijon" definition="= man, tribe, and territory; ◄ fifth son of Jacob (יַעֲקֹב) and first son of Bilhah (בִּלְהָה) slave of Rachel (רָחֵל) ► founder of a tribe|read דָּנָה יַעַן (דָּן יַעַן)|= town; ◄ territory of Naphtali (נַפְתָּלִי) ► part of northern kingdom of Israel (יִשְׂרָאֵל) conquered by Ben Hadad (בֶּן־הֲדַד) of Syria during reign of king Baasha (בַּעְשָׁא) conquered by Tiglath Pileser (תִּגְלַת פִּלְאֶסֶר) of Assyria during reign of Pekah (פֶּקַח)"/>
</morph>
<morph marble-id="01802301200014" marble-text="חֻקִּ֗י">
<entry lemma="חֹק" gloss="law|decree" definition="= a pattern of behavior required by somebody in authority"/>
<entry lemma="חֹק" gloss="law|decree" definition="= a pattern of behavior required by somebody in authority"/>
<entry lemma="חֹק" gloss="daily allotment" definition="= a certain quantity of food that an individual needs every day in order to survive"/>
<entry lemma="חֹק" gloss="NO DATA YET" definition="NO DATA YET"/>
<merged-entry lemma="חֹק" gloss="law|decree|daily allotment" definition="= a pattern of behavior required by somebody in authority|= a certain quantity of food that an individual needs every day in order to survive"/>
</morph>
<morph marble-id="01802401200026" marble-text="תִּפְלָֽה">
<entry lemma="תְּפִלָּה" gloss="prayer" definition="= action by which humans speak to a deity, often by raising their hands, ► requesting help or expressing their thankfulness"/>
<entry lemma="תְּפִלָּה" gloss="NO DATA YET" definition="NO DATA YET"/>
<entry lemma="תִּפְלָה" gloss="irrationality|senselessness" definition="= state when a certain activity does not appear to be in accordance with good sense"/>
<entry lemma="תִּפְלָה" gloss="irrationality|senselessness" definition="= state when a certain activity does not appear to be in accordance with good sense"/>
<merged-entry lemma="תְּפִלָּה|תִּפְלָה" gloss="prayer|irrationality|senselessness" definition="= action by which humans speak to a deity, often by raising their hands, ► requesting help or expressing their thankfulness|= state when a certain activity does not appear to be in accordance with good sense"/>
</morph>
<morph marble-id="01900800600010" marble-text="אֱלֹהִ֑ים">
<entry lemma="אֱלֹהִים" gloss="heavenly beings" definition="plural with plural meaning: = generic term for a supernatural being, worshiped by individuals or entire nations"/>
<entry lemma="אֱלֹהִים" gloss="god (of someone)|God (of someone)" definition="plural with singular meaning: = generic term for a supernatural being, worshiped by individuals or entire nations"/>
<entry lemma="אֱלֹהִים" gloss="God" definition="plural with singular meaning: = the highest God, creator of heaven and earth"/>
<entry lemma="אֱלֹהִים" gloss="God" definition="plural with singular meaning: = the highest God, creator of heaven and earth"/>
<merged-entry lemma="אֱלֹהִים" gloss="heavenly beings|god (of someone)|God (of someone)" definition="plural with plural meaning: = generic term for a supernatural being, worshiped by individuals or entire nations|plural with singular meaning: = generic term for a supernatural being, worshiped by individuals or entire nations|plural with singular meaning: = the highest God, creator of heaven and earth"/>
</morph>
<morph marble-id="02303002200028" marble-text="דָוָ֔ה">
<entry lemma="דָּוֶה" gloss="menstruous woman" definition="= female person discharging blood and other material from the lining of the uterus at intervals of about one lunar month; ≈ regarded as ritually unclean"/>
<entry lemma="דָּוֶה" gloss="menstrual discharge" definition="= blood and other material from the lining of the uterus discharged from the body in the menstrual period; ≈ regarded as unclean"/>
<entry lemma="דָּוֶה" gloss="filthy things = objects stained by menstruation" definition="= an unspecified object ► brought in contact to a woman undergoing menstruation; ≈ regarded as unclean"/>
<entry lemma="דָּוֶה" gloss="menstruous cloth" definition="= piece of cloth ► worn or touched by a woman undergoing menstruation; ≈ regarded as unclean"/>
<merged-entry lemma="דָּוֶה" gloss="menstruous woman|menstrual discharge|filthy things = objects stained by menstruation|menstruous cloth" definition="= female person discharging blood and other material from the lining of the uterus at intervals of about one lunar month; ≈ regarded as ritually unclean|= blood and other material from the lining of the uterus discharged from the body in the menstrual period; ≈ regarded as unclean|= an unspecified object ► brought in contact to a woman undergoing menstruation; ≈ regarded as unclean|= piece of cloth ► worn or touched by a woman undergoing menstruation; ≈ regarded as unclean"/>
</morph>
<morph marble-id="02800600700008" marble-text="אָדָ֖ם">
<entry lemma="אָדָם" gloss="human|humankind|human being(s)" definition="= human being as an individual or as a class of living creatures; sometimes explicitly subdivided between {L:male<SDBH:זָכָר>} and {L:female<SDBH:נְקֵבָה>}; ≈ associated with mortality"/>
<entry lemma="אָדָם" gloss="Adam" definition="= first man; ◄ created by God; ► husband of {L:Eve<SDBH:חַוָּה>}, father of {L:Cain<SDBH:קַיִן>}, {L:Abel<SDBH:הֶבֶל>}, and {L:Seth<SDBH:שֵׁת>}"/>
<entry lemma="אָדָם" gloss="Adam" definition="= town; ◄ located near the river {L:Jordan<SDBH:יַרְדֵּן>} near {L:Zarethan<SDBH:צָרְתָן>}"/>
<entry lemma="אַדְמָה" gloss="Admah" definition="= town; ◄ located near {L:Dead Sea<SDBH:יָם־הַמֶּלַח>}; ► destroyed with {L:Sodom<SDBH:סְדֹם>} and {L:Gomorrah<SDBH:עֲמֹרָה>}"/>
<merged-entry lemma="אַדְמָה" gloss="humankind|human being(s)|Adam|Admah" definition="= human being as an individual or as a class of living creatures; sometimes explicitly subdivided between male (זָכָר) and female (נְקֵבָה) ≈ associated with mortality|= first man; ◄ created by God; ► husband of Eve (חַוָּה) father of Cain (קַיִן) Abel (הֶבֶל) and Seth (שֵׁת)|= town; ◄ located near the river Jordan (יַרְדֵּן) near Zarethan (צָרְתָן)|= town; ◄ located near Dead Sea (יָם־הַמֶּלַח) ► destroyed with Sodom (סְדֹם) and Gomorrah (עֲמֹרָה)"/>
</morph>
<morph marble-id="03400301600016" marble-text="פָּשַׁ֖ט">
<entry lemma="פשׁט" gloss="to cast (one's) skin|to shed (one's) skin" definition="meaning unsure; possibly: = process by which a locust sheds its skin"/>
<entry lemma="פשׁט" gloss="to advance upon (an area or people)|to make a raid" definition="= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions"/>
<entry lemma="פשׁט" gloss="to advance upon (an area or people)|to make a raid" definition="= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions"/>
<entry lemma="פשׁט" gloss="to spread (one's) wings" definition="meaning unsure; possibly: = action by which an animal extends its wings in order to fly away"/>
<merged-entry lemma="פשׁט" gloss="to cast (one's) skin|to shed (one's) skin|to advance upon (an area or people)|to make a raid|to spread (one's) wings" definition="meaning unsure; possibly: = process by which a locust sheds its skin|= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions|meaning unsure; possibly: = action by which an animal extends its wings in order to fly away"/>
</morph>
Attribute order needs to be consistent and readable.
Right now c
elements do not have any attributes in lowfat.
The Lowfat
element does not have proper spacing at the moment. For example,
<p>
<milestone unit="verse" id="GEN 1:2">GEN 1:2</milestone>
וְ הָ אָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָ בֹ֔הוּ וְ חֹ֖שֶׁךְ עַל פְּנֵ֣י תְה֑וֹם וְ ר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל פְּנֵ֥י הַ מָּֽיִם
</p>
The numbering of semantic roles and participant reference use the Groves Center skeleton files, not the current numbering system. These roles need to be renumbered to match the current trees.
Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"
Once this is done, we can pull these attributes into the main tree. We need to consider whether to do this at the word level (using prepare-oshb
or at the lowest Node level (perhaps in a separate annotations merge, which could be useful for cooking trees of varying complexity).
First of all, there is a bug/type in the cherith file wlc-gloss.tsv
. Somewhere halfway through the document, the remaining lines are all cramped into one 'cell' making it unprocessable using simple tsv readers. I've worked around the error, but we should probably check where or why it occurred.
The Cherith morphemes map nicely to our Node leaves except for 10 cases:
<word morphId="010380240022;010380240023" cherithId="010380240022" hebrew="מִשְׁלֹ֣שׁ" english="three" chinese="三"/>
<word morphId="130050090041;130050090042" cherithId="130050090041" hebrew="לְב֣וֹא" english="entrance" chinese="来~到"/>
<word morphId="130200040021;130200040022" cherithId="130200040021" hebrew="אַחֲרֵיכֵ֔ן" english="after this" chinese="此后"/>
<word morphId="140200010021;140200010022" cherithId="140200010021" hebrew="אַֽחֲרֵיכֵ֡ן" english="after this" chinese="此后"/>
<word morphId="140200350012;140200350013" cherithId="140200350012" hebrew="אַחֲרֵיכֵ֗ן" english="after this" chinese="此后"/>
<word morphId="140240040021;140240040022" cherithId="140240040021" hebrew="אַחֲרֵיכֵ֑ן" english="afterward" chinese="此后"/>
<word morphId="140260080081;140260080082" cherithId="140260080081" hebrew="לְב֣וֹא" english="border" chinese="至~来到"/>
<word morphId="150030050012;150030050013" cherithId="150030050012" hebrew="אַחֲרֵיכֵ֞ן" english="after that" chinese="此后"/>
<word morphId="150040090091;150040090092" cherithId="150040090091" hebrew="דִּ֠ינָיֵא" english="judges" chinese="法官"/>
<word morphId="220080060181;220080060182" cherithId="220080060181" hebrew="שַׁלְהֶ֥בֶתְיָֽה" english="raging flame" chinese="不可遏制的烈焰"/>
How should we deal with this? Add the glosses to both nodes in our trees? Or try to split the glosses manually?
USFM identifiers identify verses, not sentences, and our extension for words means that placing a word identifier on morphemes and calling it an identifier is faulty because it is not a unique identifer when used this way.
Since USFM is the reference system we are using, we don't have to say USFM each time we use it.
Instead of this:
<?xml version="1.0" encoding="UTF-8"?><Sentences>
<Sentence USFMId="GEN 4:1">
<Trees>
<Tree>
<Node Cat="S" Head="0" nodeId="0100400100110200">
<Node Cat="cjp" Rule="Cj2Cjp" Head="0" nodeId="0100400100110011">
<Node n="010040010011" Cat="cj" morphId="010040010011" Unicode="וְ" nodeId="0100400100110010">
<m USFMId="GEN 4:1!1" n="010040010011" morph="C" lang="H" lemma="c" pos="conjunction">וְ</m>
</Node>
</Node>
<Node Cat="CL" Rule="S-V-O" Head="1" nodeId="0100400100120070">
<Node Cat="S" Rule="Np2S" Head="0" nodeId="0100400100120021">
<Node Cat="np" Rule="DetNP" Head="1" nodeId="0100400100120020">
<Node n="010040010012" Cat="art" morphId="010040010012" Unicode="הָ֣" nodeId="0100400100120010">
<m USFMId="GEN 4:1!1" n="010040010012" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ֣</m>
</Node>
<Node Cat="np" Rule="N2NP" Head="0" nodeId="0100400100130011">
<Node n="010040010013" Cat="noun" morphId="010040010013" Unicode="אָדָ֔ם" nodeId="0100400100130010">
<m USFMId="GEN 4:1!1" n="010040010013" morph="Ncmsa" lang="H" lemma="120" after=" " pos="noun" type="common" gender="masculine" number="singular" state="absolute">אָדָ֔ם</m>
</Node>
</Node>
</Node>
</Node>
I would prefer this:
<?xml version="1.0" encoding="UTF-8"?><Sentences>
<Sentence verse="GEN 4:1">
<Trees>
<Tree>
<Node Cat="S" Head="0" nodeId="0100400100110200">
<Node Cat="cjp" Rule="Cj2Cjp" Head="0" nodeId="0100400100110011">
<Node n="010040010011" Cat="cj" morphId="010040010011" Unicode="וְ" nodeId="0100400100110010">
<m word="GEN 4:1!1" n="010040010011" morph="C" lang="H" lemma="c" pos="conjunction">וְ</m>
</Node>
</Node>
<Node Cat="CL" Rule="S-V-O" Head="1" nodeId="0100400100120070">
<Node Cat="S" Rule="Np2S" Head="0" nodeId="0100400100120021">
<Node Cat="np" Rule="DetNP" Head="1" nodeId="0100400100120020">
<Node n="010040010012" Cat="art" morphId="010040010012" Unicode="הָ֣" nodeId="0100400100120010">
<m word="GEN 4:1!1" n="010040010012" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ֣</m>
</Node>
<Node Cat="np" Rule="N2NP" Head="0" nodeId="0100400100130011">
<Node n="010040010013" Cat="noun" morphId="010040010013" Unicode="אָדָ֔ם" nodeId="0100400100130010">
<m word="GEN 4:1!1" n="010040010013" morph="Ncmsa" lang="H" lemma="120" after=" " pos="noun" type="common" gender="masculine" number="singular" state="absolute">אָדָ֔ם</m>
</Node>
</Node>
</Node>
</Node>
13 sentences are currently missing their morphology and text. These include 9 sentences that have directional suffixes within a compound:
("2s20:15","ca6:12","2s24:6","gn46:1","gn28:2","gn28:5","gn28:6","gn28:7","js19:13")
They also include 3 sentences where OSHB and Westminster disagree on a Qere:
("ne5:7","ps21:2","da2:39")
The final sentence involves a mismatch in the implicit article:
("lv27:16")
Obviously, we need to fix these.
Because some words had to be broken up into constituent parts for analysis, one unique id would have to be shared across its two or three constituent parts to carry over into the trees. For example:
<verse osisID="Gen.1.1">
<w lemma="b/7225" n="1.0" morph="HR/Ncfsa" id="01xeN">בְּ/רֵאשִׁ֖ית</w>
<w lemma="1254 a" morph="HVqp3ms" id="01Nvk">בָּרָ֣א</w>
<w lemma="430" n="1" morph="HNcmpa" id="01TyA">אֱלֹהִ֑ים</w>
<w lemma="853" morph="HTo" id="01vuQ">אֵ֥ת</w>
<w lemma="d/8064" n="0.0" morph="HTd/Ncmpa" id="01TSc">הַ/שָּׁמַ֖יִם</w>
<w lemma="c/853" morph="HC/To" id="01k5P">וְ/אֵ֥ת</w>
<w lemma="d/776" n="0" morph="HTd/Ncbsa" id="01nPh">הָ/אָֽרֶץ</w><seg type="x-sof-pasuq">׃</seg>
</verse>
"in beginning", "the heavens," "and [object marker]", "the earth" all didn't keep their OSHB unique Ids due to having been separated into 2 parts, while "created", "God", "[object marker]" still show their OSHB ids in the trees. Perhaps should strip all the OSHB ids to avoid confusion.
There have been several issues and small inconsistencies concerning mapping different datasets (OSHB, old/new trees, Marble, etc). This issue should help to keep the discussions and decisions about these issues documented and traceable.
Please refer to this document for comments, issues, and feel free to add your own.
//Node[c|m][not(./@n)]
and //Node[c]
return exactly the same.
The Nodes that have compounds do not have an @n
attribute.
<Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010" StrongNumberX="8423">
<c english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
<m word="GEN 4:22!6" xml:id="o010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
<m word="GEN 4:22!7" xml:id="o010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
</c>
</Node>
</Node>
<Node Cat="np" Rule="NPofNP" Head="0" nodeId="0100402200810060">
<Node Cat="np" Rule="Vp2Np" Head="0" nodeId="0100402200810012">
<Node Cat="vp" Rule="V2VP" Head="0" nodeId="0100402200810011">
<Node n="o010040220081" Cat="verb" morphId="010040220071" Unicode="לֹטֵ֕שׁ" nodeId="0100402200810010" StrongNumberX="3913" SenseNumber="2" Frame="A0:010040220061; A1:010040220091;" SubjRef="010040220061" Greek="σφυροκόπος">
<m word="GEN 4:22!8" xml:id="o010040220081" lang="H" after=" " lemma="3913" morph="Vqrmsc" pos="verb" stem="qal" type="participle active" gender="masculine" number="singular" state="construct" english="made" mandarin="打造" SDBH="לטשׁ:002001001048:Shape:to hammer;to forge">לֹטֵ֕שׁ</m>
</Node>
</Node>
</Node>
We have several sources of glosses, and they have different advantages and purposes. We need simple attribute names that support the glosses we are using:
Obviously, glosses in other languages may also become a factor.
I don't particularly like attribute names like cherith-english
in the following:
<Node xmlns:xi="http://www.w3.org/2001/XInclude" Cat="noun" morphId="130020160092" Unicode="עֲשָׂה־אֵ֖ל" nodeId="1300201600920010" StrongNumberX="6214" Greek="ασαηλ">
<c cherith-english="Asahel" cherith-chinese="亚撒黑" marble-sense="עֲשָׂהאֵל:003001007:Names of People:Asahel|שָׁלֹשׁ:002001001042:Quantity;002001003009:Frequency:three">
<m word="1CH 2:16!9" n="130020160092" morph="Np" lang="H" lemma="6214+" after="־" pos="noun" type="proper">עֲשָׂה</m>
<m word="1CH 2:16!10" n="130020160101" lang="H" after=" " lemma="6214" morph="Np" pos="noun" type="proper">אֵ֖ל</m>
</c>
</Node>
So we need a naming convention that gives us flexibility while keeping this simple. I don't think we need the attribute name to attribute the source, we can do that in documentation and copyright / license statements.
Any suggestions?
At the last minute, we found we had IP issues with the English glosses we had intended to use.
We need to compare available glosses and pick glosses that are particularly good.
We need to develop tooling that makes it possible for an expert like @rkjtan to systematically examine differences. For morphemes, we have used a consonant-only comparison, using | to indicate boundaries between morphemes. For instance:
Verse: 27002039
- ו|בתר|כ|תקומ|מלכו|אחרי|ארע|מנ|כ|ו|מלכו|תליתאה|אחרי|די|נחש|א|די|תשלט|ב|כל|ארע|א
- ו|בתר|כ|תקומ|מלכו|אחרי|ארעא|מנ|כ|ו|מלכו|תליתאה|אחרי|די|נחש|א|די|תשלט|ב|כל|ארע|א
Verse: 16005007
- ו|ימלכ|לב|י|על|י|ו|אריב|ה|את|ה|חרימ|ו|את|ה|סגנימ|ו|אמר|ה|ל|המ|משא|איש|ב|אחי|ו|אתמ|נשימ|ו|אתנ|עלי|המ|קהלה|גדולה
- ו|ימלכ|לב|י|על|י|ו|אריב|ה|את|ה|חרימ|ו|את|ה|סגנימ|ו|אמר|ה|ל|המ|משא|איש|ב|אחי|ו|אתמ|נשאימ|ו|אתנ|עלי|המ|קהלה|גדולה
Verse: 19021002
- יהוה|ב|עז|כ|ישמח|מלכ|ו|ב|ישועת|כ|מה|יגל|מאד
- יהוה|ב|עז|כ|ישמח|מלכ|ו|ב|ישועת|כ|מה|יגיל|מאד
This would be good to have as part of a general-purpose tool that also identifies other inconsistencies.
When I run this simple count:
'# of m', count(//m),
'# of @Domain', count(//@Domain),
'# of @Extends', count(//@Extends)
I get this result (I would have expected more @Domain
)
<?xml version="1.0" encoding="UTF-8"?># of m 475911 # of @Domain 214661 # of @Extends 35441
I am noticing a number of words do not have @Domain
. E.g.,
<Node n="o010160030081" Cat="noun" morphId="010160030081" Unicode="שִׁפְחָת" nodeId="0101600300810010" StrongNumberX="8198" SenseNumber="1" Greek="παιδίσκην" GreekStrong="3814">
<m word="GEN 16:3!8" xml:id="o010160030081" morph="Ncfsc" lang="H" lemma="8198" pos="noun" type="common" gender="feminine" number="singular" state="construct" english="servant" mandarin="婢女">שִׁפְחָתָ֔</m>
</Node>
This word, you will notice, does have @SenseNumber
. Is something falling through the cracks? @klosoter @jonathanrobie
This is what the extracted SIL data looks like (full file here)
{
'wd': 'B.:/R")$I73YT',
'ws': 'בְּרֵאשִׁ֖ית',
'wt': 'bərēʾšîṯ',
'wc': 'בְּרֵאשִׁית',
'bf': '\\p',
'vdm': '{"Temporal <H>בְּ</H>" 39.6.2}',
'netB': '',
'wbc': '{"The definite article is lacking, but \'in the beginning\' is an acceptable translation" GEN.1.1.b}',
'egs': 'in.beginning',
'morphs': {
'010010010011': {
'm': 'B.:',
'ms': 'בְּ',
'mt': 'bə',
'l': 'B.:',
'ls': 'בְּ',
'lt': 'bə',
'dfA': '\\7b\\d0\\62\\7d',
'df': '}בּ{',
't': 'Pp'
},
'010010010012': {
'm': 'R")$I73YT',
'ms': 'רֵאשִׁ֖ית',
'mt': 'rēʾšiyṯ',
'l': 'R")$IYT',
'ls': 'רֵאשִׁית',
'lt': 'rēʾšîṯ',
'dfA': '\\7b\\74\\79\\69\\48\\27\\e3\\72\\7d',
'df': '}רֵאשִׁית{',
't': 'ncfsa',
'str': '{07225}'
}
}
}
There are several inconsistencies with the Marble lexicon entries from /ubsicap/marble-lexicon/SDBH/SDBH-EXPORT-en.XML
.
Please see the following files to check this out.
@morphId
and marble-BHS.@n
and marble-BHS.Collected issues with marble lexicon data. The following cases occur:
<morph SDBH="עַקְרָב:001001002001006:Swarming Creatures:scorpion|עַקְרָב:003001010:Names of Locations:Ascent of Akrabbim;Ascent of Scorpions" marble-text="מַעֲלֵ֤ה עַקְרַבִּים֙">
<Node n="060150030042" Cat="noun" morphId="060150030042" Unicode="מַעֲלֵ֤ה" nodeId="0601500300420010" StrongNumberX="4608" SenseNumber="1" Greek="προσαναβάσεως">
<m word="JOS 15:3!4" n="060150030042" morph="Ncmsc" lang="H" lemma="4610+" after=" " pos="noun" type="common" gender="masculine" number="singular" state="construct" english="ascent" mandarin="隘口">מַעֲלֵ֤ה</m>
</Node>
<Node n="060150030051" Cat="noun" morphId="060150030051" Unicode="עַקְרַבִּים֙" nodeId="0601500300510010" StrongNumberX="6137" SenseNumber="2" Greek="ακραβιν">
<m word="JOS 15:3!5" n="060150030051" lang="H" after=" " lemma="4610" morph="Np" pos="noun" type="proper" english="akrabbim" mandarin="亚克拉滨">עַקְרַבִּים֙</m>
</Node>
</morph>
<c>
elements and some of their <m>
children are aligned with a marble sense (using the mapping), which results in duplicates:<c english="Arameans of Beth-rehob" mandarin="伯·利合的亚兰人" SDBH="בֵּית רְחֹוב:003001010:Names of Locations:Beth-Rehob">
<m USFMId="2SA 10:6!12" n="100100060121" lang="H" after=" " lemma="758" morph="Np" pos="noun" type="proper" SDBH="אֲרַם בֵּית־רְחֹוב:003001006:Names of Groups:Arameans of Beth-Rehob">אֲרַ֨ם</m>
<m USFMId="2SA 10:6!13" n="100100060131" lang="H" after="־" lemma="1050+" morph="Np" pos="noun" type="proper">בֵּית</m>
<m USFMId="2SA 10:6!14" n="100100060141" lang="H" after=" " lemma="1050" morph="Np" pos="noun" type="proper">רְח֜וֹב</m>
</c>
<m>
element. This can best be seen by checking the marble-senses.xml which contains for each Marble Id:marble-text
being the text associated with the id in Marble-BHS):<morph marble-id="00503300200046" marble-text="אשׁ" combined-data="אֵשׁ:001006006:Fire:fire|אֲשֵׁדָה:001005003:Landforms:slope|אֵשְׁדָּת::fiery law|דאה:002002001009:Move:to swoop down">
<entry lemma="אֵשׁ" gloss="fire" domain="001006006:Fire"/>
<entry lemma="אֲשֵׁדָה" gloss="slope" domain="001005003:Landforms"/>
<entry lemma="אֵשְׁדָּת" gloss="fiery law" domain=""/>
<entry lemma="דאה" gloss="to swoop down" domain="002002001009:Move"/>
</morph>
It is nearly impossible to just automatically select the entries for a given Id that match the actual Hebrew text because the lexicon lemmas are often base forms. So, there will probably be some sdbh senses that do not match the <m>
or <c>
attribute they are added to.
Any thoughts, suggestions, or solutions?
Instead of this:
<Node n="190010010021" Cat="art" morphId="190010010021" Unicode="הָ" nodeId="1900100100210010" StrongNumberX="1886a" english="the" chinese="这">
<m USFMId="PSA 1:1!2" n="190010010021" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ</m>
</Node>
I would like this:
<Node n="190010010021" Cat="art" morphId="190010010021" Unicode="הָ" nodeId="1900100100210010" StrongNumberX="1886a">
<m USFMId="PSA 1:1!2" n="190010010021" morph="Td" lang="H" lemma="d" pos="particle" type="definite article" english="the" chinese="这">הָ</m>
</Node>
For compound words, put the gloss on the compound:
<Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010">
<c>
<m USFMId="GEN 4:22!6" n="010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
<m USFMId="GEN 4:22!7" n="010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
</c>
</Node>
There are 266 leaf Node elements with no <m>
or <c>
children. This is a bug. We should add this test to our unit tests, @jacobwegner .
//Node[empty(*)]
Here are the nodes in question:
<Node Cat="cj" morphId="100200150011" Unicode="וַ" nodeId="100200150010010">WA</Node>
<Node Cat="verb" morphId="100200150012" Unicode="יָּבֹ֜אוּ" nodeId="100200150020040">Y.FBO61)W.</Node>
<Node Cat="cj" morphId="100200150021" Unicode="וַ" nodeId="100200150060010">WA</Node>
<Node Cat="verb" morphId="100200150022" Unicode="יָּצֻ֣רוּ" nodeId="100200150070040">Y.FCU74RW.</Node>
<Node Cat="prep" morphId="100200150031" Unicode="עָלָ֗י" nodeId="100200150110030">(FLF81Y</Node>
<Node Cat="pron" morphId="100200150032" Unicode="ו" nodeId="100200150140010">W</Node>
<Node Cat="prep" morphId="100200150041" Unicode="בְּ" nodeId="100200150150010">B.:</Node>
<Node Cat="cj" morphId="100200150051" Unicode="וַ" nodeId="100200150280010">WA</Node>
<Node Cat="verb" morphId="100200150052" Unicode="יִּשְׁפְּכ֤וּ" nodeId="100200150290050">Y.I$:P.:K70W.</Node>
<Node Cat="noun" morphId="100200150061" Unicode="סֹֽלְלָה֙" nodeId="100200150340040">SO75L:LFH03</Node>
<Node Cat="prep" morphId="100200150071" Unicode="אֶל־" nodeId="100200150380020">)EL-</Node>
<Node Cat="art" morphId="100200150081" Unicode="הָ" nodeId="100200150400010">HF</Node>
<Node Cat="noun" morphId="100200150082" Unicode="עִ֔יר" nodeId="100200150410030">(I80YR</Node>
<Node Cat="cj" morphId="100200150091" Unicode="וַֽ" nodeId="100200150440010">WA75</Node>
<Node Cat="verb" morphId="100200150092" Unicode="תַּעֲמֹ֖ד" nodeId="100200150450040">T.A(:AMO73D</Node>
<Node Cat="prep" morphId="100200150101" Unicode="בַּ" nodeId="100200150490010">B.A</Node>
<Node Cat="art" morphId="100200150102" Unicode="" nodeId="100200150500000">_</Node>
<Node Cat="noun" morphId="100200150103" Unicode="חֵ֑ל" nodeId="100200150500020">X"92L</Node>
<Node Cat="cj" morphId="100200150111" Unicode="וְ" nodeId="100200150520010">W:</Node>
<Node Cat="noun" morphId="100200150112" Unicode="כָל־" nodeId="100200150530020">KFL-</Node>
<Node Cat="art" morphId="100200150121" Unicode="הָ" nodeId="100200150550010">HF</Node>
<Node Cat="noun" morphId="100200150122" Unicode="עָם֙" nodeId="100200150560020">(FM03</Node>
<Node Cat="rel" morphId="100200150131" Unicode="אֲשֶׁ֣ר" nodeId="100200150580030">):A$E74R</Node>
<Node Cat="prep" morphId="100200150141" Unicode="אֶת־" nodeId="100200150610020">)ET-</Node>
<Node Cat="noun" morphId="100200150151" Unicode="יוֹאָ֔ב" nodeId="100200150630040">YOW)F80B</Node>
<Node Cat="verb" morphId="100200150161" Unicode="מַשְׁחִיתִ֖ם" nodeId="100200150670060">MA$:XIYTI73M</Node>
<Node Cat="prep" morphId="100200150171" Unicode="לְ" nodeId="100200150730010">L:</Node>
<Node Cat="verb" morphId="100200150172" Unicode="הַפִּ֥יל" nodeId="100200150740040">HAP.I71YL</Node>
<Node Cat="art" morphId="100200150181" Unicode="הַ" nodeId="100200150780010">HA</Node>
<Node Cat="noun" morphId="100200150182" Unicode="׃חוֹמָֽה" nodeId="100200150790040">XOWMF75H00</Node>
<Node Cat="cj" morphId="030270160011" Unicode="וְ" nodeId="030270160010010">W:</Node>
<Node Cat="cj" morphId="030270160012" Unicode="אִ֣ם ׀" nodeId="030270160020020">)I74M05</Node>
<Node Cat="prep" morphId="030270160021" Unicode="מִ" nodeId="030270160040010">MI</Node>
<Node Cat="noun" morphId="030270160022" Unicode="שְּׂדֵ֣ה" nodeId="030270160050030">&.:D"74H</Node>
<Node Cat="noun" morphId="030270160031" Unicode="אֲחֻזָּת" nodeId="030270160080040">):AXUZ.FT</Node>
<Node Cat="pron" morphId="030270160032" Unicode="֗וֹ" nodeId="030270160120010">O81W</Node>
<Node Cat="verb" morphId="030270160041" Unicode="יַקְדִּ֥ישׁ" nodeId="030270160130050">YAQ:D.I71Y$</Node>
<Node Cat="noun" morphId="030270160051" Unicode="אִישׁ֙" nodeId="030270160180030">)IY$03</Node>
<Node Cat="prep" morphId="030270160061" Unicode="לַֽ" nodeId="030270160210010">LA75</Node>
<Node Cat="noun" morphId="030270160062" Unicode="יהוָ֔ה" nodeId="030270160220040">YHWF80H</Node>
<Node Cat="cj" morphId="030270160071" Unicode="וְ" nodeId="030270160260010">W:</Node>
<Node Cat="verb" morphId="030270160072" Unicode="הָיָ֥ה" nodeId="030270160270030">HFYF71H</Node>
<Node Cat="noun" morphId="030270160081" Unicode="עֶרְכּ" nodeId="030270160300030">(ER:K.</Node>
<Node Cat="pron" morphId="030270160082" Unicode="ְךָ֖" nodeId="030270160330010">:KF73</Node>
<Node Cat="prep" morphId="030270160091" Unicode="לְ" nodeId="030270160340010">L:</Node>
<Node Cat="noun" morphId="030270160092" Unicode="פִ֣י" nodeId="030270160350020">PI74Y</Node>
<Node Cat="noun" morphId="030270160101" Unicode="זַרְע" nodeId="030270160370030">ZAR:(</Node>
<Node Cat="pron" morphId="030270160102" Unicode="֑וֹ" nodeId="030270160400010">O92W</Node>
<Node Cat="noun" morphId="030270160111" Unicode="זֶ֚רַע" nodeId="030270160410030">10ZERA(</Node>
<Node Cat="noun" morphId="030270160121" Unicode="חֹ֣מֶר" nodeId="030270160440030">XO74MER</Node>
<Node Cat="noun" morphId="030270160131" Unicode="שְׂעֹרִ֔ים" nodeId="030270160470050">&:(ORI80YM</Node>
<Node Cat="prep" morphId="030270160141" Unicode="בַּ" nodeId="030270160520010">B.A</Node>
<Node Cat="num" morphId="030270160142" Unicode="חֲמִשִּׁ֖ים" nodeId="030270160530050">X:AMI$.I73YM</Node>
<Node Cat="noun" morphId="030270160151" Unicode="שֶׁ֥קֶל" nodeId="030270160580030">$E71QEL</Node>
<Node Cat="noun" morphId="030270160161" Unicode="׃כָּֽסֶף" nodeId="030270160610030">K.F75SEP00</Node>
<Node Cat="cj" morphId="160050070011" Unicode="וַ" nodeId="160050070010010">WA</Node>
<Node Cat="verb" morphId="160050070012" Unicode="יִּמָּלֵ֨ךְ" nodeId="160050070020040">Y.IM.FL"63K:</Node>
<Node Cat="noun" morphId="160050070021" Unicode="לִבּ" nodeId="160050070060020">LIB.</Node>
<Node Cat="pron" morphId="160050070022" Unicode="ִ֜י" nodeId="160050070080010">I61Y</Node>
<Node Cat="prep" morphId="160050070031" Unicode="עָל" nodeId="160050070090020">(FL</Node>
<Node Cat="pron" morphId="160050070032" Unicode="ַ֗י" nodeId="160050070110010">A81Y</Node>
<Node Cat="cj" morphId="160050070041" Unicode="וָ" nodeId="160050070120010">WF</Node>
<Node Cat="verb" morphId="160050070042" Unicode="אָרִ֙יב" nodeId="160050070130040">)FRI33YB</Node>
<Node Cat="x" morphId="160050070043" Unicode="ָה֙" nodeId="160050070170010">FH03</Node>
<Node Cat="prep" morphId="160050070051" Unicode="אֶת־" nodeId="160050070180020">)ET-</Node>
<Node Cat="art" morphId="160050070061" Unicode="הַ" nodeId="160050070200010">HA</Node>
<Node Cat="noun" morphId="160050070062" Unicode="חֹרִ֣ים" nodeId="160050070210040">XORI74YM</Node>
<Node Cat="cj" morphId="160050070071" Unicode="וְ" nodeId="160050070250010">W:</Node>
<Node Cat="prep" morphId="160050070072" Unicode="אֶת־" nodeId="160050070260020">)ET-</Node>
<Node Cat="art" morphId="160050070081" Unicode="הַ" nodeId="160050070280010">HA</Node>
<Node Cat="noun" morphId="160050070082" Unicode="סְּגָנִ֔ים" nodeId="160050070290050">S.:GFNI80YM</Node>
<Node Cat="cj" morphId="160050070091" Unicode="וָ" nodeId="160050070340010">WF</Node>
<Node Cat="verb" morphId="160050070092" Unicode="אֹמְר" nodeId="160050070350030">)OM:R</Node>
<Node Cat="x" morphId="160050070093" Unicode="ָ֣ה" nodeId="160050070380010">F74H</Node>
<Node Cat="prep" morphId="160050070101" Unicode="ל" nodeId="160050070390010">L</Node>
<Node Cat="pron" morphId="160050070102" Unicode="ָהֶ֔ם" nodeId="160050070400020">FHE80M</Node>
<Node Cat="noun" morphId="160050070111" Unicode="מַשָּׁ֥א" nodeId="160050070420030">MA$.F71)</Node>
<Node Cat="noun" morphId="160050070121" Unicode="אִישׁ־" nodeId="160050070450030">)IY$-</Node>
<Node Cat="prep" morphId="160050070131" Unicode="בְּ" nodeId="160050070480010">B.:</Node>
<Node Cat="noun" morphId="160050070132" Unicode="אָחִ֖י" nodeId="160050070490030">)FXI73Y</Node>
<Node Cat="pron" morphId="160050070133" Unicode="ו" nodeId="160050070520010">W</Node>
<Node Cat="pron" morphId="160050070141" Unicode="אַתֶּ֣ם" nodeId="160050070530030">)AT.E74M</Node>
<Node Cat="verb" morphId="160050070151" Unicode="נשאים" nodeId="160050070560050">*NO$:)IYM</Node>
<Node Cat="cj" morphId="160050070171" Unicode="וָ" nodeId="160050070650010">WF</Node>
<Node Cat="verb" morphId="160050070172" Unicode="אֶתֵּ֥ן" nodeId="160050070660030">)ET."71N</Node>
<Node Cat="prep" morphId="160050070181" Unicode="עֲלֵי" nodeId="160050070690030">(:AL"Y</Node>
<Node Cat="pron" morphId="160050070182" Unicode="הֶ֖ם" nodeId="160050070720020">HE73M</Node>
<Node Cat="noun" morphId="160050070191" Unicode="קְהִלָּ֥ה" nodeId="160050070740040">Q:HIL.F71H</Node>
<Node Cat="adj" morphId="160050070201" Unicode="׃גְדוֹלָֽה" nodeId="160050070780050">G:DOWLF75H00</Node>
<Node Cat="adv" morphId="220060120011" Unicode="לֹ֣א" nodeId="220060120010020">LO74)</Node>
<Node Cat="verb" morphId="220060120021" Unicode="יָדַ֔עְתִּי" nodeId="220060120030050">YFDA80(:T.IY</Node>
<Node Cat="noun" morphId="220060120031" Unicode="נַפְשׁ" nodeId="220060120080030">NAP:$</Node>
<Node Cat="pron" morphId="220060120032" Unicode="ִ֣י" nodeId="220060120110010">I74Y</Node>
<Node Cat="verb" morphId="220060120041" Unicode="שָׂמַ֔ת" nodeId="220060120120030">&FMA80T</Node>
<Node Cat="pron" morphId="220060120042" Unicode="ְנִי" nodeId="220060120150020">:NIY</Node>
<Node Cat="noun" morphId="220060120051" Unicode="מַרְכְּב֖וֹת" nodeId="220060120170060">MAR:K.:BO73WT</Node>
<Node Cat="cj" morphId="100240060011" Unicode="וַ" nodeId="100240060010010">WA</Node>
<Node Cat="verb" morphId="100240060012" Unicode="יָּבֹ֙אוּ֙" nodeId="100240060020040">Y.FBO33)W.03</Node>
<Node Cat="art" morphId="100240060021" Unicode="הַ" nodeId="100240060060010">HA</Node>
<Node Cat="noun" morphId="100240060022" Unicode="גִּלְעָ֔ד" nodeId="100240060070040">G.IL:(F80D</Node>
<Node Cat="x" morphId="100240060023" Unicode="ָה" nodeId="100240060110010">FH</Node>
<Node Cat="cj" morphId="100240060031" Unicode="וְ" nodeId="100240060120010">W:</Node>
<Node Cat="prep" morphId="100240060032" Unicode="אֶל־" nodeId="100240060130020">)EL-</Node>
<Node Cat="noun" morphId="100240060041" Unicode="אֶ֥רֶץ" nodeId="100240060150030">)E71REC</Node>
<Node Cat="cj" morphId="100240060061" Unicode="וַ" nodeId="100240060270010">WA</Node>
<Node Cat="verb" morphId="100240060062" Unicode="יָּבֹ֙אוּ֙" nodeId="100240060280040">Y.FBO33)W.03</Node>
<Node Cat="cj" morphId="100240060081" Unicode="וְ" nodeId="100240060380010">W:</Node>
<Node Cat="adv" morphId="100240060082" Unicode="סָבִ֖יב" nodeId="100240060390040">SFBI73YB</Node>
<Node Cat="prep" morphId="100240060091" Unicode="אֶל־" nodeId="100240060430020">)EL-</Node>
<Node Cat="noun" morphId="100240060101" Unicode="׃צִידֽוֹן" nodeId="100240060450050">CIYDO75WN00</Node>
<Node Cat="noun" morphId="190210020011" Unicode="יְֽהוָ֗ה" nodeId="190210020010040">Y:75HWF81H</Node>
<Node Cat="prep" morphId="190210020021" Unicode="בְּ" nodeId="190210020050010">B.:</Node>
<Node Cat="noun" morphId="190210020022" Unicode="עָזּ" nodeId="190210020060020">(FZ.</Node>
<Node Cat="pron" morphId="190210020023" Unicode="ְךָ֥" nodeId="190210020080010">:KF71</Node>
<Node Cat="verb" morphId="190210020031" Unicode="יִשְׂמַח־" nodeId="190210020090040">YI&:MAX-</Node>
<Node Cat="noun" morphId="190210020041" Unicode="מֶ֑לֶךְ" nodeId="190210020130030">ME92LEK:</Node>
<Node Cat="cj" morphId="190210020051" Unicode="וּ֝" nodeId="190210020160010">11W.</Node>
<Node Cat="prep" morphId="190210020052" Unicode="בִ" nodeId="190210020170010">BI</Node>
<Node Cat="noun" morphId="190210020053" Unicode="ישׁ֥וּעָת" nodeId="190210020180050">Y$71W.(FT</Node>
<Node Cat="pron" morphId="190210020054" Unicode="ְךָ֗" nodeId="190210020230010">:KF81</Node>
<Node Cat="pron" morphId="190210020061" Unicode="מַה־" nodeId="190210020240020">MAH-</Node>
<Node Cat="verb" morphId="190210020071" Unicode="יגיל" nodeId="190210020260040">*Y.FG"YL</Node>
<Node Cat="adv" morphId="190210020091" Unicode="׃מְאֹֽד" nodeId="190210020330030">M:)O75D00</Node>
<Node Cat="cj" morphId="010460010011" Unicode="וַ" nodeId="010460010010010">WA</Node>
<Node Cat="verb" morphId="010460010012" Unicode="יִּסַּ֤ע" nodeId="010460010020030">Y.IS.A70(</Node>
<Node Cat="noun" morphId="010460010021" Unicode="יִשְׂרָאֵל֙" nodeId="010460010050050">YI&:RF)"L03</Node>
<Node Cat="cj" morphId="010460010031" Unicode="וְ" nodeId="010460010100010">W:</Node>
<Node Cat="noun" morphId="010460010032" Unicode="כָל־" nodeId="010460010110020">KFL-</Node>
<Node Cat="rel" morphId="010460010041" Unicode="אֲשֶׁר־" nodeId="010460010130030">):A$ER-</Node>
<Node Cat="prep" morphId="010460010051" Unicode="ל" nodeId="010460010160010">L</Node>
<Node Cat="pron" morphId="010460010052" Unicode="֔וֹ" nodeId="010460010170010">O80W</Node>
<Node Cat="cj" morphId="010460010061" Unicode="וַ" nodeId="010460010180010">WA</Node>
<Node Cat="verb" morphId="010460010062" Unicode="יָּבֹ֖א" nodeId="010460010190030">Y.FBO73)</Node>
<Node Cat="cj" morphId="010460010081" Unicode="וַ" nodeId="010460010290010">WA</Node>
<Node Cat="verb" morphId="010460010082" Unicode="יִּזְבַּ֣ח" nodeId="010460010300040">Y.IZ:B.A74X</Node>
<Node Cat="noun" morphId="010460010091" Unicode="זְבָחִ֔ים" nodeId="010460010340050">Z:BFXI80YM</Node>
<Node Cat="prep" morphId="010460010101" Unicode="לֵ" nodeId="010460010390010">L"</Node>
<Node Cat="noun" morphId="010460010102" Unicode="אלֹהֵ֖י" nodeId="010460010400040">)LOH"73Y</Node>
<Node Cat="noun" morphId="010460010111" Unicode="אָבִ֥י" nodeId="010460010440030">)FBI71Y</Node>
<Node Cat="pron" morphId="010460010112" Unicode="ו" nodeId="010460010470010">W</Node>
<Node Cat="noun" morphId="010460010121" Unicode="׃יִצְחָֽק" nodeId="010460010480040">YIC:XF75Q00</Node>
<Node Cat="cj" morphId="270020390011" Unicode="וּ" nodeId="270020390010010">W.</Node>
<Node Cat="prep" morphId="270020390012" Unicode="בָתְר" nodeId="270020390020030">BFT:R</Node>
<Node Cat="pron" morphId="270020390013" Unicode="ָ֗ךְ" nodeId="270020390050010">F81K:</Node>
<Node Cat="verb" morphId="270020390021" Unicode="תְּק֛וּם" nodeId="270020390060040">T.:Q91W.M</Node>
<Node Cat="noun" morphId="270020390031" Unicode="מַלְכ֥וּ" nodeId="270020390100040">MAL:K71W.</Node>
<Node Cat="adj" morphId="270020390041" Unicode="אָחֳרִ֖י" nodeId="270020390140040">)FX:FRI73Y</Node>
<Node Cat="noun" morphId="270020390051" Unicode="אֲרַ֣עא" nodeId="270020390180040">):ARA74()</Node>
<Node Cat="prep" morphId="270020390061" Unicode="מִנּ" nodeId="270020390220020">MIN.</Node>
<Node Cat="pron" morphId="270020390062" Unicode="ָ֑ךְ" nodeId="270020390240010">F92K:</Node>
<Node Cat="cj" morphId="270020390071" Unicode="וּ" nodeId="270020390250010">W.</Node>
<Node Cat="noun" morphId="270020390072" Unicode="מַלְכ֨וּ" nodeId="270020390260040">MAL:K63W.</Node>
<Node Cat="num" morphId="270020390091" Unicode="תְלִיתָאָ֤ה" nodeId="270020390360060">**T:LIYTF)F70H</Node>
<Node Cat="adj" morphId="270020390101" Unicode="אָחֳרִי֙" nodeId="270020390420040">)FX:FRIY03</Node>
<Node Cat="rel" morphId="270020390111" Unicode="דִּ֣י" nodeId="270020390460020">D.I74Y</Node>
<Node Cat="noun" morphId="270020390121" Unicode="נְחָשׁ" nodeId="270020390480030">N:XF$</Node>
<Node Cat="art" morphId="270020390122" Unicode="ָ֔א" nodeId="270020390510010">F80)</Node>
<Node Cat="rel" morphId="270020390131" Unicode="דִּ֥י" nodeId="270020390520020">D.I71Y</Node>
<Node Cat="verb" morphId="270020390141" Unicode="תִשְׁלַ֖ט" nodeId="270020390540040">TI$:LA73+</Node>
<Node Cat="prep" morphId="270020390151" Unicode="בְּ" nodeId="270020390580010">B.:</Node>
<Node Cat="noun" morphId="270020390152" Unicode="כָל־" nodeId="270020390590020">KFL-</Node>
<Node Cat="noun" morphId="270020390161" Unicode="אַרְע" nodeId="270020390610030">)AR:(</Node>
<Node Cat="art" morphId="270020390162" Unicode="׃ָֽא" nodeId="270020390640010">F75)00</Node>
<Node Cat="verb" morphId="010280020011" Unicode="ק֥וּם" nodeId="010280020010030">Q71W.M</Node>
<Node Cat="verb" morphId="010280020021" Unicode="לֵךְ֙" nodeId="010280020040020">L"K:03</Node>
<Node Cat="noun" morphId="010280020041" Unicode="בֵּ֥ית" nodeId="010280020130030">B."71YT</Node>
<Node Cat="x" morphId="010280020042" Unicode="ָה" nodeId="010280020160010">FH</Node>
<Node Cat="noun" morphId="010280020051" Unicode="בְתוּאֵ֖ל" nodeId="010280020170050">B:TW.)"73L</Node>
<Node Cat="noun" morphId="010280020061" Unicode="אֲבִ֣י" nodeId="010280020220030">):ABI74Y</Node>
<Node Cat="noun" morphId="010280020071" Unicode="אִמּ" nodeId="010280020250020">)IM.</Node>
<Node Cat="pron" morphId="010280020072" Unicode="ֶ֑ךָ" nodeId="010280020270010">E92KF</Node>
<Node Cat="cj" morphId="010280020081" Unicode="וְ" nodeId="010280020280010">W:</Node>
<Node Cat="verb" morphId="010280020082" Unicode="קַח־" nodeId="010280020290020">QAX-</Node>
<Node Cat="prep" morphId="010280020091" Unicode="ל" nodeId="010280020310010">L</Node>
<Node Cat="pron" morphId="010280020092" Unicode="ְךָ֤" nodeId="010280020320010">:KF70</Node>
<Node Cat="prep" morphId="010280020101" Unicode="מִ" nodeId="010280020330010">MI</Node>
<Node Cat="adv" morphId="010280020102" Unicode="שָּׁם֙" nodeId="010280020340020">$.FM03</Node>
<Node Cat="noun" morphId="010280020111" Unicode="אִשָּׁ֔ה" nodeId="010280020360030">)I$.F80H</Node>
<Node Cat="prep" morphId="010280020121" Unicode="מִ" nodeId="010280020390010">MI</Node>
<Node Cat="noun" morphId="010280020122" Unicode="בְּנ֥וֹת" nodeId="010280020400040">B.:NO71WT</Node>
<Node Cat="noun" morphId="010280020131" Unicode="לָבָ֖ן" nodeId="010280020440030">LFBF73N</Node>
<Node Cat="noun" morphId="010280020141" Unicode="אֲחִ֥י" nodeId="010280020470030">):AXI71Y</Node>
<Node Cat="noun" morphId="010280020151" Unicode="אִמּ" nodeId="010280020500020">)IM.</Node>
<Node Cat="pron" morphId="010280020152" Unicode="׃ֶֽךָ" nodeId="010280020520010">E75KF00</Node>
<Node Cat="cj" morphId="010280050011" Unicode="וַ" nodeId="010280050010010">WA</Node>
<Node Cat="verb" morphId="010280050012" Unicode="יִּשְׁלַ֤ח" nodeId="010280050020040">Y.I$:LA70X</Node>
<Node Cat="noun" morphId="010280050021" Unicode="יִצְחָק֙" nodeId="010280050060040">YIC:XFQ03</Node>
<Node Cat="om" morphId="010280050031" Unicode="אֶֽת־" nodeId="010280050100020">)E75T-</Node>
<Node Cat="noun" morphId="010280050041" Unicode="יַעֲקֹ֔ב" nodeId="010280050120040">YA(:AQO80B</Node>
<Node Cat="cj" morphId="010280050051" Unicode="וַ" nodeId="010280050160010">WA</Node>
<Node Cat="verb" morphId="010280050052" Unicode="יֵּ֖לֶךְ" nodeId="010280050170030">Y."73LEK:</Node>
<Node Cat="prep" morphId="010280050071" Unicode="אֶל־" nodeId="010280050270020">)EL-</Node>
<Node Cat="noun" morphId="010280050081" Unicode="לָבָ֤ן" nodeId="010280050290030">LFBF70N</Node>
<Node Cat="noun" morphId="010280050091" Unicode="בֶּן־" nodeId="010280050320020">B.EN-</Node>
<Node Cat="noun" morphId="010280050101" Unicode="בְּתוּאֵל֙" nodeId="010280050340050">B.:TW.)"L03</Node>
<Node Cat="art" morphId="010280050111" Unicode="הָֽ" nodeId="010280050390010">HF75</Node>
<Node Cat="noun" morphId="010280050112" Unicode="אֲרַמִּ֔י" nodeId="010280050400040">):ARAM.I80Y</Node>
<Node Cat="noun" morphId="010280050121" Unicode="אֲחִ֣י" nodeId="010280050440030">):AXI74Y</Node>
<Node Cat="noun" morphId="010280050131" Unicode="רִבְקָ֔ה" nodeId="010280050470040">RIB:QF80H</Node>
<Node Cat="noun" morphId="010280050141" Unicode="אֵ֥ם" nodeId="010280050510020">)"71M</Node>
<Node Cat="noun" morphId="010280050151" Unicode="יַעֲקֹ֖ב" nodeId="010280050530040">YA(:AQO73B</Node>
<Node Cat="cj" morphId="010280050161" Unicode="וְ" nodeId="010280050570010">W:</Node>
<Node Cat="noun" morphId="010280050162" Unicode="׃עֵשָֽׂו" nodeId="010280050580030">("&F75W00</Node>
<Node Cat="cj" morphId="010280060011" Unicode="וַ" nodeId="010280060010010">WA</Node>
<Node Cat="verb" morphId="010280060012" Unicode="יַּ֣רְא" nodeId="010280060020030">Y.A74R:)</Node>
<Node Cat="noun" morphId="010280060021" Unicode="עֵשָׂ֗ו" nodeId="010280060050030">("&F81W</Node>
<Node Cat="cj" morphId="010280060031" Unicode="כִּֽי־" nodeId="010280060080020">K.I75Y-</Node>
<Node Cat="verb" morphId="010280060041" Unicode="בֵרַ֣ךְ" nodeId="010280060100030">B"RA74K:</Node>
<Node Cat="noun" morphId="010280060051" Unicode="יִצְחָק֮" nodeId="010280060130040">YIC:XFQ02</Node>
<Node Cat="om" morphId="010280060061" Unicode="אֶֽת־" nodeId="010280060170020">)E75T-</Node>
<Node Cat="noun" morphId="010280060071" Unicode="יַעֲקֹב֒" nodeId="010280060190040">YA(:AQOB01</Node>
<Node Cat="cj" morphId="010280060081" Unicode="וְ" nodeId="010280060230010">W:</Node>
<Node Cat="verb" morphId="010280060082" Unicode="שִׁלַּ֤ח" nodeId="010280060240030">$IL.A70X</Node>
<Node Cat="om" morphId="010280060091" Unicode="אֹת" nodeId="010280060270020">)OT</Node>
<Node Cat="pron" morphId="010280060092" Unicode="וֹ֙" nodeId="010280060290010">OW03</Node>
<Node Cat="prep" morphId="010280060111" Unicode="לָ" nodeId="010280060370010">LF</Node>
<Node Cat="verb" morphId="010280060112" Unicode="קַֽחַת־" nodeId="010280060380030">QA75XAT-</Node>
<Node Cat="prep" morphId="010280060121" Unicode="ל" nodeId="010280060410010">L</Node>
<Node Cat="pron" morphId="010280060122" Unicode="֥וֹ" nodeId="010280060420010">O71W</Node>
<Node Cat="prep" morphId="010280060131" Unicode="מִ" nodeId="010280060430010">MI</Node>
<Node Cat="adv" morphId="010280060132" Unicode="שָּׁ֖ם" nodeId="010280060440020">$.F73M</Node>
<Node Cat="noun" morphId="010280060141" Unicode="אִשָּׁ֑ה" nodeId="010280060460030">)I$.F92H</Node>
<Node Cat="prep" morphId="010280060151" Unicode="בְּ" nodeId="010280060490010">B.:</Node>
<Node Cat="verb" morphId="010280060152" Unicode="בָרֲכ" nodeId="010280060500030">BFR:AK</Node>
<Node Cat="pron" morphId="010280060153" Unicode="֣וֹ" nodeId="010280060530010">O74W</Node>
<Node Cat="om" morphId="010280060161" Unicode="אֹת" nodeId="010280060540020">)OT</Node>
<Node Cat="pron" morphId="010280060162" Unicode="֔וֹ" nodeId="010280060560010">O80W</Node>
<Node Cat="cj" morphId="010280060171" Unicode="וַ" nodeId="010280060570010">WA</Node>
<Node Cat="verb" morphId="010280060172" Unicode="יְצַ֤ו" nodeId="010280060580030">Y:CA70W</Node>
<Node Cat="prep" morphId="010280060181" Unicode="עָלָי" nodeId="010280060610030">(FLFY</Node>
<Node Cat="pron" morphId="010280060182" Unicode="ו֙" nodeId="010280060640010">W03</Node>
<Node Cat="prep" morphId="010280060191" Unicode="לֵ" nodeId="010280060650010">L"</Node>
<Node Cat="verb" morphId="010280060192" Unicode="אמֹ֔ר" nodeId="010280060660030">)MO80R</Node>
<Node Cat="adv" morphId="010280060201" Unicode="לֹֽא־" nodeId="010280060690020">LO75)-</Node>
<Node Cat="verb" morphId="010280060211" Unicode="תִקַּ֥ח" nodeId="010280060710030">TIQ.A71X</Node>
<Node Cat="noun" morphId="010280060221" Unicode="אִשָּׁ֖ה" nodeId="010280060740030">)I$.F73H</Node>
<Node Cat="prep" morphId="010280060231" Unicode="מִ" nodeId="010280060770010">MI</Node>
<Node Cat="noun" morphId="010280060232" Unicode="בְּנ֥וֹת" nodeId="010280060780040">B.:NO71WT</Node>
<Node Cat="noun" morphId="010280060241" Unicode="׃כְּנָֽעַן" nodeId="010280060820040">K.:NF75(AN00</Node>
<Node Cat="cj" morphId="010280070011" Unicode="וַ" nodeId="010280070010010">WA</Node>
<Node Cat="verb" morphId="010280070012" Unicode="יִּשְׁמַ֣ע" nodeId="010280070020040">Y.I$:MA74(</Node>
<Node Cat="noun" morphId="010280070021" Unicode="יַעֲקֹ֔ב" nodeId="010280070060040">YA(:AQO80B</Node>
<Node Cat="prep" morphId="010280070031" Unicode="אֶל־" nodeId="010280070100020">)EL-</Node>
<Node Cat="noun" morphId="010280070041" Unicode="אָבִ֖י" nodeId="010280070120030">)FBI73Y</Node>
<Node Cat="pron" morphId="010280070042" Unicode="ו" nodeId="010280070150010">W</Node>
<Node Cat="cj" morphId="010280070051" Unicode="וְ" nodeId="010280070160010">W:</Node>
<Node Cat="prep" morphId="010280070052" Unicode="אֶל־" nodeId="010280070170020">)EL-</Node>
<Node Cat="noun" morphId="010280070061" Unicode="אִמּ" nodeId="010280070190020">)IM.</Node>
<Node Cat="pron" morphId="010280070062" Unicode="֑וֹ" nodeId="010280070210010">O92W</Node>
<Node Cat="cj" morphId="010280070071" Unicode="וַ" nodeId="010280070220010">WA</Node>
<Node Cat="verb" morphId="010280070072" Unicode="יֵּ֖לֶךְ" nodeId="010280070230030">Y."73LEK:</Node>
<Node Cat="cj" morphId="060190130011" Unicode="וּ" nodeId="060190130010010">W.</Node>
<Node Cat="prep" morphId="060190130012" Unicode="מִ" nodeId="060190130020010">MI</Node>
<Node Cat="adv" morphId="060190130013" Unicode="שָּׁ֤ם" nodeId="060190130030020">$.F70M</Node>
<Node Cat="verb" morphId="060190130021" Unicode="עָבַר֙" nodeId="060190130050030">(FBAR03</Node>
<Node Cat="adv" morphId="060190130031" Unicode="קֵ֣דְמ" nodeId="060190130080030">Q"74D:M</Node>
<Node Cat="x" morphId="060190130032" Unicode="ָה" nodeId="060190130110010">FH</Node>
<Node Cat="noun" morphId="060190130041" Unicode="מִזְרָ֔ח" nodeId="060190130120040">MIZ:RF80X</Node>
<Node Cat="x" morphId="060190130042" Unicode="ָה" nodeId="060190130160010">FH</Node>
<Node Cat="cj" morphId="060190130071" Unicode="וְ" nodeId="060190130300010">W:</Node>
<Node Cat="verb" morphId="060190130072" Unicode="יָצָ֛א" nodeId="060190130310030">YFCF91)</Node>
<Node Cat="noun" morphId="060190130081" Unicode="רִמּ֥וֹן" nodeId="060190130340040">RIM.O71WN</Node>
<Node Cat="art" morphId="060190130091" Unicode="הַ" nodeId="060190130380010">HA</Node>
<Node Cat="verb" morphId="060190130092" Unicode="מְּתֹאָ֖ר" nodeId="060190130390040">M.:TO)F73R</Node>
<Node Cat="art" morphId="060190130101" Unicode="הַ" nodeId="060190130430010">HA</Node>
<Node Cat="noun" morphId="060190130102" Unicode="׃נֵּעָֽה" nodeId="060190130440030">N."(F75H00</Node>
We could either consistently use <pc>
nodes, or we can consistently use @after
. The latter option seems preferable, since punctuation characters are not part of the syntactic tree, but either way should be fine.
Word-level identifiers should use !
syntax for now, pending any updates on this issue.
Numeric attributes in non-leaf nodes need to be recomputed now that we have rewritten the trees.
We need ids for elements in Hebrew. They need to be unique, i.e. not the same as an id for a morph.
Sometimes we need annotations to refer to phrases or clauses. How can we create stable identifiers for structures larger than individual words?
I created a Jupyter notebook (here: 4d1ff81) to test the integrity of our word-level text content by comparing the nodes trees to the lowfat trees because I was running into the fact that there are different numbers of @xml:id
s between the two trees.
The current issue seems to pertain to particles only, for example:
<Node Cat="P" Rule="ptcl2P" Head="0" nodeId="0103802100610011">
<Node n="o010380210061"
Cat="ptcl"
morphId="010380210061"
Unicode="אַיֵּ֧ה"
nodeId="0103802100610010"
StrongNumberX="0346"
Greek="ποῦ"
GreekStrong="4226">
<m word="GEN 38:21!6"
xml:id="o010380210061"
lang="H"
after=" "
lemma="346"
morph="Ti"
pos="particle"
type="interrogative"
english="where"
mandarin="哪里"
Domain="003002004"
SDBH="000321001001000">אַיֵּ֧ה</m>
</Node>
</Node>
We need to document semantic domains for our data. They appear in the @domain
, @extends
, and @sdbh
attributes:
<w ref="GEN 1:1!1"
xml:id="o010010010012"
mandarin="起初"
english="beginning"
domain="002003003004"
sdbh="006652001001000"
greek="ἀρξῇ"
strongnumberx="7225"
class="noun"
unicode="רֵאשִׁ֖ית"
morph="Ncfsa"
lang="H"
lemma="7225"
pos="noun"
gender="feminine"
number="singular"
state="absolute"
after=" ">רֵאשִׁ֖ית</w>
@domain
s are documented in this file:
@sdbh
contains the identifier for the lexicon entry itself in the Semantic Dictionary of Biblical Hebrew.
In writing this, I realize that I don't understand @extends
as well as I thought. It is used, e.g. when the word for Spirit is based on a metaphor involving breath:
<w ref="GEN 1:2!9"
xml:id="o010010020092"
mandarin="灵"
english="Spirit"
domain="001001001"
extends="001006001"
sdbh="006717001001000 006717001005000"
greek="πνεῦμα"
strongnumberx="7307"
class="noun"
unicode="ר֣וּחַ"
morph="Ncbsc"
lang="H"
lemma="7307"
pos="noun"
gender="both"
number="singular"
state="construct"
after=" ">ר֣וּחַ</w>
I had thought it was used precisely when a domain was derived metaphorically from another, e.g. "Spirit" as an extension of the concept of "breath". I am not sure this is an adequate explanation - I'll need to ask Reinier for more information on this.
See Clear-Bible/macula-greek#21 for the same issue for Macula-Greek.
In Macula-Greek, we created two attributes:
domain="SubDomainNumber:DomainName:SubDomainName"
sdbg="Lemma:EntryCode:Glosses
"For the OT, we can't do the same:
EntryCode
or any other unique identifier that points to a lexiconThe domains are extracted from SDBH-DOMAINS1.XML
and SDBH-DOMAINS2.XML
There are two sources that contain the entire lexicon:
SDBH-01.XML
through SDBH-23.XML
, which contain up-to-date full entries (multiple languages).SDBH-EXPORT-en.XML
which contains all English data from the full lexicon.However, the first (most updated) source, have very minimal data on the domain. It only contains a category and a label for the lex domain and the core domain:
<LEXDomains>
<LEXDomain>Parts: Vegetation</LEXDomain>
</LEXDomains>
<LEXCoreDomains>
<LEXCoreDomain>Plant</LEXCoreDomain>
</LEXCoreDomains>
SDBH-DOMAINS.XML
Conclusion: there is no way to find out to which specific domain they actually belong
The second source contains more information about the domain:
<LEXDomains>
<LEXDomain>001003:Vegetation</LEXDomain>
</LEXDomains>
<LEXCoreDomains>
<LEXCoreDomain>117:Plant</LEXCoreDomain>
</LEXCoreDomains>
There are a few more issues with the domain data:
001
, 002
, 003
, and 004
occur twice in the domains file, so they are not unique. The second occurrence is very general and does not have subdomains. However, this should not be a problem:001:Objects
['Valuable > Objects>001:Objects',
'Valuable > Objects>001:Objects',
'Speed > Move>002002001009:Move',
'Shake > Non-Exist>002002002006:Non-Exist',
'Shake > Move>002002001009:Move',
'Shake > Afraid>002001002001:Afraid',
'Shake > Afraid>002001002001:Afraid',
'Valuable > Objects>001:Objects',
'Shake > Afraid>002001002001:Afraid',
'Shake > Afraid>002001002001:Afraid',
'Shake > Afraid>002001002001:Afraid',
'Shake > Afraid>002001002001:Afraid',
'Intensity > Respect>002001002025:Respect',
'Shake > Afraid>002001002001:Afraid',
'Shake > Shame>002001002026:Shame',
'Shake > Musical Instruments>001004001013:Musical Instruments',
'Mix > Engage>002003003009:Engage',
'Valuable > Objects>001:Objects',
'Shake > Afraid>002001002001:Afraid',
'Shake > Move>002002001009:Move',
'Shake > Relief>002001002024:Relief',
'Shake > Shame>002001002026:Shame',
'Valuable > Objects>001:Objects',
'Valuable > Objects>001:Objects',
'Shake > Ready>002001002022:Ready',
'Shake > Distress>002001002007:Distress',
'Body > People>001001002003:People']
SDBH-EXPORT-en.XML
Valuable > Objects>001:Objects
=> 001:Objects
<m domain="DomainID:DomainLabel:SubDomainLabel:..:SubDomainLabel:CoreDomainId:CoreDomainLabel" />
<m domain="001003:Objects:Vegetation:117:Plant"/m>
The Nodes representation contains attributes that aren't in Lowfat:
Perhaps others.
<m word="GEN 1:1!1" xml:id="o010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" " pos="noun" type="common" gender="feminine" number="singular" state="absolute" english="beginning" mandarin="起初" SDBH="רֵאשִׁית:002003003004:Begin:beginning">רֵאשִׁית</m>
We have relied on the OSHB text without systematically comparing it to the text found at tanach.us.
We should do a comparison, categorize any differences we find, and discuss them. This should not be done as a one-off, we need tooling to do this, and we need tooling for both:
This may include differences in choice of Unicode characters for a particular character, differences in cantillation, etc.
So far, we have relied on consonant-only comparisons, which are good enough to identify corresponding morphemes. We may not know exactly what to do with the things we learn by looking at every character until we see what the differences are and think about them systematically.
LexDomain
, ContextualDomain
, CoreDomain
) from node treesgloss
and transliteration
) from node treesThis has a high priority. @ryderwishart, could you do this?
Originally posted by @klosoter in https://github.com/Clear-Bible/symphony-team/discussions/75#discussioncomment-3231622
This doesn't seem to be in the tree:
010040220061 תּ֣וּבַל קַ֔יִן Tubal Tubal-cain 土八该隐
We currently have @ID
on each Sentence
. It would make a lot of sense to simply hard-code the OSIS ref for each verse directly into our data since we use OSIS refs elsewhere.
Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"
LXX mapping:
GreekStrong="1722" Greek="e)poi/hsen"
Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"
Word Sense data:
SenseNumber="2" (Sense Number="0" means that it is a function word that we didn't do word sense disambiguation on)
Glosses:
English and Chinese glosses in the full trees cannot be used--Mike has Andi's automatically calculated glosses for English and Chinese mapped for YTB & ClearSuite that we should be able to use
Object Complements:
There are actually two types of nodes that are currently labeled as O2 (second object). Some have the attribute Label="OC" in the original trees, meaning that it is object complement rather than strictly a second object. This Label="OC" data needs to be ported over & used to convert the relevant O2's into OCs.
Vocatives:
Attribute Vocative="True" (ignore all Vocative="False")
For comparison & checking purposes at some point later in the process:
Compare Strong Number and Strong NumberX work Clear did to OSHB's values (in vast majority of cases they would be identical & so there's no need to add these usually redundant values, but a comparison may show some places to double-check OSHB).
We have strongs numbers as @lemma
, and we should be pulling in the actual lemmas.
Jonathan mentioned something about using strongs data from our OSHB data.
@jonathanrobie mentioned a while ago that we might also want to add phonetic spelling.
Does anyone know some good resources?
This is more of an epic than an individual issue.
When merging existing syntax trees with morphology, we have probably created instances where the morphological interpretation does not agree with the syntactic interpretation. We need to do a survey of instances where the OSHB morphological interpretation disagrees with the interpretation used to build the original Groves trees by comparing the two morphologies. Most of these differences will probably fall into categories that we can enumerate. This will include both differences in interpretation and clear bugs.
This task involves:
Please don't create working directories on the public github. The new-nodes directory should be a branch, not a subdirectory on the main branch.
For example, see parent Node
of @n=020070110041ה
.
Some attributes like SDBH the Cherith glosses are on the c
elements instead of the m
. They thus get skipped when building the lowfat.
<Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010" StrongNumberX="8423">
<c english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
<m word="GEN 4:22!6" xml:id="o010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
<m word="GEN 4:22!7" xml:id="o010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
</c>
</Node>
<wg class="compound">
<w ref="GEN 4:22!6" role="noun" xml:id="o010040220061" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423+" pos="noun" after=" ">תּ֣וּבַל</w>
<w ref="GEN 4:22!7" role="noun" xml:id="o010040220071" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423" pos="noun" after=" ">קַ֔יִן</w>
</wg>
I think if we want to keep them, it makes sense to put them on wg/w
like this:
<wg class="compound" english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
<w ref="GEN 4:22!6" role="noun" xml:id="o010040220061" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423+" pos="noun" after=" ">תּ֣וּבַל</w>
<w ref="GEN 4:22!7" role="noun" xml:id="o010040220071" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423" pos="noun" after=" ">קַ֔יִן</w>
</wg>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.