clear-bible / macula-hebrew Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 9.0 1.86 GB

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible

License: Other

CSS 4.89% XQuery 21.08% XSLT 20.96% Jupyter Notebook 19.81% Python 33.26%

macula-hebrew's People

Contributors

Stargazers

Watchers

Forkers

geekyinsights jcuenod freely-given-org renenyffenegger jacky998877 dotmonkey chrisdrymon emg

macula-hebrew's Issues

“Lemma” attribute in OSHB is really a pointer into an index that Open Scriptures created & need to pull in the actual lemma data

Ulrik's original proposal (which you can modify as needed) makes this need clear:

The OSHB trees are based on the Open Scriptures morphhb Hebrew Bible data. This data contains a “Lemma” attribute that is really a pointer into an index that Open Scriptures created. Each index entry ties together:

An index into their version of the Brown Driver Briggs lexicon (BDB)
The BDB lemma, with transliteration
A gloss (…)
A part of speech
The Strong’s number
A sub-division (aug) for the Strong’s number.
TWOT number
Information about the etymology as described by BDB

We should have the following:

The id of the LexicalIndex entry. Call it LexicalIndexID
The Hebrew BDB lemma (inside …). Call it BDBLemma
The transliterated Hebrew BDB lemma () Call it BDBLemmaXLit
BDB entry index () Call it BDBId
The gloss. Call it OpenScripturesGloss

In addition, what is now the “Lemma” attribute should be renamed into “AugmentedStrongs”.

Create mapping for Groves / MARBLE / MACULA

Create a mapping between Groves, MARBLE, and MACULA at:

The word level, and
The morpheme level

Recreating annotations.xml

I've created a new annotations.xml file from the full trees, this time. When I tried to change the morphIds for issues #13 and #14, it appeared that our mapping (between old trees and macula) does not suffice for the annotations file: the morphIds in the annotation file do not align with those in the Full trees, and as is not clear where they come from (a different version of the full trees?) or the original file might be lost, it is probably best to recreate it from a file that won't likely be changed or deleted (the full trees in trees-oshb.

So what I've done is:

created empty 'node' elements from our new trees (macula)
added the macula morphId
added macula text, full tree morphId, and full tree text (Unicode) for clarity/comparison reasons
used the mapping between full trees and our macula trees to add the desired attributes from the full trees
applied the usual conversions: greek to unicode, updating Frame, Ref and SubjRef ids, but see below.

However, there are at least 2 issues:

Corresponding to 5), I'm not sure what to convert the old (full tree) morphIds into. OSHB ns or the new morphId (macula)? I've left both options open for now.
There are cases where macula nodes do not correspond one to one to full trees' nodes. The following patterns occur:
1 macula: 2+ full tree
2 macula: 2 full tree
2 macula: 1 full tree
Not all macula nodes (mostly implied articles) correspond to nodes from the full tree, so they won't have any 'annotations'.

We should probably review these cases manually. These cases can be retrieved easily, because I added a @duplicate attribute to the node element. These are all such cases:

<node morphId="010380240022" macula-text="מִ" duplicate="2macula-1full" OLDmorphId="010380240022" StrongNumberX="7969a" Unicode="מִשְׁלֹ֣שׁ"/>
<node morphId="010380240023" macula-text="ִשְׁלֹ֣שׁ" duplicate="2macula-1full" OLDmorphId="010380240022" StrongNumberX="7969a" Unicode="מִשְׁלֹ֣שׁ"/>
<node morphId="010430200021" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="010430200021|010430200022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{010430180022}" Greek="δεόμεθα" GreekStrong="1189"/>
<node morphId="010440180051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="010440180051|010440180052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{010440180031}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="020040100051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="020040100051|020040100052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{020040100021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="020040130021" macula-text="בִּ֣יּ" duplicate="1macula-2full" OLDmorphId="020040130021|020040130022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{020040100021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="040120110051" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="040120110051|040120110052" StrongNumberX="0994" Unicode="בּ|ִ֣י" Ref="{040120110021}" Greek="δέομαι" GreekStrong="1189"/>
<node morphId="060070080011" macula-text="בִּ֖י" duplicate="1macula-2full" OLDmorphId="060070080011|060070080012" StrongNumberX="0994" Unicode="בּ|ִ֖י" Ref="{060070070021}"/>
<node morphId="070060130041" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070060130041|070060130042" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070060130031}"/>
<node morphId="070060150031" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070060150031|070060150032" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070060130031}"/>
<node morphId="070090410032" macula-text="ארוּמָ֑ה" duplicate="1macula-2full" OLDmorphId="070090410032|070090410033" StrongNumberX="1886a|0725" Unicode="ארוּמָ֑ה" Greek="αρημα"/>
<node morphId="070130080061" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="070130080061|070130080062" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{070130080021}"/>
<node morphId="070190130041" macula-text="וְ" duplicate="1macula-2full" OLDmorphId="070190130032|070190130041" StrongNumberX="1886j|2050b" Unicode="ָ֥|וְ" Greek="καὶ" GreekStrong="2532"/>
<node morphId="090010260021" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="090010260021|090010260022" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{090010230182}"/>
<node morphId="110030170041" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="110030170041|110030170042" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Vocative="True" Ref="{110030170022}"/>
<node morphId="110030260141" macula-text="בִּ֣י" duplicate="1macula-2full" OLDmorphId="110030260141|110030260142" StrongNumberX="0994" Unicode="בּ|ִ֣י" Greek="ἐν|ἐμοί" GreekStrong="1722|1698" Ref="{110030260022}"/>
<node morphId="130020520061" macula-text="הָרֹאֶ֖ה" duplicate="1macula-2full" OLDmorphId="130020520061|130020520062" StrongNumberX="1886a|7204a" Unicode="הָ|רֹאֶ֖ה" SenseNumber="1" Greek="αραα"/>
<node morphId="130050090041" macula-text="לְ" duplicate="2macula-1full" OLDmorphId="130050090041" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="ἐρξομένων" GreekStrong="2064"/>
<node morphId="130050090042" macula-text="ב֣וֹא" duplicate="2macula-1full" OLDmorphId="130050090041" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="ἐρξομένων" GreekStrong="2064"/>
<node morphId="130200040021" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="130200040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֔ן"/>
<node morphId="130200040022" macula-text="כֵ֔ן" duplicate="2macula-1full" OLDmorphId="130200040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֔ן"/>
<node morphId="140200010021" macula-text="אַֽחֲרֵי" duplicate="2macula-1full" OLDmorphId="140200010021" StrongNumberX="0310" Unicode="אַֽחֲרֵיכֵ֡ן"/>
<node morphId="140200010022" macula-text="כֵ֡ן" duplicate="2macula-1full" OLDmorphId="140200010021" StrongNumberX="0310" Unicode="אַֽחֲרֵיכֵ֡ן"/>
<node morphId="140200350012" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="140200350012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֗ן"/>
<node morphId="140200350013" macula-text="כֵ֗ן" duplicate="2macula-1full" OLDmorphId="140200350012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֗ן"/>
<node morphId="140240040021" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="140240040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֑ן"/>
<node morphId="140240040022" macula-text="כֵ֑ן" duplicate="2macula-1full" OLDmorphId="140240040021" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֑ן"/>
<node morphId="140260080081" macula-text="לְ" duplicate="2macula-1full" OLDmorphId="140260080081" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="εἰσόδου" GreekStrong="1529"/>
<node morphId="140260080082" macula-text="ב֣וֹא" duplicate="2macula-1full" OLDmorphId="140260080081" StrongNumberX="3820b" Unicode="לְב֣וֹא" Greek="εἰσόδου" GreekStrong="1529"/>
<node morphId="150030050012" macula-text="אַחֲרֵי" duplicate="2macula-1full" OLDmorphId="150030050012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֞ן"/>
<node morphId="150030050013" macula-text="כֵ֞ן" duplicate="2macula-1full" OLDmorphId="150030050012" StrongNumberX="0310" Unicode="אַחֲרֵיכֵ֞ן"/>
<node morphId="150040090091" macula-text="דִּ֠ינָי" duplicate="2macula-1full" OLDmorphId="150040090091" StrongNumberX="1784" Unicode="דִּ֠ינָיֵא" Greek="διναῖοι"/>
<node morphId="150040090092" macula-text="ֵא" duplicate="2macula-1full" OLDmorphId="150040090091" StrongNumberX="1784" Unicode="דִּ֠ינָיֵא" Greek="διναῖοι"/>
<node morphId="220080060181" macula-text="שַׁלְהֶ֥בֶתְ" duplicate="2macula-1full" OLDmorphId="220080060181" StrongNumberX="7957a" SenseNumber="1" Unicode="שַׁלְהֶ֥בֶתְיָֽה"/>
<node morphId="220080060182" macula-text="יָֽה" duplicate="2macula-1full" OLDmorphId="220080060181" StrongNumberX="7957a" SenseNumber="1" Unicode="שַׁלְהֶ֥בֶתְיָֽה"/>
<node morphId="380020130061" macula-text="עֲלֵי" duplicate="2macula-2full" OLDmorphId="380020130061|380020130062" StrongNumberX="5921|3963a" Unicode="עֲל|ֵיהֶ֔ם" Greek="ἐπ’|αὐτούς" GreekStrong="1909|848" Ref="{380020120102}"/>
<node morphId="380020130062" macula-text="הֶ֔ם" duplicate="2macula-2full" OLDmorphId="380020130061|380020130062" StrongNumberX="5921|3963a" Unicode="עֲל|ֵיהֶ֔ם" Greek="ἐπ’|αὐτούς" GreekStrong="1909|848" Ref="{380020120102}"/>

Numbering of annotations.xml

The morphId column in annotations.xml uses Groves Center numbering. It needs to be updated.

We need to decide whether to apply these annotations at the leaf Node level or at lower levels when we merge into the tree. @rkjtan , any thoughts?

I think this should be orthogonal to prepare-oshb, it should be a separate merge.

Convert Greek beta encoding to Unicode

In annotations.xml, attributes that contain Septuagint Greek are in Beta encoding. They should be converted to Unicode.

Extracting and tweaking data from Marble-Lexicon

I've extracted all English lexicon entries from marble-lexicon\SDBH\SDBH-EXPORT-en.XML and grouped it by the ids they are associated with. This can be found (sorted) in \trees-oshb\py\create-annotations-and-glosses\annotations-and-glosses\marble-lexicon-entries.xml.

There are a few issues with the extracted data:
1 There is often more than one lexicon entry associated with single morphemes.
2 These entries often don't refer exactly to the morpheme they are connected to (by id).
3 There often is some unclear data in the entries, like "NO DATA YET" or several references (?) like "({S:0010010010})"

A simple algorithm strips, tweaks, and merges multiple entries/meanings into one entry, which is the one that has been added to the trees. This is also included in the extracted and sorted lexicon entries file.

Consider the following examples:

<morph marble-id="00503300200046" marble-text="אשׁ">
  <entry lemma="אֵשׁ" gloss="fire" definition="= state of burning, in which substances combine chemically with oxygen from the air and give out bright light, heat, and smoke; ► used for cooking, melting, cleansing, heating, and destroying; ≈ many aspects of life are compared with fire, such as anger, jealousy, aggression, wickedness, words, life, certain sicknesses, suffering, etc."/>
  <entry lemma="אֲשֵׁדָה" gloss="slope" definition="= side of a mountain or hill"/>
  <entry lemma="אֲשֵׁדָה" gloss="NO DATA YET" definition="NO DATA YET"/>
  <entry lemma="אֵשְׁדָּת" gloss="fiery law" definition="read {L:אֵשׁ&lt;SDBH:אֵשׁ&gt;} {L:דָּת&lt;SDBH:דָּת&gt;} with {A:MT-Q}"/>
  <entry lemma="דאה" gloss="to swoop down" definition="= action by which a bird moves swiftly downwards ► in order to capture its prey; ≈ often used metaphorically to refer to an army attacking the enemy"/>
  <entry lemma="דאה" gloss="NO DATA YET" definition="NO DATA YET"/>
  <merged-entry lemma="אֲשֵׁדָה|אֵשְׁדָּת|דאה" gloss="fire|slope|fiery law|to swoop down" definition="= state of burning, in which substances combine chemically with oxygen from the air and give out bright light, heat, and smoke; ► used for cooking, melting, cleansing, heating, and destroying; ≈ many aspects of life are compared with fire, such as anger, jealousy, aggression, wickedness, words, life, certain sicknesses, suffering, etc.|= side of a mountain or hill|read אֵשׁ (אֵשׁ) דָּת (דָּת) with |= action by which a bird moves swiftly downwards ► in order to capture its prey; ≈ often used metaphorically to refer to an army attacking the enemy"/>
</morph>
<morph marble-id="01002400600024" marble-text="יַּ֔עַן">
  <entry lemma="דָּן" gloss="Dan" definition="= man, tribe, and territory; ◄ fifth son of {L:Jacob&lt;SDBH:יַעֲקֹב&gt;} and first son of {L:Bilhah&lt;SDBH:בִּלְהָה&gt;}, slave of {L:Rachel&lt;SDBH:רָחֵל&gt;}; ► founder of a tribe"/>
  <entry lemma="דָּן" gloss="NO DATA YET" definition="NO DATA YET"/>
  <entry lemma="יַעַן" gloss="Dan Jaan" definition="read {L:דָּנָה יַעַן&lt;SDBH:דָּן יַעַן&gt;}"/>
  <entry lemma="עִיֹּון" gloss="Ijon" definition="= town; ◄ territory of {L:Naphtali&lt;SDBH:נַפְתָּלִי&gt;}; ► part of northern kingdom of {L:Israel&lt;SDBH:יִשְׂרָאֵל&gt;}; conquered by {L:Ben-Hadad&lt;SDBH:בֶּן־הֲדַד&gt;} of Syria during reign of king {L:Baasha&lt;SDBH:בַּעְשָׁא&gt;}; conquered by {L:Tiglath-Pileser&lt;SDBH:תִּגְלַת פִּלְאֶסֶר&gt;} of Assyria during reign of {L:Pekah&lt;SDBH:פֶּקַח&gt;}"/>
  <entry lemma="עִיֹּון" gloss="NO DATA YET" definition="NO DATA YET"/>
  <merged-entry lemma="דָּן|יַעַן|עִיֹּון" gloss="Dan Jaan|Ijon" definition="= man, tribe, and territory; ◄ fifth son of Jacob (יַעֲקֹב) and first son of Bilhah (בִּלְהָה) slave of Rachel (רָחֵל) ► founder of a tribe|read דָּנָה יַעַן (דָּן יַעַן)|= town; ◄ territory of Naphtali (נַפְתָּלִי) ► part of northern kingdom of Israel (יִשְׂרָאֵל) conquered by Ben Hadad (בֶּן־הֲדַד) of Syria during reign of king Baasha (בַּעְשָׁא) conquered by Tiglath Pileser (תִּגְלַת פִּלְאֶסֶר) of Assyria during reign of Pekah (פֶּקַח)"/>
</morph>
<morph marble-id="01802301200014" marble-text="חֻקִּ֗י">
  <entry lemma="חֹק" gloss="law|decree" definition="= a pattern of behavior required by somebody in authority"/>
  <entry lemma="חֹק" gloss="law|decree" definition="= a pattern of behavior required by somebody in authority"/>
  <entry lemma="חֹק" gloss="daily allotment" definition="= a certain quantity of food that an individual needs every day in order to survive"/>
  <entry lemma="חֹק" gloss="NO DATA YET" definition="NO DATA YET"/>
  <merged-entry lemma="חֹק" gloss="law|decree|daily allotment" definition="= a pattern of behavior required by somebody in authority|= a certain quantity of food that an individual needs every day in order to survive"/>
</morph>
<morph marble-id="01802401200026" marble-text="תִּפְלָֽה">
  <entry lemma="תְּפִלָּה" gloss="prayer" definition="= action by which humans speak to a deity, often by raising their hands, ► requesting help or expressing their thankfulness"/>
  <entry lemma="תְּפִלָּה" gloss="NO DATA YET" definition="NO DATA YET"/>
  <entry lemma="תִּפְלָה" gloss="irrationality|senselessness" definition="= state when a certain activity does not appear to be in accordance with good sense"/>
  <entry lemma="תִּפְלָה" gloss="irrationality|senselessness" definition="= state when a certain activity does not appear to be in accordance with good sense"/>
  <merged-entry lemma="תְּפִלָּה|תִּפְלָה" gloss="prayer|irrationality|senselessness" definition="= action by which humans speak to a deity, often by raising their hands, ► requesting help or expressing their thankfulness|= state when a certain activity does not appear to be in accordance with good sense"/>
</morph>
<morph marble-id="01900800600010" marble-text="אֱלֹהִ֑ים">
  <entry lemma="אֱלֹהִים" gloss="heavenly beings" definition="plural with plural meaning: = generic term for a supernatural being, worshiped by individuals or entire nations"/>
  <entry lemma="אֱלֹהִים" gloss="god (of someone)|God (of someone)" definition="plural with singular meaning: = generic term for a supernatural being, worshiped by individuals or entire nations"/>
  <entry lemma="אֱלֹהִים" gloss="God" definition="plural with singular meaning: = the highest God, creator of heaven and earth"/>
  <entry lemma="אֱלֹהִים" gloss="God" definition="plural with singular meaning: = the highest God, creator of heaven and earth"/>
  <merged-entry lemma="אֱלֹהִים" gloss="heavenly beings|god (of someone)|God (of someone)" definition="plural with plural meaning: = generic term for a supernatural being, worshiped by individuals or entire nations|plural with singular meaning: = generic term for a supernatural being, worshiped by individuals or entire nations|plural with singular meaning: = the highest God, creator of heaven and earth"/>
</morph>
<morph marble-id="02303002200028" marble-text="דָוָ֔ה">
  <entry lemma="דָּוֶה" gloss="menstruous woman" definition="= female person discharging blood and other material from the lining of the uterus at intervals of about one lunar month; ≈ regarded as ritually unclean"/>
  <entry lemma="דָּוֶה" gloss="menstrual discharge" definition="= blood and other material from the lining of the uterus discharged from the body in the menstrual period; ≈ regarded as unclean"/>
  <entry lemma="דָּוֶה" gloss="filthy things = objects stained by menstruation" definition="= an unspecified object ► brought in contact to a woman undergoing menstruation; ≈ regarded as unclean"/>
  <entry lemma="דָּוֶה" gloss="menstruous cloth" definition="= piece of cloth ► worn or touched by a woman undergoing menstruation; ≈ regarded as unclean"/>
  <merged-entry lemma="דָּוֶה" gloss="menstruous woman|menstrual discharge|filthy things = objects stained by menstruation|menstruous cloth" definition="= female person discharging blood and other material from the lining of the uterus at intervals of about one lunar month; ≈ regarded as ritually unclean|= blood and other material from the lining of the uterus discharged from the body in the menstrual period; ≈ regarded as unclean|= an unspecified object ► brought in contact to a woman undergoing menstruation; ≈ regarded as unclean|= piece of cloth ► worn or touched by a woman undergoing menstruation; ≈ regarded as unclean"/>
</morph>
<morph marble-id="02800600700008" marble-text="אָדָ֖ם">
  <entry lemma="אָדָם" gloss="human|humankind|human being(s)" definition="= human being as an individual or as a class of living creatures; sometimes explicitly subdivided between {L:male&lt;SDBH:זָכָר&gt;} and {L:female&lt;SDBH:נְקֵבָה&gt;}; ≈ associated with mortality"/>
  <entry lemma="אָדָם" gloss="Adam" definition="= first man; ◄ created by God; ► husband of {L:Eve&lt;SDBH:חַוָּה&gt;}, father of {L:Cain&lt;SDBH:קַיִן&gt;}, {L:Abel&lt;SDBH:הֶבֶל&gt;}, and {L:Seth&lt;SDBH:שֵׁת&gt;}"/>
  <entry lemma="אָדָם" gloss="Adam" definition="= town; ◄ located near the river {L:Jordan&lt;SDBH:יַרְדֵּן&gt;} near {L:Zarethan&lt;SDBH:צָרְתָן&gt;}"/>
  <entry lemma="אַדְמָה" gloss="Admah" definition="= town; ◄ located near {L:Dead Sea&lt;SDBH:יָם־הַמֶּלַח&gt;}; ► destroyed with {L:Sodom&lt;SDBH:סְדֹם&gt;} and {L:Gomorrah&lt;SDBH:עֲמֹרָה&gt;}"/>
  <merged-entry lemma="אַדְמָה" gloss="humankind|human being(s)|Adam|Admah" definition="= human being as an individual or as a class of living creatures; sometimes explicitly subdivided between male (זָכָר) and female (נְקֵבָה) ≈ associated with mortality|= first man; ◄ created by God; ► husband of Eve (חַוָּה) father of Cain (קַיִן) Abel (הֶבֶל) and Seth (שֵׁת)|= town; ◄ located near the river Jordan (יַרְדֵּן) near Zarethan (צָרְתָן)|= town; ◄ located near Dead Sea (יָם־הַמֶּלַח) ► destroyed with Sodom (סְדֹם) and Gomorrah (עֲמֹרָה)"/>
</morph>
<morph marble-id="03400301600016" marble-text="פָּשַׁ֖ט">
  <entry lemma="פשׁט" gloss="to cast (one's) skin|to shed (one's) skin" definition="meaning unsure; possibly: = process by which a locust sheds its skin"/>
  <entry lemma="פשׁט" gloss="to advance upon (an area or people)|to make a raid" definition="= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions"/>
  <entry lemma="פשׁט" gloss="to advance upon (an area or people)|to make a raid" definition="= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions"/>
  <entry lemma="פשׁט" gloss="to spread (one's) wings" definition="meaning unsure; possibly: = action by which an animal extends its wings in order to fly away"/>
  <merged-entry lemma="פשׁט" gloss="to cast (one's) skin|to shed (one's) skin|to advance upon (an area or people)|to make a raid|to spread (one's) wings" definition="meaning unsure; possibly: = process by which a locust sheds its skin|= action by which a group of armed people makes a sudden move in order to attack the people living there and steal their possessions|meaning unsure; possibly: = action by which an animal extends its wings in order to fly away"/>
</morph>

Consistent attribute order

Attribute order needs to be consistent and readable.

Retain `c` attributes in lowfat

Right now c elements do not have any attributes in lowfat.

The <p> element's text child should properly space the Hebrew words

The Lowfat

element does not have proper spacing at the moment. For example,

<p>
<milestone unit="verse" id="GEN 1:2">GEN 1:2</milestone>
וְ הָ אָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָ בֹ֔הוּ וְ חֹ֖שֶׁךְ עַל פְּנֵ֣י תְה֑וֹם וְ ר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל פְּנֵ֥י הַ מָּֽיִם
</p>

Correct spacing between sub-words

@after should contain any whitespace needed

@after attributes should include whitespace such that sorting morphs in document order, concatenating @after to each morph, and concatenating these morphs gives the sentences with cantillation as in a printed text.

Renumbering semantic roles and participant reference attributes.

The numbering of semantic roles and participant reference use the Groves Center skeleton files, not the current numbering system. These roles need to be renumbered to match the current trees.

Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"

Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"

Once this is done, we can pull these attributes into the main tree. We need to consider whether to do this at the word level (using prepare-oshb or at the lowest Node level (perhaps in a separate annotations merge, which could be useful for cooking trees of varying complexity).

Issues with OT Cherith

First of all, there is a bug/type in the cherith file wlc-gloss.tsv. Somewhere halfway through the document, the remaining lines are all cramped into one 'cell' making it unprocessable using simple tsv readers. I've worked around the error, but we should probably check where or why it occurred.

The Cherith morphemes map nicely to our Node leaves except for 10 cases:

<word morphId="010380240022;010380240023" cherithId="010380240022" hebrew="מִשְׁלֹ֣שׁ" english="three" chinese="三"/>
<word morphId="130050090041;130050090042" cherithId="130050090041" hebrew="לְב֣וֹא" english="entrance" chinese="来~到"/>
<word morphId="130200040021;130200040022" cherithId="130200040021" hebrew="אַחֲרֵיכֵ֔ן" english="after this" chinese="此后"/>
<word morphId="140200010021;140200010022" cherithId="140200010021" hebrew="אַֽחֲרֵיכֵ֡ן" english="after this" chinese="此后"/>
<word morphId="140200350012;140200350013" cherithId="140200350012" hebrew="אַחֲרֵיכֵ֗ן" english="after this" chinese="此后"/>
<word morphId="140240040021;140240040022" cherithId="140240040021" hebrew="אַחֲרֵיכֵ֑ן" english="afterward" chinese="此后"/>
<word morphId="140260080081;140260080082" cherithId="140260080081" hebrew="לְב֣וֹא" english="border" chinese="至~来到"/>
<word morphId="150030050012;150030050013" cherithId="150030050012" hebrew="אַחֲרֵיכֵ֞ן" english="after that" chinese="此后"/>
<word morphId="150040090091;150040090092" cherithId="150040090091" hebrew="דִּ֠ינָיֵא" english="judges" chinese="法官"/>
<word morphId="220080060181;220080060182" cherithId="220080060181" hebrew="שַׁלְהֶ֥בֶתְיָֽה" english="raging flame" chinese="不可遏制的烈焰"/>

How should we deal with this? Add the glosses to both nodes in our trees? Or try to split the glosses manually?

Update `@n` to `@xml:id` and add corpus-specific prefix

USFMId issues

USFM identifiers identify verses, not sentences, and our extension for words means that placing a word identifier on morphemes and calling it an identifier is faulty because it is not a unique identifer when used this way.

Since USFM is the reference system we are using, we don't have to say USFM each time we use it.

Instead of this:

<?xml version="1.0" encoding="UTF-8"?><Sentences>
  <Sentence USFMId="GEN 4:1">
    <Trees>
      <Tree>
        <Node Cat="S" Head="0" nodeId="0100400100110200">
          <Node Cat="cjp" Rule="Cj2Cjp" Head="0" nodeId="0100400100110011">
            <Node n="010040010011" Cat="cj" morphId="010040010011" Unicode="וְ" nodeId="0100400100110010">
              <m USFMId="GEN 4:1!1" n="010040010011" morph="C" lang="H" lemma="c" pos="conjunction">וְ</m>
            </Node>
          </Node>
          <Node Cat="CL" Rule="S-V-O" Head="1" nodeId="0100400100120070">
            <Node Cat="S" Rule="Np2S" Head="0" nodeId="0100400100120021">
              <Node Cat="np" Rule="DetNP" Head="1" nodeId="0100400100120020">
                <Node n="010040010012" Cat="art" morphId="010040010012" Unicode="הָ֣" nodeId="0100400100120010">
                  <m USFMId="GEN 4:1!1" n="010040010012" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ֣</m>
                </Node>
                <Node Cat="np" Rule="N2NP" Head="0" nodeId="0100400100130011">
                  <Node n="010040010013" Cat="noun" morphId="010040010013" Unicode="אָדָ֔ם" nodeId="0100400100130010">
                    <m USFMId="GEN 4:1!1" n="010040010013" morph="Ncmsa" lang="H" lemma="120" after=" " pos="noun" type="common" gender="masculine" number="singular" state="absolute">אָדָ֔ם</m>
                  </Node>
                </Node>
              </Node>
            </Node>

I would prefer this:

<?xml version="1.0" encoding="UTF-8"?><Sentences>
  <Sentence verse="GEN 4:1">
    <Trees>
      <Tree>
        <Node Cat="S" Head="0" nodeId="0100400100110200">
          <Node Cat="cjp" Rule="Cj2Cjp" Head="0" nodeId="0100400100110011">
            <Node n="010040010011" Cat="cj" morphId="010040010011" Unicode="וְ" nodeId="0100400100110010">
              <m word="GEN 4:1!1" n="010040010011" morph="C" lang="H" lemma="c" pos="conjunction">וְ</m>
            </Node>
          </Node>
          <Node Cat="CL" Rule="S-V-O" Head="1" nodeId="0100400100120070">
            <Node Cat="S" Rule="Np2S" Head="0" nodeId="0100400100120021">
              <Node Cat="np" Rule="DetNP" Head="1" nodeId="0100400100120020">
                <Node n="010040010012" Cat="art" morphId="010040010012" Unicode="הָ֣" nodeId="0100400100120010">
                  <m word="GEN 4:1!1" n="010040010012" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ֣</m>
                </Node>
                <Node Cat="np" Rule="N2NP" Head="0" nodeId="0100400100130011">
                  <Node n="010040010013" Cat="noun" morphId="010040010013" Unicode="אָדָ֔ם" nodeId="0100400100130010">
                    <m word="GEN 4:1!1" n="010040010013" morph="Ncmsa" lang="H" lemma="120" after=" " pos="noun" type="common" gender="masculine" number="singular" state="absolute">אָדָ֔ם</m>
                  </Node>
                </Node>
              </Node>
            </Node>

Individual words should not always have a word group wrapping them

13 total missing sentences

13 sentences are currently missing their morphology and text. These include 9 sentences that have directional suffixes within a compound:

("2s20:15","ca6:12","2s24:6","gn46:1","gn28:2","gn28:5","gn28:6","gn28:7","js19:13")

They also include 3 sentences where OSHB and Westminster disagree on a Qere:

("ne5:7","ps21:2","da2:39")

The final sentence involves a mismatch in the implicit article:

("lv27:16")

Obviously, we need to fix these.

Inconsistent Preservation of OSHB's Unique Ids for Words

Because some words had to be broken up into constituent parts for analysis, one unique id would have to be shared across its two or three constituent parts to carry over into the trees. For example:

    <verse osisID="Gen.1.1">
      <w lemma="b/7225" n="1.0" morph="HR/Ncfsa" id="01xeN">בְּ/רֵאשִׁ֖ית</w>
      <w lemma="1254 a" morph="HVqp3ms" id="01Nvk">בָּרָ֣א</w>
      <w lemma="430" n="1" morph="HNcmpa" id="01TyA">אֱלֹהִ֑ים</w>
      <w lemma="853" morph="HTo" id="01vuQ">אֵ֥ת</w>
      <w lemma="d/8064" n="0.0" morph="HTd/Ncmpa" id="01TSc">הַ/שָּׁמַ֖יִם</w>
      <w lemma="c/853" morph="HC/To" id="01k5P">וְ/אֵ֥ת</w>
      <w lemma="d/776" n="0" morph="HTd/Ncbsa" id="01nPh">הָ/אָֽרֶץ</w><seg type="x-sof-pasuq">׃</seg>
    </verse>

"in beginning", "the heavens," "and [object marker]", "the earth" all didn't keep their OSHB unique Ids due to having been separated into 2 parts, while "created", "God", "[object marker]" still show their OSHB ids in the trees. Perhaps should strip all the OSHB ids to avoid confusion.

Issues with morpheme/attribute mappings

There have been several issues and small inconsistencies concerning mapping different datasets (OSHB, old/new trees, Marble, etc). This issue should help to keep the discussions and decisions about these issues documented and traceable.

Please refer to this document for comments, issues, and feel free to add your own.

Nodes with compounds missing `@n`

//Node[c|m][not(./@n)] and //Node[c] return exactly the same.

The Nodes that have compounds do not have an @n attribute.

<Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010" StrongNumberX="8423">
    <c english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
        <m word="GEN 4:22!6" xml:id="o010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
        <m word="GEN 4:22!7" xml:id="o010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
      </c>
    </Node>
  </Node>
  <Node Cat="np" Rule="NPofNP" Head="0" nodeId="0100402200810060">
    <Node Cat="np" Rule="Vp2Np" Head="0" nodeId="0100402200810012">
      <Node Cat="vp" Rule="V2VP" Head="0" nodeId="0100402200810011">
        <Node n="o010040220081" Cat="verb" morphId="010040220071" Unicode="לֹטֵ֕שׁ" nodeId="0100402200810010" StrongNumberX="3913" SenseNumber="2" Frame="A0:010040220061; A1:010040220091;" SubjRef="010040220061" Greek="σφυροκόπος">
          <m word="GEN 4:22!8" xml:id="o010040220081" lang="H" after=" " lemma="3913" morph="Vqrmsc" pos="verb" stem="qal" type="participle active" gender="masculine" number="singular" state="construct" english="made" mandarin="打造" SDBH="לטשׁ:002001001048:Shape:to hammer;to forge">לֹטֵ֕שׁ</m>
        </Node>
      </Node>
    </Node>

Gloss upon gloss

We have several sources of glosses, and they have different advantages and purposes. We need simple attribute names that support the glosses we are using:

Cherith Mandarin glosses
Cherith English glosses
SIL English glosses - broken down in a very interlinear-friendly way (as in Paratext SLT)
Berean Literal Bible (if we want them)

Obviously, glosses in other languages may also become a factor.

I don't particularly like attribute names like cherith-english in the following:

<Node xmlns:xi="http://www.w3.org/2001/XInclude" Cat="noun" morphId="130020160092" Unicode="עֲשָׂה־אֵ֖ל" nodeId="1300201600920010" StrongNumberX="6214" Greek="ασαηλ">
  <c cherith-english="Asahel" cherith-chinese="亚撒黑" marble-sense="עֲשָׂהאֵל:003001007:Names of People:Asahel|שָׁלֹשׁ:002001001042:Quantity;002001003009:Frequency:three">
    <m word="1CH 2:16!9" n="130020160092" morph="Np" lang="H" lemma="6214+" after="־" pos="noun" type="proper">עֲשָׂה</m>
    <m word="1CH 2:16!10" n="130020160101" lang="H" after=" " lemma="6214" morph="Np" pos="noun" type="proper">אֵ֖ל</m>
  </c>
</Node>

So we need a naming convention that gives us flexibility while keeping this simple. I don't think we need the attribute name to attribute the source, we can do that in documentation and copyright / license statements.

Any suggestions?

Marble Domains (`Domain`, `ContextualDomain`, `CoreDomain`)

English glosses

At the last minute, we found we had IP issues with the English glosses we had intended to use.

We need to compare available glosses and pick glosses that are particularly good.

Consistency Check: Morphology in Skeleton, OSHB

We need to develop tooling that makes it possible for an expert like @rkjtan to systematically examine differences. For morphemes, we have used a consonant-only comparison, using | to indicate boundaries between morphemes. For instance:

Verse: 27002039

ו|בתר|כ|תקומ|מלכו|אחרי|ארע|מנ|כ|ו|מלכו|תליתאה|אחרי|די|נחש|א|די|תשלט|ב|כל|ארע|א

ו|בתר|כ|תקומ|מלכו|אחרי|ארעא|מנ|כ|ו|מלכו|תליתאה|אחרי|די|נחש|א|די|תשלט|ב|כל|ארע|א

Verse: 16005007

ו|ימלכ|לב|י|על|י|ו|אריב|ה|את|ה|חרימ|ו|את|ה|סגנימ|ו|אמר|ה|ל|המ|משא|איש|ב|אחי|ו|אתמ|נשימ|ו|אתנ|עלי|המ|קהלה|גדולה

ו|ימלכ|לב|י|על|י|ו|אריב|ה|את|ה|חרימ|ו|את|ה|סגנימ|ו|אמר|ה|ל|המ|משא|איש|ב|אחי|ו|אתמ|נשאימ|ו|אתנ|עלי|המ|קהלה|גדולה

Verse: 19021002

יהוה|ב|עז|כ|ישמח|מלכ|ו|ב|ישועת|כ|מה|יגל|מאד

יהוה|ב|עז|כ|ישמח|מלכ|ו|ב|ישועת|כ|מה|יגיל|מאד

This would be good to have as part of a general-purpose tool that also identifies other inconsistencies.

@Domain and @Extends do not seem to exist on many words

When I run this simple count:

'# of m', count(//m),
'# of @Domain', count(//@Domain),
'# of @Extends', count(//@Extends)

I get this result (I would have expected more @Domain)

<?xml version="1.0" encoding="UTF-8"?># of m 475911 # of @Domain 214661 # of @Extends 35441

I am noticing a number of words do not have @Domain. E.g.,

<Node n="o010160030081" Cat="noun" morphId="010160030081" Unicode="שִׁפְחָת" nodeId="0101600300810010" StrongNumberX="8198" SenseNumber="1" Greek="παιδίσκην" GreekStrong="3814">
  <m word="GEN 16:3!8" xml:id="o010160030081" morph="Ncfsc" lang="H" lemma="8198" pos="noun" type="common" gender="feminine" number="singular" state="construct" english="servant" mandarin="婢女">שִׁפְחָתָ֔</m>
</Node>

This word, you will notice, does have @SenseNumber. Is something falling through the cracks? @klosoter @jonathanrobie

Partial mapping SIL - Macula (word and morph level)

This is what the extracted SIL data looks like (full file here)

{
    'wd': 'B.:/R")$I73YT',
    'ws': 'בְּרֵאשִׁ֖ית',
    'wt': 'bərēʾšîṯ',
    'wc': 'בְּרֵאשִׁית',
    'bf': '\\p',
    'vdm': '{"Temporal <H>בְּ</H>" 39.6.2}',
    'netB': '',
    'wbc': '{"The definite article is lacking, but \'in the beginning\' is an acceptable translation" GEN.1.1.b}',
    'egs': 'in.beginning',
    'morphs': {
        '010010010011': {
            'm': 'B.:',
            'ms': 'בְּ',
            'mt': 'bə',
            'l': 'B.:',
            'ls': 'בְּ',
            'lt': 'bə',
            'dfA': '\\7b\\d0\\62\\7d',
            'df': '}בּ{',
            't': 'Pp'
        },
        '010010010012': {
            'm': 'R")$I73YT',
            'ms': 'רֵאשִׁ֖ית',
            'mt': 'rēʾšiyṯ',
            'l': 'R")$IYT',
            'ls': 'רֵאשִׁית',
            'lt': 'rēʾšîṯ',
            'dfA': '\\7b\\74\\79\\69\\48\\27\\e3\\72\\7d',
            'df': '}רֵאשִׁית{',
            't': 'ncfsa',
            'str': '{07225}'
        }
    }
}

Inconsistencies with Marble Lexicon entries

There are several inconsistencies with the Marble lexicon entries from /ubsicap/marble-lexicon/SDBH/SDBH-EXPORT-en.XML.

Please see the following files to check this out.

macula-marble-mapping.xml
Mapping between macula trees @morphId and marble-BHS.
oshb-marble-mapping.xml
Mapping between macula trees @n and marble-BHS.
marble-senses.xml
Glosses and LEXDomains grouped by marble-id extracted from ubsicap
marble-issues.xml

Collected issues with marble lexicon data. The following cases occur:

One marble sense corresponds to more than one Macula Node

<morph SDBH="עַקְרָב:001001002001006:Swarming Creatures:scorpion|עַקְרָב:003001010:Names of Locations:Ascent of Akrabbim;Ascent of Scorpions" marble-text="מַעֲלֵ֤ה עַקְרַבִּים֙">
  <Node n="060150030042" Cat="noun" morphId="060150030042" Unicode="מַעֲלֵ֤ה" nodeId="0601500300420010" StrongNumberX="4608" SenseNumber="1" Greek="προσαναβάσεως">
    <m word="JOS 15:3!4" n="060150030042" morph="Ncmsc" lang="H" lemma="4610+" after=" " pos="noun" type="common" gender="masculine" number="singular" state="construct" english="ascent" mandarin="隘口">מַעֲלֵ֤ה</m>
  </Node>
  <Node n="060150030051" Cat="noun" morphId="060150030051" Unicode="עַקְרַבִּים֙" nodeId="0601500300510010" StrongNumberX="6137" SenseNumber="2" Greek="ακραβιν">
    <m word="JOS 15:3!5" n="060150030051" lang="H" after=" " lemma="4610" morph="Np" pos="noun" type="proper" english="akrabbim" mandarin="亚克拉滨">עַקְרַבִּים֙</m>
  </Node>
</morph>

There are multiple lexicon entries that correspond to one Marble Id (see 4).
Of compounds both <c> elements and some of their <m> children are aligned with a marble sense (using the mapping), which results in duplicates:

<c english="Arameans of Beth-rehob" mandarin="伯·利合的亚兰人" SDBH="בֵּית רְחֹוב:003001010:Names of Locations:Beth-Rehob">
  <m USFMId="2SA 10:6!12" n="100100060121" lang="H" after=" " lemma="758" morph="Np" pos="noun" type="proper" SDBH="אֲרַם בֵּית־רְחֹוב:003001006:Names of Groups:Arameans of Beth-Rehob">אֲרַ֨ם</m>
  <m USFMId="2SA 10:6!13" n="100100060131" lang="H" after="־" lemma="1050+" morph="Np" pos="noun" type="proper">בֵּית</m>
  <m USFMId="2SA 10:6!14" n="100100060141" lang="H" after=" " lemma="1050" morph="Np" pos="noun" type="proper">רְח֜וֹב</m>
</c>

Sometimes, the lemma associated with a lexicon entry does not match the Hebrew word(s) that are associated with the reference id. See previous example, first <m> element. This can best be seen by checking the marble-senses.xml which contains for each Marble Id:

all associated lexicon entries with lemmas, glosses, and domains
the Hebrew text corresponding to the id in the original Marble-BHS
Example (the marble-text being the text associated with the id in Marble-BHS):

<morph marble-id="00503300200046" marble-text="אשׁ" combined-data="אֵשׁ:001006006:Fire:fire|אֲשֵׁדָה:001005003:Landforms:slope|אֵשְׁדָּת::fiery law|דאה:002002001009:Move:to swoop down">
  <entry lemma="אֵשׁ" gloss="fire" domain="001006006:Fire"/>
  <entry lemma="אֲשֵׁדָה" gloss="slope" domain="001005003:Landforms"/>
  <entry lemma="אֵשְׁדָּת" gloss="fiery law" domain=""/>
  <entry lemma="דאה" gloss="to swoop down" domain="002002001009:Move"/>
</morph>

It is nearly impossible to just automatically select the entries for a given Id that match the actual Hebrew text because the lexicon lemmas are often base forms. So, there will probably be some sdbh senses that do not match the <m> or <c> attribute they are added to.

Any thoughts, suggestions, or solutions?

Please put Cherith glosses on the <m> elements

Instead of this:

<Node n="190010010021" Cat="art" morphId="190010010021" Unicode="הָ" nodeId="1900100100210010" StrongNumberX="1886a" english="the" chinese="这">
         <m USFMId="PSA 1:1!2" n="190010010021" morph="Td" lang="H" lemma="d" pos="particle" type="definite article">הָ</m>
</Node>

I would like this:

<Node n="190010010021" Cat="art" morphId="190010010021" Unicode="הָ" nodeId="1900100100210010" StrongNumberX="1886a">
         <m USFMId="PSA 1:1!2" n="190010010021" morph="Td" lang="H" lemma="d" pos="particle" type="definite article" english="the" chinese="这">הָ</m>
</Node>

For compound words, put the gloss on the compound:

                    <Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010">
                      <c>
                        <m USFMId="GEN 4:22!6" n="010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
                        <m USFMId="GEN 4:22!7" n="010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
                      </c>
                    </Node>

Leaf Nodes without <m> elements

There are 266 leaf Node elements with no <m> or <c> children. This is a bug. We should add this test to our unit tests, @jacobwegner .

//Node[empty(*)]

Here are the nodes in question:

<Node Cat="cj" morphId="100200150011" Unicode="וַ" nodeId="100200150010010">WA</Node>
<Node Cat="verb" morphId="100200150012" Unicode="יָּבֹ֜אוּ" nodeId="100200150020040">Y.FBO61)W.</Node>
<Node Cat="cj" morphId="100200150021" Unicode="וַ" nodeId="100200150060010">WA</Node>
<Node Cat="verb" morphId="100200150022" Unicode="יָּצֻ֣רוּ" nodeId="100200150070040">Y.FCU74RW.</Node>
<Node Cat="prep" morphId="100200150031" Unicode="עָלָ֗י" nodeId="100200150110030">(FLF81Y</Node>
<Node Cat="pron" morphId="100200150032" Unicode="ו" nodeId="100200150140010">W</Node>
<Node Cat="prep" morphId="100200150041" Unicode="בְּ" nodeId="100200150150010">B.:</Node>
<Node Cat="cj" morphId="100200150051" Unicode="וַ" nodeId="100200150280010">WA</Node>
<Node Cat="verb" morphId="100200150052" Unicode="יִּשְׁפְּכ֤וּ" nodeId="100200150290050">Y.I$:P.:K70W.</Node>
<Node Cat="noun" morphId="100200150061" Unicode="סֹֽלְלָה֙" nodeId="100200150340040">SO75L:LFH03</Node>
<Node Cat="prep" morphId="100200150071" Unicode="אֶל־" nodeId="100200150380020">)EL-</Node>
<Node Cat="art" morphId="100200150081" Unicode="הָ" nodeId="100200150400010">HF</Node>
<Node Cat="noun" morphId="100200150082" Unicode="עִ֔יר" nodeId="100200150410030">(I80YR</Node>
<Node Cat="cj" morphId="100200150091" Unicode="וַֽ" nodeId="100200150440010">WA75</Node>
<Node Cat="verb" morphId="100200150092" Unicode="תַּעֲמֹ֖ד" nodeId="100200150450040">T.A(:AMO73D</Node>
<Node Cat="prep" morphId="100200150101" Unicode="בַּ" nodeId="100200150490010">B.A</Node>
<Node Cat="art" morphId="100200150102" Unicode="" nodeId="100200150500000">_</Node>
<Node Cat="noun" morphId="100200150103" Unicode="חֵ֑ל" nodeId="100200150500020">X"92L</Node>
<Node Cat="cj" morphId="100200150111" Unicode="וְ" nodeId="100200150520010">W:</Node>
<Node Cat="noun" morphId="100200150112" Unicode="כָל־" nodeId="100200150530020">KFL-</Node>
<Node Cat="art" morphId="100200150121" Unicode="הָ" nodeId="100200150550010">HF</Node>
<Node Cat="noun" morphId="100200150122" Unicode="עָם֙" nodeId="100200150560020">(FM03</Node>
<Node Cat="rel" morphId="100200150131" Unicode="אֲשֶׁ֣ר" nodeId="100200150580030">):A$E74R</Node>
<Node Cat="prep" morphId="100200150141" Unicode="אֶת־" nodeId="100200150610020">)ET-</Node>
<Node Cat="noun" morphId="100200150151" Unicode="יוֹאָ֔ב" nodeId="100200150630040">YOW)F80B</Node>
<Node Cat="verb" morphId="100200150161" Unicode="מַשְׁחִיתִ֖ם" nodeId="100200150670060">MA$:XIYTI73M</Node>
<Node Cat="prep" morphId="100200150171" Unicode="לְ" nodeId="100200150730010">L:</Node>
<Node Cat="verb" morphId="100200150172" Unicode="הַפִּ֥יל" nodeId="100200150740040">HAP.I71YL</Node>
<Node Cat="art" morphId="100200150181" Unicode="הַ" nodeId="100200150780010">HA</Node>
<Node Cat="noun" morphId="100200150182" Unicode="׃חוֹמָֽה" nodeId="100200150790040">XOWMF75H00</Node>
<Node Cat="cj" morphId="030270160011" Unicode="וְ" nodeId="030270160010010">W:</Node>
<Node Cat="cj" morphId="030270160012" Unicode="אִ֣ם ׀" nodeId="030270160020020">)I74M05</Node>
<Node Cat="prep" morphId="030270160021" Unicode="מִ" nodeId="030270160040010">MI</Node>
<Node Cat="noun" morphId="030270160022" Unicode="שְּׂדֵ֣ה" nodeId="030270160050030">&amp;.:D"74H</Node>
<Node Cat="noun" morphId="030270160031" Unicode="אֲחֻזָּת" nodeId="030270160080040">):AXUZ.FT</Node>
<Node Cat="pron" morphId="030270160032" Unicode="֗וֹ" nodeId="030270160120010">O81W</Node>
<Node Cat="verb" morphId="030270160041" Unicode="יַקְדִּ֥ישׁ" nodeId="030270160130050">YAQ:D.I71Y$</Node>
<Node Cat="noun" morphId="030270160051" Unicode="אִישׁ֙" nodeId="030270160180030">)IY$03</Node>
<Node Cat="prep" morphId="030270160061" Unicode="לַֽ" nodeId="030270160210010">LA75</Node>
<Node Cat="noun" morphId="030270160062" Unicode="יהוָ֔ה" nodeId="030270160220040">YHWF80H</Node>
<Node Cat="cj" morphId="030270160071" Unicode="וְ" nodeId="030270160260010">W:</Node>
<Node Cat="verb" morphId="030270160072" Unicode="הָיָ֥ה" nodeId="030270160270030">HFYF71H</Node>
<Node Cat="noun" morphId="030270160081" Unicode="עֶרְכּ" nodeId="030270160300030">(ER:K.</Node>
<Node Cat="pron" morphId="030270160082" Unicode="ְךָ֖" nodeId="030270160330010">:KF73</Node>
<Node Cat="prep" morphId="030270160091" Unicode="לְ" nodeId="030270160340010">L:</Node>
<Node Cat="noun" morphId="030270160092" Unicode="פִ֣י" nodeId="030270160350020">PI74Y</Node>
<Node Cat="noun" morphId="030270160101" Unicode="זַרְע" nodeId="030270160370030">ZAR:(</Node>
<Node Cat="pron" morphId="030270160102" Unicode="֑וֹ" nodeId="030270160400010">O92W</Node>
<Node Cat="noun" morphId="030270160111" Unicode="זֶ֚רַע" nodeId="030270160410030">10ZERA(</Node>
<Node Cat="noun" morphId="030270160121" Unicode="חֹ֣מֶר" nodeId="030270160440030">XO74MER</Node>
<Node Cat="noun" morphId="030270160131" Unicode="שְׂעֹרִ֔ים" nodeId="030270160470050">&amp;:(ORI80YM</Node>
<Node Cat="prep" morphId="030270160141" Unicode="בַּ" nodeId="030270160520010">B.A</Node>
<Node Cat="num" morphId="030270160142" Unicode="חֲמִשִּׁ֖ים" nodeId="030270160530050">X:AMI$.I73YM</Node>
<Node Cat="noun" morphId="030270160151" Unicode="שֶׁ֥קֶל" nodeId="030270160580030">$E71QEL</Node>
<Node Cat="noun" morphId="030270160161" Unicode="׃כָּֽסֶף" nodeId="030270160610030">K.F75SEP00</Node>
<Node Cat="cj" morphId="160050070011" Unicode="וַ" nodeId="160050070010010">WA</Node>
<Node Cat="verb" morphId="160050070012" Unicode="יִּמָּלֵ֨ךְ" nodeId="160050070020040">Y.IM.FL"63K:</Node>
<Node Cat="noun" morphId="160050070021" Unicode="לִבּ" nodeId="160050070060020">LIB.</Node>
<Node Cat="pron" morphId="160050070022" Unicode="ִ֜י" nodeId="160050070080010">I61Y</Node>
<Node Cat="prep" morphId="160050070031" Unicode="עָל" nodeId="160050070090020">(FL</Node>
<Node Cat="pron" morphId="160050070032" Unicode="ַ֗י" nodeId="160050070110010">A81Y</Node>
<Node Cat="cj" morphId="160050070041" Unicode="וָ" nodeId="160050070120010">WF</Node>
<Node Cat="verb" morphId="160050070042" Unicode="אָרִ֙יב" nodeId="160050070130040">)FRI33YB</Node>
<Node Cat="x" morphId="160050070043" Unicode="ָה֙" nodeId="160050070170010">FH03</Node>
<Node Cat="prep" morphId="160050070051" Unicode="אֶת־" nodeId="160050070180020">)ET-</Node>
<Node Cat="art" morphId="160050070061" Unicode="הַ" nodeId="160050070200010">HA</Node>
<Node Cat="noun" morphId="160050070062" Unicode="חֹרִ֣ים" nodeId="160050070210040">XORI74YM</Node>
<Node Cat="cj" morphId="160050070071" Unicode="וְ" nodeId="160050070250010">W:</Node>
<Node Cat="prep" morphId="160050070072" Unicode="אֶת־" nodeId="160050070260020">)ET-</Node>
<Node Cat="art" morphId="160050070081" Unicode="הַ" nodeId="160050070280010">HA</Node>
<Node Cat="noun" morphId="160050070082" Unicode="סְּגָנִ֔ים" nodeId="160050070290050">S.:GFNI80YM</Node>
<Node Cat="cj" morphId="160050070091" Unicode="וָ" nodeId="160050070340010">WF</Node>
<Node Cat="verb" morphId="160050070092" Unicode="אֹמְר" nodeId="160050070350030">)OM:R</Node>
<Node Cat="x" morphId="160050070093" Unicode="ָ֣ה" nodeId="160050070380010">F74H</Node>
<Node Cat="prep" morphId="160050070101" Unicode="ל" nodeId="160050070390010">L</Node>
<Node Cat="pron" morphId="160050070102" Unicode="ָהֶ֔ם" nodeId="160050070400020">FHE80M</Node>
<Node Cat="noun" morphId="160050070111" Unicode="מַשָּׁ֥א" nodeId="160050070420030">MA$.F71)</Node>
<Node Cat="noun" morphId="160050070121" Unicode="אִישׁ־" nodeId="160050070450030">)IY$-</Node>
<Node Cat="prep" morphId="160050070131" Unicode="בְּ" nodeId="160050070480010">B.:</Node>
<Node Cat="noun" morphId="160050070132" Unicode="אָחִ֖י" nodeId="160050070490030">)FXI73Y</Node>
<Node Cat="pron" morphId="160050070133" Unicode="ו" nodeId="160050070520010">W</Node>
<Node Cat="pron" morphId="160050070141" Unicode="אַתֶּ֣ם" nodeId="160050070530030">)AT.E74M</Node>
<Node Cat="verb" morphId="160050070151" Unicode="נשאים" nodeId="160050070560050">*NO$:)IYM</Node>
<Node Cat="cj" morphId="160050070171" Unicode="וָ" nodeId="160050070650010">WF</Node>
<Node Cat="verb" morphId="160050070172" Unicode="אֶתֵּ֥ן" nodeId="160050070660030">)ET."71N</Node>
<Node Cat="prep" morphId="160050070181" Unicode="עֲלֵי" nodeId="160050070690030">(:AL"Y</Node>
<Node Cat="pron" morphId="160050070182" Unicode="הֶ֖ם" nodeId="160050070720020">HE73M</Node>
<Node Cat="noun" morphId="160050070191" Unicode="קְהִלָּ֥ה" nodeId="160050070740040">Q:HIL.F71H</Node>
<Node Cat="adj" morphId="160050070201" Unicode="׃גְדוֹלָֽה" nodeId="160050070780050">G:DOWLF75H00</Node>
<Node Cat="adv" morphId="220060120011" Unicode="לֹ֣א" nodeId="220060120010020">LO74)</Node>
<Node Cat="verb" morphId="220060120021" Unicode="יָדַ֔עְתִּי" nodeId="220060120030050">YFDA80(:T.IY</Node>
<Node Cat="noun" morphId="220060120031" Unicode="נַפְשׁ" nodeId="220060120080030">NAP:$</Node>
<Node Cat="pron" morphId="220060120032" Unicode="ִ֣י" nodeId="220060120110010">I74Y</Node>
<Node Cat="verb" morphId="220060120041" Unicode="שָׂמַ֔ת" nodeId="220060120120030">&amp;FMA80T</Node>
<Node Cat="pron" morphId="220060120042" Unicode="ְנִי" nodeId="220060120150020">:NIY</Node>
<Node Cat="noun" morphId="220060120051" Unicode="מַרְכְּב֖וֹת" nodeId="220060120170060">MAR:K.:BO73WT</Node>
<Node Cat="cj" morphId="100240060011" Unicode="וַ" nodeId="100240060010010">WA</Node>
<Node Cat="verb" morphId="100240060012" Unicode="יָּבֹ֙אוּ֙" nodeId="100240060020040">Y.FBO33)W.03</Node>
<Node Cat="art" morphId="100240060021" Unicode="הַ" nodeId="100240060060010">HA</Node>
<Node Cat="noun" morphId="100240060022" Unicode="גִּלְעָ֔ד" nodeId="100240060070040">G.IL:(F80D</Node>
<Node Cat="x" morphId="100240060023" Unicode="ָה" nodeId="100240060110010">FH</Node>
<Node Cat="cj" morphId="100240060031" Unicode="וְ" nodeId="100240060120010">W:</Node>
<Node Cat="prep" morphId="100240060032" Unicode="אֶל־" nodeId="100240060130020">)EL-</Node>
<Node Cat="noun" morphId="100240060041" Unicode="אֶ֥רֶץ" nodeId="100240060150030">)E71REC</Node>
<Node Cat="cj" morphId="100240060061" Unicode="וַ" nodeId="100240060270010">WA</Node>
<Node Cat="verb" morphId="100240060062" Unicode="יָּבֹ֙אוּ֙" nodeId="100240060280040">Y.FBO33)W.03</Node>
<Node Cat="cj" morphId="100240060081" Unicode="וְ" nodeId="100240060380010">W:</Node>
<Node Cat="adv" morphId="100240060082" Unicode="סָבִ֖יב" nodeId="100240060390040">SFBI73YB</Node>
<Node Cat="prep" morphId="100240060091" Unicode="אֶל־" nodeId="100240060430020">)EL-</Node>
<Node Cat="noun" morphId="100240060101" Unicode="׃צִידֽוֹן" nodeId="100240060450050">CIYDO75WN00</Node>
<Node Cat="noun" morphId="190210020011" Unicode="יְֽהוָ֗ה" nodeId="190210020010040">Y:75HWF81H</Node>
<Node Cat="prep" morphId="190210020021" Unicode="בְּ" nodeId="190210020050010">B.:</Node>
<Node Cat="noun" morphId="190210020022" Unicode="עָזּ" nodeId="190210020060020">(FZ.</Node>
<Node Cat="pron" morphId="190210020023" Unicode="ְךָ֥" nodeId="190210020080010">:KF71</Node>
<Node Cat="verb" morphId="190210020031" Unicode="יִשְׂמַח־" nodeId="190210020090040">YI&amp;:MAX-</Node>
<Node Cat="noun" morphId="190210020041" Unicode="מֶ֑לֶךְ" nodeId="190210020130030">ME92LEK:</Node>
<Node Cat="cj" morphId="190210020051" Unicode="וּ֝" nodeId="190210020160010">11W.</Node>
<Node Cat="prep" morphId="190210020052" Unicode="בִ" nodeId="190210020170010">BI</Node>
<Node Cat="noun" morphId="190210020053" Unicode="ישׁ֥וּעָת" nodeId="190210020180050">Y$71W.(FT</Node>
<Node Cat="pron" morphId="190210020054" Unicode="ְךָ֗" nodeId="190210020230010">:KF81</Node>
<Node Cat="pron" morphId="190210020061" Unicode="מַה־" nodeId="190210020240020">MAH-</Node>
<Node Cat="verb" morphId="190210020071" Unicode="יגיל" nodeId="190210020260040">*Y.FG"YL</Node>
<Node Cat="adv" morphId="190210020091" Unicode="׃מְאֹֽד" nodeId="190210020330030">M:)O75D00</Node>
<Node Cat="cj" morphId="010460010011" Unicode="וַ" nodeId="010460010010010">WA</Node>
<Node Cat="verb" morphId="010460010012" Unicode="יִּסַּ֤ע" nodeId="010460010020030">Y.IS.A70(</Node>
<Node Cat="noun" morphId="010460010021" Unicode="יִשְׂרָאֵל֙" nodeId="010460010050050">YI&amp;:RF)"L03</Node>
<Node Cat="cj" morphId="010460010031" Unicode="וְ" nodeId="010460010100010">W:</Node>
<Node Cat="noun" morphId="010460010032" Unicode="כָל־" nodeId="010460010110020">KFL-</Node>
<Node Cat="rel" morphId="010460010041" Unicode="אֲשֶׁר־" nodeId="010460010130030">):A$ER-</Node>
<Node Cat="prep" morphId="010460010051" Unicode="ל" nodeId="010460010160010">L</Node>
<Node Cat="pron" morphId="010460010052" Unicode="֔וֹ" nodeId="010460010170010">O80W</Node>
<Node Cat="cj" morphId="010460010061" Unicode="וַ" nodeId="010460010180010">WA</Node>
<Node Cat="verb" morphId="010460010062" Unicode="יָּבֹ֖א" nodeId="010460010190030">Y.FBO73)</Node>
<Node Cat="cj" morphId="010460010081" Unicode="וַ" nodeId="010460010290010">WA</Node>
<Node Cat="verb" morphId="010460010082" Unicode="יִּזְבַּ֣ח" nodeId="010460010300040">Y.IZ:B.A74X</Node>
<Node Cat="noun" morphId="010460010091" Unicode="זְבָחִ֔ים" nodeId="010460010340050">Z:BFXI80YM</Node>
<Node Cat="prep" morphId="010460010101" Unicode="לֵ" nodeId="010460010390010">L"</Node>
<Node Cat="noun" morphId="010460010102" Unicode="אלֹהֵ֖י" nodeId="010460010400040">)LOH"73Y</Node>
<Node Cat="noun" morphId="010460010111" Unicode="אָבִ֥י" nodeId="010460010440030">)FBI71Y</Node>
<Node Cat="pron" morphId="010460010112" Unicode="ו" nodeId="010460010470010">W</Node>
<Node Cat="noun" morphId="010460010121" Unicode="׃יִצְחָֽק" nodeId="010460010480040">YIC:XF75Q00</Node>
<Node Cat="cj" morphId="270020390011" Unicode="וּ" nodeId="270020390010010">W.</Node>
<Node Cat="prep" morphId="270020390012" Unicode="בָתְר" nodeId="270020390020030">BFT:R</Node>
<Node Cat="pron" morphId="270020390013" Unicode="ָ֗ךְ" nodeId="270020390050010">F81K:</Node>
<Node Cat="verb" morphId="270020390021" Unicode="תְּק֛וּם" nodeId="270020390060040">T.:Q91W.M</Node>
<Node Cat="noun" morphId="270020390031" Unicode="מַלְכ֥וּ" nodeId="270020390100040">MAL:K71W.</Node>
<Node Cat="adj" morphId="270020390041" Unicode="אָחֳרִ֖י" nodeId="270020390140040">)FX:FRI73Y</Node>
<Node Cat="noun" morphId="270020390051" Unicode="אֲרַ֣עא" nodeId="270020390180040">):ARA74()</Node>
<Node Cat="prep" morphId="270020390061" Unicode="מִנּ" nodeId="270020390220020">MIN.</Node>
<Node Cat="pron" morphId="270020390062" Unicode="ָ֑ךְ" nodeId="270020390240010">F92K:</Node>
<Node Cat="cj" morphId="270020390071" Unicode="וּ" nodeId="270020390250010">W.</Node>
<Node Cat="noun" morphId="270020390072" Unicode="מַלְכ֨וּ" nodeId="270020390260040">MAL:K63W.</Node>
<Node Cat="num" morphId="270020390091" Unicode="תְלִיתָאָ֤ה" nodeId="270020390360060">**T:LIYTF)F70H</Node>
<Node Cat="adj" morphId="270020390101" Unicode="אָחֳרִי֙" nodeId="270020390420040">)FX:FRIY03</Node>
<Node Cat="rel" morphId="270020390111" Unicode="דִּ֣י" nodeId="270020390460020">D.I74Y</Node>
<Node Cat="noun" morphId="270020390121" Unicode="נְחָשׁ" nodeId="270020390480030">N:XF$</Node>
<Node Cat="art" morphId="270020390122" Unicode="ָ֔א" nodeId="270020390510010">F80)</Node>
<Node Cat="rel" morphId="270020390131" Unicode="דִּ֥י" nodeId="270020390520020">D.I71Y</Node>
<Node Cat="verb" morphId="270020390141" Unicode="תִשְׁלַ֖ט" nodeId="270020390540040">TI$:LA73+</Node>
<Node Cat="prep" morphId="270020390151" Unicode="בְּ" nodeId="270020390580010">B.:</Node>
<Node Cat="noun" morphId="270020390152" Unicode="כָל־" nodeId="270020390590020">KFL-</Node>
<Node Cat="noun" morphId="270020390161" Unicode="אַרְע" nodeId="270020390610030">)AR:(</Node>
<Node Cat="art" morphId="270020390162" Unicode="׃ָֽא" nodeId="270020390640010">F75)00</Node>
<Node Cat="verb" morphId="010280020011" Unicode="ק֥וּם" nodeId="010280020010030">Q71W.M</Node>
<Node Cat="verb" morphId="010280020021" Unicode="לֵךְ֙" nodeId="010280020040020">L"K:03</Node>
<Node Cat="noun" morphId="010280020041" Unicode="בֵּ֥ית" nodeId="010280020130030">B."71YT</Node>
<Node Cat="x" morphId="010280020042" Unicode="ָה" nodeId="010280020160010">FH</Node>
<Node Cat="noun" morphId="010280020051" Unicode="בְתוּאֵ֖ל" nodeId="010280020170050">B:TW.)"73L</Node>
<Node Cat="noun" morphId="010280020061" Unicode="אֲבִ֣י" nodeId="010280020220030">):ABI74Y</Node>
<Node Cat="noun" morphId="010280020071" Unicode="אִמּ" nodeId="010280020250020">)IM.</Node>
<Node Cat="pron" morphId="010280020072" Unicode="ֶ֑ךָ" nodeId="010280020270010">E92KF</Node>
<Node Cat="cj" morphId="010280020081" Unicode="וְ" nodeId="010280020280010">W:</Node>
<Node Cat="verb" morphId="010280020082" Unicode="קַח־" nodeId="010280020290020">QAX-</Node>
<Node Cat="prep" morphId="010280020091" Unicode="ל" nodeId="010280020310010">L</Node>
<Node Cat="pron" morphId="010280020092" Unicode="ְךָ֤" nodeId="010280020320010">:KF70</Node>
<Node Cat="prep" morphId="010280020101" Unicode="מִ" nodeId="010280020330010">MI</Node>
<Node Cat="adv" morphId="010280020102" Unicode="שָּׁם֙" nodeId="010280020340020">$.FM03</Node>
<Node Cat="noun" morphId="010280020111" Unicode="אִשָּׁ֔ה" nodeId="010280020360030">)I$.F80H</Node>
<Node Cat="prep" morphId="010280020121" Unicode="מִ" nodeId="010280020390010">MI</Node>
<Node Cat="noun" morphId="010280020122" Unicode="בְּנ֥וֹת" nodeId="010280020400040">B.:NO71WT</Node>
<Node Cat="noun" morphId="010280020131" Unicode="לָבָ֖ן" nodeId="010280020440030">LFBF73N</Node>
<Node Cat="noun" morphId="010280020141" Unicode="אֲחִ֥י" nodeId="010280020470030">):AXI71Y</Node>
<Node Cat="noun" morphId="010280020151" Unicode="אִמּ" nodeId="010280020500020">)IM.</Node>
<Node Cat="pron" morphId="010280020152" Unicode="׃ֶֽךָ" nodeId="010280020520010">E75KF00</Node>
<Node Cat="cj" morphId="010280050011" Unicode="וַ" nodeId="010280050010010">WA</Node>
<Node Cat="verb" morphId="010280050012" Unicode="יִּשְׁלַ֤ח" nodeId="010280050020040">Y.I$:LA70X</Node>
<Node Cat="noun" morphId="010280050021" Unicode="יִצְחָק֙" nodeId="010280050060040">YIC:XFQ03</Node>
<Node Cat="om" morphId="010280050031" Unicode="אֶֽת־" nodeId="010280050100020">)E75T-</Node>
<Node Cat="noun" morphId="010280050041" Unicode="יַעֲקֹ֔ב" nodeId="010280050120040">YA(:AQO80B</Node>
<Node Cat="cj" morphId="010280050051" Unicode="וַ" nodeId="010280050160010">WA</Node>
<Node Cat="verb" morphId="010280050052" Unicode="יֵּ֖לֶךְ" nodeId="010280050170030">Y."73LEK:</Node>
<Node Cat="prep" morphId="010280050071" Unicode="אֶל־" nodeId="010280050270020">)EL-</Node>
<Node Cat="noun" morphId="010280050081" Unicode="לָבָ֤ן" nodeId="010280050290030">LFBF70N</Node>
<Node Cat="noun" morphId="010280050091" Unicode="בֶּן־" nodeId="010280050320020">B.EN-</Node>
<Node Cat="noun" morphId="010280050101" Unicode="בְּתוּאֵל֙" nodeId="010280050340050">B.:TW.)"L03</Node>
<Node Cat="art" morphId="010280050111" Unicode="הָֽ" nodeId="010280050390010">HF75</Node>
<Node Cat="noun" morphId="010280050112" Unicode="אֲרַמִּ֔י" nodeId="010280050400040">):ARAM.I80Y</Node>
<Node Cat="noun" morphId="010280050121" Unicode="אֲחִ֣י" nodeId="010280050440030">):AXI74Y</Node>
<Node Cat="noun" morphId="010280050131" Unicode="רִבְקָ֔ה" nodeId="010280050470040">RIB:QF80H</Node>
<Node Cat="noun" morphId="010280050141" Unicode="אֵ֥ם" nodeId="010280050510020">)"71M</Node>
<Node Cat="noun" morphId="010280050151" Unicode="יַעֲקֹ֖ב" nodeId="010280050530040">YA(:AQO73B</Node>
<Node Cat="cj" morphId="010280050161" Unicode="וְ" nodeId="010280050570010">W:</Node>
<Node Cat="noun" morphId="010280050162" Unicode="׃עֵשָֽׂו" nodeId="010280050580030">("&amp;F75W00</Node>
<Node Cat="cj" morphId="010280060011" Unicode="וַ" nodeId="010280060010010">WA</Node>
<Node Cat="verb" morphId="010280060012" Unicode="יַּ֣רְא" nodeId="010280060020030">Y.A74R:)</Node>
<Node Cat="noun" morphId="010280060021" Unicode="עֵשָׂ֗ו" nodeId="010280060050030">("&amp;F81W</Node>
<Node Cat="cj" morphId="010280060031" Unicode="כִּֽי־" nodeId="010280060080020">K.I75Y-</Node>
<Node Cat="verb" morphId="010280060041" Unicode="בֵרַ֣ךְ" nodeId="010280060100030">B"RA74K:</Node>
<Node Cat="noun" morphId="010280060051" Unicode="יִצְחָק֮" nodeId="010280060130040">YIC:XFQ02</Node>
<Node Cat="om" morphId="010280060061" Unicode="אֶֽת־" nodeId="010280060170020">)E75T-</Node>
<Node Cat="noun" morphId="010280060071" Unicode="יַעֲקֹב֒" nodeId="010280060190040">YA(:AQOB01</Node>
<Node Cat="cj" morphId="010280060081" Unicode="וְ" nodeId="010280060230010">W:</Node>
<Node Cat="verb" morphId="010280060082" Unicode="שִׁלַּ֤ח" nodeId="010280060240030">$IL.A70X</Node>
<Node Cat="om" morphId="010280060091" Unicode="אֹת" nodeId="010280060270020">)OT</Node>
<Node Cat="pron" morphId="010280060092" Unicode="וֹ֙" nodeId="010280060290010">OW03</Node>
<Node Cat="prep" morphId="010280060111" Unicode="לָ" nodeId="010280060370010">LF</Node>
<Node Cat="verb" morphId="010280060112" Unicode="קַֽחַת־" nodeId="010280060380030">QA75XAT-</Node>
<Node Cat="prep" morphId="010280060121" Unicode="ל" nodeId="010280060410010">L</Node>
<Node Cat="pron" morphId="010280060122" Unicode="֥וֹ" nodeId="010280060420010">O71W</Node>
<Node Cat="prep" morphId="010280060131" Unicode="מִ" nodeId="010280060430010">MI</Node>
<Node Cat="adv" morphId="010280060132" Unicode="שָּׁ֖ם" nodeId="010280060440020">$.F73M</Node>
<Node Cat="noun" morphId="010280060141" Unicode="אִשָּׁ֑ה" nodeId="010280060460030">)I$.F92H</Node>
<Node Cat="prep" morphId="010280060151" Unicode="בְּ" nodeId="010280060490010">B.:</Node>
<Node Cat="verb" morphId="010280060152" Unicode="בָרֲכ" nodeId="010280060500030">BFR:AK</Node>
<Node Cat="pron" morphId="010280060153" Unicode="֣וֹ" nodeId="010280060530010">O74W</Node>
<Node Cat="om" morphId="010280060161" Unicode="אֹת" nodeId="010280060540020">)OT</Node>
<Node Cat="pron" morphId="010280060162" Unicode="֔וֹ" nodeId="010280060560010">O80W</Node>
<Node Cat="cj" morphId="010280060171" Unicode="וַ" nodeId="010280060570010">WA</Node>
<Node Cat="verb" morphId="010280060172" Unicode="יְצַ֤ו" nodeId="010280060580030">Y:CA70W</Node>
<Node Cat="prep" morphId="010280060181" Unicode="עָלָי" nodeId="010280060610030">(FLFY</Node>
<Node Cat="pron" morphId="010280060182" Unicode="ו֙" nodeId="010280060640010">W03</Node>
<Node Cat="prep" morphId="010280060191" Unicode="לֵ" nodeId="010280060650010">L"</Node>
<Node Cat="verb" morphId="010280060192" Unicode="אמֹ֔ר" nodeId="010280060660030">)MO80R</Node>
<Node Cat="adv" morphId="010280060201" Unicode="לֹֽא־" nodeId="010280060690020">LO75)-</Node>
<Node Cat="verb" morphId="010280060211" Unicode="תִקַּ֥ח" nodeId="010280060710030">TIQ.A71X</Node>
<Node Cat="noun" morphId="010280060221" Unicode="אִשָּׁ֖ה" nodeId="010280060740030">)I$.F73H</Node>
<Node Cat="prep" morphId="010280060231" Unicode="מִ" nodeId="010280060770010">MI</Node>
<Node Cat="noun" morphId="010280060232" Unicode="בְּנ֥וֹת" nodeId="010280060780040">B.:NO71WT</Node>
<Node Cat="noun" morphId="010280060241" Unicode="׃כְּנָֽעַן" nodeId="010280060820040">K.:NF75(AN00</Node>
<Node Cat="cj" morphId="010280070011" Unicode="וַ" nodeId="010280070010010">WA</Node>
<Node Cat="verb" morphId="010280070012" Unicode="יִּשְׁמַ֣ע" nodeId="010280070020040">Y.I$:MA74(</Node>
<Node Cat="noun" morphId="010280070021" Unicode="יַעֲקֹ֔ב" nodeId="010280070060040">YA(:AQO80B</Node>
<Node Cat="prep" morphId="010280070031" Unicode="אֶל־" nodeId="010280070100020">)EL-</Node>
<Node Cat="noun" morphId="010280070041" Unicode="אָבִ֖י" nodeId="010280070120030">)FBI73Y</Node>
<Node Cat="pron" morphId="010280070042" Unicode="ו" nodeId="010280070150010">W</Node>
<Node Cat="cj" morphId="010280070051" Unicode="וְ" nodeId="010280070160010">W:</Node>
<Node Cat="prep" morphId="010280070052" Unicode="אֶל־" nodeId="010280070170020">)EL-</Node>
<Node Cat="noun" morphId="010280070061" Unicode="אִמּ" nodeId="010280070190020">)IM.</Node>
<Node Cat="pron" morphId="010280070062" Unicode="֑וֹ" nodeId="010280070210010">O92W</Node>
<Node Cat="cj" morphId="010280070071" Unicode="וַ" nodeId="010280070220010">WA</Node>
<Node Cat="verb" morphId="010280070072" Unicode="יֵּ֖לֶךְ" nodeId="010280070230030">Y."73LEK:</Node>
<Node Cat="cj" morphId="060190130011" Unicode="וּ" nodeId="060190130010010">W.</Node>
<Node Cat="prep" morphId="060190130012" Unicode="מִ" nodeId="060190130020010">MI</Node>
<Node Cat="adv" morphId="060190130013" Unicode="שָּׁ֤ם" nodeId="060190130030020">$.F70M</Node>
<Node Cat="verb" morphId="060190130021" Unicode="עָבַר֙" nodeId="060190130050030">(FBAR03</Node>
<Node Cat="adv" morphId="060190130031" Unicode="קֵ֣דְמ" nodeId="060190130080030">Q"74D:M</Node>
<Node Cat="x" morphId="060190130032" Unicode="ָה" nodeId="060190130110010">FH</Node>
<Node Cat="noun" morphId="060190130041" Unicode="מִזְרָ֔ח" nodeId="060190130120040">MIZ:RF80X</Node>
<Node Cat="x" morphId="060190130042" Unicode="ָה" nodeId="060190130160010">FH</Node>
<Node Cat="cj" morphId="060190130071" Unicode="וְ" nodeId="060190130300010">W:</Node>
<Node Cat="verb" morphId="060190130072" Unicode="יָצָ֛א" nodeId="060190130310030">YFCF91)</Node>
<Node Cat="noun" morphId="060190130081" Unicode="רִמּ֥וֹן" nodeId="060190130340040">RIM.O71WN</Node>
<Node Cat="art" morphId="060190130091" Unicode="הַ" nodeId="060190130380010">HA</Node>
<Node Cat="verb" morphId="060190130092" Unicode="מְּתֹאָ֖ר" nodeId="060190130390040">M.:TO)F73R</Node>
<Node Cat="art" morphId="060190130101" Unicode="הַ" nodeId="060190130430010">HA</Node>
<Node Cat="noun" morphId="060190130102" Unicode="׃נֵּעָֽה" nodeId="060190130440030">N."(F75H00</Node>

Punctuation is inconsistent between Hebrew and Greek. Should this be consistent?

We could either consistently use <pc> nodes, or we can consistently use @after. The latter option seems preferable, since punctuation characters are not part of the syntactic tree, but either way should be fine.

We need to modify the pipeline so that we add `@xml:id` instead of `@n`

Add USFM book-, verse-, and word-level identifiers

Word-level identifiers should use ! syntax for now, pending any updates on this issue.

Numeric attributes in non-leaf nodes

Numeric attributes in non-leaf nodes need to be recomputed now that we have rewritten the trees.

Ids for <c> elements

We need ids for elements in Hebrew. They need to be unique, i.e. not the same as an id for a morph.

Referring to phrases, clauses, and other things bigger than words

Sometimes we need annotations to refer to phrases or clauses. How can we create stable identifiers for structures larger than individual words?

There are missing `m/@xml:id`s in our current lowfat trees

I created a Jupyter notebook (here: 4d1ff81) to test the integrity of our word-level text content by comparing the nodes trees to the lowfat trees because I was running into the fact that there are different numbers of @xml:ids between the two trees.

The current issue seems to pertain to particles only, for example:

<Node Cat="P" Rule="ptcl2P" Head="0" nodeId="0103802100610011">
   <Node n="o010380210061"
         Cat="ptcl"
         morphId="010380210061"
         Unicode="אַיֵּ֧ה"
         nodeId="0103802100610010"
         StrongNumberX="0346"
         Greek="ποῦ"
         GreekStrong="4226">
      <m word="GEN 38:21!6"
         xml:id="o010380210061"
         lang="H"
         after=" "
         lemma="346"
         morph="Ti"
         pos="particle"
         type="interrogative"
         english="where"
         mandarin="哪里"
         Domain="003002004"
         SDBH="000321001001000">אַיֵּ֧ה</m>
   </Node>
</Node>

Remove 3 Type of Superfluous Intermediate Nodes

Nodes with Rule="Cj2Cjp" and Cat="cjp" conjunctions are always single words & so cjp is superfluous
Nodes with Rule="ObjMarker" and Cat="omp" object markers are always single words & so omp is superfluous
Nodes with Rule="P2PP" and Cat="pp" turning a preposition into a prepositional phrase before it takes its object is superfluous

Documentation for Semantic Domains, SDBH

We need to document semantic domains for our data. They appear in the @domain, @extends, and @sdbh attributes:

               <w ref="GEN 1:1!1"
                  xml:id="o010010010012"
                  mandarin="起初"
                  english="beginning"
                  domain="002003003004"
                  sdbh="006652001001000"
                  greek="ἀρξῇ"
                  strongnumberx="7225"
                  class="noun"
                  unicode="רֵאשִׁ֖ית"
                  morph="Ncfsa"
                  lang="H"
                  lemma="7225"
                  pos="noun"
                  gender="feminine"
                  number="singular"
                  state="absolute"
                  after=" ">רֵאשִׁ֖ית</w>

@domains are documented in this file:

https://github.com/Clear-Bible/macula-hebrew/blob/main/sources/MARBLE/SDBH/marble-domain-label-mapping.json

@sdbh contains the identifier for the lexicon entry itself in the Semantic Dictionary of Biblical Hebrew.

In writing this, I realize that I don't understand @extends as well as I thought. It is used, e.g. when the word for Spirit is based on a metaphor involving breath:

                  <w ref="GEN 1:2!9"
                     xml:id="o010010020092"
                     mandarin="灵"
                     english="Spirit"
                     domain="001001001"
                     extends="001006001"
                     sdbh="006717001001000 006717001005000"
                     greek="πνεῦμα"
                     strongnumberx="7307"
                     class="noun"
                     unicode="ר֣וּחַ"
                     morph="Ncbsc"
                     lang="H"
                     lemma="7307"
                     pos="noun"
                     gender="both"
                     number="singular"
                     state="construct"
                     after=" ">ר֣וּחַ</w>

I had thought it was used precisely when a domain was derived metaphorically from another, e.g. "Spirit" as an extension of the concept of "breath". I am not sure this is an adequate explanation - I'll need to ask Reinier for more information on this.

Extracting domain and senses from MARBLE OT

See Clear-Bible/macula-greek#21 for the same issue for Macula-Greek.

In Macula-Greek, we created two attributes:

domain="SubDomainNumber:DomainName:SubDomainName"
sdbg="Lemma:EntryCode:Glosses"

For the OT, we can't do the same:

There is no EntryCode or any other unique identifier that points to a lexicon
The domain data is problematic for multiple reasons.

Problems with domain data in MARBLE SDBH

The domains are extracted from SDBH-DOMAINS1.XML and SDBH-DOMAINS2.XML

There are two sources that contain the entire lexicon:

SDBH-01.XML through SDBH-23.XML, which contain up-to-date full entries (multiple languages).
SDBH-EXPORT-en.XML which contains all English data from the full lexicon.

Domains in the full lexicon

However, the first (most updated) source, have very minimal data on the domain. It only contains a category and a label for the lex domain and the core domain:

<LEXDomains>
    <LEXDomain>Parts: Vegetation</LEXDomain>
</LEXDomains>
<LEXCoreDomains>
    <LEXCoreDomain>Plant</LEXCoreDomain>
</LEXCoreDomains>

the (core)domain identifier is missing
the domain label does not correspond to the labels in the SDBH-DOMAINS.XML
the domain label consists of the label of the last subdomain (which is very generic and not unique, e.g., "Kinship")

Conclusion: there is no way to find out to which specific domain they actually belong

Domains in the English (exported) lexicon

The second source contains more information about the domain:

<LEXDomains>
    <LEXDomain>001003:Vegetation</LEXDomain>
</LEXDomains>
<LEXCoreDomains>
    <LEXCoreDomain>117:Plant</LEXCoreDomain>
</LEXCoreDomains>

Both domains have identifiers here. The core domain id always consists of 3 characters (no subdomains, hence 'core domain') while the other domain can consist of up to 15 characters.
Again, only the label of the last domain is included.

Additional Issues

There are a few more issues with the domain data:

The domain ids 001, 002, 003, and 004 occur twice in the domains file, so they are not unique. The second occurrence is very general and does not have subdomains. However, this should not be a problem:

the lexicon only refers to these 4 ids 8 times
and always to the first occurrence of "001", that is the core domain 001:Objects
However, looking up the domain using the identifier alone (without the domain label), will result in duplicates for these four ids.

Some domain data (id + label) is preceded by a category indication that breaks up the usual format of "{domain id}:{domain label}":

['Valuable > Objects>001:Objects',
 'Valuable > Objects>001:Objects',
 'Speed > Move>002002001009:Move',
 'Shake > Non-Exist>002002002006:Non-Exist',
 'Shake > Move>002002001009:Move',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Afraid>002001002001:Afraid',
 'Valuable > Objects>001:Objects',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Afraid>002001002001:Afraid',
 'Intensity > Respect>002001002025:Respect',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Shame>002001002026:Shame',
 'Shake > Musical Instruments>001004001013:Musical Instruments',
 'Mix > Engage>002003003009:Engage',
 'Valuable > Objects>001:Objects',
 'Shake > Afraid>002001002001:Afraid',
 'Shake > Move>002002001009:Move',
 'Shake > Relief>002001002024:Relief',
 'Shake > Shame>002001002026:Shame',
 'Valuable > Objects>001:Objects',
 'Valuable > Objects>001:Objects',
 'Shake > Ready>002001002022:Ready',
 'Shake > Distress>002001002007:Distress',
 'Body > People>001001002003:People']

Suggested Approach

We take the glosses, lemma, and domain from SDBH-EXPORT-en.XML
We remove the prefixed category indication mentioned in Additional issue 2: Valuable > Objects>001:Objects => 001:Objects
We extend the domain label with all parent domain labels, like this:

We also use the core domain and put it in the attribute like this:

<m domain="DomainID:DomainLabel:SubDomainLabel:..:SubDomainLabel:CoreDomainId:CoreDomainLabel" />

<m domain="001003:Objects:Vegetation:117:Plant"/m>

Lowfat is falling behind (missing attributes)

The Nodes representation contains attributes that aren't in Lowfat:

English
Mandarin
SDBH

Perhaps others.

<m word="GEN 1:1!1" xml:id="o010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" " pos="noun" type="common" gender="feminine" number="singular" state="absolute" english="beginning" mandarin="起初" SDBH="רֵאשִׁית:002003003004:Begin:beginning">רֵאשִׁית</m>

Consistency check: Textual differences

We have relied on the OSHB text without systematically comparing it to the text found at tanach.us.

We should do a comparison, categorize any differences we find, and discuss them. This should not be done as a one-off, we need tooling to do this, and we need tooling for both:

Providing input to an expert for expert opinion, and
Unit tests

This may include differences in choice of Unicode characters for a particular character, differences in cantillation, etc.

So far, we have relied on consonant-only comparisons, which are good enough to identify corresponding morphemes. We may not know exactly what to do with the things we learn by looking at every character until we see what the differences are and think about them systematically.

5. Repopulate Hebrew lowfat with the latest updates:

Repopulate Hebrew lowfat with the latest updates:

Pull in new domain attributes (LexDomain, ContextualDomain, CoreDomain) from node trees
Remove old attributes from lowat (LexicalDomain, Domain, Extends)
Pull in SIL data (gloss and transliteration) from node trees

This has a high priority. @ryderwishart, could you do this?

Originally posted by @klosoter in https://github.com/Clear-Bible/symphony-team/discussions/75#discussioncomment-3231622

Are glosses for compounds going missing?

This doesn't seem to be in the tree:

010040220061 תּ֣וּבַל קַ֔יִן Tubal Tubal-cain 土八该隐

Add OSIS refs to sentences

We currently have @ID on each Sentence. It would make a lot of sense to simply hard-code the OSIS ref for each verse directly into our data since we use OSIS refs elsewhere.

Additional Data to Port over from Original Trees

Participant referent data:
SubjRef only on verbs with implied subjects; format SubjRef="{010010310021}"
Ref only on nouns, pronouns, or adjectives usually; format Ref="{010010120082}"

LXX mapping:
GreekStrong="1722" Greek="e)poi/hsen"

Semantic Roles:
Frame="{A0:010010310021; A1:010010310041;}"

Word Sense data:
SenseNumber="2" (Sense Number="0" means that it is a function word that we didn't do word sense disambiguation on)

Glosses:
English and Chinese glosses in the full trees cannot be used--Mike has Andi's automatically calculated glosses for English and Chinese mapped for YTB & ClearSuite that we should be able to use

Object Complements:
There are actually two types of nodes that are currently labeled as O2 (second object). Some have the attribute Label="OC" in the original trees, meaning that it is object complement rather than strictly a second object. This Label="OC" data needs to be ported over & used to convert the relevant O2's into OCs.

Vocatives:
Attribute Vocative="True" (ignore all Vocative="False")

For comparison & checking purposes at some point later in the process:
Compare Strong Number and Strong NumberX work Clear did to OSHB's values (in vast majority of cases they would be identical & so there's no need to add these usually redundant values, but a comparison may show some places to double-check OSHB).

Add lemmas to Hebrew nodes trees

We have strongs numbers as @lemma, and we should be pulling in the actual lemmas.

Jonathan mentioned something about using strongs data from our OSHB data.

Add phonetic spelling. From which resource?

@jonathanrobie mentioned a while ago that we might also want to add phonetic spelling.

Does anyone know some good resources?

Consistency check: syntax and morphology

This is more of an epic than an individual issue.

When merging existing syntax trees with morphology, we have probably created instances where the morphological interpretation does not agree with the syntactic interpretation. We need to do a survey of instances where the OSHB morphological interpretation disagrees with the interpretation used to build the original Groves trees by comparing the two morphologies. Most of these differences will probably fall into categories that we can enumerate. This will include both differences in interpretation and clear bugs.

This task involves:

Comparing morphologies and creating a list of differences
As far as possible, automatically categorizing these differences into categories
Creating a representation of these differences that is easy for a Hebrew expert to assess efficiently and make decisions.

Trees:

<Node Cat="noun" morphId="010040220061" Unicode="תּ֣וּבַל קַ֔יִן" nodeId="0100402200610010" StrongNumberX="8423">
  <c english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
    <m word="GEN 4:22!6" xml:id="o010040220061" lang="H" after=" " lemma="8423+" morph="Np" pos="noun" type="proper">תּ֣וּבַל</m>
    <m word="GEN 4:22!7" xml:id="o010040220071" lang="H" after=" " lemma="8423" morph="Np" pos="noun" type="proper">קַ֔יִן</m>
  </c>
</Node>

Lowfat

<wg class="compound">
  <w ref="GEN 4:22!6" role="noun" xml:id="o010040220061" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423+" pos="noun" after=" ">תּ֣וּבַל</w>
  <w ref="GEN 4:22!7" role="noun" xml:id="o010040220071" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423" pos="noun" after=" ">קַ֔יִן</w>
</wg>

I think if we want to keep them, it makes sense to put them on wg/w like this:

<wg class="compound" english="Tubal-cain" mandarin="土八该隐" SDBH="תּוּבַל קַיִן:003001007:Names of People:Tubal-Cain">
  <w ref="GEN 4:22!6" role="noun" xml:id="o010040220061" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423+" pos="noun" after=" ">תּ֣וּבַל</w>
  <w ref="GEN 4:22!7" role="noun" xml:id="o010040220071" StrongNumberX="8423" Cat="noun" Unicode="תּ֣וּבַל קַ֔יִן" morph="Np" lang="H" lemma="8423" pos="noun" after=" ">קַ֔יִן</w>
</wg>