Comments (3)
However, these mappings are not perfectly one-to-one. Generally, there are three cases:
1. More than one SILId
for one maculaId
(morph level, 2 cases)
Use
for $node in //morph[@SILId => contains(";")]
return $node/..
to find all words which have morphemes containing more than one SILId
(this does not occur at the word level):
<word maculaText="בָּארוּמָ֑ה" maculaId="07009041003" SILText="בָּארוּמָ֑ה" SILId="07009041003" SILGlosses="at.(the).Arumah" SILTransliteration="bāʾrûmâ">
<morph maculaText="בָּ" maculaId="070090410031" SILText="בָּ" SILId="070090410031" SILMorphology="Pp" SILTransliteration="bā"/>
<morph maculaText="ארוּמָ֑ה" maculaId="070090410032" SILText="|ארוּמָ֑ה" SILId="070090410032;070090410033" SILMorphology="Pa|np" SILTransliteration="–|ʾrûmāh"/>
</word>
<word maculaText="הָרֹאֶ֖ה" maculaId="13002052007" SILText="הָרֹאֶ֖ה" SILId="13002052006" SILGlosses="Haroeh" SILTransliteration="hārōʾeh">
<morph maculaText="הָרֹאֶ֖ה" maculaId="130020520071" SILText="הָ|רֹאֶ֖ה" SILId="130020520061;130020520062" SILMorphology="Pa|ncmsa" SILTransliteration="hā|rōʾeh"/>
</word>
2. More than one maculaId
for one SILId
(word level, 1003 cases)
Use
for $node in //word[@maculaId => contains(";")]
return $node
to find all words that have more than one maculaId
for one SILId
:
<word maculaText="עַל|כֵּן֙" maculaId="01002024001;01002024002" SILText="עַל־כֵּן֙" SILId="01002024001" SILGlosses="therefore" SILTransliteration="ʿal-kēn">
<morph maculaText="עַל|כֵּן֙" maculaId="010020240011;010020240021" SILText="עַל־כֵּן֙" SILId="010020240011" SILMorphology="Pd" SILTransliteration="ʿal-kēn"/>
</word>
<word maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="01004022006;01004022007" SILText="תּ֣וּבַל קַ֔יִן" SILId="01004022006" SILGlosses="Tubal-Cain" SILTransliteration="tûḇal qayin">
<morph maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="010040220061;010040220071" SILText="תּ֣וּבַל קַ֔יִן" SILId="010040220061" SILMorphology="np" SILTransliteration="tûḇal qayin"/>
</word>
3. More than one maculaId
for one SILId
(morph level, 48,453 cases)
Use
for $node in //morph[@maculaId => contains(";")]
return $node/..
to find all morphs that have more than one maculaId
for one SILId
:
<word maculaText="לְמִינ֔וֹ" maculaId="01001011013" SILText="לְמִינ֔וֹ" SILId="01001011013" SILGlosses="to.its.kind" SILTransliteration="ləmînô">
<morph maculaText="לְ" maculaId="010010110131" SILText="לְ" SILId="010010110131" SILMorphology="Pp" SILTransliteration="lə"/>
<morph maculaText="מִינ֔|וֹ" maculaId="010010110132;010010110133" SILText="מִינ֔וֹ" SILId="010010110132" SILMorphology="ncmscX3ms" SILTransliteration="mînô"/>
</word>
<word maculaText="זַרְעוֹ" maculaId="01001011015" SILText="זַרְעוֹ־" SILId="01001011015" SILGlosses="its.seed" SILTransliteration="zarʿô-">
<morph maculaText="זַרְע|וֹ" maculaId="010010110151;010010110152" SILText="זַרְעוֹ־" SILId="010010110151" SILMorphology="ncmscX3ms" SILTransliteration="zarʿô-"/>
</word>
These are mostly cases where a suffix is involved (SIL does not split suffixes).
Use
for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
return $node/..
to filter these cases.
The remaining 1041 cases are mostly compounds. Use
for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
where not(contains($node/../@maculaId, ";"))
return $node/..
to filter these cases too.
The remaining 63 cases involve mostly (implied articles):
<word maculaText="כָּעֵ֣ת" maculaId="01018010005" SILText="כָּעֵ֣ת" SILId="01018010005" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
<morph maculaText="כָּ" maculaId="010180100051" SILText="ךָּ" SILId="010180100051" SILMorphology="Pp" SILTransliteration="kā"/>
<morph maculaText="|עֵ֣ת" maculaId="010180100051ה;010180100052" SILText="עֵ֣ת" SILId="010180100052" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
<word maculaText="כָּעֵ֥ת" maculaId="01018014007" SILText="כָּעֵ֥ת" SILId="01018014007" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
<morph maculaText="כָּ" maculaId="010180140071" SILText="ךָּ" SILId="010180140071" SILMorphology="Pp" SILTransliteration="kā"/>
<morph maculaText="|עֵ֥ת" maculaId="010180140071ה;010180140072" SILText="עֵ֥ת" SILId="010180140072" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
from macula-hebrew.
A mapping file can be found here
The mapping is done on both the word and morph level since SIL has some useful attributes that are only present on either of them (e.g., contextual glosses egs
and morphology coding t
)
<maculaSilMapping>
<word maculaText="בְּרֵאשִׁ֖ית" maculaId="01001001001" SILText="בְּרֵאשִׁ֖ית" SILId="01001001001" SILGlosses="in.beginning" SILTransliteration="bərēʾšîṯ">
<morph maculaText="בְּ" maculaId="010010010011" SILText="בְּ" SILId="010010010011" SILMorphology="Pp" SILTransliteration="bə"/>
<morph maculaText="רֵאשִׁ֖ית" maculaId="010010010012" SILText="רֵאשִׁ֖ית" SILId="010010010012" SILMorphology="ncfsa" SILTransliteration="rēʾšiyṯ"/>
</word>
<word maculaText="בָּרָ֣א" maculaId="01001001002" SILText="בָּרָ֣א" SILId="01001001002" SILGlosses="he.created" SILTransliteration="bārāʾ">
<morph maculaText="בָּרָ֣א" maculaId="010010010021" SILText="בָּרָ֣א" SILId="010010010021" SILMorphology="vqp3ms" SILTransliteration="bārāʾ"/>
</word>
<word maculaText="אֱלֹהִ֑ים" maculaId="01001001003" SILText="אֱלֹהִ֑ים" SILId="01001001003" SILGlosses="God" SILTransliteration="ʾĕlōhîm">
<morph maculaText="אֱלֹהִ֑ים" maculaId="010010010031" SILText="אֱלֹהִ֑ים" SILId="010010010031" SILMorphology="ncmpa" SILTransliteration="ʾĕlōhiym"/>
</word>
<word maculaText="אֵ֥ת" maculaId="01001001004" SILText="אֵ֥ת" SILId="01001001004" SILGlosses="(et)" SILTransliteration="ʾēṯ">
<morph maculaText="אֵ֥ת" maculaId="010010010041" SILText="אֵ֥ת" SILId="010010010041" SILMorphology="Po" SILTransliteration="ʾēṯ"/>
</word>
from macula-hebrew.
To be done:
- split SIL morph data (e.g., transliteration) into suffixes aligning with Macula
- split
egs
attributes, contextual glosses, at the word level into smaller glosses corresponding to morphemes
from macula-hebrew.
Related Issues (20)
- Add lemmas to Hebrew nodes trees HOT 4
- There are missing `m/@xml:id`s in our current lowfat trees HOT 1
- Marble Domains (`Domain`, `ContextualDomain`, `CoreDomain`) HOT 6
- 5. Repopulate Hebrew lowfat with the latest updates:
- transcription and gloss attributes from SIL are still missing, at least from Genesis 1.
- Problems in `morpheme-mappings.xml` HOT 1
- Word Sense (from macula-greek) HOT 1
- Greek beta-to-unicode in Genesis 1:1 HOT 1
- Incorrect closing </w> tag
- Implicit article stealing attributes from following sibling
- Split node at GEN 50:10!4
- Replace `c` node with merged `m` in PSA 102:4
- After in Gen 1:12 HOT 2
- Incorrect mapping to lowfat HOT 1
- _ki_ missing in Lev 5:21. HOT 2
- Low-fat word parts missing HOT 5
- Lowfat 'c' fields have no glosses HOT 1
- include Ketiv into Macula-Hebrew ? HOT 2
- Misnumbered nodes in 1 Chronicles 20 HOT 1
- Macula Contextual Domains
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from macula-hebrew.