Giter Site home page Giter Site logo

Comments (3)

klosoter avatar klosoter commented on August 14, 2024 1

However, these mappings are not perfectly one-to-one. Generally, there are three cases:

1. More than one SILId for one maculaId (morph level, 2 cases)

Use

for $node in //morph[@SILId => contains(";")]
return $node/..

to find all words which have morphemes containing more than one SILId (this does not occur at the word level):

<word maculaText="בָּארוּמָ֑ה" maculaId="07009041003" SILText="בָּארוּמָ֑ה" SILId="07009041003" SILGlosses="at.(the).Arumah" SILTransliteration="bāʾrûmâ">
  <morph maculaText="בָּ" maculaId="070090410031" SILText="בָּ" SILId="070090410031" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="ארוּמָ֑ה" maculaId="070090410032" SILText="|ארוּמָ֑ה" SILId="070090410032;070090410033" SILMorphology="Pa|np" SILTransliteration="–|ʾrûmāh"/>
</word>
<word maculaText="הָרֹאֶ֖ה" maculaId="13002052007" SILText="הָרֹאֶ֖ה" SILId="13002052006" SILGlosses="Haroeh" SILTransliteration="hārōʾeh">
  <morph maculaText="הָרֹאֶ֖ה" maculaId="130020520071" SILText="הָ|רֹאֶ֖ה" SILId="130020520061;130020520062" SILMorphology="Pa|ncmsa" SILTransliteration="hā|rōʾeh"/>
</word>

2. More than one maculaId for one SILId (word level, 1003 cases)

Use

for $node in //word[@maculaId => contains(";")]
return $node

to find all words that have more than one maculaId for one SILId:

<word maculaText="עַל|כֵּן֙" maculaId="01002024001;01002024002" SILText="עַל־כֵּן֙" SILId="01002024001" SILGlosses="therefore" SILTransliteration="ʿal-kēn">
  <morph maculaText="עַל|כֵּן֙" maculaId="010020240011;010020240021" SILText="עַל־כֵּן֙" SILId="010020240011" SILMorphology="Pd" SILTransliteration="ʿal-kēn"/>
</word>
<word maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="01004022006;01004022007" SILText="תּ֣וּבַל קַ֔יִן" SILId="01004022006" SILGlosses="Tubal-Cain" SILTransliteration="tûḇal qayin">
  <morph maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="010040220061;010040220071" SILText="תּ֣וּבַל קַ֔יִן" SILId="010040220061" SILMorphology="np" SILTransliteration="tûḇal qayin"/>
</word>

3. More than one maculaId for one SILId (morph level, 48,453 cases)

Use

for $node in //morph[@maculaId => contains(";")]
return $node/..

to find all morphs that have more than one maculaId for one SILId:

<word maculaText="לְמִינ֔וֹ" maculaId="01001011013" SILText="לְמִינ֔וֹ" SILId="01001011013" SILGlosses="to.its.kind" SILTransliteration="ləmînô">
  <morph maculaText="לְ" maculaId="010010110131" SILText="לְ" SILId="010010110131" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="מִינ֔|וֹ" maculaId="010010110132;010010110133" SILText="מִינ֔וֹ" SILId="010010110132" SILMorphology="ncmscX3ms" SILTransliteration="mînô"/>
</word>
<word maculaText="זַרְעוֹ" maculaId="01001011015" SILText="זַרְעוֹ־" SILId="01001011015" SILGlosses="its.seed" SILTransliteration="zarʿô-">
  <morph maculaText="זַרְע|וֹ" maculaId="010010110151;010010110152" SILText="זַרְעוֹ־" SILId="010010110151" SILMorphology="ncmscX3ms" SILTransliteration="zarʿô-"/>
</word>

These are mostly cases where a suffix is involved (SIL does not split suffixes).
Use

for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
return $node/..

to filter these cases.

The remaining 1041 cases are mostly compounds. Use

for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
where not(contains($node/../@maculaId, ";"))
return $node/..

to filter these cases too.
The remaining 63 cases involve mostly (implied articles):

<word maculaText="כָּעֵ֣ת" maculaId="01018010005" SILText="כָּעֵ֣ת" SILId="01018010005" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
  <morph maculaText="כָּ" maculaId="010180100051" SILText="ךָּ" SILId="010180100051" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="|עֵ֣ת" maculaId="010180100051ה;010180100052" SILText="עֵ֣ת" SILId="010180100052" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
<word maculaText="כָּעֵ֥ת" maculaId="01018014007" SILText="כָּעֵ֥ת" SILId="01018014007" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
  <morph maculaText="כָּ" maculaId="010180140071" SILText="ךָּ" SILId="010180140071" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="|עֵ֥ת" maculaId="010180140071ה;010180140072" SILText="עֵ֥ת" SILId="010180140072" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>

from macula-hebrew.

klosoter avatar klosoter commented on August 14, 2024

A mapping file can be found here

The mapping is done on both the word and morph level since SIL has some useful attributes that are only present on either of them (e.g., contextual glosses egs and morphology coding t)

<maculaSilMapping>
  <word maculaText="בְּרֵאשִׁ֖ית" maculaId="01001001001" SILText="בְּרֵאשִׁ֖ית" SILId="01001001001" SILGlosses="in.beginning" SILTransliteration="bərēʾšîṯ">
    <morph maculaText="בְּ" maculaId="010010010011" SILText="בְּ" SILId="010010010011" SILMorphology="Pp" SILTransliteration=""/>
    <morph maculaText="רֵאשִׁ֖ית" maculaId="010010010012" SILText="רֵאשִׁ֖ית" SILId="010010010012" SILMorphology="ncfsa" SILTransliteration="rēʾšiyṯ"/>
  </word>
  <word maculaText="בָּרָ֣א" maculaId="01001001002" SILText="בָּרָ֣א" SILId="01001001002" SILGlosses="he.created" SILTransliteration="bārāʾ">
    <morph maculaText="בָּרָ֣א" maculaId="010010010021" SILText="בָּרָ֣א" SILId="010010010021" SILMorphology="vqp3ms" SILTransliteration="bārāʾ"/>
  </word>
  <word maculaText="אֱלֹהִ֑ים" maculaId="01001001003" SILText="אֱלֹהִ֑ים" SILId="01001001003" SILGlosses="God" SILTransliteration="ʾĕlōhîm">
    <morph maculaText="אֱלֹהִ֑ים" maculaId="010010010031" SILText="אֱלֹהִ֑ים" SILId="010010010031" SILMorphology="ncmpa" SILTransliteration="ʾĕlōhiym"/>
  </word>
  <word maculaText="אֵ֥ת" maculaId="01001001004" SILText="אֵ֥ת" SILId="01001001004" SILGlosses="(et)" SILTransliteration="ʾēṯ">
    <morph maculaText="אֵ֥ת" maculaId="010010010041" SILText="אֵ֥ת" SILId="010010010041" SILMorphology="Po" SILTransliteration="ʾēṯ"/>
  </word>

from macula-hebrew.

klosoter avatar klosoter commented on August 14, 2024

To be done:

  • split SIL morph data (e.g., transliteration) into suffixes aligning with Macula
  • split egs attributes, contextual glosses, at the word level into smaller glosses corresponding to morphemes

from macula-hebrew.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.