Giter Site home page Giter Site logo

charlesloder / havarotjs Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 6.0 2.49 MB

A Typescript package for getting syllabic data about Hebrew text with niqqud.

Home Page: https://www.npmjs.com/package/havarotjs

License: MIT License

JavaScript 2.76% TypeScript 97.16% Shell 0.07%

havarotjs's People

Contributors

charlesloder avatar dependabot[bot] avatar ighmaz avatar m-yac avatar ryuusama09 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

havarotjs's Issues

Update npm homepage info

havarotjs/package.json

Lines 21 to 28 in 6ca463a

"repository": {
"type": "git",
"url": "https://github.com/charlesLoder/havarot.git"
},
"bugs": {
"url": "https://github.com/charlesLoder/havarot/issues"
},
"homepage": "https://github.com/charlesLoder/havarot",

Holem waw with final aleph

The fix in #17 caused an error where a word with a final aleph - ס֣וֹא would lose the aleph. This is a non-standard Hebrew spelling

Add `vowelName` property to `Cluster`

Similar to #74, a property called vowelName should exist that returns unicode character name.

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.vowelName)
// ["SHEVA", "SEGOL", "SEGOL", null]

Things to consider:

  • is SHEVA a "vowel"? See especially hasShewa property
  • how should names be formatted? Leaning towards replacing spaces with underscores

Improved Metheg/Siluq Distinction

The Cluster.hasMetheg property needs to better determine between the use of U+05BD as a metheg or as a siluq.

  1. check if the Cluster even has a metheg. If no, return false
  2. If yes, loop over the text of Clusters via this.next.
    a. check if a sof pasuq is present. If yes, then the metheg is really a siluq. Return false
    b. check if another Cluster has a metheg. If yes, than the second metheg is the siluq, and the current one is a metheg. Return true
    c. if no sof pasuq or additional metheg is found, then return true

This logic will have to be tweaked a bit

Single shureq failing

A single shureq וּ fails with the error:

TypeError: Cannot read properties of undefined (reading 'hasTaamim')
    at /Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:297:70
    at Array.filter (<anonymous>)
    at setIsAccented (/Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:297:42)
    at /Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:347:37
    at Array.forEach (<anonymous>)
    at syllabify (/Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:347:15)
    at get syllables [as syllables] (/Users/charlesloder/Documents/code/personal/havarot/dist/word.js:67:44)
    at /Users/charlesloder/Documents/code/personal/havarot/dist/text.js:146:46
    at Array.map (<anonymous>)
    at get syllables [as syllables] (/Users/charlesloder/Documents/code/personal/havarot/dist/text.js:146:27)

The error is caused by

https://github.com/charlesLoder/havarot/blob/a824a06690b2b823f37c555aa734088ce27904e7/src/utils/syllabifier.ts#L39-L50

Need to check if arr[i] exists

Vav Holam Shifts

Hello Charles,

Thank you so much for your continued work on this fantastic library. I'm experiencing an issue with a Vav Holam shifting to the previous letter after initializing a Text object. Any suggestions on what may be going wrong? I'm using version 0.7.2 with Node for reference.

Word passed to Text() = א֑וֹר
Syllable returned = אֹ֑ור

Thank you!!

Divine name dropping latin char after

See here

The Latin character after the Divine Name is dropped

Hebrew

כִּי אִם בְּתוֹרַת יְהוָה, חֶפְצוֹ; וּבְתוֹרָתוֹ יֶהְגֶּה, יוֹמָם וָלָיְלָה

Transliteration

kî ʾim bǝtôrat yhwh ḥepṣô; ûbǝtôrātô yehgê, yômām wālāyǝlâ

Paseq Should Not Be Included

The paseq is included when words as split (e.g. "עֵֽדֹתֶ֨יךָ ׀ "). Because the paseq is a word divider, it should not be included. Additionally, it messes with adding accents to syllables, causing in the example above the the final kaf to be marked as accented.

The paseq should be counted as its own word with a single syllable similar to how non-Hebrew words are handled.

Syllable should include more linguistic data

Though the Syllable has useful properties, it should have linguistic properties of syllables as well. Suggestions:

Syllable.onset: string | null
Syllable.nucleus: string
Syllable.coda: string | null

Syllable.onset

The overwhelming majority of Hebrew syllables have an onset. Though the aleph or ayin may not be considered an onset in Modern Hebrew, they were in Biblical, and orthographically function like an onset.

The only syllable that won't have onset is a word-initial shureq (e.g. וּמֶלֶךְ [u. 'mε. lεk])

In Biblical Hebrew, there are no medial consonants in the onset; that is, there are no consonant clusters (i.e. CCV or CCVC types). The only exception is for the numeral שְׁתַּיִם and its various forms.

Syllable.nucleus

Every syllable must have a nucleus (i.e. vowel). A vocal shewa is a nucleus

Syllable.coda

A coda is optional. A final qamet-he or qamets-aleph would not count as a coda; these would be of the syllable type CV, but a he with a mappiq would be a coda—it would be a syllable type of CVC.

Aleph-Shureq Failing

In the word יִירָא֥וּךָ the aleph was being was being parsed as a quiesced aleph (i.e. the syllable as רָא֥) instead of as a consonant (i.e. as א֥וּ)

Shewa and Shureq Not Syllabifying Correctly

Words in the form of CǝCûC are being syllabified as 1 syllable, when they should be two syllables.

Words in the form of CǝCû (w/o the final consonant) are correct 2 syllables, and so are words of the form CǝCVC.

There is likely a problem in the groupFinal logic.

Sanitize Holem-Waw Orthography

There are two ways to write a holem-waw:

Pattern Word
(1) consonant + holem + waw שָׁלֹום
(2) consonant + waw + holem שָׁלוֹם

Additionally, instead of a holem (U+05B9), a holem haser for vav (U+05BA) can also be used for typographic reasons, meaning there are four possible patterns for encoding a holem-vav.

Pattern (2) is preferred because:

  • it is semantically correct—the vowel belongs to the consonant, the waw is simply an orthographic marker
  • it reduces confusion for when a waw is being used as a consonant with a holem as its vowel (e.g. עָוֹן)

Because the holem haser for vav (U+05BA) is primarily used for typographic reasons, it will be best to convert all occurrences of U+05BA to U+05B9.

In order to semantically encode a holem-waw, all occurrences in each word of:

a waw preceding a holem, but no vowel preceding the waw will be swapped so that the holem precedes the waw.

Examples:

  • For שָׁלוֹם, since the waw precedes the holem, but not vowel precedes the waw, the holem and waw would be switched so that it becomes שָׁלֹום.
  • For עָוֹן, since a vowel precedes the waw, they would not be switched

Because taamei can occur before a waw but do not need to occur before a waw, the taamei will be removed, the characters swapped, and then the strings rebuilt like the qametsQatan sanitation.

Mater text is reversed

On v0.1.2, example:

const str = "מַשִׁיחַ";
const doc = new Text(str);
const res = doc.syllables.map((el) => el.text);
[
  "מַ", // \u{5DE}\u{5B7} (mem, patach) 
  "ישִׁ",// \u{5D9}\u{5E9}\u{5C1}\u{5B4} (yod, shin, shin-dot, hiriq)
  "חַ" // \u{5D7}\u{5B7} (chet, patach)
]

The yod should be after the shin cluster.

Something is wrong in the new syllabifier.ts logic. Need to add tests.

Maybe `isVocalSheva` property on `Cluster` only

On the Cluster object, add a property called isVocalSheva, that return a boolean indicating if the shewa is vocal or not.

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.isVocalSheva)
// [true, false, false, false]

Things to consider:

  • Perhaps null if there is not shewa?
  • vocal shewa or shewa na?

Seghol Yod not counted as Mater

The combination of Seghol-Yod (e.g. יךָ◌ֶ) is not counted as a mater.

This is used as a mater in Biblical Hebrew commonly.

Add more forms of כל

There are a lot of variation on the form כל (e.g. וְכָל). Most issues are going to occur in non-Biblical texts where the use of the maqqef is less common.

I will need to identify these — hopefully systematically.

Error thrown with Divine Name

The Divine Name יְהוָה causes causes the Error A Syllable shouldn't preceded a Cluster with a Mater.

This wasn't anticipated as the Divine Name does not follow typical rules.

The name can be written two ways:

  1. as יְהֹוָה with a holem, which produces no error
  2. as יְהוָה w/o a holem, which produces an error but is more typical.

Perhaps add create property Word.isDivineName?

Shureq on Alef and preceded by Syllable with Shewa throws error

This text:

וְאֵ֗לֶּה שְׁמוֹת֙ בְּנֵ֣י יִשְׂרָאֵ֔ל הַבָּאִ֖ים מִצְרָ֑יְמָה אֵ֣ת יַעֲקֹ֔ב אִ֥ישׁ וּבֵית֖וֹ בָּֽאוּ׃ ברְאוּבֵ֣ן שִׁמְע֔וֹן לֵוִ֖י וִיהוּדָֽה׃ גיִשָּׂשכָ֥ר זְבוּלֻ֖ן וּבִנְיָמִֽן׃ דדָּ֥ן וְנַפְתָּלִ֖י גָּ֥ד וְאָשֵֽׁר׃ הוַֽיְהִ֗י כׇּל־נֶ֛פֶשׁ יֹצְאֵ֥י יֶֽרֶךְ־יַעֲקֹ֖ב שִׁבְעִ֣ים נָ֑פֶשׁ וְיוֹסֵ֖ף הָיָ֥ה בְמִצְרָֽיִם׃ ווַיָּ֤מׇת יוֹסֵף֙ וְכׇל־אֶחָ֔יו וְכֹ֖ל הַדּ֥וֹר הַהֽוּא׃ זוּבְנֵ֣י יִשְׂרָאֵ֗ל פָּר֧וּ וַֽיִּשְׁרְצ֛וּ וַיִּרְבּ֥וּ וַיַּֽעַצְמ֖וּ בִּמְאֹ֣ד מְאֹ֑ד וַתִּמָּלֵ֥א הָאָ֖רֶץ אֹתָֽם׃ {פ} חוַיָּ֥קׇם מֶֽלֶךְ־חָדָ֖שׁ עַל־מִצְרָ֑יִם אֲשֶׁ֥ר לֹֽא־יָדַ֖ע אֶת־יוֹסֵֽף׃ טוַיֹּ֖אמֶר אֶל־עַמּ֑וֹ הִנֵּ֗ה עַ֚ם בְּנֵ֣י יִשְׂרָאֵ֔ל רַ֥ב וְעָצ֖וּם מִמֶּֽנּוּ׃ יהָ֥בָה נִֽתְחַכְּמָ֖ה ל֑וֹ פֶּן־יִרְבֶּ֗ה וְהָיָ֞ה כִּֽי־תִקְרֶ֤אנָה מִלְחָמָה֙ וְנוֹסַ֤ף גַּם־הוּא֙ עַל־שֹׂ֣נְאֵ֔ינוּ וְנִלְחַם־בָּ֖נוּ וְעָלָ֥ה מִן־הָאָֽרֶץ׃

throws the error:

Error: Syllable should not precede a Cluster with a Mater

Figure out why

Changelog guard

I keep forgetting to update the changelog. Create some guard to ensure that it's udpated. Maybe even a simple Y/n on the command line

Loss of Dagesh Chazaq after Article and Interrogative

Acc. to GKC §20m there are instances when after the article and the interrogative מה that the dagesh chazaq (or forte) is omitted:

Very frequently in certain consonants with Šewâ mobile, since the absence of a strong vowel causes the strengthening to be less noticeable. This occurs principally in the case of ו and י (on יְ and יֵּ after the article, see § 35 b; on יְּ after מַה־, § 37 b); and in the sonants מ‍,[6] נ‍ and ל; also in the sibilants, especially when a guttural follows (but note Is 629, מְאַסְפָיו, as ed. Mant. and Ginsb. correctly read, while Baer has מְאָֽסְ׳ with compensatory lengthening, and others even מְאָסְ׳; מִשְׁמַנֵּי Gn 2728, 39; מִשְׁלשׁ 38:24 for מִשְּׁ׳, הַֽשְׁלַבִּים 1 K 728; אֶֽשְֽׁקָה־ 1 K 1920 from נָשַׁק, הַֽשְׁפַתַּ֫יִם Ez 4043 and לַֽשְׁפַנִּים ψ 10418; מִשְׁתֵּים Jon 411, הַֽצְפַרְדְּעִים Ex 81 &c.);—and finally in the emphatic ק.[7]

Of the Begadkephath letters, ב occurs without Dageš in מִבְצִיר Ju 82; ג in מִגְבֽוּרָתָם Ez 3230; ד in נִדְחֵי Is 1112 56:8, ψ 1472 (not in Jer 4936), supposing that it is the Participle Niphʿal of נָדַח; lastly, ת in תִּתְצוּ Is 2210. Examples, עִוְרִים, וַיְהִי (so always the preformative יְ in the imperf. of verbs), מִלְמַ֫עְלָה, לַֽמְנַצֵּחַ, הִנְנִי, הַֽלֲלוּ, מִלְאוּ, כִּסְאִי, יִשְׂאוּ, יִקְחוּ, מַקְלוֹת, מִקְצֵה, &c. In correct MSS. the omission of the Dageš is indicated by the Rāphè stroke (§ 14) over the consonant. However, in these cases, we must assume at least a virtual strengthening of the consonant (Dageš forte implicitum, see § 22 c, end).

The second paragraph is likely beyond the scope of this package.

The first paragraph has three categories for when a dagesh chazaq may be lost, but the shewa should still be counted as a shewa naʿ (or shewa mobile/vocal):

  1. in the case of ו and י (on יְ and יֵּ after the article, see § 35 b; on יְּ after מַה־, § 37 b)
  2. in the sonants מ‍, נ‍ and ל
  3. in the sibilants, especially when a guttural follows

The Article

Walkte & O'Connor §13.3d give a simplified explanation:

image

According to this, the shewa is a shewa nach not a shewa na' seemingly contra GKC.

The Interrogative

GKC's references are ambiguous


see charlesLoder/hebrew-transliteration#14

...wip


Goal

In the forms with a metheg there is nothing to check. For the others, something like:

  • if cluster.hasShewa and /י/,test(cluster.text) and /הַ/.test(cluster.prev.text)

should syllabify as:

["מִן־", "הַ", "יְ", "אֹ֗ר"]

That would limit it only to the article, but it would be a start.

Add premade syllabification schemas

The way it checks for a schema and then sets options according to that isn't intuitive.

Instead, create premade syllabification schemas that can just be imported

Add `hasVowel` property to `Cluster`

In the same vein as #74 and #75, create a property (maybe method is better term here) called hasVowel that takes a vowel name and returns a boolean

new Text("בְּאֶ֣רֶץ").cluster.map(c => c.hasVowel("SEGOL"));
// [false, true, true, false]

Add q.q. check

Add a check for q.q.

const qametsReg = /\u{05B8}/u;
const hatefQamRef = /\u{05B3}/u;
// if no qamets, return
if (!qametsReg.test(word)) {
return word;
}

 const qametsReg = /\u{05B8}/u; 
 const qametsQatReg = /\u{05C7}/u; 
 const hatefQamRef = /\u{05B3}/u; 
  
 // if no qamets or has qamets qatan char, return 
 if (!qametsReg.test(word) || qametsQatReg.test(word)) { 
   return word; 
 } 

Letters that Reject Dagesh Chazaq, but have Shewa Na'

Certain letters—שׁ, שׂ, ס, צ, נ, מ, ל, ו, י—when they have a shewa na' (i.e. vocal shewa) reject a dagesh chazaq (i.e. forte).

E.g. וַיְּהִי* becomes וַיְהִי

Should be syllabified as: ["וַ", "יְ", "הִי"], but instead get ["וַיְ", "הִי"].

Some may consider the first syllable (i.e. "וַ") as closed, but it will be considered open.

Option for Modern Hebrew Syllabification

Currently, havarot syllabifies words according to Traditional (i.e. Sephardic) or Tiberian rules.
The ability to syllabify word according to general Modern Hebrew pronunciation would be beneficial, especially for augmenting with transliteration schemas that follow Modern Hebrew

Differences

Syllable Properties

Syllable.medial

In issue #2, it is proposed to introduce more linguistic properties to syllables.
Modern Hebrew differs in it's syllable properties

A medial property would need to be included:

Syllable.medial: string | null

Modern Hebrew allows for syllable types of CCV and CCVC.

E.g. גְּדֹולִים is realized as [gdo. 'lim]

Syllable.onset

For syllables beginning with א, ע, or ה, the onset can be realized as null.
Though, orthographically, they do function like an onset.

Realization of Shewa

In Biblical Hebrew reading traditions, the shewa is often vocalic, but in Modern Hebrew it is often realized as a zero-vowel [Ø] (Coffin and Bolozky, A Reference Grammar of Modern Hebrew, 22), creating syllables of CCV or CCVC types (see above)

The most common times that a word-initial (maybe syllable-initial) shewa is realized as vocalic is when (1) it's onset is a י, ל, מ, נ, or ר, or (2) when the second letter is א, ה, or ע.

Example of (1):

  • גְּדֹולִים is [gdo. 'lim]
  • לְבָנִים is [lǝ. va. 'nim]

Example of (2):

  • תְּשׁוּקָה is [tʃu. ˈka], but
  • תְּאוּנָה is [tǝ. u. ˈna]

A shewa preceded by a shewa is typically vocal as well, just like TIberian, but not necessarily so

Allow incorrect syllabification option

Maybe an option like strict and when false, allows for incorrect text.

Basically,

if (nxt instanceof Syllable) {
throw new Error("Syllable should not precede a Cluster with a Mater");
}

and

if (nxt instanceof Syllable) {
throw new Error("Syllable should not precede a Cluster with a Mater");
}

would need to be bypassed, and error that occur from Cannot read properties of undefined (reading 'has<something>')

Remove Reduce

Remove reduce() (slow), and clean up

const sequenceSnippets = (arr: string[]) => {
return arr.map((snippet) => {
const text = snippet.normalize("NFKD");
const sequencedChar = sequence(text).flat();
return sequencedChar.reduce((a, c) => a + c.text, "");
});
};

To something like

const sequenceSnippets = (arr: string[]) => { 
   return arr.map((snippet) => (sequence(snippet.normalize("NFKD")).flat().join(""));
 }; 

Fix docs failing

The doc CI job always fails because typedoc was updated. Either downgrade typedoc or find a better pages plugin

Add `vowel` property to `Cluster`

On the Cluster object, add a property that return the unicode character.

Something like:

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.vowel)

The first three should return the vowel characters of SHEVA, SEGOL, and SEGOL, and the final should return null.

Various spellings of Jerusalem

The various spellings of 'Jerusalem' do not sequence correctly.

Uncommon

The most uncommon spelling — יְרוּשָׁלַיִם like וִירוּשָׁלַ֨יִם֙ in Jer 26:18 — syllabifies fine ✅

Common

The common spelling of יְרוּשָׁלִַ֗ם like in Josh 10:1 does syllabify correctly, but switches the hiriq and the patach in the final syllable 👎

With a metheg/sof pasuq

See יְרוּשָׁלִָֽם in 2 Sam 14:23; the same issue as above 👎


The issues resides in how the Cluster sequences the Chars.

Pipe character causing errors

The pipe character (e.g. אֲשֶׁר | אָֽנֹכִי) causes the error Cannot read properties of undefined (reading 'hasVowel').

Some texts use a pipe character instead of a paseq.

The pipe characters are separated into their own words, and when they are syllabified, all the Latin chars are removed and an empty array is used when trying to group clusters

https://github.com/charlesLoder/havarot/blob/1dd198029947386b03dd1433d8fadeece0bfd57b/src/utils/syllabifier.ts#L379

In order to fix this, add a check to see if the Word is Hebrew or not. If not, just make a syllable like is done with the Divine Name

Incorrect Holem waw

When there is a word with a "waw with a holem" and a "holem waw" in the same word, the "waw with a holem" is incorrectly replaces with a "holem waw"

E.g. עֲוֹנוֹתֵינוּ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.