Giter Site home page Giter Site logo

mattlianje / loquax Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 6.05 MB

NLP framework for phonology

License: GNU General Public License v3.0

Python 87.10% JavaScript 1.31% HTML 5.91% Shell 5.68%
digital-humanities history linguistics nlp nlp-library nlp-parsing phonological-features phonology functional-programming

loquax's Introduction

loquax

A Classical Phonology framework

Code style: black codecov Build status

Loquax, (Latin for "chatty"), is an extensible, zero-dependency, FP-style Python library for phonological analysis.

loquax The loquax web client.

Features

Languages

Language/Dialect IPA Syllabification Scansion
Latin/Classical
Greek/Classical X X X

Quickstart

pip install loquax
from loquax import Document
from loquax.languages import Latin

catilinarian_orations = Document("Quoūsque tandem abutēre, Catilīna, patientiā nostrā?", Latin)
print(catilinarian_orations.to_string(ipa=True, scansion=True))

# outputs:
# kʷɔ.uːs.kʷɛ    tan.dɛm    a.bʊ.teː.rɛ    ka.tɪ.liː.na    pa.tɪ.ɛn.tɪ.aː    nɔs.traː
#  u   -   u      -   u     u u   -  u     u  u   -  u     u  u  u  u  -      u   -

Syllabification, Tokenization

print(catilinarian_orations.tokens)

# outputs:
# [kʷɔ.uːs.kʷɛ, tan.dɛm, a.bʊ.teː.rɛ, ka.tɪ.liː.na, pa.tɪ.ɛn.tɪ.aː, nɔs.traː]

print(catilinarian_orations.tokens[0].syllables)

# outputs:
# [quo, ūs, que]

Phoneme Analysis

Understand unique sounds and their roles within words relative to a Language

from loquax.abstractions import Phoneme
from loquax.languages import Latin

r = Phoneme('r', Latin)
print(r.is_consonant and r.is_liquid)  # outputs: True

Morphology

The central problem of phonology is that linguistic units have changing features depending on their context and neighbours.

Loquax allows users to tackle this by defining their own morphisms.

from loquax.morphisms import Morphism, Rule, RuleSequence
from loquax.syllables import Syllable
from dataclasses import replace

long_position_morphism = Morphism[Syllable](
    target=Rule[Syllable](check_fn=lambda s: s.nucleus and s.coda and len(s.coda) >= 1),
    transformation=lambda s: replace(s, is_long=True),
    suffix=RuleSequence(
        [Rule[Syllable](check_fn=lambda s: s.coda and len(s.onset) >= 1)]
    ),
)

MorphismStore lets you organize your morphisms and to apply all transformations in your MorphismStore to a given syllable or phoneme sequence:

from loquax.abstractions import MorphismStore

morphism_store = MorphismStore([morphism1, morphism2, morphism3])
syllables_sequence = [syllable1, syllable2, syllable3]
transformed_sequence = morphism_store.apply_all(syllables_sequence)

Ipa

To convert text into the International Phonetic Alphabet for universal comprehension, you can use the to_string function with ipa=True:

print(catilinarian_orations.to_string(ipa=True))

# outputs:
# kʷɔ.uːs.kʷɛ    tan.dɛm    a.bʊ.teː.rɛ    ka.tɪ.liː.na    pa.tɪ.ɛn.tɪ.aː    nɔs.traː

Scansion

Scansion is the process of marking the stresses in a poem, and dividing the lines into feet. It's a critical part of the study and enjoyment of classical verse, like in Latin and Ancient Greek poetry. Loquax makes it easy to integrate scansion into your language analysis pipeline.

Currently only differentiation between long and short syllables is made

print(catilinarian_orations.to_string(scansion=True))

# outputs:
# quo.ūs.que    tan.dem    a.bu.tē.re    ca.ti.lī.na    pa.ti.en.ti.ā    nos.trā
#  u  -   u      -   u     u u  -  u     u  u  -  u     u  u  u  u  -     u   -

Extensibility

Loquax allows for extensibility, so you can build and customize your own language rules for unique or theoretical languages. Here's an example of how to define custom rules and apply them:

from loquax.languages import Latin
from loquax.abstractions import (
    PhonemeSyllabificationRuleStore, Language, 
    Constants, Tokenizer, MorphismStore, 
    Syllable, Morphism, Phoneme
)

syllabification_rules = PhonemeSyllabificationRuleStore(...)
constants = Constants(...)
tokenizer = Tokenizer(...)
syllable_morphisms = MorphismStore[Syllable]([...])
phoneme_morphisms = MorphismStore[Phoneme]([...])

my_lang = Language(
    language_name='MyLang',
    iso_639_code='myl', 
    constants,
    syllabification_rules,
    syllable_morphisms,
    phoneme_morphisms,
    tokenizer,
)

loquax's People

Contributors

mattlianje avatar

Stargazers

 avatar  avatar

Watchers

 avatar

loquax's Issues

Classical Latin - Phonology - Reddit #2

original comment: here

from u/christmas_fan1

Very cool idea.

Just a few points about the IPA:

- short i before a vowel has /i/ quality, likewise short u /u/

- you've got your work cut out for you writing all the cases for final m (assimilates to following consonants, in certain cases even word internally such as quamquam, -que, etc., nasalizes preceding vowels in front of fricatives, vowels and h-)

- -gn- = ŋn

- n before c, g, k, x = ŋ

- ngu<vowel> should become ŋɡʷ except in certain cases such as languī which is /laŋɡui/

- consonant cluster assimilation: plebs = /plɛps/. You could write a rule b -> p before voiceless consonants. There are other examples of this but I can't think of any right now.

Looking forward to see where this goes!

Classical Latin - Phonology - Reddit #1

original reddit comment

Suggestion from Reddit user: u/LatPronunciationGeek

1-3 will be new syllabification rules
4 is a phonological morphism rule to be added.

#1 neglego and related words are always syllabified in Latin poetry with [nɛg.l].

#2 words like abluo or abrumpo are always syllabified in Latin poetry with [ab.l] and [ab.r]. The same goes for words starting with ob-, sub-, ad-: the consonant at the end of the prefix doesn't get syllabified with a following /r/ or /l/ (although it does get syllabified with a following vowel).

#3 Words like gaza are syllabified in Classical Latin poetry as [gaz.za], with double [z.z].

#4 Words like maior/major have double [j.j] in the middle, not /ɪ/ or single /j/.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.