Giter Site home page Giter Site logo

classical-greek-nlp-resources's Introduction

classical-greek-nlp-resources

🚧 🙏 This is currently a work in progress 🙏 🚧

This repository contains a list of publicly available resources that might come in handy while performing NLP tasks for Classical Greek

Liddell, Scott, Jones Ancient Greek Lexicon (LSJ)

This is the de facto best known / most popular Ancient Greek - English lexicon.

Comments on each below.

  • 12M gzip, 36M unpacked.
  • A single txt file, plaintext version of Greek-English LSJ.
  • There's also an html version but it's essentially the same txt file within a single <pre></pre> block.
  • Unicode, not Betacode.
  • Easy to separate definitions (divided by a series of "*")
  • No out-of-the-box way of parsing within a definition.
  • Some entries contain extra chars such as '-', e.g. ἀγαθο-γονία
Snippet
************************************************************

ἀβᾰθής, ές, (βάθος) not deep, φάλαγξ Arr.Tact.5.6; in
single rank, ἡ ἐφ’ ἑνὸς ἀ. τάξις ib.17.5, ἕλκεα
Aret.SA1.9, Gal.11.127.

2.  Geom., without depth, ἐπιφάνεια S.E.P.3.43, cf.
Simp.in Ph.572.25.

************************************************************

ἄβαθρος, ον, without foundation, Cyr.

Perseus LSJ - https://github.com/gcelano/lexica (last commit: 2022-06-10)

  • Perseus export of LSJ.
  • XML. Well structured.
  • Betacode.
  • 40M zip-compressed, 345M unpacked.
Snippet
<entryFree id="n92041"
           key="r(abdouxe/w"
           type="main"
           opt="n"
           TEIform="entryFree">
    <orth extent="full" lang="greek" opt="n" TEIform="orth">r(abdoux-e/w</orth>,
    <sense id="n92041.0" n="A" level="1" opt="n" TEIform="sense">
        <tr opt="n" TEIform="tr">to be a</tr>
        <foreign lang="greek" TEIform="foreign">r(abdou=xos</foreign>, <tr opt="n" TEIform="tr">carry a rod</tr> or <tr
            opt="n" TEIform="tr">wand</tr>, esp. as a badge
        <pb n="1563" TEIform="pb"/>
        of office,
        <bibl default="NO" TEIform="bibl">
            <author TEIform="author">Hippias Erythr.</author>
        </bibl>
        ap.
        <bibl n="urn:cts:greekLit:tlg0008.tlg001.perseus-grc1:6:259d"
              default="NO"
              TEIform="bibl">
            <author TEIform="author">Ath.</author>
            <biblScope TEIform="biblScope">6.259d</biblScope>
        </bibl>
        ; of the Roman lictors, <tr opt="n" TEIform="tr">bear the fasces</tr>,
        <bibl n="urn:cts:greekLit:tlg0385.tlg001:48:43"
              default="NO"
              TEIform="bibl">
            <author TEIform="author">D.C.</author>
            <biblScope TEIform="biblScope">48.43</biblScope>
        </bibl>
        :—
        <gramGrp opt="n" TEIform="gramGrp">
            <gram type="voice" opt="n" TEIform="gram">Pass.</gram>
        </gramGrp>
        , <tr opt="n" TEIform="tr">have the fasces borne before one</tr>,
        <bibl n="urn:cts:greekLit:tlg0007.tlg005:10" default="NO" TEIform="bibl">
            <author TEIform="author">Plu.</author>
            <title TEIform="title">Num.</title>
            <biblScope TEIform="biblScope">10</biblScope>
        </bibl>
        .
    </sense>
</entryFree>

Unicode Perseus LSJ - https://github.com/gcelano/LSJ_GreekUnicode (last commit: 2022-09-22)

  • ^, but converted to Unicode.
  • 26M zip-compressed, 228M unpacked.
Snippet
<entryFree id="n92041" key="ῥαβδουχέω" type="main" opt="n" TEIform="entryFree">
    <orth extent="full" lang="greek" opt="n" TEIform="orth">ῥαβδουχ-έω</orth>,
    <sense id="n92041.0" n="A" level="1" opt="n" TEIform="sense">
        <tr opt="n" TEIform="tr">to be a</tr>
        <foreign lang="greek" TEIform="foreign">ῥαβδοῦχος</foreign>,<tr opt="n" TEIform="tr">carry a rod</tr>or<tr
            opt="n" TEIform="tr">wand</tr>, esp. as a badge<pb n="1563" TEIform="pb"/>of office,
        <bibl default="NO" TEIform="bibl">
            <author TEIform="author">Hippias Erythr.</author>
        </bibl>
        ap.
        <bibl n="urn:cts:greekLit:tlg0008.tlg001.perseus-grc1:6:259d" default="NO" TEIform="bibl">
            <author TEIform="author">Ath.</author>
            <biblScope TEIform="biblScope">6.259d</biblScope>
        </bibl>
        ; of the Roman lictors,<tr opt="n" TEIform="tr">bear the fasces</tr>,
        <bibl n="urn:cts:greekLit:tlg0385.tlg001:48:43" default="NO" TEIform="bibl">
            <author TEIform="author">D.C.</author>
            <biblScope TEIform="biblScope">48.43</biblScope>
        </bibl>
        :—
        <gramGrp opt="n" TEIform="gramGrp">
            <gram type="voice" opt="n" TEIform="gram">Pass.</gram>
        </gramGrp>
        ,<tr opt="n" TEIform="tr">have the fasces borne before one</tr>,
        <bibl n="urn:cts:greekLit:tlg0007.tlg005:10" default="NO" TEIform="bibl">
            <author TEIform="author">Plu.</author>
            <title TEIform="title">Num.</title>
            <biblScope TEIform="biblScope">10</biblScope>
        </bibl>
        .
    </sense>
</entryFree>

Unicode Perseus LSJ + 2 extra tags - https://github.com/gcelano/LSJ_additions (last commit: 2018-11-06)

  • 27M zip-compressed, 233M unpacked.

  • The extra tags are:

    • key2 = value of @key without U+0308 U+0304 U+0306,
    • key3 = value of @key without accents
Snippet
<entryFree key2="ῥαβδουχέω" key3="ραβδουχεω" id="n92041" key="ῥαβδουχέω" type="main" opt="n" TEIform="entryFree">
    <orth extent="full" lang="greek" opt="n" TEIform="orth">ῥαβδουχ-έω</orth>,
    <sense id="n92041.0" n="A" level="1" opt="n" TEIform="sense">
        <tr opt="n" TEIform="tr">to be a</tr>
        <foreign lang="greek" TEIform="foreign">ῥαβδοῦχος</foreign>,<tr opt="n" TEIform="tr">carry a rod</tr>or<tr
            opt="n" TEIform="tr">wand</tr>, esp. as a badge<pb n="1563" TEIform="pb"/>of office,
        <bibl default="NO" TEIform="bibl">
            <author TEIform="author">Hippias Erythr.</author>
        </bibl>
        ap.
        <bibl n="urn:cts:greekLit:tlg0008.tlg001.perseus-grc1:6:259d" default="NO" TEIform="bibl">
            <author TEIform="author">Ath.</author>
            <biblScope TEIform="biblScope">6.259d</biblScope>
        </bibl>
        ; of the Roman lictors,<tr opt="n" TEIform="tr">bear the fasces</tr>,
        <bibl n="urn:cts:greekLit:tlg0385.tlg001:48:43" default="NO" TEIform="bibl">
            <author TEIform="author">D.C.</author>
            <biblScope TEIform="biblScope">48.43</biblScope>
        </bibl>
        :—
        <gramGrp opt="n" TEIform="gramGrp">
            <gram type="voice" opt="n" TEIform="gram">Pass.</gram>
        </gramGrp>
        ,<tr opt="n" TEIform="tr">have the fasces borne before one</tr>,
        <bibl n="urn:cts:greekLit:tlg0007.tlg005:10" default="NO" TEIform="bibl">
            <author TEIform="author">Plu.</author>
            <title TEIform="title">Num.</title>
            <biblScope TEIform="biblScope">10</biblScope>
        </bibl>
        .
    </sense>
</entryFree>

Morphological forms of nouns + adjectives extracted from LSJ - https://github.com/gcelano/Thesaurus_Morphologicus_Linguae_Graecae (last commit: 2019-05-12)

  • xml format, easy to parse
  • contains nouns (decl. I and II) and adjectives (I and II) only
  • 732K zip-compressed, 5.6M unpacked.
Snippet (adj)
<l id="n73673" key="ὁμοιόσημος" key2="ὁμοιόσημος" cls="1" bse="ὁμοιοσημ" end="ος_ον"/>
<l id="n73675" key="ὁμοιόσκευος" key2="ὁμοιόσκευος" cls="1" bse="ὁμοιοσκευ" end="ος_ον"/>
Snippet (nouns decl. I)
<l id="n164" key="ἁβροβάτης" key2="ἁβροβάτης" gnt="ου" gnd="" dcl="1" bse="ἁβροβατ" end="ης"/>
<l id="n169" key="ἁβροδίαιτα" key2="ἁβροδίαιτα" gnt="ας" gnd="" dcl="1" bse="ἁβροδιαιτ" end="α"/>
Snippet (nouns decl. II)
<l id="n397" key="ἄγγελος" key2="ἄγγελος" gnt="ου" gnd="" dcl="2" bse="ἀγγελ" end="ος"/>
<l id="n410" key="ἄγεθλον" key2="ἄγεθλον" gnt="ου" gnd="τό" dcl="2" bse="ἀγεθλ" end="ον"/>
<l id="n442" key="ἀγελητρόφος" key2="ἀγελητρόφος" gnt="ου" gnd="" dcl="2" bse="ἀγελητροφ" end="ος"/>

Perseus DL - https://github.com/PerseusDL/lexica/ (last commit: 2022-08-19)

Perseus Digital Library comprises various Greek and Latin digital resources. More details: https://github.com/PerseusDL/lexica/wiki The resources are distributed across two places:

Snippet
<entryFree id="n92041"
           key="r(abdouxe/w"
           type="main"
           opt="n"
           TEIform="entryFree">
    <orth extent="full" lang="greek" opt="n" TEIform="orth">r(abdoux-e/w</orth>,
    <sense id="n92041.0" n="A" level="1" opt="n" TEIform="sense">
        <tr opt="n" TEIform="tr">to be a</tr>
        <foreign lang="greek" TEIform="foreign">r(abdou=xos</foreign>, <tr opt="n" TEIform="tr">carry a rod</tr> or <tr
            opt="n" TEIform="tr">wand</tr>, esp. as a badge
        <pb n="1563" TEIform="pb"/>
        of office,
        <bibl default="NO" TEIform="bibl">
            <author TEIform="author">Hippias Erythr.</author>
        </bibl>
        ap.
        <bibl n="urn:cts:greekLit:tlg0008.tlg001.perseus-grc1:6:259d"
              default="NO"
              TEIform="bibl">
            <author TEIform="author">Ath.</author>
            <biblScope TEIform="biblScope">6.259d</biblScope>
        </bibl>
        ; of the Roman lictors, <tr opt="n" TEIform="tr">bear the fasces</tr>,
        <bibl n="urn:cts:greekLit:tlg0385.tlg001:48:43"
              default="NO"
              TEIform="bibl">
            <author TEIform="author">D.C.</author>
            <biblScope TEIform="biblScope">48.43</biblScope>
        </bibl>
        :—
        <gramGrp opt="n" TEIform="gramGrp">
            <gram type="voice" opt="n" TEIform="gram">Pass.</gram>
        </gramGrp>
        , <tr opt="n" TEIform="tr">have the fasces borne before one</tr>,
        <bibl n="urn:cts:greekLit:tlg0007.tlg005:10" default="NO" TEIform="bibl">
            <author TEIform="author">Plu.</author>
            <title TEIform="title">Num.</title>
            <biblScope TEIform="biblScope">10</biblScope>
        </bibl>
        .
    </sense>
</entryFree>

Betacode converters

Betacode is a romanized way of representing Greek texts (e.g. ὠμόβυρσος -> w)mo/bursos) (wikipedia). There's a bunch of tools allowing for working with this format and converting it to Unicode.

More reading:

Online Betacode <-> Unicode converters

Python packages

classical-greek-nlp-resources's People

Contributors

mrapacz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.