Giter Site home page Giter Site logo

abbott-smith's Introduction

Abbott-Smith - Summary

Abbott-Smith is a project to mark up the G. Abbott-Smith's A Manual Greek Lexicon of the New Testament (New York: Scribner's, 1922) using TEI.

Source and Copyright

The PDF file with a text layer (manualgreeklexic00abborich.pdf) was obtained from http://archive.org/details/manualgreeklexic00abborich. Certain restrictions apply to the use of this file. These are included in the PDF file.

The lexicon (abbott-smith.tei.xml), including the marked up version in this repository, is in the public domain.

Viewing and Downloading

The main file in this repository is abbott-smith.tei.xml.

To use the lexicon, download any release from the Releases page.

Also, a module for the SWORD Library is available from CrossWire.

Contributors

The work of marking up and checking the text are complete. Many thanks to all those who devoted time and expertise:

  • Daniel Owens
  • Dardo Sordi
  • Chuck Bearden
  • Patrick Durusau
  • Jonathan Robie
  • Stephen Hughes (aka Στέφανος)
  • David Statezni
  • Bram vandenHeuvel
  • Drew Curley
  • Chapel Presson
  • Todd L. Price

Markup Information

All text from the lexicon is marked up using CrossWire.org's iteration of TEI XML, which supports several features of OSIS XML that are relevant to biblical studies (especially biblical references). For helpful documentation on this iteration of TEI, see http://www.crosswire.org/wiki/TEI_Dictionaries. For the schema definition, see http://www.crosswire.org/OSIS/teiP5osis.1.4.xsd. For detailed documentation on TEI dictionaries, see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html.

Changelog

2017/02/03 - Release of v. 1.0: This release comes thanks to the work of David Statezni, Bram vandenHeuvel, Drew Curley, Chapel Presson, and Todd L. Price to complete the markup and checking of the lexicon. The lexicon is complete.

2013/12/12 - Release of v. 0.5: This release contains the majority of entries in Abbott-Smith, some of which have been marked up very carefully but many others that require manual editing. The release includes numerous fixes to the data from DSAW's version, particularly correcting references and restoring <div> elements and page numbers. Also added: <etym> for etymological data and <re> for related entry information (mostly for synonyms). Many thanks to Jonathan Robie and Patrick Durusau for collating the data and to Dardo Sordi for working countless hours to improve the data. From this point forward we will use the Github release feature since there is no longer any nonsense OCR text to remove before release. However, much editing remains to be done, and there may be errors. Note the Total entries: 5,726. Total pages checked: 4/526.

2013/12/04 - Replaced all missing entries with entries generated from DSAW's version using an XQuery, added Hebrew text to DSAW's entries. Initially complete. We still plan to restructure the entries and verify a number of things.

2012/12/12 - Release of v. 0.15: Includes pages iii-16 and entries for words occurring 100 times or more in the Greek NT. Total entries: 555.

2012/10/01 - Release of v. 0.14: Includes pages iii-9 and entries for words occurring 100 times or more in the Greek NT. Also moved markup instructions to markdown file instead of PDF. Total entries: 299.

2012/09/07 - Release of v. 0.13: Includes pages iii-5 and entries for words occurring 200 times or more in the Greek NT. Total entries: 148.

2012/09/01 - Release of v. 0.12: Includes pages iii-4 and entries for words occurring 300 times or more in the Greek NT. Total entries: 110.

2012/08/07 - Release of v. 0.11: Includes pages iii-4 and entries for words occurring 500 times or more in the Greek NT. Also changed to using <gloss> instead of <def>. Many thanks to Dardo Sordi for corrections and additional entries. Total entries: 85.

2012/07/27 - First Release (v. 0.1): Includes pages iii-3 and entries for words occurring 1,000 times or more in the Greek NT. Total entries: 50.

2012/05/10 - Moved markup instructions to PDF file

2012/05/09 - Initial upload with frontmatter and page numbers marked up

abbott-smith's People

Contributors

bramvandenheuvel avatar cbearden avatar cpresson avatar dardosordi avatar destatez avatar dowens76 avatar drew-curley avatar emg avatar jag3773 avatar jonathanrobie avatar mrgreekgeek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abbott-smith's Issues

Final Pages (final pages of A-S)

  • page 495

  • page 496

  • page 497

  • page 498

  • page 499

  • page 500

  • page 501

  • page 502

  • page 503

  • page 504

  • page 505

  • page 506

  • page 507

  • page 508

  • page 509

  • page 510

  • page 511

  • page 512

  • page 513

Kappa (all A-S pages with entries starting with the letter kappa)

  • page 222
  • page 223
  • page 224
  • page 225
  • page 226
  • page 227
  • page 228
  • page 229
  • page 230
  • page 231
  • page 232
  • page 233
  • page 234
  • page 235
  • page 236
  • page 237
  • page 238
  • page 239
  • page 240
  • page 241
  • page 242
  • page 243
  • page 244
  • page 245
  • page 246
  • page 247
  • page 248
  • page 249
  • page 250
  • page 251
  • page 252
  • page 253
  • page 254
  • page 255
  • page 256
  • page 257
  • page 258
  • page 259
  • page 260
  • page 261
  • page 262

<gloss> versus <emph>

I may have missed it, but a "command decision" should be made as to when to use <gloss> versus <emph>. There seem to be 3 classes of instances of italicization in the A-S, with the 2nd being the most straight-forward use for <gloss>. An analysis function could be developed to analyze and report the instances of these classes.

  1. Within a derivation clause (example uses <gloss>):
    <seg type="derivation">(&lt; <foreign xml:lang="grc">συγκυρέω</foreign>, <gloss>to happen</gloss>), </seg>
  2. Within a sense clause, particularly when immediately following that XML tag:
    <sense><gloss>chance, coincidence</gloss>: <foreign xml:lang="grc">κατὰ σ.</foreign> (v. MM, xxiii), <ref osisRef="Luke.10.31">Lk 10:31</ref> (Hippocr., Eccl.).†</sense>
  3. Within a sense clause, but within parenthesis that identify either RV or AS as the source (first example uses <gloss>, second has neither):
    ... (RV,<gloss>exact wrongfully</gloss>;... or ...(AV, comforter;...

a number of issues I hit with the entry/@n values

Stray iota subscript
-γαστήρͅ|G1064
+γαστήρ|G1064

Stray semicolon
-διαβλέπω;|G1227
+διαβλέπω|G1227

I presume the n attribute shouldn't contain the hyphens (but not sure intention)
-ἐκ-κρέμαννυμι
+ἐκκρέμαννυμι

Stray period
-Ἑσρώμ.|G2074|Ἑσρώμ
+Ἑσρώμ|G2074|Ἑσρώμ

Not sure if parenthetical μήν should be in n attribute
-ἦ(μήν)|G2229
+ἦ|G2229

Stay period
-ὅπως.|G3704
+ὅπως|G3704

OCR problem?
-πυνθώ'ομαι|G4441
+πυνθάνομαι|G4441

osisRefs for I Ki & II Ki

The osisRef for I Ki and II Ki can either point to 1Sam and 2Sam, respectively, if the context is LXX, or point to 1King and 2King if the context is the Hebrew text. An analysis function should be developed which can either determine whether the mapping is correct for each instance (there are both in the A-S), or generate a report that would be analyzed by an editor to determine correctness of mapping.

How to tag subscripts after Bible books

The list of abbreviations in the PDF (p. XVI) says this about subscripted numbers after the names of Bible books:

“An inferior numeral after a biblical book (e.g. III Mac 6) indicates the number of times a word occurs in that book.”
So in the entry for 'ἀγών|G73', we have the following:

II Mac <hi rend="subscript">6</hi>, IV Mac <hi rend="subscript">5</hi>

Should we tag these also as osisRef elements? The above would become e.g.

<ref osisRef="2Macc">II Mac <hi rend="subscript">6</hi></ref>, <ref osisRef="4Macc">IV Mac <hi rend="subscript">5</hi></ref>

I've tagged a couple of instances this way before looking for precedents in what has already been tagged, but these should be easy to detect and change if we decide not to make them osisRef elements. I just haven't thought this through, and I don't know if it was decided already.

Representation of consecutive verses in references

For 2 consecutive verses in a reference, the scan process has a representation that is a “valid” representation of the XML, but varies from the A-S PDF, by replacing the comma between the two verses with a dash. There are two other ways to specify this which are both “valid” reference syntax, as well as “identical” representation of the A-S PDF. Below is the scan presentation and then both of the other representations of John 13: 26 & 27. (The markdown language of this issue buffer would not let me show the underscoring in the visualization, where the entire string shown would be underlined, except that for representation 2 the space after the comma would NOT be underlined.)

I have been converting most instances to representation 2. I am not sure whether others have left the scan representation AS-IS, or whether they have chosen representation 1 or 2. If it is decided that the scan representation should be corrected, at the end of “editing” we can search for instances of it, and correct those to match either representation 1 or 2. The choice of either of those is another decision that needs to be made.

Scan Representation:
XML: <ref osisRef="John.13.26-John.13.27">Jo 13:26-27</ref>
Visualization: Jo 13:26-27

Representation 1:
XML: <ref osisRef="John.13.26-John.13.27">Jo 13:26, 27</ref>
Visualization: Jo 13:26, 27

Representation 2:
XML: <ref osisRef="John.13.26">Jo 13:26</ref>, <ref osisRef="John.13.27">27</ref>
Visualization: Jo 13:26, 27

Need to update list of contributors

In the /TEI/teiHeader[1]/encodingDesc[1] element, we need to add the names of all the new contributors (Dave Statezni, Chapel Presson, et al.) who don't need to remain anonymous.

Scan omitted Grammar tagging in many instances

We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from section "I. GENERAL." at the beginning of the XML file.

Most of the current instances of tagging occur after the <form...> tag-pair and the <etym...> tag-pair and before the first <sense...> tag-pair, but there are also current instances that a a part of the contents of a <sense...> tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the <sense...> tag-pair or whether they should be "replaced" wherever they occur.

Sigma (all A-S pages with entries starting with the letter sigma)

  • page 400

  • page 401

  • page 402

  • page 403

  • page 404

  • page 405

  • page 406

  • page 407

  • page 408

  • page 409

  • page 410

  • page 411

  • page 412

  • page 413

  • page 414

  • page 415

  • page 416

  • page 417

  • page 418

  • page 419

  • page 420

  • page 421

  • page 422

  • page 423

  • page 424

  • page 425

  • page 426

  • page 427

  • page 428

  • page 429

  • page 430

  • page 431

  • page 432

  • page 433

  • page 434

  • page 435

  • page 436

  • page 437

  • page 438

Epsilon (all A-S pages with entries starting with the letter epsilon)

  • page 125
  • page 126
  • page 127
  • page 128
  • page 129
  • page 130
  • page 131
  • page 132
  • page 133
  • page 134
  • page 135
  • page 136
  • page 137
  • page 138
  • page 139
  • page 140
  • page 141
  • page 142
  • page 143
  • page 144
  • page 145
  • page 146
  • page 147
  • page 148
  • page 149
  • page 150
  • page 151
  • page 152
  • page 153
  • page 154
  • page 155
  • page 156
  • page 157
  • page 158
  • page 159
  • page 160
  • page 161
  • page 162
  • page 163
  • page 164
  • page 165
  • page 166
  • page 167
  • page 168
  • page 169
  • page 170
  • page 171
  • page 172
  • page 173
  • page 174
  • page 175
  • page 176
  • page 177
  • page 178
  • page 179
  • page 180
  • page 181
  • page 182
  • page 183
  • page 184
  • page 185
  • page 186
  • page 187
  • page 188
  • page 189
  • page 190
  • page 191
  • page 192

Alpha (all A-S pages with entries starting with the letter alpha)

*** Pages already checked off for this issue have been checked by another contributor**

  • page 1

  • page 2

  • page 3

  • page 4

  • page 5

  • page 6

  • page 7

  • page 8

  • page 9

  • page 10

  • page 11

  • page 12

  • page 13

  • page 14

  • page 15

  • page 16

  • page 17

  • page 18

  • page 19

  • page 20

  • page 21

  • page 22

  • page 23

  • page 24

  • page 25

  • page 26

  • page 27

  • page 28

  • page 29

  • page 30

  • page 31

  • page 32

  • page 33

  • page 34

  • page 35

  • page 36

  • page 37

  • page 38

  • page 39

  • page 40

  • page 41

  • page 42

  • page 43

  • page 44

  • page 45

  • page 46

  • page 47

  • page 48

  • page 49

  • page 50

  • page 51

  • page 52

  • page 53

  • page 54

  • page 55

  • page 56

  • page 57

  • page 58

  • page 59

  • page 60

  • page 61

  • page 62

  • page 63

  • page 64

  • page 65

  • page 66

  • page 67

  • page 68

  • page 69

  • page 70

  • page 71

  • page 72

Miscellaneous simple markup problems

These are problems too minor each to have their own issues. I could easily fix them, but I don't want to complicate things during merges from the section editors. Would it make sense for them to check for and implement the changes?

  1. Well-formedness error: at line 55272 there is an unescaped ampersand ("&" should be "&").
  2. There are several <ref> elements with the attribute osisref; it should be osisRef, with an upper-case "R". This is a validity error against the teiP5osis.2.5.0.xsd schema.
  3. The element for sense "2" of "ἐπιούσιος" should probably enclose the following lettered sub-senses, but it is closed before "(a)".
  4. There are three instances of an extra greater-than (">") following the closing greater-than of a tag (lines 5926, 10708, 35047).

Missing entry - αλλαχου

Page 21 missing entry between and , should be: αλλαχου

This will impact the numbering of entries G238 and following.

Remove soft hyphens G262, G263

... of amaranth (Inscr.); hence un­fading: I Pe 5:4.†

suspect the hyphen in un-fading was soft hyphen - correct to unfading

... un­fading (whence ὁ ἀ., the amaranth, an unfading flower): I Pe 1:4 (cf. MM, VGT, s.v.).†

suspect the hyphen in un-fading was soft hyphen - correct to unfading

'foreign[@n]' tags an English word referring to another Greek entry

In the entry for 'ἀνθ-υπατεύω', the English word 'word' is tagged with foreign, probably so as to to supply it with an n attribute that refers to another entry.

<entry n="ἀνθυπατεύω|G445">
  <note type="occurrencesNT">1</note>
  <form>*† <orth>ἀνθ-υπατεύω</orth></form>
<etym>
  <seg type="derivation">(see next <foreign xml:lang="grc" n="G446">word</foreign>), </seg>
</etym>
  <sense>to be proconsul: <ref osisRef="Acts.18.12">Ac 18:12</ref> Rec. (v.s. <foreign xml:lang="grc">ἀνθύπατος</foreign>).†</sense>
</entry>

The normal usage of this construct seems to be to tag the foreign word and put the Strong's number in the @n. What seems needed here is a structure to refer to the entry.

At a future time, I will write a script to search for other examples of English tagged as foreign.

Character too use for primes when Greek letters are cited as numerals

E.g. in the entry for the alpha:

<sense><gloss>alpha</gloss>, the first letter of the Greek alphabet. As a numeral, <foreign xml:lang="grc">ά</foreign> = 1,

Here the alpha has the tonos rather than the following prime; the beta in its entry has a simple apostrophe. We should use the same character throughout the dictionary.

AutoHotKey Hebrew pointing removed plus many keystrokes to speed editing

Go to site: https://autohotkey.com/
Click “Download” button, then “Installer” button that appears
To Install:
If using Chrome: the file AutoHotkey_1.1.24.03_setup.exe should appear at the bottom of the Chrome browser. Right click over that filename and select open.
If using Firefox select the “Save File” button on the pop-up menu. The select the Downloads arrow on the icon line and then select “Show All Downloads”. Right-click the file in the new window and select “Open Containing Folder”. Right-click that file again and select “Open”.
When Installation starts you will want to select the standard installation which will be the Unicode version for your machine type (32 or 64 bit). When installation is complete, select Exit. Windows Explorer will now be configured to use the toolset.
I have put several ahk files in the attached zip file, the file type for theses must be maintained as ahk. These are defined as:
RemoveHebrewPointing.ahk – Remove the Hebrew pointing of the selected Hebrew text with an Alt+. (Alt key and period) {See note below}
XMLGloss.ahk - Insert “” at the current cursor location with an Alt+g keystroke.
XMLSlashGloss.ahk - Insert “” at the current cursor location with an Alt+h keystroke.
XMLRef.ahk - Insert “” at the current cursor location with an Alt+r keystroke.
XMLSlashRef.ahk - Insert “” at the current cursor location with an Alt+t keystroke.
XMLemph.ahk - Insert “” at the current cursor location with an Alt+e keystroke.
For all but the first file, the file contents are very simple, with an initial definition of the key sequence, followed by an action, in this case a SendRaw command followed by the text that is desired for entry at the current cursor location There is nothing magic about the key sequences that I have defined for these. You can change them as you prefer. I have found that even though the tool supports the use of the Windows key, the Windows operating system seemed to take precedence over what is defined by AHK. I just stayed away from using that key. Once you have an ahk file configured as you desire, or the first time you use what I have attached, you will need to open up the Windows Explorer to the folder where you have stored them. Right-click the appropriate ahk file and select “Run Script”. You are not set up to make use of these hotkeys.
The RemoveHebrewPointing.ahk file is a little more complex and makes use of a subroutine which does the work of removing the Hebrew vowels for the text that is in the copy-paste clipboard buffer. The first action of this file is to do a Cntl+C (copy) which puts the highlighted text into the clipboard buffer. It then calls the subroutine and finally does a Cntl+V (paste) to replace the selected Hebrew text with its vowels removed. This file will remove ALL Hebrew points except shin, sin, dagesh/mapiq, and sof pasuq. I did find that with a final kaf with a sof pasuq, that sof pasuq ends up really being a Shewa, so this function will remove that pointing. I did not want to open up the editing to remove the sheva, because there are many places where we want that preserved. If you are able to select all but that last letter, you apply the function to the remainder of the Hebrew word.
If you have any questions, please post them as comments against this issue.This GitHub toolsest wont let me direcfkt\y attach .ahk files, so I put them in a Zip filr/folder.

AHK_Files.zip

Hot-Links missing for some classes of word references

There appear to be 3 classes of instances where hot-links (XML <ref...> tag-pairs) should be present for links to other words in the XML file:

  1. The use of the character "<" for "derived from" should have the following word(s) hot-linked
  2. The word(s) following the "SYN" keyword should be hot-linked
  3. (This needs a confirmation from Todd) The use of the character "=" for "equal to" with following Greek, and not English, should have those following word(s) hot-linked

For each of these cases, only the referenced words that exist in this XML should be hot-linked, otherwise they should NOT BE modified to have the XML <ref...> tag-pair.

This could be an automated task by an XML-smart tool to check for compliance, and to update with the tag-pair, where needed

Pi (all A-S pages with entries starting with the letter pi)

  • page 332

  • page 333

  • page 334

  • page 335

  • page 336

  • page 337

  • page 338

  • page 339

  • page 340

  • page 341

  • page 342

  • page 343

  • page 344

  • page 345

  • page 346

  • page 347

  • page 348

  • page 349

  • page 350

  • page 351

  • page 352

  • page 353

  • page 354

  • page 355

  • page 356

  • page 357

  • page 358

  • page 359

  • page 360

  • page 361

  • page 362

  • page 363

  • page 364

  • page 365

  • page 366

  • page 367

  • page 368

  • page 369

  • page 370

  • page 371

  • page 372

  • page 373

  • page 374

  • page 375

  • page 376

  • page 377

  • page 378

  • page 379

  • page 380

  • page 381

  • page 382

  • page 383

  • page 384

  • page 385

  • page 386

  • page 387

  • page 388

  • page 389

  • page 390

  • page 391

  • page 392

  • page 393

  • page 394

  • page 395

typos in note/@type

προφήτης has type="occurrencesnT"
ὧδε has type="occurrencstNT"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.