translatable-exegetical-tools / abbott-smith Goto Github PK

Abbott-Smith's Manual Greek Lexicon

PHP 2.37% CSS 2.14% XSLT 3.05% JavaScript 81.46% HTML 0.32% Shell 0.53% Perl 8.89% Python 0.93% Batchfile 0.32%

abbott-smith's Introduction

Abbott-Smith - Summary

Abbott-Smith is a project to mark up the G. Abbott-Smith's A Manual Greek Lexicon of the New Testament (New York: Scribner's, 1922) using TEI.

Source and Copyright

The PDF file with a text layer (manualgreeklexic00abborich.pdf) was obtained from http://archive.org/details/manualgreeklexic00abborich. Certain restrictions apply to the use of this file. These are included in the PDF file.

The lexicon (abbott-smith.tei.xml), including the marked up version in this repository, is in the public domain.

Viewing and Downloading

The main file in this repository is abbott-smith.tei.xml.

To use the lexicon, download any release from the Releases page.

Also, a module for the SWORD Library is available from CrossWire.

Contributors

The work of marking up and checking the text are complete. Many thanks to all those who devoted time and expertise:

Daniel Owens
Dardo Sordi
Chuck Bearden
Patrick Durusau
Jonathan Robie
Stephen Hughes (aka Στέφανος)
David Statezni
Bram vandenHeuvel
Drew Curley
Chapel Presson
Todd L. Price

Markup Information

All text from the lexicon is marked up using CrossWire.org's iteration of TEI XML, which supports several features of OSIS XML that are relevant to biblical studies (especially biblical references). For helpful documentation on this iteration of TEI, see http://www.crosswire.org/wiki/TEI_Dictionaries. For the schema definition, see http://www.crosswire.org/OSIS/teiP5osis.1.4.xsd. For detailed documentation on TEI dictionaries, see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html.

Changelog

2017/02/03 - Release of v. 1.0: This release comes thanks to the work of David Statezni, Bram vandenHeuvel, Drew Curley, Chapel Presson, and Todd L. Price to complete the markup and checking of the lexicon. The lexicon is complete.

2013/12/12 - Release of v. 0.5: This release contains the majority of entries in Abbott-Smith, some of which have been marked up very carefully but many others that require manual editing. The release includes numerous fixes to the data from DSAW's version, particularly correcting references and restoring <div> elements and page numbers. Also added: <etym> for etymological data and <re> for related entry information (mostly for synonyms). Many thanks to Jonathan Robie and Patrick Durusau for collating the data and to Dardo Sordi for working countless hours to improve the data. From this point forward we will use the Github release feature since there is no longer any nonsense OCR text to remove before release. However, much editing remains to be done, and there may be errors. Note the Total entries: 5,726. Total pages checked: 4/526.

2013/12/04 - Replaced all missing entries with entries generated from DSAW's version using an XQuery, added Hebrew text to DSAW's entries. Initially complete. We still plan to restructure the entries and verify a number of things.

2012/12/12 - Release of v. 0.15: Includes pages iii-16 and entries for words occurring 100 times or more in the Greek NT. Total entries: 555.

2012/10/01 - Release of v. 0.14: Includes pages iii-9 and entries for words occurring 100 times or more in the Greek NT. Also moved markup instructions to markdown file instead of PDF. Total entries: 299.

2012/09/07 - Release of v. 0.13: Includes pages iii-5 and entries for words occurring 200 times or more in the Greek NT. Total entries: 148.

2012/09/01 - Release of v. 0.12: Includes pages iii-4 and entries for words occurring 300 times or more in the Greek NT. Total entries: 110.

2012/08/07 - Release of v. 0.11: Includes pages iii-4 and entries for words occurring 500 times or more in the Greek NT. Also changed to using <gloss> instead of <def>. Many thanks to Dardo Sordi for corrections and additional entries. Total entries: 85.

2012/07/27 - First Release (v. 0.1): Includes pages iii-3 and entries for words occurring 1,000 times or more in the Greek NT. Total entries: 50.

2012/05/10 - Moved markup instructions to PDF file

2012/05/09 - Initial upload with frontmatter and page numbers marked up

abbott-smith's People

Contributors

Stargazers

Watchers

Forkers

dardosordi cbearden gregorycrane pdurusau biblicalhumanities jag3773 bramvandenheuvel drew-curley cpresson viktor-zhuromskyy emg destatez freely-given-org standardgalactic helmadik mrgreekgeek lordfrishetti1

abbott-smith's Issues

Final Pages (final pages of A-S)

Kappa (all A-S pages with entries starting with the letter kappa)

ἔπειτα missing entry

The entry for ἔπειτα seems to be missing.

Gamma (all A-S pages with entries starting with the letter gamma)

Mu (all A-S pages with entries starting with the letter mu)

<gloss> versus <emph>

I may have missed it, but a "command decision" should be made as to when to use <gloss> versus <emph>. There seem to be 3 classes of instances of italicization in the A-S, with the 2nd being the most straight-forward use for <gloss>. An analysis function could be developed to analyze and report the instances of these classes.

Within a derivation clause (example uses <gloss>):
<seg type="derivation">(< <foreign xml:lang="grc">συγκυρέω</foreign>, <gloss>to happen</gloss>), </seg>
Within a sense clause, particularly when immediately following that XML tag:
<sense><gloss>chance, coincidence</gloss>: <foreign xml:lang="grc">κατὰ σ.</foreign> (v. MM, xxiii), <ref osisRef="Luke.10.31">Lk 10:31</ref> (Hippocr., Eccl.).†</sense>
Within a sense clause, but within parenthesis that identify either RV or AS as the source (first example uses <gloss>, second has neither):
... (RV,<gloss>exact wrongfully</gloss>;... or ...(AV, comforter;...

Typo in electronic text - <entry n="ἀνάδειξις|G323">

a sheaving forth, announcement:

The 1922 edition reads: a shewing forth, announcement:

sheaving -> shewing

a number of issues I hit with the entry/@n values

Stray iota subscript
-γαστήρͅ|G1064
+γαστήρ|G1064

Stray semicolon
-διαβλέπω;|G1227
+διαβλέπω|G1227

I presume the n attribute shouldn't contain the hyphens (but not sure intention)
-ἐκ-κρέμαννυμι
+ἐκκρέμαννυμι

Stray period
-Ἑσρώμ.|G2074|Ἑσρώμ
+Ἑσρώμ|G2074|Ἑσρώμ

Not sure if parenthetical μήν should be in n attribute
-ἦ(μήν)|G2229
+ἦ|G2229

Stay period
-ὅπως.|G3704
+ὅπως|G3704

OCR problem?
-πυνθώ'ομαι|G4441
+πυνθάνομαι|G4441

A-S Pages Cleanup

osisRefs for I Ki & II Ki

The osisRef for I Ki and II Ki can either point to 1Sam and 2Sam, respectively, if the context is LXX, or point to 1King and 2King if the context is the Hebrew text. An analysis function should be developed which can either determine whether the mapping is correct for each instance (there are both in the A-S), or generate a report that would be analyzed by an editor to determine correctness of mapping.

How to tag subscripts after Bible books

The list of abbreviations in the PDF (p. XVI) says this about subscripted numbers after the names of Bible books:

“An inferior numeral after a biblical book (e.g. III Mac 6) indicates the number of times a word occurs in that book.”
So in the entry for 'ἀγών|G73', we have the following:
II Mac <hi rend="subscript">6</hi>, IV Mac <hi rend="subscript">5</hi>
Should we tag these also as osisRef elements? The above would become e.g.

<ref osisRef="2Macc">II Mac <hi rend="subscript">6</hi></ref>, <ref osisRef="4Macc">IV Mac <hi rend="subscript">5</hi></ref>

I've tagged a couple of instances this way before looking for precedents in what has already been tagged, but these should be easy to detect and change if we decide not to make them osisRef elements. I just haven't thought this through, and I don't know if it was decided already.

Representation of consecutive verses in references

For 2 consecutive verses in a reference, the scan process has a representation that is a “valid” representation of the XML, but varies from the A-S PDF, by replacing the comma between the two verses with a dash. There are two other ways to specify this which are both “valid” reference syntax, as well as “identical” representation of the A-S PDF. Below is the scan presentation and then both of the other representations of John 13: 26 & 27. (The markdown language of this issue buffer would not let me show the underscoring in the visualization, where the entire string shown would be underlined, except that for representation 2 the space after the comma would NOT be underlined.)

I have been converting most instances to representation 2. I am not sure whether others have left the scan representation AS-IS, or whether they have chosen representation 1 or 2. If it is decided that the scan representation should be corrected, at the end of “editing” we can search for instances of it, and correct those to match either representation 1 or 2. The choice of either of those is another decision that needs to be made.

Scan Representation:
XML: <ref osisRef="John.13.26-John.13.27">Jo 13:26-27</ref>
Visualization: Jo 13:26-27

Representation 1:
XML: <ref osisRef="John.13.26-John.13.27">Jo 13:26, 27</ref>
Visualization: Jo 13:26, 27

Representation 2:
XML: <ref osisRef="John.13.26">Jo 13:26</ref>, <ref osisRef="John.13.27">27</ref>
Visualization: Jo 13:26, 27

Theta (all A-S pages with entries starting with the letter theta)

Need to update list of contributors

In the /TEI/teiHeader[1]/encodingDesc[1] element, we need to add the names of all the new contributors (Dave Statezni, Chapel Presson, et al.) who don't need to remain anonymous.

Rho (all A-S pages with entries starting with the letter rho)

page 396
page 397
page 398
page 399

Iota (all A-S pages with entries starting with the letter iota)

ποία - missing entry

My paper copy has a separate entry for ποία that is missing in the online version.

Scan omitted Grammar tagging in many instances

We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from section "I. GENERAL." at the beginning of the XML file.

Most of the current instances of tagging occur after the <form...> tag-pair and the <etym...> tag-pair and before the first <sense...> tag-pair, but there are also current instances that a a part of the contents of a <sense...> tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the <sense...> tag-pair or whether they should be "replaced" wherever they occur.

Chi (all A-S pages with entries starting with the letter chi)

Nu (all A-S pages with entries starting with the letter nu)

Sigma (all A-S pages with entries starting with the letter sigma)

Epsilon (all A-S pages with entries starting with the letter epsilon)

Xi (all A-S pages with entries starting with the letter xi)

page 308

Alpha (all A-S pages with entries starting with the letter alpha)

*** Pages already checked off for this issue have been checked by another contributor**

εἶτα - missing entry

The entry for εἶτα seems to be missing

XML correctness of having 2 different n="Hxxxx" subclauses within one foreign tag-pair

Our XML "Editor" / viewer flagged the instance of the subject line as an error. The XML line is shown in the attached file. Is this really an XML error? @cbearden

Possible_Issues.txt

Miscellaneous simple markup problems

These are problems too minor each to have their own issues. I could easily fix them, but I don't want to complicate things during merges from the section editors. Would it make sense for them to check for and implement the changes?

Well-formedness error: at line 55272 there is an unescaped ampersand ("&" should be "&").
There are several <ref> elements with the attribute osisref; it should be osisRef, with an upper-case "R". This is a validity error against the teiP5osis.2.5.0.xsd schema.
The element for sense "2" of "ἐπιούσιος" should probably enclose the following lettered sub-senses, but it is closed before "(a)".
There are three instances of an extra greater-than (">") following the closing greater-than of a tag (lines 5926, 10708, 35047).

Beta (all A-S pages containing the letter beta)

@drew-curley

Psi (all A-S pages with entries starting with the letter psi)

page 487
page 488
page 489

Upsilon (all A-S pages with entries starting with the letter upsilon)

Omega (all A-S pages with entries starting with the letter omega)

page 490
page 491

Missing entry - αλλαχου

Page 21 missing entry between and , should be: αλλαχου

This will impact the numbering of entries G238 and following.

Remove soft hyphens G262, G263

... of amaranth (Inscr.); hence unfading: I Pe 5:4.†

suspect the hyphen in un-fading was soft hyphen - correct to unfading

... unfading (whence ὁ ἀ., the amaranth, an unfading flower): I Pe 1:4 (cf. MM, VGT, s.v.).†

suspect the hyphen in un-fading was soft hyphen - correct to unfading

'foreign[@n]' tags an English word referring to another Greek entry

In the entry for 'ἀνθ-υπατεύω', the English word 'word' is tagged with foreign, probably so as to to supply it with an n attribute that refers to another entry.

<entry n="ἀνθυπατεύω|G445">
  <note type="occurrencesNT">1</note>
  <form>*† <orth>ἀνθ-υπατεύω</orth></form>
<etym>
  <seg type="derivation">(see next <foreign xml:lang="grc" n="G446">word</foreign>), </seg>
</etym>
  <sense>to be proconsul: <ref osisRef="Acts.18.12">Ac 18:12</ref> Rec. (v.s. <foreign xml:lang="grc">ἀνθύπατος</foreign>).†</sense>
</entry>

The normal usage of this construct seems to be to tag the foreign word and put the Strong's number in the @n. What seems needed here is a structure to refer to the entry.

At a future time, I will write a script to search for other examples of English tagged as foreign.

Typo in original, correct in electronic version?

Should we correct the typo:

I Pe 1:19; etbically,

etbically -> ethically

Appears as "etbically" in the 1922 edition.

Common scan omissions that could be detected and replaced by a script

#The cases below are examples where a script could be written to correct the omissions from the scan process:

" acc " should be " acc. "
" in l)" should be " in l.)"
" acc," should be " acc.,"
" acc)" should be " acc.)"

Character too use for primes when Greek letters are cited as numerals

E.g. in the entry for the alpha:

<sense><gloss>alpha</gloss>, the first letter of the Greek alphabet. As a numeral, <foreign xml:lang="grc">ά</foreign> = 1,

Here the alpha has the tonos rather than the following prime; the beta in its entry has a simple apostrophe. We should use the same character throughout the dictionary.

Lambda (all A-S pages with entries starting with the letter lambda)

Phi (all A-S pages with entries starting with the letter phi)

AutoHotKey Hebrew pointing removed plus many keystrokes to speed editing

Go to site: https://autohotkey.com/
Click “Download” button, then “Installer” button that appears
To Install:
If using Chrome: the file AutoHotkey_1.1.24.03_setup.exe should appear at the bottom of the Chrome browser. Right click over that filename and select open.
If using Firefox select the “Save File” button on the pop-up menu. The select the Downloads arrow on the icon line and then select “Show All Downloads”. Right-click the file in the new window and select “Open Containing Folder”. Right-click that file again and select “Open”.
When Installation starts you will want to select the standard installation which will be the Unicode version for your machine type (32 or 64 bit). When installation is complete, select Exit. Windows Explorer will now be configured to use the toolset.
I have put several ahk files in the attached zip file, the file type for theses must be maintained as ahk. These are defined as:
RemoveHebrewPointing.ahk – Remove the Hebrew pointing of the selected Hebrew text with an Alt+. (Alt key and period) {See note below}
XMLGloss.ahk - Insert “” at the current cursor location with an Alt+g keystroke.
XMLSlashGloss.ahk - Insert “” at the current cursor location with an Alt+h keystroke.
XMLRef.ahk - Insert “” at the current cursor location with an Alt+r keystroke.
XMLSlashRef.ahk - Insert “” at the current cursor location with an Alt+t keystroke.
XMLemph.ahk - Insert “” at the current cursor location with an Alt+e keystroke.
For all but the first file, the file contents are very simple, with an initial definition of the key sequence, followed by an action, in this case a SendRaw command followed by the text that is desired for entry at the current cursor location There is nothing magic about the key sequences that I have defined for these. You can change them as you prefer. I have found that even though the tool supports the use of the Windows key, the Windows operating system seemed to take precedence over what is defined by AHK. I just stayed away from using that key. Once you have an ahk file configured as you desire, or the first time you use what I have attached, you will need to open up the Windows Explorer to the folder where you have stored them. Right-click the appropriate ahk file and select “Run Script”. You are not set up to make use of these hotkeys.
The RemoveHebrewPointing.ahk file is a little more complex and makes use of a subroutine which does the work of removing the Hebrew vowels for the text that is in the copy-paste clipboard buffer. The first action of this file is to do a Cntl+C (copy) which puts the highlighted text into the clipboard buffer. It then calls the subroutine and finally does a Cntl+V (paste) to replace the selected Hebrew text with its vowels removed. This file will remove ALL Hebrew points except shin, sin, dagesh/mapiq, and sof pasuq. I did find that with a final kaf with a sof pasuq, that sof pasuq ends up really being a Shewa, so this function will remove that pointing. I did not want to open up the editing to remove the sheva, because there are many places where we want that preserved. If you are able to select all but that last letter, you apply the function to the remainder of the Hebrew word.
If you have any questions, please post them as comments against this issue.This GitHub toolsest wont let me direcfkt\y attach .ahk files, so I put them in a Zip filr/folder.

AHK_Files.zip

Eta (all A-S pages with entries starting with the letter eta)

page 197
page 198
page 199
page 200

Zeta (all A-S pages with entries starting with the letter zeta)

page 193
page 194
page 195
page 196

Hot-Links missing for some classes of word references

There appear to be 3 classes of instances where hot-links (XML <ref...> tag-pairs) should be present for links to other words in the XML file:

The use of the character "<" for "derived from" should have the following word(s) hot-linked
The word(s) following the "SYN" keyword should be hot-linked
(This needs a confirmation from Todd) The use of the character "=" for "equal to" with following Greek, and not English, should have those following word(s) hot-linked

For each of these cases, only the referenced words that exist in this XML should be hot-linked, otherwise they should NOT BE modified to have the XML <ref...> tag-pair.

This could be an automated task by an XML-smart tool to check for compliance, and to update with the tag-pair, where needed

2 Ἀλεξανδρεύς, -έως, ὁ an Alexandrian: Ac 6:9 18:24.†

be on page 19 and not 20?

Soft hyphen <entry n="ἀνακαίνωσις|G342">

SYN.: παλινγενεσία, in NT, new birth, of which ἀ. is the conse-quent renewal

suspect soft-hypher at conse-quent -> consequent

translatable-exegetical-tools / abbott-smith Goto Github PK

abbott-smith's Introduction

Abbott-Smith - Summary

Source and Copyright

Viewing and Downloading

Contributors

Markup Information

Changelog

abbott-smith's People

Contributors

Stargazers

Watchers

Forkers

abbott-smith's Issues

Recommend Projects

Recommend Topics

Recommend Org