cost-eltec / eltec-fra Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 9.0 229.37 MB

French novel collection for the ELTeC (European Literary Text Collection)

Makefile 100.00%

french novel xml-tei

eltec-fra's People

Contributors

Stargazers

Watchers

Forkers

liversausagechimp zeta-and-company sieversmartin rankastankovic sheiden daisy-1911 sdolv riri62300 milicaik

eltec-fra's Issues

FRA05702: Fix basic errors

Missing chapter divs.
Check split paragraphs.
Fix spelling / OCR errors generally.

This is definitely a long term request, but I just noticed that only 8 of files contain links to their print source page images in Gallica. I am sure others are available, and it would be nice to see their arks too, one day...

Replace novels to increase variety

A number of novels should be replaced with others. The number of authors represented with 3 novels should only be 10 authors (30 novels). The remaining 70 novels should, as far as possible, be written by a different author each.

author@ref syntax

Where you want to supply codes for an author in two different authority files, pointers should be separated by white space, not semicolons. For exanple, in FRA06501_Gyp.xml:

<author ref="viaf:96218572;wikidata:Q199228">Gyp ... </author>

should be

<author ref="viaf:96218572 wikidata:Q199228">Gyp ... </author>

See definition for this element in the schema.

All files: retrieve and add/check reprintCount category

Run (possibly improve) worldcat script
check, revise, add reprintCount category label
Upload all reprintCount data to repo as well

All files: one round of pretty-printing

All files with the same pretty-printing.

FRA01702: remove linebreaks inside p

FRA01601: Encode notes using "ref"

FRA01701: remove linebreaks inside p

FRA00401: issue with footnotes (LB)

In text FRA00401 the footnotes
a) are all in a div type=liminal instead of a div type=notes
b) are not tagged as elements
c) do not appear to be linked from the text (i.e. I can't find any elements pointing at them)
I don't know how pervasive this problem is.

Several files are invalid : please fix

Mostly caused by <milestone/> without a @Unit

Generally: turn "hi" into semantic encoding where possible

Add more novels to reach 100.

FRA03201_Blandy: check <pb /> for words and figures

Add wikidata + VIAF identifiers (for authors, novels) as well

Just an idea: add wikidata and VIAF identifiers for all novels and authors. Might make integration of our (meta)data with wikidata easier. DraCor relies heavily on wikidata, a later integration into a next-level DraCor will then be easier. Not urgent at this time.

All files: create TXM binary corpus from complete set

Check integrity of Ponson_Baronne

All files: improve sourceDesc

firstEdition with year
digitalSource with a URL

Better and more homogeneous metadata (e.g. for use in HTML display)

Encoded by; words; pages (in "digital source" or "first edition"); available on Zenodo, community, DOI; narrative perspective; genre;

Add narrative-perspective to "profileDesc/textClass/keyword/term"

narrative-perspective: heterodiegetic, homodiegetic, autodiegetic, epistolary

All volumes of "Jean-Christophe" instead of just one? (Romain Rolland)

issue with filenames and identifiers

Concerns FRA008XX and FRA038XX filenames and xml:ids.

FRA02303: remove linebreaks inside p

All files: check for milestones that mark letters / poems.

FRA01402: encode notes using "ref"

Add digitalSource

Add info about digitalSource (especially "publisher") more widely, so that it can be picked up by the metadata extraction (and of course for better provenance documentation).

Check that all quotes have an @type

ID and filename must match!

My post processing fell over in a heap because the root element of the file named FRA0704_Sand has an xml:id of FRA0703. I fixed it by renaming the file: you may wish to check that this is what you intended!

FRA01601_GautierJ: Encode notes at the end of chapters (" ↑ ")

FRA00901_Daudet: Sort out quotation marks

FRA06301 : milestones not marked

here and elsewhere in this text

<p><hi rend="italic"/>\<hi rend="italic"> \</hi> \<hi rend="italic"/>*</p>

should be a <milestone> methinks