Giter Site home page Giter Site logo

Project Orientation about cityslavegirls HOT 8 CLOSED

rjp43 avatar rjp43 commented on June 1, 2024
Project Orientation

from cityslavegirls.

Comments (8)

RJP43 avatar RJP43 commented on June 1, 2024

@KariWomack A lot of the information we have right now in the file names will be moved into the TEI Headers of the individual files. In the meantime you can go forward with posting in this issue a systematic way of labeling the different source files so that we have shorter file and folder names. Don't make the changes just yet on the actual files because I still need to pull some of that information into the headers before it is gone and moved over to your better file naming system. In order to receive your 10 pts for this week your task is to leave a comment here listing your suggestions for each of the folder names and a system for naming each source types' files. If you encounter issues comment in this issue so the instructors can take note of your efforts in completing this task. Thanks.

from cityslavegirls.

RJP43 avatar RJP43 commented on June 1, 2024

Great News!!! @abrennr completed the OCR of our third source. We will have to review the files and do regex to clean up the extra characters and general wonky-ness that comes in with OCR text. @KariWomack @spadafour @rCarls @CodyKarch check out the new files here --- notice how for every .pdf file there is now a corresponding .txt file! You will want to consider our "new" source while developing TEI tags.

from cityslavegirls.

ebeshero avatar ebeshero commented on June 1, 2024

@RJP43 @KariWomack @ghbondar I just took part in a command-line "boot camp" here in Pittsburgh, working with Pittsburgh Supercomputer space--and I learned how to apply regular expression patterns to match on file names and change them in their file directories. So, based on what I learned, I know we can quickly change file names by matching on a particular pattern or series of characters (like we do with any regex matching), and we run some commands to loop through a directory to change those files in any way we designate.

SO, think about regex or that consistent patterns we can remove in old file names: Can you identify some patterns?
And what simpler names will make the most functional and human-readable sense for the project?

from cityslavegirls.

KariWomack avatar KariWomack commented on June 1, 2024

Okay, exactly how many of these files need to be renamed? The reason I am asking is because I was wondering if certain things like tables and graphs would be part of other files, or if those need to also be systematically renamed on their own. Also, how do you feel about the commentary file names looking something like: NComm8-21 instead of Nell Nelson_8-21-1888_Commentary. Since all of the commentary files have the same year, al we need to make note of is the day and month, N for Nelson, and Comm for commentary. I also figured we could use the same for the articles but substitute Comm for Art. Comments? @ebeshero

from cityslavegirls.

ebeshero avatar ebeshero commented on June 1, 2024

So, I'm meeting with @RJP43 and @ghbondar now, and we're thinking maybe we want to keep the full date in case we want to add 1889 or 1890 articles later. But we can definitely shorten the names and that's a really good idea. I will suggest perhaps foregrounding the newspaper title in the filename as a standard way to put forward the publication medium: Eventually you may be adding articles from OTHER newspapers (say the New York World), so you want to be able to tell instantly what the source is from the file name, just to make your project development life a little easier!

Think about sorting the files by year in the file directory:

1888-01-25-ChTimes.xml

1890-07-29-NYWorld.xml

We profs recommend changing the PDF file names to match, so you can instantly correlate them.

from cityslavegirls.

ebeshero avatar ebeshero commented on June 1, 2024

Book publications: keep it simple? Try for the books:

BarkleyS01
BarkelyS02
BarkleyS03
...
BarkelyS10
BarkelyS11
etc
(The Barkley pub isn't divided into chapters, and has more than 10 sections. Make your numbering go:
01 ... 10...20...30 so they can be sorted.)

VS.

McEnnisC01
McEnnisC02
McEnnisC03
...
McEnnisC39

from cityslavegirls.

RJP43 avatar RJP43 commented on June 1, 2024

In order to get credit for participation in last week's and this week's project development I would like each of you @spadafour @rCarls @CodyKarch @KariWomack to read one article from the PDF images of the original articles and try your hand at transcribing it. In oXygen open a new XML document and give the document a root element of div type="article" be sure to leave a comment tag or feel free to begin creating a basic TEI header with the publication date included inside. Separate the headlines from the main body of the article with div elements setting the attributes to @type="headlines" and @type="body". Use the self-closing <gap/> element and either a comment tag or logical attribute to indicate words that are difficult to transcribe and a reason as of why. Separate paragraphs with the <p> elements and if advertisements for future newspaper issues related to the Nelson series follow the main body of the article separate those into a separate <div type="advertisement">. Mark with comments (that include your name and date) areas of interest and parts of the article that stand out to you and then push your finished transcription to the OriginalArticle_XML Folder using your desktop client. Once you have completed your transcription (not an easy task so give yourself sufficient time) go into the Anon.WhiteSlaveGirls Folder and hunt for the corresponding Barkley Section (you can best do this by skimming the headlines of the sections in search of headlines that match those from your article). Review your section in comparison with you article and jot down any noticeable differences. Once you have completed all of these tasks comment in Issue 9 giving us which section from the Barkley text a.k.a. Anon.WhiteSlaveGirls corresponds with the article you transcribed. This will be significant project work helping transcribe, begin basic TEI structure tagging, and versioning. This will also give each of you a chance to become better associated with the Nelson project and begin the process of finding interesting things each of you may want to produce data visualizations on for upcoming assignments.

Article Assignments:
@spadafour --- 8/6/1888
@rCarls --- 8/7/1888
@CodyKarch --- 8/8/1888
@KariWomack --- 8/9/1888

Please contact me via email or in this issue with any questions and concerns as they arise.
Thank you!

from cityslavegirls.

RJP43 avatar RJP43 commented on June 1, 2024

I am going to make a new issue better directing how to complete the tasks laid out in the above comment!

from cityslavegirls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.