Giter Site home page Giter Site logo

litbank's People

Contributors

dbamman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

litbank's Issues

Quotation TSV file names have "brat" component

Minor (almost aesthetic) point given they're already under a "tsv" folder. On a side note, I wonder if a combination of text-span and relation annotations would allow for a brat-compatible version of this.

Original vs annotated alignment

Hello! Thank you for making this really cool dataset publicly available :)

I'm trying to align the annotations and the original text, could you please specify what tokenizer was used to produce the dataset? So far I can't get it quite right. Or is there perhaps an easier way to align original texts and annotations that I'm missing? Thanks in advance

Formatting in tsv-files went wrong

Hello everyone and thank the authors for the dataset!

I noticed that in some tsv-files, for example, here, file formatting went wrong. So, it becomes difficult to parse the texts and work with them after.

Can you fix this problem or maybe we can cooperate and solve it?

Quotation Data

Hi,

We noticed that the quotation data does not contain a lot of lines, for many files it doesn't contain any (in the brat format).

Is this example data, and if so, how do we generate the full data?

Thanks

spaCy Model?

spaCy is quickly becoming the most popular NLP module in Python, but alas, I cannot locate any models for spaCy specifically trained on literary data. Are there any plans to train a spaCy model using this dataset?

Coref: Percentage of singletons

Hi,

Thanks for creating this awesome resource.
I was trying to confirm some of the stats mentioned in the paper from the data.
My calculations match on the total number of mentions, which is 29,104, but the singleton percentage is coming out to be 19.8% instead of the 17.4% reported in the paper.
Not sure where I'm going wrong.

Thanks,
Shubham

about the coreference resolution

Hello, thank you very much for such a great data set of open source. I wonder if you can provide a simple experiment of coreference resolution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.