dbamman / litbank Goto Github PK

Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.

Python 99.86% Shell 0.14%

litbank's People

Contributors

Stargazers

Watchers

Forkers

barseghyanartur yushu-liu hhy5277 codeaudit zorrock olivialewke fighting41love rxlian chengyjon sidney1994 alderpaw r-craft ashora emanuelaboros dragomirradev ml-ai-nlp-ir awoziji hanxiao-chen sanger640 tollefj whitewolfkings 21dmohan jiasir803 lauwauw jayanta47 darthbhyrava ukcagreen xiwuhan marconaguib alabrashjr shirelch vrnanshuman techthiyanes kaushikepi kumar-shridhar siyan-sylvia-li devraj-k hamzakhan78600 yofayed buffy05 tt-sk lucy3 plum-yin zhenjiechu ad2000x

litbank's Issues

Quotation TSV file names have "brat" component

Minor (almost aesthetic) point given they're already under a "tsv" folder. On a side note, I wonder if a combination of text-span and relation annotations would allow for a brat-compatible version of this.

Original vs annotated alignment

Hello! Thank you for making this really cool dataset publicly available :)

I'm trying to align the annotations and the original text, could you please specify what tokenizer was used to produce the dataset? So far I can't get it quite right. Or is there perhaps an easier way to align original texts and annotations that I'm missing? Thanks in advance

Formatting in tsv-files went wrong

Hello everyone and thank the authors for the dataset!

I noticed that in some tsv-files, for example, here, file formatting went wrong. So, it becomes difficult to parse the texts and work with them after.

Can you fix this problem or maybe we can cooperate and solve it?

Quotation Data

Hi,

We noticed that the quotation data does not contain a lot of lines, for many files it doesn't contain any (in the brat format).

Is this example data, and if so, how do we generate the full data?

Thanks

What about posting your Brat Conf files?

It would help to have the conf files for brat along with, if you have those too...

spaCy Model?

spaCy is quickly becoming the most popular NLP module in Python, but alas, I cannot locate any models for spaCy specifically trained on literary data. Are there any plans to train a spaCy model using this dataset?

Coref: Percentage of singletons

Hi,

Thanks for creating this awesome resource.
I was trying to confirm some of the stats mentioned in the paper from the data.
My calculations match on the total number of mentions, which is 29,104, but the singleton percentage is coming out to be 19.8% instead of the 17.4% reported in the paper.
Not sure where I'm going wrong.

Thanks,
Shubham

generating tsv files out of brat.ann and brat.txt files

Hi,

Can you give some more explanation on how to use the cl-coref annotator to generate corefrence resolution training data?
Thanks in advance!

about the coreference resolution

Hello, thank you very much for such a great data set of open source. I wonder if you can provide a simple experiment of coreference resolution?

dbamman / litbank Goto Github PK

litbank's People

Contributors

Stargazers

Watchers

Forkers

litbank's Issues

Quotation TSV file names have "brat" component

Original vs annotated alignment

Formatting in tsv-files went wrong

Quotation Data

What about posting your Brat Conf files?

spaCy Model?

Coref: Percentage of singletons

generating tsv files out of brat.ann and brat.txt files

about the coreference resolution

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent