dbamman / litbank Goto Github PK
View Code? Open in Web Editor NEWAnnotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.
Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.
Minor (almost aesthetic) point given they're already under a "tsv" folder. On a side note, I wonder if a combination of text-span and relation annotations would allow for a brat-compatible version of this.
Hello! Thank you for making this really cool dataset publicly available :)
I'm trying to align the annotations and the original text, could you please specify what tokenizer was used to produce the dataset? So far I can't get it quite right. Or is there perhaps an easier way to align original texts and annotations that I'm missing? Thanks in advance
Hello everyone and thank the authors for the dataset!
I noticed that in some tsv-files, for example, here, file formatting went wrong. So, it becomes difficult to parse the texts and work with them after.
Can you fix this problem or maybe we can cooperate and solve it?
Hi,
We noticed that the quotation data does not contain a lot of lines, for many files it doesn't contain any (in the brat format).
Is this example data, and if so, how do we generate the full data?
Thanks
It would help to have the conf files for brat along with, if you have those too...
spaCy is quickly becoming the most popular NLP module in Python, but alas, I cannot locate any models for spaCy specifically trained on literary data. Are there any plans to train a spaCy model using this dataset?
Hi,
Thanks for creating this awesome resource.
I was trying to confirm some of the stats mentioned in the paper from the data.
My calculations match on the total number of mentions, which is 29,104, but the singleton percentage is coming out to be 19.8% instead of the 17.4% reported in the paper.
Not sure where I'm going wrong.
Thanks,
Shubham
Hi,
Can you give some more explanation on how to use the cl-coref annotator to generate corefrence resolution training data?
Thanks in advance!
Hello, thank you very much for such a great data set of open source. I wonder if you can provide a simple experiment of coreference resolution?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.