anthony9624 / mlee Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
This package contains the data of the MLEE (Multi-Level Event Extraction) corpus, version 1.0.2 (revision 1). This README provides a brief overview of the package contents. See the LICENSE file included in the package for the data license, the manuscript referenced at the bottom of this file for an introduction of the corpus, and the project homepage http://www.nactem.ac.uk/MLEE/ for data visualizations, supplementary data and more information. CONTENTS This package contains the following: * README: this file * LICENSE: licenses of the texts and annotations * standoff: corpus data in standoff format (all annotations) * conll: corpus data in CoNLL format (entity annotations only) Both of the standoff/ and conll/ directories contain the following subdirectories: * development: development split of data, excluding test set * test: test split of the data, including all data * full: full corpus data Each of the development/ and test/ directories further contain the following: * train: training data for development/final test * test: test data for development/final test The format and suggested use of the files contained in these directories is explained below. FORMAT The corpus data is provided in two formats: BioNLP Shared Task-style standoff format, and CoNLL shared task-style BIO-format. Standoff format The data in the standoff/ directory are provided in the standoff format used by the brat annotation tool (http://brat.nlplab.org/). For details of the format, see the documentation page http://brat.nlplab.org/standoff.html For the full corpus data in standoff/full/, all standoff annotations for a single text file are provided in a single file (.ann). For the data in standoff/development/ and standoff/test/, the annotations are split into entity annotations (.a1) and event annotations (.a2). This is intended to faciliate event extraction experiments where entity annotations are provided as part of the input. CoNLL format The data in the conll/ directory is provided in the column-formatted BIO representation used in many reference resources for mention detection such as that of the CoNLL shared tasks (see e.g. http://www.cnts.ua.ac.be/conll2002/ner/). Each line contains four TAB-separated columns: token text, start offset, end offset, and tag. Each tag consist of one of the letters B, I or O (for "begin", "in", and "out"), and the type of the entity for the B and I tags. (The offsets into the source text are provided for reference and can be ignored for most applications.) The entity mention detection task is to learn to predict the tags (last column) given the token texts (first column). EVALUATION The corpus is intended to serve as an evaluation standard. The proposed approach to method development and evaluation is to use the test/ data only for final evaluation after completing method development and parameter selection. PLEASE NOTE: the data in the development/ and test/ directories are not separate: the development/ data is a split of the test/train/ data. CONTACT For any queries relating to the corpus, please contact Sampo Pyysalo <[email protected]> CHANGELOG * 1.0.2 (11.09.2012): first public release REFERENCES The corpus is presented in the following manuscript. * Sampo Pyysalo, Tomoko Ohta, Makoto Miwa, Han-Cheol Cho, Jun'ichi Tsujii and Sophia Ananiadou (2012). Event extraction across multiple levels of biological organization. Bioinformatics 28(18):i575-i581. The project page is located at http://www.nactem.ac.uk/MLEE/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.