Giter Site home page Giter Site logo

sergiog95 / csabstracts Goto Github PK

View Code? Open in Web Editor NEW
10.0 0.0 2.0 265 KB

Dataset of scientific abstracts for the purpose of sentence classification

dataset machine-learning sentence-classification natural-language-processing scientific-abstracts deep-learning

csabstracts's Introduction

CS Abstracts Dataset

Dataset of scientific abstracts for the purpose of sentence classification.

The dataset is composed by a total of 654 abstracts, which were collected from the arXiv platform. Then, using crowdsourcing and collective inteligence, the data was annotated, where the sentences were categorized in the following classes:

  • Background
  • Objective
  • Methods
  • Results
  • Conclusions

All the abstracts are from the Computer Science field (hence the name CS Abstracts). The following table details the composition of the dataset, with respective numbers of abstracts and sentences splitted by training set, validation set and test set:

Training set Validation set Test set
#abstracts 500 77 77
#sentences 3287 824 619

The dataset is split in three files (train.txt, validation.txt and test.txt).

Each entry in the file corresponds to one entry in the dataset and each column is separated by a tab. The first column is the position of the sentence of the abstracts, the second column is the label/category of the sentence of the abstracts, and the third column is the text itself of the abstract sentence.

The dataset was developed in the context of my Master's Thesis in Engineering and Management of Information Systems.

Citation Request

This dataset is made freely available for research purposes. Please include this citation if you plan to use this database:

Gonçalves, S., Cortez, P., & Moro, S. (2019). A deep learning classifier for sentence classification in biomedical and computer science abstracts. In Neural Computing and Applications, In press, http://dx.doi.org/10.1007/s00521-019-04334-2

csabstracts's People

Contributors

sergiog95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.