Giter Site home page Giter Site logo

skimlit's Introduction

SkimLit: Abstract Sentence Classification for Medical Literature

Project Overview

In this project, we aim to replicate the deep learning model presented in the 2017 paper PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts. The goal is to classify sentences within medical abstracts into specific roles (e.g., objective, methods, results) to facilitate efficient reading and comprehension of research articles.

Problem Statement

The growing number of randomized controlled trial (RCT) papers poses challenges in efficiently reviewing the literature, particularly for unstructured abstracts. We aim to address this by developing an NLP model that classifies abstract sentences into their respective roles, aiding researchers in quickly skimming through the literature while allowing for in-depth exploration when needed.

Project Structure

1. Dataset

We start by downloading the PubMed RCT200k dataset from GitHub, which serves as the foundation for our model training and evaluation.

2. Data Preprocessing

We develop a preprocessing function to prepare the dataset for modeling, including tokenization and embedding creation.

3. Baseline Model

We create a TF-IDF classifier to establish a baseline for our modeling experiments.

4. Deep Learning Models

We experiment with various deep learning models, incorporating different combinations of token embeddings, character embeddings, pretrained embeddings, and positional embeddings.

5. Multimodal Model

We build a multimodal model, replicating the architecture outlined in the research paper, which takes multiple types of data inputs.

6. Model Evaluation and Analysis

We evaluate the models and identify the most incorrect predictions to gain insights and improve model performance.

7. Predictions on New Data

Finally, we use our trained model to make predictions on unseen PubMed abstracts, demonstrating the model's practical application.


skimlit's People

Contributors

akhtarshadab avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.