Giter Site home page Giter Site logo

psaegert / pmtrendviz Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 512 KB

Unsupervised Discovery Of Trends In Biomedical Research Based On The PubMed Baseline Repository

License: MIT License

Shell 0.10% Python 93.11% JavaScript 0.85% TypeScript 0.31% HTML 0.12% Svelte 5.52%
biomedical-text-mining clustering data-science document-representation pubmed pubmed-abstracts text-analysis trends

pmtrendviz's Introduction

Unsupervised Discovery Of Trends In Biomedical Research Based On The PubMed Baseline Repository

Data Science for Text Analytics: Student Project (Public Version)

pytest quality checks

Visual Abstract

Table of Contents

  1. Introduction
  2. Requirements
  3. Getting Started
  4. Usage
  5. Development

Introduction

PubMed is an online database of biomedical literature from MEDLINE, life science journals, and online books. It contains over 35 million citations, covering various areas of research related to biomedicine and health since 1965.

Our work aims to offer easy to access insights into hot research areas, establish structure and organize the vast amount of information available in PubMed. We developed pmtrendviz, a python based text-analytics tool that makes use of document embedding and clustering methods to identify research areas without supervision and derive trends on a per-cluster basis for a number of clusters most similar to a given query.

Requirements

Software

  • Python 3.10
  • Docker 20.10.20
  • Node 19.4.0 (see the instructions)
  • Git LFS (optional, for installing pre-trained pmtrendviz models)
    • On WSL2, you may need to install git-lfs manually, see this thread

Hardware

Minimum

  • 8GB RAM
  • 10GB free disk space

Recommended

  • 32GB RAM
  • 70GB free disk space
  • GPU (optional)

Getting Started

Clone The Repository

git clone https://github.com/psaegert/pmtrendviz.git
cd pmtrendviz

Create A Virtual Environment (optional):

With conda

conda create -n pmtrendviz python=3.10
conda activate pmtrendviz

With venv and pyenv

pyenv install 3.10
pyenv local 3.10
python -m venv .venv

Install

Option 1 (recommended):
Install the entire package with pip:

pip install -e .

Option 2:
If you do not wish to install the package and run the main.py script directly, use the following command to install the dependencies:

pip install -r requirements.txt

Set up Elasticsearch

docker compose up -d es01 [elasticvue]

Note: The es01 service is required for all steps of the pipeline.

Usage

The pmtrendviz pipeline consists of four distinct steps: Data collection, training, prediction, and visualization, which can be run in the following ways:

Crunch the data

Option 1: CLI (recommended)
Check out the CLI Documentation or the minimal CLI example

Option 2: Use pmtrendviz in your own python code
Check out the minimal python example

Visualize the results

To start the visualization, run the start_backend.sh and start_frontend.sh scripts in two separate terminals. Afterwards, open http://localhost:5173/ in your browser, and start typing in the search bar (be patient, it may take a while for the models to load into memory).

Development

Setup

To set up the development environment, run the following command:

pip install -r requirements_dev.txt

Tools

We use

  • flake8 to enforce linting
  • mypy to enforce static typing
  • isort to enforce import sorting
  • pytest to run tests against our code (see tests/)

Pre-Commit Hooks

To set up linting, static typing, whitespace trailing, ordering of requirements.txt and imports when committing, run the following command:

pre-commit install

To run the pre-commit hooks manually, run the following command:

pre-commit run --all-files

Tests can be run with the following command:

pytest

Citation

If you use this code for your own research, please cite our work:

@misc{pmtrendviz,
  author = {Paul Saegert and Philipp Steichen},
  title = {Unsupervised Discovery Of Trends In Biomedical Research Based On The PubMed Baseline Repository},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/psaegert/pmtrendviz}},
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.