Giter Site home page Giter Site logo

License: MIT Build Status

medspacy

Library for clinical NLP with spaCy.

alt text

MedSpaCy is currently in beta.

Overview

MedSpaCy is a library of tools for performing clinical NLP and text processing tasks with the popular spaCy framework. The medspacy package brings together a number of other packages, each of which implements specific functionality for common clinical text processing specific to the clinical domain, such as sentence segmentation, contextual analysis and attribute assertion, and section detection.

medspacy is modularized so that each component can be used independently. All of medspacy is designed to be used as part of a spacy processing pipeline. Each of the following modules is available as part of medspacy:

  • medspacy.preprocess: Destructive preprocessing for modifying clinical text before processing
  • medspacy.sentence_splitter: Clinical sentence segmentation
  • medspacy.ner: Utilities for extracting concepts from clinical text
  • medspacy.context: Implementation of the ConText for detecting semantic modifiers and attributes of entities, including negation and uncertainty
  • medspacy.section_detection: Clinical section detection and segmentation
  • medspacy.postprocess: Flexible framework for modifying and removing extracted entities
  • medspacy.io: Utilities for converting processed texts to structured data and interacting with databases
  • medspacy.visualization: Utilities for visualizing concepts and relationships extracted from text
  • SpacyQuickUMLS: UMLS concept extraction compatible with spacy and medspacy implemented by our fork of QuickUMLS. More detail on this component, how to use it, how to generate UMLS resources beyond the small UMLS sample can be found in this notebook.

Future work could include I/O, relations extraction, and pre-trained clinical models.

As of 10/2/2021 (version 0.2.0.0), medspaCy supports spaCy v3

Usage

Installation

You can install medspacy using setup.py:

python setup.py install

Or with pip:

pip install medspacy

To install a previous version which uses spaCy 2:

pip install medspacy==medspacy 0.1.0.2

Requirements

The following packages are required and installed when medspacy is installed:

If you download other models, you can use them by providing the model itself or model name to medspacy.load(model_name):

import spacy; import medspacy
# Option 1: Load default
nlp = medspacy.load()

# Option 2: Load from existing model
nlp = spacy.load("en_core_web_sm", disable={"ner"})
nlp = medspacy.load(nlp)

# Option 3: Load from model name
nlp = medspacy.load("en_core_web_sm", disable={"ner"})

Basic Usage

Here is a simple example showing how to implement and visualize a simple rule-based pipeline using medspacy:

import medspacy
from medspacy.ner import TargetRule
from medspacy.visualization import visualize_ent

# Load medspacy model
nlp = medspacy.load()
print(nlp.pipe_names)

text = """
Past Medical History:
1. Atrial fibrillation
2. Type II Diabetes Mellitus

Assessment and Plan:
There is no evidence of pneumonia. Continue warfarin for Afib. Follow up for management of type 2 DM.
"""

# Add rules for target concept extraction
target_matcher = nlp.get_pipe("medspacy_target_matcher")
target_rules = [
    TargetRule("atrial fibrillation", "PROBLEM"),
    TargetRule("atrial fibrillation", "PROBLEM", pattern=[{"LOWER": "afib"}]),
    TargetRule("pneumonia", "PROBLEM"),
    TargetRule("Type II Diabetes Mellitus", "PROBLEM", 
              pattern=[
                  {"LOWER": "type"},
                  {"LOWER": {"IN": ["2", "ii", "two"]}},
                  {"LOWER": {"IN": ["dm", "diabetes"]}},
                  {"LOWER": "mellitus", "OP": "?"}
              ]),
    TargetRule("warfarin", "MEDICATION")
]
target_matcher.add(target_rules)

doc = nlp(text)
visualize_ent(doc)

Output: alt text

For more detailed examples and explanations of each component, see the notebooks folder.

Citing medspaCy

If you use medspaCy in your work, consider citing our paper! Presented at the AMIA Annual Symposium 2021, preprint available on Arxiv.

H. Eyre, A.B. Chapman, K.S. Peterson, J. Shi, P.R. Alba, M.M. Jones, T.L. Box, S.L. DuVall, O. V Patterson,
Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python,
AMIA Annu. Symp. Proc. 2021 (in Press. (n.d.). 
http://arxiv.org/abs/2106.07799.
@Article{medspacy,
   Author="Eyre, H.  and Chapman, A. B.  and Peterson, K. S.  and Shi, J.  and Alba, P. R.  and Jones, M. M.  and Box, T. L.  and DuVall, S. L.  and Patterson, O. V. ",
   Title="{{L}aunching into clinical space with medspa{C}y: a new clinical text processing toolkit in {P}ython}",
   Journal="AMIA Annu Symp Proc",
   Year="2021",
   Volume="2021",
   Pages="438--447"
}

}

Made with medSpaCy

Here are some links to projects or tutorials which use medSpacy. If you have a project which uses medSpaCy which you'd like to use, let us know!

medspacy's Projects

cycontext icon cycontext

A Python implementation of the ConText algorithm for clinical text concept assertion using the spaCy framework

cycontext_old icon cycontext_old

A Python implementation of the ConText algorithm for clinical text concept assertion using the spaCy framework

medspacy_io icon medspacy_io

A collection of modules to facilitate reading text from various sources and writing to various sources.

medspacy_medinfo_2023 icon medspacy_medinfo_2023

Supplementary information for the medspaCy workshop presentation at MedInfo 2023 in Sydney, Australia

pysimstring icon pysimstring

Python Simstring bindings for Linux, OS X and Windows

relation_extraction icon relation_extraction

Spacy components for Adverse Drug Event (ADE) clinical text processing. This work comes from University of Utah work on the n2c2 2018 and MADE 1.0 ADE data challenges.

sectionizer icon sectionizer

A rule-based Python module for spitting documents into sections.

spacy icon spacy

💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.