Giter Site home page Giter Site logo

varganti / imeds-temporal-score Goto Github PK

View Code? Open in Web Editor NEW

This project forked from afbarnard/imeds-temporal-score

0.0 0.0 0.0 160 KB

Temporal score adverse drug event discovery research method for IMEDS

License: Apache License 2.0

Python 90.61% Shell 9.39%

imeds-temporal-score's Introduction

Temporal Score IMEDS Method

Description

The Temporal Score IMEDS Method is a program that evaluates the adverse drug event likelihood of drug-condition pairs and outputs each pair with its counts and scores in CSV format. The program only runs with an Oracle database having electronic medical records data in IMEDS common data model (CDM) format. This software is submitted as a research method of the Innovation in Medical Evidence Development and Surveillance (IMEDS) program.

This software implements the temporal score described on page 4 of the following paper. A detailed description of the temporal score is in its own section below.

Identifying Adverse Drug Events by Relational Learning. David Page, Vitor Santos Costa, Sriraam Natarajan, Aubrey Barnard, Peggy Peissig, Michael Caldwell. AAAI 2012. http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4941

For more information on the IMEDS program and the research methods it and collaborators are developing see http://imeds.reaganudall.org. Historical information can be found at the site of IMEDS's predecessor, the Observational Medical Outcomes Partnership (OMOP) http://omop.org.

License

The Temporal Score IMEDS Method is free, open source software. It is released under the Apache License, Version 2.0, a copy of which can be found as LICENSE.txt in your distribution as a sibling of this file.

Requirements

  • Python 2.7
  • Oracle database with data in IMEDS CDM (version 2) format
  • sqlplus, the Oracle client

How to Use

Download this project and extract it into your preferred directory. Then run the program at the command line like in the following example:

$ python2.7 <...>/temporalScore.py --db-user <oracle-username> -p <params-file> <drug-IDs-file> <condition-IDs-file> > <report-file>

The normal operation is to collect counts from the database for each drug-condition pair and output those counts with computed scores to standard output (or a file). The results table is also left in the database for later inspection or processing. For more information on how to run the program, access the command line help:

$ python2.7 <...>/temporalScore.py -h

Parameters File

Database and algorithm parameters can be specified in a configuration file. The file is in "config" format without any section headers. The parameters are described below and some can also be specified on the command line. Any command line arguments override any settings from the parameters file.

It is possible to run this software without a parameters file, but one will usually have to specify the names of the drug and condition era tables.

  • dbConnectionName: Identifier of the Oracle DB connection to use. Default is 'lsomop'. Also settable on the command line.
  • dbUser: Oracle username. Default is to prompt. Also settable on the command line.
  • dbPass: Oracle password. Default is to prompt. Also settable on the command line.
  • dbSchemaName: Schema to use for temporary tables and the results table. Defaults to the user schema. Also settable on the command line.
  • drugEraTableName: Name of table containing drug era records. Specify a fully-qualified table name if the table is not in the specified schema. Default is 'drug_era'.
  • condEraTableName: Name of the table containing condition era records. Specify a fully-qualified table name if the table is not in the specified schema. Default is 'condition_era'.
  • conditionWindowStart: Number of days after a drug occurrence to allow the earliest associated condition occurrence. Use a negative number to make the window start before the drug. Default is -100,000.
  • conditionWindowEnd: Number of days after a drug occurrence to allow the latest associated condition occurrence. Default is 100,000.
  • drugOccurrenceOffset: Number of days to shift the date of drug occurrences. Default is 0.
  • pseudocount: Pseudocount to add to all counts to avoid zero counts. Default is 1.
  • countsScoresTableName: Name of the table to contain the results report. Default is 'counts_scores'.
  • reportFileName: Name of the file to contain the results report. Default is standard output.

Report Format

This software reports its results as a table in the Oracle DB (see the 'countsScoresTableName' parameter) and also outputs the table in CSV format (see the 'reportFileName' parameter). These are the fields of the table.

  • drug: Drug ID
  • cond: Condition ID
  • ct_d_bef_c: Count of people having the drug before the condition
  • ct_c_bef_d: Count of people having the condition before the drug
  • ct_d_c: Count of people having both the drug and condition
  • ct_d_bef_anyc: Count of people having the drug before any of the conditions
  • ct_d_anyc: Count of people having the drug and any of the conditions
  • ct_anyd_bef_c: Count of people having any of the drugs before the condition
  • ct_anyd_c: Count of people having any of the drugs and the condition
  • ct_d: Count of people having the drug
  • ct_c: Count of people having the condition
  • ct_ppl: Count of people having any of the drugs and conditions
  • temporal_score: Temporal score

One can use the above counts to do further epidemiology-style 2-by-2 table analysis.

Temporal Score Explanation

The temporal score is an evaluation of the likelihood that a drug-condition pair is an adverse drug event. The evaluation starts with defining sets of drugs and conditions, 'D' and 'C'. Then for every drug-condition pair '(d,c)' in the Cartesian product 'D x C', the temporal score for that pair is computed as in the following equation where 't' is the start time of a drug or condition occcurrence.

Pr(t_d < t_c | d, c) /
(Pr(t_d < t_C | d, C) * Pr(t_D < t_c | D, c))

The term in the numerator estimates the probability that the drug occurs before the condition in patients that have both. The terms in the denominator estimate (1) the probability that the drug occurs before any of the conditions in patients that have the drug and any of the conditions and (2) the probability that any of the drugs occur before the condition in patients that have any of the drugs and the condition. The denominator terms "normalize" for drugs that commonly occur with many conditions and conditions that commonly occur with many drugs.

The above probabilities are estimated by counting patients with the desired characteristics and considering only the first occurrence of a drug or condition in a patient. Pseudocounts are added to ensure positive estimates. Thus, having limited the data to only first occurrences, the temporal score is estimated with the following equation in terms of counts.

(#(t_d < t_c) / #(d, c)) /
( (#(t_d < t_C) / #(d, C)) * (#(t_D < t_c) / #(D, c)) )

One pseudocount ('m') is added to the numerator of each probability estimate and two pseudocounts are added to each denominator. To illustrate, consider the first probability estimate in the following form. (The two other probability estimates are treated analogously.)

(#(t_d < t_c) + m) / (#(t_d < t_c) + m + #(t_d >= t_c) + m)

We have found that the temporal score works better when there are more drugs and conditions, presumably because the normalizing terms are more accurate.

Contact

Contact me about this software through GitHub. First, see if there are any relevant issues. If not, then open a new issue to report a bug or ask a new question. If you have academic inquiries you can find my e-mail address in the git log.

Copyright (c) 2014 Aubrey Barnard. This is free software. See LICENSE.txt for details.

imeds-temporal-score's People

Contributors

afbarnard avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.