Giter Site home page Giter Site logo

impactpharmacie's Introduction

Impact Pharmacie

We identify high-quality publications demonstrating the impact of pharmacists on health outcomes. Impact Pharmacy is transparent, reproducible, and evidence-based.


Context

The amount of scientific literature describing the impact of pharmacists on health outcomes is increasing. Pharmacists have limited time to keep up with current evidence. We offer a curation service for these publications by identifying high-quality publications through a rigorous, transparent and reproducible methodology. We broadcast these publications weekly in our mailing list. Through this methodology, we are building a dataset of publications related to pharmacy practice. The features of this dataset are the title and abstracts of all publications retrieved by our search strategy. The labels indicate whether or not this publication met our inclusion criteria (as determined indepentently by two reviewers and through consensus when there is disagreement), the study design, the pharmacy practice field for which this publication is applicable as well as the practice setting (see details in methodology). This dataset grows every week. The complete dataset is available as csv files updated weekly within the present repository under the data/second_gen directory. Alternatively, the complete extraction logs and machine learning predictions and ground truths, as well as ratings for the current week, can be found in a Google Spreadsheet here. We develop machine learning models to help determine if a given abstract meets our inclusion criteria, and to predict its design, field and setting labels. We plan to improve upon these models as our new dataset grows. This repository includes

  • The raw data extracted from the first generation platform
  • The dataset from our current generation platform
  • The code used to transform the original data into a machine learning compatible dataset
  • The code to train and evaluate our machine learning models
  • The code to build our current dataset as well as to update our website and to generate our mailing list.

Files

Files contained in the credentials directory

These files are examples (containing no actual credentials) of how to create json files containing the authentication credentials for each API acessed by our scripts.

Files contained in the data/first_gen directory

These files are the raw HTML scraped from the first generation Impact Pharmacie website and are used by the create_impact_dataset.ipynb notebook to generate the machine learning dataset.

Files contained in the data/second_gen directory

These CSV files contain:

  • A log of the data extractions made from PubMed (extraction_log.csv).
  • The titles, abstract texts and ratings of all papers evaluated with our methodology (ratings.csv).
  • The machine learning tag predictions and ground truths for papers which met our inclusion criteria (predictions.csv).
  • The exclusion thresholds for the model that predicts if a paper can be automatically excluded or needs to be reviewed manually, as well as associated metrics computed when determining the threshold (thresholds.csv).

create_impact_dataset.ipynb

This notebook creates a dataset containing as features the titles and abstracts from publications included in the original Impact Pharmacie website, as well as titles and abstracts from publications that were not included.

inclusion_basic_models.ipynb

This notebook uses scikit-learn to train and evaluate various classifier models on the second generation dataset to determine if the paper should be included in Impact Pharmacie.

inclusion_transformers.ipynb

This notebook uses Hugging Face Transformers to train and evaluate transformer models on the second generation dataset to determine if the paper should be included in Impact Pharmacie. The notebook also includes the training of production models.

labels_basic_models.ipynb

This notebook uses scikit-learn to train and evaluate a large number of "classic" machine learning models on the original dataset for label predictions (design, field and setting of included papers).

labels_transformers.ipynb

This notebook uses Hugging Face Transformers to train and evaluate transformer models on the original dataset for label predictions (design, field and setting of included papers). These models performed better than the basic models, therefore the notebook also includes the training, evaluation and explainability testing of models on the test set as well as the training of production models.

update_data.py

This script is used to build our new dataset by performing an automated PubMed search.

update_site.py

This script is used to update our website from our dataset and to generate our newsletter.

update_inclusion_model.py

This script is used to update the model that predicts if abstracts can be excluded automatically or need to be reviewed for inclusion.

Prerequisites

Developed using Python 3.9.7

Requires:

  • BeautifulSoup
  • Gspread
  • Hugging Face Datasets
  • Hugging Face Transformers
  • Numpy
  • Pandas
  • PyTorch
  • Ray Tune
  • Scikit-learn
  • Scipy
  • Shap
  • Tqdm

Contributors

Maxime Thibault Cynthia Tanguay

References

License

GNU GPL v3

Copyright (C) 2022 Maxime Thibault, Cynthia Tanguay.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

impactpharmacie's People

Contributors

thibaultmax avatar

impactpharmacie's Issues

Integrate Google credential verification into the scripts

Currently the script to update Google credentials ( ./credentials/get_gspread_creds.py ) is separate from the data update and site update scripts. We need to move this into the scripts by verifying that the credentials are still valid and if not go through OAuth to update the credentials.

Add potentially predatory journals

  • Update code to include:

    • When updating site, obtain and store predatory publisher and journal list
    • Compare lists to previously stored versions and alert if change.
    • When obtaining pubmed data for publication, follow DOI link and compare result to list.
    • If potentially predatory, alert to determine if tag or not
    • If tagged, include Wordpress tags, link in impact briefing and publication post to method page
  • Update website

    • New page to explain
    • Link in method

Update website methodology to account for data storage in the repo

Update the methodology page on the website to reflect that:

  1. The dataset is now "officially" stored as CSV files in the GitHub repo (under the /data/second_gen directory)
  2. The spreadsheet now only contains ratings for the current week, other ratings will only be found in the CSV files in the repo.

Add inclusion model into workflow

Add following enhancements:

-When updating data, call predictions from the inclusion model and log the inclusion score, suggestion based on thresholds and model version.
-For now, hide the three new columns after updating the data to run these predictions silently for prospective model evaluation.
-After updating the website, create a new inclusion model version with the newly updated inclusion data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.