Giter Site home page Giter Site logo

asreview / paper-megameta-postprocessing-screeningresults Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 654 KB

The repository is part of the so-called, Mega-Meta study on reviewing factors contributing to substance use, anxiety, and depressive disorders. This repository contains the scripts for Post-Processing the screening results.

Home Page: https://www.asreview.ai

License: MIT License

R 79.87% Jupyter Notebook 14.93% Python 5.20%
asreview mega-meta systematic-review deduplication

paper-megameta-postprocessing-screeningresults's Introduction





ASReview: Active learning for Systematic Reviews

Systematically screening large amounts of textual data is time-consuming and often tiresome. The rapidly evolving field of Artificial Intelligence (AI) has allowed the development of AI-aided pipelines that assist in finding relevant texts for search tasks. A well-established approach to increasing efficiency is screening prioritization via Active Learning.

The Active learning for Systematic Reviews (ASReview) project, published in Nature Machine Intelligence implements different machine learning algorithms that interactively query the researcher. ASReview LAB is designed to accelerate the step of screening textual data with a minimum of records to be read by a human with no or very few false negatives. ASReview LAB will save time, increase the quality of output and strengthen the transparency of work when screening large amounts of textual data to retrieve relevant information. Active Learning will support decision-making in any discipline or industry.

ASReview software implements three different modes:

  • Oracle Screen textual data in interaction with the active learning model. The reviewer is the 'oracle', making the labeling decisions.
  • Exploration Explore or demonstrate ASReview LAB with a completely labeled dataset. This mode is suitable for teaching purposes.
  • Simulation Evaluate the performance of active learning models on fully labeled data. Simulations can be run in ASReview LAB or via the command line interface with more advanced options.

Installation

The ASReview software requires Python 3.8 or later. Detailed step-by-step instructions to install Python and ASReview are available for Windows and macOS users.

pip install asreview

Upgrade ASReview with the following command:

pip install --upgrade asreview

To install ASReview LAB with Docker, see Install with Docker.

How it works

ASReview LAB explained - animation

Getting started

Getting Started with ASReview LAB.

ASReview LAB

Citation

If you wish to cite the underlying methodology of the ASReview software, please use the following publication in Nature Machine Intelligence:

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3, 125โ€“133 (2021). https://doi.org/10.1038/s42256-020-00287-7

For citing the software, please refer to the specific release of the ASReview software on Zenodo https://doi.org/10.5281/zenodo.3345592. The menu on the right can be used to find the citation format of prevalence.

For more scientific publications on the ASReview software, go to asreview.ai/papers.

Contact

For an overview of the team working on ASReview, see ASReview Research Team. ASReview LAB is maintained by Jonathan de Bruin and Yongchao Terry Ma.

The best resources to find an answer to your question or ways to get in contact with the team are:

PyPI version DOI Downloads CII Best Practices

License

The ASReview software has an Apache 2.0 LICENSE. The ASReview team accepts no responsibility or liability for the use of the ASReview tool or any direct or indirect damages arising out of the application of the tool.

paper-megameta-postprocessing-screeningresults's People

Contributors

jteijema avatar lhofstee avatar rensvandeschoot avatar sagevdbrand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

paper-megameta-postprocessing-screeningresults's Issues

Conservative deduplication does not run

It appears that there is an issue with the conservative deduplication strategy:
After the doi_retrieval in Python and loading the data back into R for the deduplication part, a few columns were added

> # IMPORTING RESULTS
> ## from doi retrieval 
> df <- read_xlsx(paste0(OUTPUT_PATH, DOI_RETRIEVED_PATH))
New names:
* `` -> ...1

which caused a hiccup in the conservative deduplication part:

New names:
* ...1 -> ...6
New names:
* ...1 -> ...6
 Error: not compatible: 
not compatible: 
- Cols in y but not x: `...1`.
- Cols in x but not y: `...6`.

Run `rlang::last_error()` to see where the error occurred. 

This issues causes the conservative deduplication function to fail and therefore, needs repairment.

solve duplicates

After merging the three datasets, it appeared there are still some duplicates in the dataset. This holds for relevant papers, for irrelevant papers, and for unseen papers. We would need a script that searches for more DOIs, for example in Crossref, so that we can apply another round of deduplication based on DOIs and a script for title-matching.

add file with requirement

A file requirements.txt needs to be added containing a list of required R-packages including version information.

create two datasets

Can you create two datasets as output:

  • one with all the information for the quality checks
  • one clean dataset which can be used for future studies

For the second dataset there should be five columns:

  • (ir)relevant for topic area 1-3 (output of the combined screening phases using ASReview)
  • misclassified (as part of the quality checks 1->0 or 0->1)
  • final label which can be used for future studies

In this second dataset, records should appear only once.

rlang returns an error

This code chunk returns an error:

# First pivot the title and doi columns
mismatch_included_no_source <- mir %>% 
  select(-contains("source")) %>%
  pivot_longer(cols = ends_with(c("title","doi")),
    names_to = c("intended_subject", ".value"),
    names_pattern = "(.+)_(.+)"
  )

error:

Error: `cols` must select at least one column.
Run `rlang::last_error()` to see where the error occurred.

request for descriptive stats

I would very much like to obtain a table with descriptive statistics including:

Generic stats:

  • total number of records per subject area
  • missing information (abstracts, titles, DOI)
  • number of prior relevant/irrelevant papers used in the first phase
  • number labelled records in the first phase (plus % relevant)
  • number of labelled records in the second phase (plus % relevant)

Quality stats:

  • number of irrelevant papers which appeared to be relevant after screening by a 2nd screener
  • number of relevant papers which appeared to be irrelevant after screening by a 2nd screener

Data for quality check 2 is incomplete

This issue is meant as an extra reminder that the data for quality check 2 (articles which have been incorrectly included) should be updated! Thus far we are working with the preliminary results.

This issue can be resolved when the final data for quality check 2 is available and the master-script is adapted to import this instead of the preliminary results.

Quality checks unclear

While reading your impressive documentation, it remains unclear to me what you did in step 4 of the post-processing, 'Deal with noisy labels corrected in two rounds of quality checks'.
I cannot find a script with a similar name or an explanation.
Could you point me to where it is, or if not add it to the documentation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.