Giter Site home page Giter Site logo

adataviewer's Introduction

ADataViewer

Source code and scripts to process the datasets using in the paper: ADataViewer: Exploring Semantically Harmonized Alzheimer’s Disease Cohort Datasets.

Artifacts

DOI

Summary statistics

  • The summary statistics were computed here (quantiles) and here (mean).
    • There are 4 distinct scripts that are used to compute the summary statistics.
    • Each script is dedicated to select a subset of participants based on a diagnostic group except one where we take all diagnoses into account. For instance "AD_quantiles_all_datasets-v100.ipynb" is used to compute summary statistics of the selected features for participants that were diagnosed with AD. Note: we restrict the datasets to only baseline visit here.
    • The computed summary statistics are then saved as a CSV file here.

Using the CSV tables we illustrated the following table.

MicrosoftTeams-image (11)

Ethnicity

  • Using this script, we make a table that would be used to generate pie charts for the ethnicity distribution of any cohort (shown in the figure below)
  • The table made by this script is saved here

Screenshot from 2021-07-19 19-06-50

Modality

  • To check for the available modality and plot the ranks, we can use this script Cohorts_data_availability (2)

Biomarkers

  • To visualize the biomarker distribution, we generated tables that contain measurements of participants for each feature in all datasets
    • The categorical features are Sex, APOE4 and CRD
    • All harmonized numerical features are included
  • There are dedicated scripts for each diagnostic group as well as one script for all diagnoses
  • Each script saves the tables where the columns are the "feature + name of cohort" and rows contain the measurements of participants

The tables are then used to generate boxplot for numerical features and stacked barplot for categorical features. Figure below is an example of boxplots. Screenshot from 2021-07-20 15-08-29

Longitudinal assesment

  • To investigate the number of visits for each study cohort as well as the number of patients in each diagnosis stage, use this script all_line_long_plot

Number of conversions from one state to another

  • We can compute the transition from one diagnostic state to another using this

The following table illustrates the number of patients for each transition

Screenshot from 2021-07-21 15-52-54

StudyPicker, Longitudinal Follow-up, Mappings

To enable a user-friendly tool for assessing cohort studies that could be compatible to use as training and validation datasets, we investigated the feature overlap across our datasets. Additionally, the number of available measurements for each follow-up visit was computed to enable visualization of collected measurements through the study length (script).

Inputs:

  • Harmonized feature scape across the cohorts
  • Merged dataset of each cohort

Outputs:

  • Table per modality where rows are feature names and columns are the cohorts, the cells contain 1 where the feature was available and 0 where it was not. Note: 0 indicates that the feature was reported in the study but no measurements were collected for any of the participants of that cohort. here
  • Similar to the point above, one table that contains all the non-existing features in every cohort dataset (we combined the tables into one table). This table can be find here, called "nonexistence_features.tsv".
  • To investigated whether each feature of a cohort dataset has been collected for all of the participants of that cohort or a subset of the participants, we generated tables for all the cohort. For instance, ADNI.tsv contains random index for all participants as rows and all the investigated features as columns. For each participant, we look into the original dataset (merged table) and check whether a certain feature was recorded in any visit-point through the study length and if so we store "1" for that participant in the output table.
  • the total number of harmonized cohort is saved as a distict tsv file per modality and stored here
  • Lastly, for each harmonized feature, we count the number of patients that have collected measurements for each visit point. The results are stored in separated tables for each investigated modality. For instance, apoe.tsv table contains information about the features related to APOE status. In this table, the cohort names are as rows and harmonized features as columns. Note: there are multiple rows with the same cohort name as the index for longitudinal cohorts and each time point is stored in the same row under the "Months" column. In other words, the "Months" column indicates whether at a certain time-point of the study length the measurements were collected or simply skipped.

All the described outputs are then used for the "StudyPicker" as well as "Longitudinal" (i.e. Biomarker-specific Follow-up) tools on the website.

The figure below is an exmaple of StudyPicker". case1

The figure below is an example of longitudinal plot for a set of features in 4 distinct cohorts. study_picker_result

Contact

Citation

Salimi, Y., Domingo‐Fernándéz, D., Bobis-Álvarez, C., Hofmann‐Apitius, M., for the Alzheimer's Disease Neuroimaging Initiative, the Japanese Alzheimer’s Disease Neuroimaging Initiative, for the Aging Brain: Vasculature, Ischemia, and Behavior Study, the Alzheimer's Disease Repository Without Borders Investigators, for the European Prevention of Alzheimer’s Disease (EPAD) Consortium, Birkenbihl, C. ADataViewer: Exploring Semantically Harmonized Alzheimer’s Disease Cohort Datasets (2021), medRxiv, 2021.09.01.21262607

License

The code in this package is licensed under the MIT License.

adataviewer's People

Contributors

yasaminsali avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.