ADataViewer

Source code and scripts to process the datasets using in the paper: ADataViewer: Exploring Semantically Harmonized Alzheimer’s Disease Cohort Datasets.

Artifacts

Summary statistics

The summary statistics were computed here (quantiles) and here (mean).
- There are 4 distinct scripts that are used to compute the summary statistics.
- Each script is dedicated to select a subset of participants based on a diagnostic group except one where we take all diagnoses into account. For instance "AD_quantiles_all_datasets-v100.ipynb" is used to compute summary statistics of the selected features for participants that were diagnosed with AD. Note: we restrict the datasets to only baseline visit here.
- The computed summary statistics are then saved as a CSV file here.

Using the CSV tables we illustrated the following table.

Ethnicity

Using this script, we make a table that would be used to generate pie charts for the ethnicity distribution of any cohort (shown in the figure below)
The table made by this script is saved here

Modality

To check for the available modality and plot the ranks, we can use this script

Biomarkers

To visualize the biomarker distribution, we generated tables that contain measurements of participants for each feature in all datasets
- The categorical features are Sex, APOE4 and CRD
- All harmonized numerical features are included
There are dedicated scripts for each diagnostic group as well as one script for all diagnoses
Each script saves the tables where the columns are the "feature + name of cohort" and rows contain the measurements of participants

The tables are then used to generate boxplot for numerical features and stacked barplot for categorical features. Figure below is an example of boxplots.

Longitudinal assesment

To investigate the number of visits for each study cohort as well as the number of patients in each diagnosis stage, use this script

Number of conversions from one state to another

We can compute the transition from one diagnostic state to another using this

The following table illustrates the number of patients for each transition

StudyPicker, Longitudinal Follow-up, Mappings

To enable a user-friendly tool for assessing cohort studies that could be compatible to use as training and validation datasets, we investigated the feature overlap across our datasets. Additionally, the number of available measurements for each follow-up visit was computed to enable visualization of collected measurements through the study length (script).

Inputs:

Harmonized feature scape across the cohorts
Merged dataset of each cohort

Outputs:

Table per modality where rows are feature names and columns are the cohorts, the cells contain 1 where the feature was available and 0 where it was not. Note: 0 indicates that the feature was reported in the study but no measurements were collected for any of the participants of that cohort. here
Similar to the point above, one table that contains all the non-existing features in every cohort dataset (we combined the tables into one table). This table can be find here, called "nonexistence_features.tsv".
To investigated whether each feature of a cohort dataset has been collected for all of the participants of that cohort or a subset of the participants, we generated tables for all the cohort. For instance, ADNI.tsv contains random index for all participants as rows and all the investigated features as columns. For each participant, we look into the original dataset (merged table) and check whether a certain feature was recorded in any visit-point through the study length and if so we store "1" for that participant in the output table.
the total number of harmonized cohort is saved as a distict tsv file per modality and stored here
Lastly, for each harmonized feature, we count the number of patients that have collected measurements for each visit point. The results are stored in separated tables for each investigated modality. For instance, apoe.tsv table contains information about the features related to APOE status. In this table, the cohort names are as rows and harmonized features as columns. Note: there are multiple rows with the same cohort name as the index for longitudinal cohorts and each time point is stored in the same row under the "Months" column. In other words, the "Months" column indicates whether at a certain time-point of the study length the measurements were collected or simply skipped.

All the described outputs are then used for the "StudyPicker" as well as "Longitudinal" (i.e. Biomarker-specific Follow-up) tools on the website.

The figure below is an exmaple of StudyPicker".

The figure below is an example of longitudinal plot for a set of features in 4 distinct cohorts.

Contact

Yasamin Salimi: [email protected]
Colin Birkenbihl: [email protected]

Citation

Salimi, Y., Domingo‐Fernándéz, D., Bobis-Álvarez, C., Hofmann‐Apitius, M., for the Alzheimer's Disease Neuroimaging Initiative, the Japanese Alzheimer’s Disease Neuroimaging Initiative, for the Aging Brain: Vasculature, Ischemia, and Behavior Study, the Alzheimer's Disease Repository Without Borders Investigators, for the European Prevention of Alzheimer’s Disease (EPAD) Consortium, Birkenbihl, C. ADataViewer: Exploring Semantically Harmonized Alzheimer’s Disease Cohort Datasets (2021), medRxiv, 2021.09.01.21262607

License

The code in this package is licensed under the MIT License.

yasaminsali / adataviewer Goto Github PK

adataviewer's Introduction

ADataViewer

Artifacts

Summary statistics

Ethnicity

Modality

Biomarkers

Longitudinal assesment

Number of conversions from one state to another

StudyPicker, Longitudinal Follow-up, Mappings

Contact

Citation

License

adataviewer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent