The deep_learning_metabolomics from fadhlyemen

Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data Fadhl M. Alakwaa, Kumardeep Chaudhary, and Lana X. Garmire Journal of Proteome Research 2018 17 (1), 337-347 DOI: 10.1021/acs.jproteome.7b00595

https://pubs.acs.org/action/showCitFormats?doi=10.1021%2Facs.jproteome.7b00595

Problems and objectives

1 in 8 U.S. women (about 12.4%) will develop invasive breast cancer
in 2018, 266,120 diagnosed new cases of invasive breast cancer
Breast cancer has more than one subtypes. We are interested in two subtypes, ER+ and ER-
Diagnosis and further treatment is depend on these subtypes
Classification of Breast cancer patients based on their metabolomics profile
Assess the predictive accuracy of the Deep Learning (DL) to predict estrogen receptor (ER) status from the metabolomics data.

preprocessing:to remove sample to sample variation

pretraining:This step improves the model performance, avoids random initialization of the weights, and selects the best model architecture

Figure 2. (A) Average AUC on 10 hold out test sets of the DL framework against six machine learning algorithms for prediction of ER status from metabolomics data: recursive partitioning and regression trees (RPART) (0.83), linear discriminant analysis (LDA) (0.74), support vector machine (SVM) (0.89), deep learning (DL) (0.93), random forest (RF) (0.89), generalized boosted models (GBM) (0.89), and prediction analysis for microarrays (PAM) (0.88). The above algorithms were run 10 times on different train/test splits. We used pairwise Wilcoxon signed-rank test to estimate the statistical significance of the difference in performance between DL and other methods (∗∗ p < 0.01, ∗ p < 0.1). (B) Bipartite graph of the top 20 important metabolites extracted from DL model and other machine learning algorithms. Large nodes represent the models and small nodes are metabolites. A connection between metabolite and the model means this metabolite is one of the top 20 high importance metabolites extracted by this model.

So, what is the best model? "ALL of them are wrong some of them are useful" George Box

Figure 3. Biological relevance of the DL hidden layers. (A) Activation levels of the high variance nodes extracted from the layer 1 of the DL model. Columns are samples and rows are the top 12 nodes with high variance >0.1. (B) Bipartite graph of enriched significant metabolomics pathways and top hidden nodes. The nodes represent enriched pathways common to all top 12 nodes (green color) in the first hidden layer of DL in KEGG pathway enrichment analysis (FDR< 0.05).

Figure 4. Joint pathway analysis between the top 20 DL metabolites and the highly differentiated enzymes. Only significant pathways with at least five overlapping metabolites are shown. X-axis shows the number of overlapped metabolites with the number of genes (number in parentheses) involved in the same pathway, y-axis shows the adjusted joint P-value calculated from IMPALA tool.(42) The size of the nodes represents the size of metabolomic pathway (number of metabolites involved in that pathway). The color of the nodes represents the database source of these pathways.

Figure 5. Circos plot of Spearman’s correlation values between top 20 DL metabolites and highly differentiated enzymes with cutoff = |0.35|.

fadhlyemen / deep_learning_metabolomics Goto Github PK

deep_learning_metabolomics's Introduction

deep_learning_metabolomics's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent