Cancer remains a significant global health challenge, with early detection and accurate classification being critical for effective treatment. In this repository, we develop an approach to detect the three most common subtypes of renal cancer โ chromophobe renal cell carcinoma, clear cell renal cell carcinoma, and papillary renal cell carcinoma โ by leveraging machine learning algorithms applied to RNA-seq gene expression data.
The overall predictive performance of the machine learning methods based on ANOVA feature selection.
Model | Accuracy | Precision | Recall | Specificity | F1 Score | AUC |
---|---|---|---|---|---|---|
Logistic Regression | 94.81 | 93.70 | 82.53 | 94.63 | 86.58 | 97.78 |
Naive Bayes | 95.56 | 87.69 | 90.27 | 95.89 | 88.83 | 98.06 |
K-Nearest Neighbors | 97.41 | 96.75 | 88.53 | 97.43 | 91.68 | 92.98 |
Support Vector Machine | 97.04 | 96.74 | 85.63 | 96.69 | 89.64 | 98.23 |
Decision Tree | 95.56 | 86.81 | 93.21 | 96.44 | 89.51 | 96.82 |
Random Forest Classifier | 97.04 | 89.98 | 93.12 | 98.09 | 91.42 | 99.14 |
Extra Trees Classifier | 96.30 | 87.88 | 92.20 | 97.38 | 89.84 | 98.49 |
Bagging Classifier | 95.93 | 92.12 | 84.42 | 95.72 | 87.43 | 99.16 |
Note: KICH stands for chromophobe renal cell carcinoma, KIRC stands for clear cell renal cell carcinoma, and KIRP stands for papillary renal cell carcinoma.
Classifier | Performance Metrics | |||||
---|---|---|---|---|---|---|
Class | Accuracy | Precision | Recall | Specificity | F1 Score | |
Logistic Regression | KICH | 97.22 | 100.00 | 61.54 | 100.00 | 76.19 |
KIRC | 94.44 | 94.07 | 97.37 | 89.39 | 95.69 | |
KIRP | 92.78 | 87.04 | 88.68 | 94.49 | 87.85 | |
Naive Bayes | KICH | 96.67 | 73.33 | 84.62 | 97.60 | 78.57 |
KIRC | 94.44 | 95.61 | 95.61 | 92.42 | 95.61 | |
KIRP | 95.56 | 94.12 | 90.57 | 97.64 | 92.31 | |
K-Nearest Neighbor | KICH | 97.78 | 100.00 | 69.23 | 100.00 | 81.82 |
KIRC | 97.22 | 97.39 | 98.25 | 95.45 | 97.82 | |
KIRP | 97.22 | 92.86 | 98.11 | 96.85 | 95.41 | |
Support Vector Machine | KICH | 97.22 | 100.00 | 61.54 | 100.00 | 76.19 |
KIRC | 96.67 | 95.76 | 99.12 | 92.42 | 97.41 | |
KIRP | 97.22 | 94.44 | 96.23 | 97.64 | 95.33 | |
Decision Tree Classifier | KICH | 96.67 | 70.59 | 92.31 | 97.01 | 80.00 |
KIRC | 93.89 | 97.25 | 92.98 | 95.45 | 95.07 | |
KIRP | 96.11 | 92.59 | 94.34 | 96.85 | 93.46 | |
Random Forest Classifier | KICH | 97.22 | 78.57 | 84.62 | 98.20 | 81.48 |
KIRC | 96.67 | 100.00 | 97.74 | 100.00 | 97.30 | |
KIRP | 97.22 | 91.38 | 100.00 | 96.06 | 95.50 | |
Extra Trees Classifier | KICH | 96.67 | 73.33 | 84.62 | 97.60 | 78.57 |
KIRC | 95.56 | 99.07 | 93.86 | 98.48 | 96.40 | |
KIRP | 96.67 | 91.23 | 98.11 | 96.06 | 94.55 | |
Bagging Classifier | KICH | 96.67 | 88.89 | 61.54 | 99.40 | 72.73 |
KIRC | 95.00 | 94.87 | 97.37 | 90.91 | 96.10 | |
KIRP | 96.11 | 92.59 | 94.34 | 96.85 | 93.46 |
Fig. 1 | Receiver operating curve of Random Forest.
Fig. 2 | Receiver operating curve of Support Vector Machine.