Giter Site home page Giter Site logo

znreza / machine_learning_on_prostate_cancer Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 501 KB

Applying different machine learning algorithms on PCGA Prostate Cancer Gene Dataset for Feature Selection, Dimensional Reduction and Classification and Regression

Python 100.00%
machine-learning dimensionality-reduction feature-selection classification regression-models cross-validation random-forest principal-component-analysis linear-discriminant-analysis-lda information-gain

machine_learning_on_prostate_cancer's Introduction

Machine_Learning_on_Prostate_Cancer

Applying different machine learning algorithms on PCGA Prostate Cancer Gene Dataset for Feature Selection, Dimensional Reduction and Classification and Regression

In this project, PCGA Prostate Cancer Gene Expression dataset and PCGA Clinical dataset is used to apply different machine learning techniques for selecting biomarkers or genes out of 60K genes that are relevant to predcit patients gleason score, T-Stage and Tumor Recurrence. Different machine learning algortihms have been implemented for feature selection, dimensionality reduction and target prediction. The overall model accuracy is around 96% for each of the three targets.

Merge_dataset.py: This script is written to merge required columns from prad_tcga_genes.xls and prad_tcga_clinical_data.xls dataset. As example, for gleason_score prediction, this sript will generate a new dataset where genes columns from prad_tcga_genes.xls file and gleason_score column from prad_tcga_clinical_data.xls file have been stacked with respect to each patient. Same way Predict_t_stage.csv and Tumor_Recurrence.csv datset can be made. More detail is available in the script.

Gleason_score.py: Predicts gleason_score based on genes using PCA, LDA and Random Forest Classifier. Model evaluation is done with 10-fold cross-validation. Result is plotted using matplotlib. Confusion Matrix is generated to check number of TP, TN, FP and FN for each of the five classes. More detail is available in the script.

Predict_t_stage.py: Predicts t_stage based on genes using PCA, LDA and Random Forest Classifier. Model evaluation is done with 10-fold cross-validation. Result is plotted using matplotlib. Confusion Matrix is generated to check number of TP, TN, FP and FN for the stages. More detail is available in the script.

TumorRecurrence.py: Predicts if a tumor will come back or not (0 or 1) based on genes using PCA, LDA and Random Forest Classifier. Model evaluation is done with 10-fold cross-validation. Result is plotted using matplotlib. Confusion Matrix is generated to check number of TP, TN, FP and FN for the positive and negative classes. More detail is available in the script.

FS_with_random_forest.py: Uses Random Forest classifier for selecting top k features based on feature importance ranking. More detail is available in the script.

Information_Gain.py: Uses Information Gain algorithm for selecting top k features based on feature IG score. More detail is available in the script.

LowVariance.py: Checks for features or columns whose values does not change significantly over the samples in the input dataset. These low variance features contribute less in predicting the target, thus can be safely discarded without hampering the accuracy.

machine_learning_on_prostate_cancer's People

Contributors

znreza avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

machine_learning_on_prostate_cancer's Issues

data

Hello author, how do I get the data used from TCGA download? Is there a tutorial?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.