Giter Site home page Giter Site logo

cancerimagedetection's Introduction

CancerImageDetection

Attempt to detect cancer from publically available CT scans using both traditional statistical learning methods and deep learning.

Background

This project is a classification task of the LDRI-IDRI database of lung nodules on CT scans (link here). The database contains slices of the CT scans as well as annotated values of each nodule found in the scans. A nodule is a growth of cells in the body which, if malignant, can be cancer.

Goal

To classify each nodule found in the dataset to find instances of cancer among the CT scans.

Methods

We used two methods to find cancer among the nodules:

  1. Multinomial statistical learning models trained on each set of annotations describing a nodule. Annotations consistent of markings of the images made by radiologists describing the nodules structure, shape, texture, and other features helpful for determining presence of cancer.

  2. Deep learning methods trained on the images themselves

Relevant files for statistical methods used:

EDA:

01_annotation_EDA.Rmd: Explores relationships of original nodule characteristic predictors and their summary statistics with the response variable. Also, explores distributions of radiologist annotations by radiologist (not used in final report)

Modeling:

02_RandomForestModeling.Rmd: Code for random forest model, including use of mean gini decrease as a variable selection method.

03_LogisticRegressionModeling.Rmd: Code for logistic regression models

To view fitting and results of CNN, go to this link to run the relevant Google Colab file (takes about 10-15 minutes to train): https://colab.research.google.com/drive/1ZXjOXir2pCCJrp7tAIL_8tbW-2jt2OLw?usp=sharing

Final report

A link to the informal final report can be found (link here)

Note

All files not listed above and that are not numbered were used to create the datasets, both the annotation csvs and the tensor data used to train the vision model. They require the original dataset to run, which is 125 GBs of CT scan images and xml files. We did not include them due to file sizes. Please refer to the Methods section of the report for a description of how the datasets used in this project were created.

cancerimagedetection's People

Contributors

anamhira47 avatar alexr626 avatar kenmawer avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.