Giter Site home page Giter Site logo

2021_group07_final_project's Introduction

Identification of important genes in leukemia patients

Data set

The data set used for this project is a Leukemia data set called Golub (1999) found in John Ramey’s datamicroarray Github repository. The data can be found in the following link: https://github.com/ramhiser/datamicroarray/blob/master/data/golub.RData. In this data set, there are a total of 72 leukemia patients that can be divided into 2 classes based on the type of leukemia; acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). 47 of the patients are diagnosed with ALL and 25 patients are diagnosed with AML. All patients were assayed for 7129 gene expression levels using the Affymetrix Hgu6800 chips. Each gene is an attribute in the data set making the dimensions 72x7129.

The data set has no missing values.

Motivation

Using the Golub (1999) data, we want to investigate the different genes expressed in leukemia patients. We also want to take a look at which genes are upregulated in leukemia patients and whether the same genes are involved in the different leukemia types. By statistical evaluation, we will examine whether there is a significant difference in gene expression of key genes between ALL and AML.

Code style

The language used in this project is R and the analysis is carried out using functions from the Tidyverse-package.

Content

In the repository of this project you can find a 'data', 'R', 'results' and 'doc'-folder containing the materials, methods and results used and found from this project.

We have in the 'doc'-folder prepared a presentation of the project in an ioslide format.

Analysis procedure

In the 'R'-folder you find the scripts used to carry out the analysis. To begin with, tidying and data wrangling is achieved. We decided based on the large range of expression levels to standardize the gene expression by subtracting the mean and dividing by the standard deviation. After data transformation, modeling in form of logistic regression, PCA and K-means clustering was attempted. PCA and K-means were however not appropriate for this data set. We have furthermore found the most significant genes chosen by lowest p-value. We decided to go with the top 1% equal to 71 genes. To visualize our data, we have created plots of these most significant genes by making bar charts, heatmap and boxplots.

The entire project analysis can be carried out by running the 00_doit.R script.

Dead-ends

Through this project we carried out the unsupervised learning techniques; principal component analysis (PCA) and K-means clustering analysis. These methods were however not suitable for this data set.

Authors

The people that contributed to this project are Charlotte Würtzen s174564, Emma Ahrensbach Rørbeck s173733, Julie Maria Johansen s174595, Simone Majken Stegenborg-Grathwohl s174596.

2021_group07_final_project's People

Contributors

emmaror avatar juliemariaj avatar simone-majken avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.