Giter Site home page Giter Site logo

drivergene's Introduction

Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data

User Manual

1. Introduction


This repository contains source code and original/preprocessed datasets for the paper "Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data". This pipeline aims to systematically integrate state-of-the-art computational tools to identify, characterize the driver genes and then predict their specific subgroups, which establish the basis for the understanding and further validations of driver genes in the future. We divide the work into two parts: (I) IDENTIFICATION and (II) ANALYSIS, in which the specific tasks of each part as follows:

(I) IDENTIFICATION

  • Omics data are as input to identify driver genes by driver gene identification tools

(II) ANALYSIS

  • The four most widely focused analyses are performed to deal with those drivers

The workflow is also be applied to the identification and analysis of driver genes for breast cancer patients. All statistical analyses were performed using R statistical software.

All data can be downloaded here. The data and source code should be in the same folder!

2. Pipeline


Figure1 Figure 1. Improved analysis pipeline for identification and analysis of driver genes. The scheme comprises two stages: identification and analysis, in which the former uses the OncodriveFML and OncodriveCLUSTL to identify driver genes with somatic mutation data as input, and the latter performs the four most widely focused analyses to deal with those driver genes. Abbreviation: CNV, Copy number variations.

Here, we use the METABRIC breast cancer (BRCA) dataset as an example to demonstrate the general usability of our workflow to (1) identify and (2) analyze the driver genes.

3. Implementation


Step 1: Identification of driver genes using the two tools OncodriveCLUSTL and OncodriveFML.
Step 2: Enrichment analysis using the tool g:Profiler
Step 3: Individual gene-clincial feature association analysis

  • Pre-processing gene expression data & clinical data: "1.Clinical-preprocess.R"
  • Association between individual driver genes and survival rates: "2.Clinical-SA.R"
  • Identify which driver gene is significantly correlated with which clinical feature : "3.Clinical-corr.R"

Step 4: Co-expressed module-clinical feature association analysis using WGCNA

  • Weighted co-expression driver gene network construction: "1.WGCA-STEPbySTEP.R"
  • Detect associations of functional modules to clinical features, identification of significant genes and hub genes in each of those modules: "2.Association-significantgenes-and-hubgenes.R"

Step 5: Patient stratification

  • Pre-processing clinical data & CNVs data: "1.CNA-preprocess.R"
  • Patient stratification by the identified driver genes: "2.CNA-hclut.R"
  • Comparison between the identified patient groups: "3.CNA-SA.R" + "4.CNA-test.R"

4. Citation


Please kindly cite the following paper (and Star this Github repository if you find this tool of interest) if you use the tool in this repo:

Reference Type: Journal Article
Author: Nguyen, Quang-Huy
Le, Duc-Hau
Year: 2020
Title: Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data
Journal: Scientific Reports
Volume: 10
Issue: 1
Pages: 20521
Date: 2020/11/25
ISSN: 2045-2322
DOI: 10.1038/s41598-020-77318-1

Feel free to contact Quang-Huy Nguyen ([email protected]) or Duc-Hau Le ([email protected]) for any questions about the paper, datasets, code and results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.