Giter Site home page Giter Site logo

dyxstat / deepdecon Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jiawei-huang/deepdecon

0.0 0.0 0.0 857 KB

DeepDecon: A Deep-learning Method for Estimating Cell Fractions in Bulk RNA-seq Data with Applications to AM

Python 16.49% Jupyter Notebook 83.51%

deepdecon's Introduction

DeepDecon: A Deep-learning Method for Estimating Cell Fractions in Bulk RNA-seq Data with Applications to AML

model

Overview

Here, we present DeepDecon, a deep neural network model leveraging single-cell gene expression information to accurately predict the fraction of cancer cells in bulk tissues. DeepDecon was trained based on single-cell RNA sequencing data and was robust to experimental biases and noises. It will automatically select optimal models to recursively estimate malignant cell fractions and improve prediction accuracy. When applied to bone marrow data (see Tutorials), it outperforms existing decomposition methods in both accuracy and robustness. We further show that the DeepDecon is robust to the number of single cells within a bulk sample.

Requirements

  • tensorflow 1.14.0
  • scikit-learn 0.24.2
  • python 3.6.12
  • pandas 1.1.3
  • numpy 1.19.2
  • keras 2.3.1
  • scanpy 1.7.2

Installation

Download DeepDecon by

git clone https://github.com/Jiawei-Huang/DeepDecon.git

Installation has been tested in a Linux and MacOs platform with Python3.6. GPU is recommended for accelerating the training process.

Instructions

This section provides instructions on how to run DeepDecon with scRNA-seq datasets.

Data preparation

Several scRNA-seq AML datasets have been prepared as the input of DeepDecom model. These datasets can be downloaded from the zenode repository. Uncompress the datasets.tar.gz in datasets folder then each dataset will have its own file, which denotes the gene expression matrix (XXX_norm_sc.txt, XXX refers to the subject name). Each row in the matrix refers to one cell and the first column of the matrix refers to the cell type (malignant/normal), the rest columns refer to genes.

Bulk sample simulation

DeepDecon construct bulk RNA-seq samples through the get_bulk_samples.py script. One can try generate a bulk RNA-seq dataset with any ratio of malignant cell by running

python ./src/get_bulk_samples.py [-h] [--cells CELLS] [--samples SAMPLES] [--subject SUBJECT] [--start START] [--end END] [--binomial BINOMIAL] [--data DATA] [--out OUT]
-h, --help            show this help message and exit
--cells CELLS         Number of cells to use for each bulk sample.
--samples SAMPLES, -n SAMPLES
                      Total number of samples to create for each dataset.
--subject SUBJECT     Subject name
--start START         Fraction start range of generated samples e.g. 0 for [0, 100]
--end END             Fraction end range of generated samples e.g. 0 for [0, 100]
--binomial BINOMIAL   Whether generating bulk fractions from binomial distribution, 0=False, 1=True
--data DATA           Directory containg the datsets
--out OUT             Output directory

Model training

As long as we have the data, one can train DeepDecon models by running

python train_model.py [-h] [--cells CELLS] [--path PATH] [--lr LR] [--bs BS]
                      [--dr DR] [--start START] [--end END] [--scaler SCALER]
                      [--normalization NORMALIZATION]
  -h, --help            show this help message and exit
  --cells CELLS         Number of cells to use for each bulk sample.
  --path PATH           Training data directory
  --lr LR               learning rate index k, lr = 10^(-k)
  --bs BS               batch size
  --dr DR               dropout
  --start START         Fraction start range of generated samples e.g. 0 for
                        [0, 100]
  --end END             Fraction end range of generated samples e.g. 100 for
                        [0, 100]
  --scaler SCALER       Scaler of neural network, MinMaxScaler (mms) or
                        StandardScaler (ss)
  --normalization NORMALIZATION
                        Normalization methods,TF-IDF, FPKM, CPM or TPM

Model evaluation

Next, people can get predictions by running

python eval.py [--cells CELLS] [--dir DIR] [--filepath FILEPATH] [--sub_idx SUB_IDX]

--cells CELLS        Number of cells to use for each bulk sample.
--dir DIR            Training data directory
--filepath FILEPATH  Testing file path
--sub_idx SUB_IDX    Testing subject index, 0-14 refers to subjects in the
                    training datasets, 15 means new dataset.

Tutorial

See DeepDecon_example.ipynb for reproducing the experimental results in this paper.

Contact

Feel free to open an issue on Github or contact me if you have any problem in running DeepDecon.

deepdecon's People

Contributors

jiawei-huang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.