Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning

DECoNT is a deep learning based software that corrects CNV predictions on exome sequencing data using read depth sequences.

Deep Learning, Copy Number Variation, Whole Exome Sequencing

Authors

Furkan Ozden, Can Alkan, A. Ercument Cicek

Questions & comments

[firstauthorname].[firstauthorsurname]@bilkent.edu.tr

Reproducing the results given in the manuscript and toy example

To reproduce results given in the manuscript, Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning, please refer to https://zenodo.org/record/3865380#.XtVRchMzZds. This repository also includes the toy example tutorial.

Warning: Please note that DECoNT software is completely free for academic usage. However it is licenced for commercial usage. Please first refer to the License section for more info.

Installation
Features
Instructions Manual
Usage Examples
Citations
License

Installation

DECoNT does not require installation in conventional manner. DECoNT is a python3 script ready to run with required packages installed.

Requirements

Python 3.7.6
NumPy 1.16.2
Pandas 1.0.0
TensorFlow 1.14.0
Keras 2.2.4
Scikit-Learn 0.22.1
keras-metrics 1.1.0
cudnn 7.6.5 (optional, for gpu support only) (keras-gpu 2.2.4 requires it)

For easy requirement handling, you can use DECoNT_linux.yml or DECoNT_mac.yml files to initialize appropriate environment with conda using:

$ conda env create -f DECoNT_linux.yml
$ conda activate DECoNT_linux

$ conda env create -f DECoNT_mac.yml
$ conda activate DECoNT_mac

Features

DECoNT provides GPU support optionally. See GPU Support section.
DECoNT provides ETA for the analysis with progress bar.
Upcoming version: custom training, custom call polishing.

Instructions Manual

Important notice: Please call the DECoNT_polish.py script from the scripts directory.

Required Arguments

-m, --model

For version 0.1, DECoNT provides pretrained weights for polishing CNV calls from the following WES-based CNV callers: (i) XHMM; (ii) CoNIFER; (iii) CODEX2; (iv) Control-FREEC.
If you want to use pretrained DECoNT weights for polishing set this argument to pretrained.
If you want to use custom model weights for DECoNT obtained using DECoNT_train.py script, please provide path to model weights with .h5 extension instead.

-cn, --callername

For version 0.1, DECoNT supports only XHMM, CoNIFER, CODEX2 and Control-FREEC. For future versions, DECoNT will be able to polish any CNV output format with a required CNV output template.
Set to one of the WES-based CNV caller names above for DECoNT to understand the required weights for the polishing process.

-i, --input

Relative or direct path to output file of selected WES based CNV caller.

-o, --output

Relative or direct output directory path to write DECoNT output file.

-s, --samples

Relative or direct directory path to read depth files of samples in the analysis (i.e. samples used in WES CNV calling). Note that, all read depth files must be in the format specified below in the examples section. The provided directory must not include any other files. Read depth files generated by Sambamba tool are directly accepted with no formatting requirements.
Read depth file names must have the following format: SAMPLENAME.read_depths.txt (e.g. HG00096.read_depths.txt)
The sample names should be consistent between obtained WES-CNV outputs and read depth file names.

Optional Arguments

-g, --gpu

Set to PCI BUS ID of the gpu in your system.
You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:

$ gpustat

-v, --version

-Check the version of DECoNT.

-h, --help

-See help page.

Usage Examples

Usage of DECoNT is very simple, also it comes with ETA and progress bar features!

Step-1: Use your preferred WES-based CNV caller to call CNVs on your WES dataset.

For XHMM refer to: XHMM Manual Page
For CoNIFER refer to: CoNIFER Manual Page
For CODEX2 refer to: CODEX2 Manual Page
For Control-FREEC refer to: Control-FREEC Manual Page
After obtaining output file from one of these tools, store it.
For purposes of this tutorial, lets call the output WES CNV file: /home/user/analysis.txt

Step-2: Obtain read depth files for samples used in WES CNV calling.

Read depth counts obtained using Sambamba tool are directly accepted by DECoNT. Note that you should use -w option of sambamba with parameter 1000. By doing so, sambamba sets base-pair resolution to 1000bp. You can run sambamba on your inputs as follows:

$ sambamba depth window -w 1000 HG00096.wes.bam > /home/user/sambamba_read_depths/HG00096.wes.bam_read_depths.txt

Note that, all read depth files must have SAMPLENAME. prefix in the file name.
You can use any read depth generator you like, however for DECoNT to have unified input format, we require the following format for read depth files:

-Note that, DECoNT does not use mean coverage information column provided in the above file format figure. You can fill that column with all 0's.

For purposes of this tutorial, lets call the directory containing all described read depth files: /home/user/sambamba_read_depths/

Step-3: Run DECoNT on data obtained in Step-1 and Step-2

Requirements of DECoNT must be satisfied. For easy handling of requirements download DECoNT_mac.yml or DECoNT_linux.yml file and initialize environment of DECoNT as follows (optional).

$ conda env create -f DECoNT_mac.yml
$ conda activate DECoNT_mac

Note: for the scope of this tutorial, we assume that WES CNV calls are obtained using XHMM software. If you obtain WES CNV calls using any other software just change the -cn argument to that software.
After initializing the envorinment, run decont as follows:

$ python ./DECoNT_polish.py -m pretrained -cn XHMM -i /home/user/analysis.txt -o /home/user/ -s /home/user/sambamba_read_depths/

Optionally, if you have available gpu's, you can set -g argument to PCI BUS ID of the GPU you want to use. Please refer to Optional Arguments section. By default, script will use CPU.

$ python ./DECoNT_polish.py -g 5 -m pretrained -cn XHMM -i /home/user/analysis.txt -o /home/user/ -s /home/user/sambamba_read_depths/

Output file of DECoNT

At the end of the polishing procedure, DECoNT will write its output file to the directory given with -o option. In this tutorial it is /home/user/
Output file of DECoNT is a tab-delimited .bed like format.
Columns in the output file of DECoNT are the following with order: 1. Sample Name, 2. Chromosome, 3. CNV Start Index, 4. CNV End Index, 5. XHMM Prediction (XHMM name changes according to the -cn argument), 6. DECoNT Polished Prediction
Following figure is an example of DECoNT output file.

Running quick experiment with DECoNT:

Just follow the steps above, instead of analysis.txt use DATA_chaisson_hg00733.xcnv file provided. Also instead of /sambamba_read_depths/ directory use the directory in this link and repeat the steps.

Citations

License

CC BY-NC-SA 2.0
For commercial usage, please contact.

microtan0902 / decont Goto Github PK

decont's Introduction