Giter Site home page Giter Site logo

wxyz / circdeep Goto Github PK

View Code? Open in Web Editor NEW

This project forked from uoflbioinformatics/circdeep

0.0 1.0 0.0 48.37 MB

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

License: MIT License

Python 100.00%

circdeep's Introduction

circDeep: Deep learning approach for circular RNA classification from other long non-coding RNA

circDeep fuse Reverse Complement Matching (RCM) descriptor, Asymmetric Convolution Neural Network combined with Long Short Term Memory (ACNN-BLSTM) sequence descriptor and conservation descriptor into high level abstraction descriptors, where the shared representations across different modalities are integrated. The experiments show that circDeep is not only faster than existing tools but also performs at an unprecedented level of accuracy by achieving more than 12 percent increase in accuracy over the existing tools.

Prerequisites

We recommend to use Anaconda 3 platform.

Installation

Download circDeep by

git clone https://github.com/UofLBioinformatics/circDeep

Installation has been tested in Anaconda (Linux/Windows) platform with Python3.

Usage

usage: circDeep.py [-h] --train TRAIN --genome GENOME -gtf GTF --bigwig BIGWIG
               [--seq SEQ] [--rcm RCM] [--cons CONS] [--predict PREDICT]
               [--out_file OUT_FILE] [--model_dir MODEL_DIR] 
               [--positive_bed POSITIVE_BED] [--negative_bed NEGATIVE_BED] 
               [--testing_bed TESTING_BED] 

circular RNA classification from other long non-coding RNA using multimodal deep learning

Required arguments:
=================== 
   --data_dir <data_directory>
                        Under this directory, you will have descriptors files used for training, the label file, genome sequencefile , gtf annotation file and bigwig file
  --train TRAIN         use this option for training model
  --genome GENOME       Genome sequence. e.g., hg38.fa
  --gtf GTF             The gtf annotation file. e.g., hg38.gtf
  --bigwig BIGWIG       conservation scores in bigWig file format
                        
 optional arguments:
====================

   -h, --help            show this help message and exit
  --seq SEQ             The modularity of ACNN-BLSTM seq
  --rcm RCM             The modularity of RCM
  --cons CONS           The modularity of conservation
  --predict PREDICT     Predicting circular RNAs. if using train, then it will
                        be False
  --out_file OUT_FILE   The output file used to store the prediction
                        probability of testing data
  --model_dir MODEL_DIR
                        The directory to save the trained models for future
                        prediction
   --positive_bed POSITIVE_BED
                        BED input file for circular RNAs for training, it
                        should be like:chromosome start end gene
  --negative_bed NEGATIVE_BED
                        BED input file for other long non coding RNAs for
                        training, it should be like:chromosome start end gene
  --testing_bed TESTING_BED
                        BED input file for testing data, it should be
                        like:chromosome start end gene

Example

Train the model:

In our experiements, we have used circular RNAs from circRNADb and our negative dataset from GENCODE. The original coordinates of our datasets were in hg19 genome and we convert them to hg38 genome using liftOver provided in UCSC Genome Browser. We need also to download all necessary files and put them in data directory.

python3 circDeep.py --data_dir 'data/' --train True --model_dir 'models/' --seq True --rcm True --cons True --genome 'data/hg38.fasta' --gtf 'data/Homo_sapiens.Ensembl.GRCh38.82.gtf' --bigwig 'data/hg38.phastCons20way.bw' --positive_bed 'data/circRNA_dataset.bed' --negative_bed 'data/negative_dataset.bed'

Test the model:

python3 circDeep.py --data_dir 'data/' --train False --model_dir 'models/' --seq True --rcm True --cons True --genome 'data/hg38.fasta' --gtf 'data/Homo_sapiens.Ensembl.GRCh38.82.gtf' --bigwig 'data/hg38.phastCons20way.bw' --testing_bed 'data/test.bed'

Note:

Input data files for training and testing should be in bed format:

chr17 17507350 17508308 + gene1

chr11 48014405 48015855 - gene2

chr17 77469161 77472770 - gene3

License

Copyright (C) 2017 . See the LICENSE file for license rights and limitations (MIT).

Citation

Mohamed Chaabane, Robert M Williams, Austin T Stephens, Juw Won Park, circDeep: deep learning approach for circular RNA classification from other long non-coding RNA, Bioinformatics, Volume 36, Issue 1, 1 January 2020, Pages 73โ€“80, https://doi.org/10.1093/bioinformatics/btz537

circdeep's People

Contributors

uoflbioinformatics avatar medchaabane avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.