Giter Site home page Giter Site logo

parisepigenetics / rna_feat_ext Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 1.0 130.34 MB

Software tools to extract mRNA's features from a list of ENSEMBL gene IDs.

Home Page: https://parisepigenetics.github.io/rna_feat_ext/

License: GNU General Public License v3.0

Python 100.00%
bioinformatics bioinformatics-scripts

rna_feat_ext's Introduction

mRNA feature extraction tools

authors: Costas BOUYIOUKOS, Antoine LU and Arnold Franz AKE

A set of computational tools to extract user defined mRNA features from a list of ENSEMBL gene IDs by using the web API of ENSEMBL BioMart and custom computations. Conceived and developed by Costas Bouyioukos @cbouyio at Paris Epigenetics @parisepigenetics and Universite Paris Diderot. Development involved two bioinformatics master students: Antoine LU @antoinezl who started as part of a coding project during his second year in the degree and Franz-Arnold AKE @franzx5 a second year Master's degree student who mainly worked on the clustering part of the project.

Installation.

To install the tools in your local python environment (user $HOME directory) type:

./setup.py install --user

(the --user flag installs the software on your personal account (no root privileges required).

Requirements.

Python.

All are available for installation via pip install <package_name>

External.

For external tools please follow the installation guidelines in the provided links.

Main Usage.

geneIDs2fasta.py ENSEMBL_geneIDs_file fasta_output_file

and

fasta2table.py ENSEMBL_fasta_output_file features_table_file

geneIDs2fasta.py

This program takes a text file with a list of ENSEMBL gene IDs and returns a FASTA formatted file of the corresponding cDNA sequences. The header is formatted and contains various metadata ordered as:

>ENSEMBL_transcript_ID |Gene stable ID | Gene name | cDNA start | cDNA end | TSL | APRIS | HAVANA_ENSEMBL | gene description | Source:|

fasta2table.py

This program takes the fasta formatted file returned by the previous script geneIDs2fasta in input, and return a semicolon separated table with the following header:

ensembl_gene_id;gene_name;coding_len;5pUTR_len;5pUTR_GC;5pUTR_MFE;5pUTR_MfeBP;3pUTR_len;3pUTR_GC;3pUTR_MFE;3pUTR_MfeBP;TOP_localScore;CAI;Kozak_Sequence;Kozak_Context

Testing

Test directory contains two test files to test and demonstrate the functionality of the tools.

  • test/testENSEMBLids.txt Contains 6 genes with their ENSEMBL IDs.

  • test/testTransExpr.csv Contains the expression levels of each individual transcript of the above genes from a case study.

TODO add section for MEME suite integration.

rna_feat_ext's People

Contributors

antoinezl avatar cbouyio avatar franzx5 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

cbouyio

rna_feat_ext's Issues

Library and script to retrieve data from ENSEMBL

We need to consolidate all the code we have so we end up with something as useful as possible.

  1. We need a pure python library that will include all the functions (perhaps some classes too).
  2. We well need a script to execute and retrieve a query from the Biomart database API.
    The script will get nothing else than a list of ENSEMBL identifiers (either gene or transcript) and will output a table with all the features we need. without doing any computations.
    So no computations of energy or distance or anything else for the moment.
    later we will include the libraries for computation. (MFE, RBPBS and miRNAs).

Introduce Makefile

A minimal Makefile to handle installation, cleaning and documentation.

import local_score error

in the script rnafeatureslib it want to import local_score but the module is no where to be found, is that a in-house module or third-party module to download

Check min-max size of UTRs.

RNAfold does not process UTRs smaller that 8 and longer than 10000 bps.
Include this checks while extracting features and printing the UTR files.

Put a transcript selection option.

Select transcripts based on isoforms expression percentages.

-- Actually for the moment we will only consider the most well annotated transcript from ENSEMBL based on the database criteria, Havana, APRIS and TSL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.