Giter Site home page Giter Site logo

scripts_sorfs_ms's Introduction

A collection of python scripts (v3.5) used for P.patens sORF discovery work.

The corresponding MS is currently under revision and is available here: https://www.biorxiv.org/content/early/2017/11/03/213736

GffParser.py

Description: script to find introns and CDS coordinates in P.patens gff3 file. Usage: python3 GffParser.py

protein_Ka_Ks_codeml.py

Description: script to calculate dn/ds value. Basically, it takes two fasta files: proteins corresponding to sORF, transcript nucleotide sequences from other species. Usage: see --help. Example: python3 protein_Ka_Ks_codeml.py translated_FINAL_sORF_SELECTED.fa Zmays_284_Ensembl-18_2010-01-MaizeSequence.transcript.fa --blast T --threads 10 --makedb T

Dependencies:

Change variable query_seq_ind to the path to the file with sORF nucleotide sequences e.g. (default) query_seq_ind = SeqIO.index("/home/ilia/sORF/sORFfinder2/moss/FINAL_sORF_SELECTED.fa", "fasta")

  • oopBLASTv_forsORF.py script. Available in this repository

!!! NOTE !!!It requires some variables have to be manually changed. See info in the script files for details. One of the variables corresponds to the fasta file containing nucleotide sequences of sORFs (Important! ids of the protein and nucleotide sORF sequences MUST be identical).

  • Scripts required for protein_Ka_Ks_codeml.py:
  • 2.1 codemlParser folder contains module to run and parser Codeml program
  • 2.2 oopBLASTv_forsORF.py - script to run blast and parse the results. Usage examples: class takes query and reference fasta files Bl = BlastParser(r"/home/ilia/sORF/sORFfinder2/moss/FINAL_sORF_SELECTED.fa", r"/home/ilia/SOLID_moss/Ppatens_318_v3_index.fa", DB_build=False)

run BLAST Bl.runblast()

the function takes xml file. Default name for this file is "vs". Bl.parseBlastXml() It can also be used to parse external xml file. E.g.

parseBlastXml(file_exo="extranal_xml_blast.xml")

It returns .bed file with following columns:

  1. Query id
  2. Hit id
  3. Start HSP in hit (start position of hit sequence involved in alignment)
  4. Stop HSP in hit (stop position of hit sequence involved in alignment)
  5. E-value,
  6. hit HSP sequence,
  7. query HSP sequence
  8. length of hit HSP
  9. hit strand
  10. Start HSP in query (start position of query sequence involved in alignment)
  11. Stop HSP in query (stop position of query sequence involved in alignment)

KaKsloop.py

Descroption: example script which can be used to run protein_Ka_Ks_codeml.py for set of reference sequences

sORF_completeness_v2.0.py

Description: script to estimate changes in homologous sORF length between different species. It takes 1) genome/transcriptome fasta file, 2) fasta file with protein sequence translated from sORFs and 3) bed table created by oopBLASTv_forsORF.py script. The script return table with 18 column (actually the first 13 columns are identical to the input bed file):

  1. Query id
  2. Hit id
  3. Start HSP in hit (start position of hit sequence involved in alignment)
  4. Stop HSP in hit (stop position of hit sequence involved in alignment)
  5. E-value,
  6. hit HSP sequence,
  7. query HSP sequence
  8. length of hit HSP
  9. hit strand
  10. Blank column
  11. Start HSP in query (start position of query sequence involved in alignment)
  12. Stop HSP in query (stop position of query sequence involved in alignment)
  13. Predicted start Codon coordinates of homologous sORF
  14. Predicted stop Codon coordinates of homologous sORF
  15. Premature Stop Codon (- no PSC found)
  16. Predicted length of homologous sORF (if 0 โ€“ premature stop codon found before (upstream) HSP start), aa
  17. sORF query length, aa

sORFfastaToBed2.py

Description: Script to parse sORFfinder output file to generate bed file Usage: positional arguments: infile name of sORF fasta file generated by sORFfinder genomFile name of target genome fasta file used for sORFfinder outputfileName number of threads optional arguments: -h, --help show this help message and exit It writes a table with following columns:

  1. Chromosome name
  2. Start position
  3. End position
  4. Strand
  5. Coding index
  6. sORF name

scripts_sorfs_ms's People

Contributors

kirovez avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.