Giter Site home page Giter Site logo

j-andrews7 / pytfmpval Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 4.0 338 KB

Python bindings for the TFM-Pvalue program.

License: GNU General Public License v3.0

Python 8.82% C++ 91.08% SWIG 0.10%
bioinformatics motifs transcription-factor-binding genomics python motif-analysis

pytfmpval's Introduction

pytfmpval

https://travis-ci.org/j-andrews7/pytfmpval.png?branch=master https://badge.fury.io/py/pytfmpval.svg?style=flat Documentation Status

This Python package serves as a wrapper for the incredibly useful TFM-Pvalue C++ program. It allows users to determine score thresholds for a given transcription factor position frequency matrix associated with a specific p-value. Naturally, it can also perform the reverse, quickly calculating an accurate p-value from a score for a given motif matrix.

pytfmpval allows this functionality to be easily utilized within a Python script, module, or package.

See full documentation and use examples here.

This project has been archived and is provided as-is, with no additional support or development.

Installation

pytfmpval is on PyPI, so you can install via pip easily:

pip install pytfmpval

A Simple Example

JASPAR is a very highly-touted transcription factor motif database from which motif count matrices can be downloaded for a large variety of organisms and transcription factors. There exist numerous other motif databases as well (TRANSFAC, CIS-BP, MEME, HOMER, WORMBASE, etc), most of which use a relatively similar format for their motifs. Typically, a motif file consists of four rows or columns with each position in a given row or column corresponding to a base within the motif. Sometimes there is a comment line started with >. The row or column order is always A, C, G, T. In this example, the motif consists of four rows corresponding to the 16 positions of the motif with counts for each base at each position.

>>> from pytfmpval import tfmp
>>> m = tfmp.create_matrix("MA0045.pfm")
>>> tfmp.score2pval(m, 8.7737)
9.992625564336777e-06
>>> tfmp.pval2score(m, 0.00001)
8.773708000000001

This could also be done by creating a string for the matrix by concatenating the rows (or columns) and using the read_matrix() function. This method is usually easier, as allows the user to parse the motif file as necessary to ensure a proper input. It's also more fitting for high-throughput.

>>> from pytfmpval import tfmp
>>> mat = (" 3  7  9  3 11 11 11  3  4  3  8  8  9  9 11  2"
...        " 5  0  1  6  0  0  0  3  1  4  5  1  0  5  0  7"
...        " 4  3  1  4  3  2  2  2  8  6  1  4  2  0  3  0"
...        " 2  4  3  1  0  1  1  6  1  1  0  1  3  0  0  5"
...       )
>>> m = tfmp.read_matrix(mat)
>>> tfmp.pval2score(m, 0.00001)
8.773708000000001
>>> tfmp.score2pval(m, 8.7737)
9.992625564336777e-06

Contribute

Any and all contributions are welcome. Bug reporting via the Issue Tracker is much appeciated. Here's how to contribute:

  1. Fork the pytfmpval repository on github (see forking help).
  2. Make your changes/fixes/improvements locally.
  3. Optional, but much-appreciated: write some tests for your changes. (Don't worry about integrating your tests into the test framework - writing some in your commit comments or providing a test script is fine. I will integrate them later.)
  4. Send a pull request (see pull request help).

Reference

Efficient and accurate P-value computation for Position Weight Matrices
H. Touzet and J.S. Varré
Algorithms for Molecular Biology 2007, 2:15

License

This project is licensed under the GPL3 license. You are free to use, modify, and distribute it as you see fit. The program is provided as is, with no guarantees.

pytfmpval's People

Contributors

j-andrews7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pytfmpval's Issues

logOddsScore in Matrix.h not affecting p-value or score

Hello,

I'm changing logOdds functions in Matrix.h, but seems that it didn't affect anything.

        // mat[k][p] = log((mat[k][p] + 0.25) /(sum + 1)) - log (background[k]); 
        mat[k][p] = 0  ; 
        // mat[k][p] = log(mat[k][p]) - log (background[k]); 

After changing the score, I re-installed python setup.py install.

It is tested on:

>>> from pytfmpval import tfmp
>>> m = tfmp.create_matrix("MA0045.pfm")
>>> tfmp.score2pval(m, 8.7737)
9.992625564336777e-06
>>> tfmp.pval2score(m, 0.00001)
8.773708000000001

Always gave the same result.

Inconsistent/intransparent score definitions

# example from README
from pytfmpval import tfmp
mat = (" 3  7  9  3 11 11 11  3  4  3  8  8  9  9 11  2"
       " 5  0  1  6  0  0  0  3  1  4  5  1  0  5  0  7"
       " 4  3  1  4  3  2  2  2  8  6  1  4  2  0  3  0"
       " 2  4  3  1  0  1  1  6  1  1  0  1  3  0  0  5"
      )
m = tfmp.read_matrix(mat)
print(tfmp.pval2score(m, 0.00001))  # 8.773708000000001
print(tfmp.score2pval(m, 8.7737))  # 9.992625564336777e-06

# my addition
print(m.minScore)  # 0
print(m.maxScore)  # 435430296

The cited paper defines the score as the sum of one element per column (representing the nucleotides).

  1. the minimal score of the matrix cannot be zero, as there are multiple columns which do not contain a zero. It is 10 (2+0+1+1+0+0+0+2+1+1+1+1+0+0+0+0).
  2. the maximal score cannot be 4.35e8, it is 132 (5+7+9+6+11+11+11+6+8+6+8+8+9+9+11+7)
  3. the score result of pval2score obviously is a different kind of score as the one given by m.minScore and m.maxScore and also different from the definition in the paper.

Can someone please clarify on the score definitions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.