Giter Site home page Giter Site logo

adssrc's Introduction

adssrc

Some of the code I use in my own data analysis. Some of this might make it into smithlabcode repositories eventually.

collapsebed

This program takes a set of genomic intervals, and identifies each maximal contiguous sub-iterval that overlaps a fixed number of the input intervals. Then the set of sub-intervals overlapping more than some specified number of the original intervals are reported.

majormethstate

The proagram looks at each "epiread" in an epiread formatted file (of the kind used in amrfinder) and determines if the number of methylated states in the read is lower than 0.5. If so, then the states are inverted: all C changed to T, and all T changed to C. Then the epiread is printed to a file or stdout. If you don't know why this might be a useful thing to do, then you probably don't need to do it.

smoothmeth

In methpipe the hmr program uses a 2-state HMM to identify hypomethylated regions (HMRs) using a beta-binomial model for emissions. This program takes the same approach for smoothing methylation levels. Each of the 2 states has an average methylation level, and the posterior probabilities for state occupancy at any CpG site are extracted from the HMM. So if p is the posterior for being in the high methylation state, and h is the methylation level for the "high" state and l is the methylation level for the "low" state, then the smoothed level is ph + (1 - p)*l. This program might eventually replace how we build tracks for single-CpG methylation levels.

tsscpgplot

I use this program to make meta-gene plots of methylation levels around transcription start sites (TSS) but also around other fixed landmarks in the genome. The program takes a BED format file, and processes it in a stranded way, but currently it only works if the BED interval is a single site, so the interval has size 1.

adssrc's People

Contributors

andrewdavidsmith avatar bdecato avatar jqujqu avatar mengzhou avatar xjlizji avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jqujqu

adssrc's Issues

How should we view the use of the PhyloTree in our algorithms

My thinking is that for our current purpose the PhyloTree is only used once as a tree object (for each tree that we will consider) in order to construct the proper state transitions, and maybe extract branch lengths associated with state transitions.

If this is true, then we should try to keep anything related to a "neighbor" or a "state" separate from the tree itself. These are not associated with general behaviors of the tree, but rather with our specific use of the tree. Keeping this separation would reduce coupling of software components, making them more easily understood, maintained, debugged, and reused in other contexts. I think probably we can have PhyloTree as a first class object, with methods corresponding to natural and general behaviors, while still having it work just as efficiently when we use it to try and get state transition information. But I'm not completely sure. @jqujqu do you have any thoughts on this?

bit set should never be used

I think Stroustrup recognized this as a mistake from the beginning. The bit set container should never have existed.

collapseBed should output mean score rather than count

Currently, collapseBed outputs a set of BED regions with a score that is the "count:" a counter for the number of currently overlapping BED regions.

Instead, it would be nice to keep a vector of the scores for the currently open BED files and output the mean of those scores each time a BED region is output.

has_mutated function in methcounts2

This condition:
return is_cytosine(base) ? (cs.nA > 2_(cs.nG + 1)) : (cs.pT > 2_(cs.pC + 1));
marks a cytosine base as mutated if more than 1/3 of the complementary bases are A. Why is that? I think we only care about CpG deamination so should the condition be is_cpg(base)?
In the other case, the condition that if a guanine base has more than 1/3 of complementary bases are T will almost always occur if the CpG is unmethylated, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.