Giter Site home page Giter Site logo

greg-ensembl's People

Contributors

gjuggler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

jergosh

greg-ensembl's Issues

Collect whole-genome alignments based on Compara alignments

For a given ProteinTree alignment, use the Compara API to collect an equivalent alignment derived from the EPO pipeline. Things to do might include:

  1. Filter out bad Compara gene trees based on lack of overlap with the EPO alignments.
  2. "" for individual exons or aligned residues.

Generate sets of genes under clade-specific relaxed or increased purifying constraint

This can be calculated in two ways for each gene:

  1. Calculate the overall mean dn/ds (perhaps excluding pos-sel sites) for each gene, then compare these overall values between sub-clades
  2. Do site-wise comparisons between one sub-clade and either (a) other individual sub-clades, i.e. primates vs glires, or (b) the complement of that sub-clade. Count up either the number/proportion of sites where the sub-clade of interest is significantly lower (LRT-like).

(Note: this can also be done by collating on domains, etc...)

Real-time visualization of inferred vs true alignment residues

This needs to be done in Java / Processing, but it would be cool.

Given a (hidden) true alignment and a (shown) inferred alignment. Plot the entire inferred alignment, and when the user hovers over a given residue, highlight (a) the current column of inferred homologous residues and (b) the (potentially scattered) set of truly homologous residues.

Compute gc3 and gci/gcf for genes

These values per gene tree could be useful for isochore / codon bias analysis...

From Lavner and Kotlar 2005:

For the first three of the four methods described above, we need non-coding sequences neighboring a given gene (the fourth method, MCB, uses the coding sequence itself). We used the sequence consisting of the introns of the gene, the 1000 nucleotides immediately preceding the coding area of the gene, and similarly, those 1000 nucleotides immediately succeeding it (or truncated, as necessary, in the case that genes were less than 1000 nucleotides apart; see also Hey and Kliman, 2002 and Urrutia and Hurst, 2003). If an intron is longer than 2000 bp, only the 1000 nucleotides on each of the intron's ends were taken. By taking 1000 flanking bases, we assure that regions that may be under selective constrains, both in flanking regions and introns, constitute only a small portion of the strands that are used as control. On the other hand, regions of large introns that are far from any coding sequence may not represent the mutational bias that acts on the nearby exons, and thus introns were truncated to 1000 bases on each end. We masked repetitive elements using RepeatMasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker).

Simulate domain-loop structures with differing indel rates

In order to achieve genome-wide bootstrap simulations, we need to go one level deeper than random simulations with indel processes. Domain-loop structure seems to be the next logical level to get at.

On the Slrsim side, we should have a parameter which accepts a Perl arrayref that defines a series of "domains", where each domain has a separate set of simulation parameters. These are then sent to Indelible as different blocks for simulation in consecutive order.

Non-interactive visualization of inferred vs true alignments

In the SLRsim project, it would be nice to directly compare the inferred alignments with the true alignment. We could do this by either:

  1. Coloring columns of the inferred alignment with less than a given % cutoff of correctly-placed homology inferences.
  2. Coloring individual residues with less than a given % of correct pairwise homology inferences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.