Giter Site home page Giter Site logo

genomicsnx / ouija Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kieranrcampbell/ouija

0.0 1.0 0.0 3.09 MB

Incorporating prior knowledge in single-cell trajectory learning using Bayesian nonlinear factor analysis

R 50.95% Shell 0.50% Stan 2.71% C++ 45.84%

ouija's Introduction

Ouija

Ouija is a statistical framework for learning pseudotimes from single-cell RNA-seq data using only small panels of marker genes and prior knowledge of gene behaviour. The user can place priors on whether a given gene is up or down regulated along a trajectory, as well as where in the trajectory the regulation happens. Under-the-hood Ouija uses Bayesian hierarchical nonlinear factor analysis as implemented in the probabilistic programming language Stan.

Getting started

Installation

# install.packages("devtools")
devtools::install_github("kieranrcampbell/ouija")

To build the Ouija vignette install using

devtools::install_github("kieranrcampbell/ouija", local = FALSE, 
                          args = "--preclean", build_vignettes = TRUE)

Model fitting

Input is a cell-by-gene expression matrices that is non-negative and represents logged gene expression values. We recommend using log2(TPM + 1).

library(ouija)
data(synth_gex) # synthetic gene expression data bundled

Prior expectations in Ouija are encoded by whether we expect genes to turn on or off along the trajectory through the k_means parameter, with one for each gene. This is discussed further below.

strengths <- 5 * c(1, -1, 1, -1, -1, -1)
oui <- ouija(synth_gex, strengths)

Ouija also supports Variational Bayes using the inference_type = "vb" argument in ouija. For more details see the vignette.

Plotting

Since we've performed MCMC inference we should check the convergence:

plot(oui, what = "diagnostic")

We can then plot the gene behaviour at the MAP pseudotime estimates:

plot(oui, what = "behaviour")

An informative way to understand the uncertainty in the ordering is to look at the gene expression trace plot:

plot(oui, what = "heatmap")

We can also plot how the prior compares to the posterior over the activation parameters, such as strength or time:

plot(oui, what = "pp")

Extracting the pseudotimes

Point estimates

MAP estimates (ie point estimates) may be extracted using the map_pseudotime function for further downstream analysis:

t_map_estimate <- map_pseudotime(oui)

For example, if you're working with an SCESet object from the scater package, you could add this as a new variable in the phenotype data slot:

pData(sce)$Pseudotime <- map_pseudotime(oui)

If you wish to take a more Bayesian approach, the full set of pseudotime samples can be extracted from the underlying stanfit object, which is held in the $fit position of any ouija_fit object. The pseudotimes are encoded in the parameter t. To get the sample-by-cell matrix:

posterior_pseudotime_traces <- rstan::extract(oui$fit, "t")$t

Posterior errors

One of the advantages in performing a Bayesian pseudotime analysis is that we can quantify the posterior uncertainty in the pseudotimes of each cell. To do this we can find the highest probability density (HPD) credible interval, which can be thought of as the Bayesian equivalent of a confidence interval. However, unlike a confidence interval, it does have the interpretation that there's a 95% chance the parameter (here the pseudotimes) will fall within the interval. To find the credible interval call

cred_int <- pseudotime_error(oui)

which returns a matrix with two columns, with the first column corresponding to the lower interval and the second corresponding to the upper interval. By default the interval probability is 95%, but this can be adjusted using the prob parameter.

Incorporating prior information

Ouija assumes gene expression across pseudotime will follow a particular pattern known as a sigmoid. This is a very general pattern that accounts for virtually all forms of monotonically increasing / decreasing expression, but is also very powerful in that we can place priors on where in a trajectory a gene turns on or off and how quickly it turns on or off. We discuss each of these separately below.

Activation strength

The activation strength parameter (known in the statistical model as k) dictates how quickly a gene turns on or off. A large magnitude means the gene turns on/off quickly, while a small magnitude means the gene turns on/off slowly (and in some cases, approximately linearly). The sign of the strength parameter dictates whether the gene turns on (positive) or off (negative) along the trajectory. Examples for various k are in the gif below.

To incorporate prior information about the activation strength we place a N(μk, σk) prior on k for each gene. The mean parameter μk describes the expected behaviour as above, while the standard deviation parameter σk controls how sure we are of this behaviour.

For example, if we expect a gene to turn on quickly, we may impose μk = 20. If we are sure of this we would then place a small standard deviation on this, e.g. σk = 1, while if we are not sure we may place a more diffuse standard deviation, e.g. σk = 5. These two examples are shown below in plots A and B respectively.

The parameters are encoded in the ouija function via the parameters strengths and strength_sd for μk and σk respectively.

Activation time

The activation time (known in the statistical model as t0) tells us where in the trajectory a gene turns on or off. The pseudotimes are defined between 0 and 1, so if t0 = 0 then we expect the behaviour to occur at the start of the trajectory, t0 = 0.5 it would occur in the middle and t0 = 1 it would occur towards the end. Examples for various t0 are shown in the gif below.

To incorporate prior knowledge of t0 we place a N(μt, σt) prior on it. As before, μt gives is the expected position in the trajectory, while σt tells us how sure of this we are. Examples are shown below for a gene we expect to turn on in the middle of the trajectory (μt = 0.5), and two situations where we are very sure of this σt = 0.1 and not so sure σt = 1 in A and B respectively.

The parameters are encoded in the ouija function via the parameters times and time_sd for μt and σt respectively.

Authors

Kieran Campbell & Christopher Yau
Wellcome Trust Centre for Human Genetics, University of Oxford

Artwork

ouija's People

Contributors

kieranrcampbell avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.