Giter Site home page Giter Site logo

biotip_application's Introduction

BioTIP_Application: a collection of transcriptomic data analysis using the package BioTIP

An Overview of BioTIP

In short, BioTIP is an R-package designated for the characterization of biological tipping points. A more detailed overview - (1) what are tipping points, (2) how BioTIP improves on existing methods, (3) what datasets can BioTIP be applied to, and (4) how to install BioTIP - can be found at BioTIP.

BioTIP workflow

The general workflow of BioTIP is detailed in the following chart:

BioTIP workflow

Examples of BioTIP applied to various datasets can be found in the folder Rcode.

We here present two cases in the following sections, one with a single-cell RNA dataset and the other with a bulk RNA dataset.

Example with Single-Cell RNA Dataset: BloodNet (Nestorowa 2016)

The following figure shows BioTIP's analytic pipeline of single-cell RNA-seq analysis, as an alternative to differential expression analysis - instead of discovering marker genes from the differential-expression analysis between stable states, BioTIP introduces the tipping-point analysis that focuses on unstable transition states. The orange arrows show that BioTIP is applied to cell state ensembles (clusters) without the pseudo-order information. The dashed orange arrow shows that BioTIP may use pseudo-order information to focus cell states on the trajectory of interests.

BioTIP workflow

We will perform downstream analysis of the existing the BloodNet dataset, the hematopoietic stem and progenitor cells (HSPCs) from ten female mice. The detail of the experiment can be found in details on our Vignette.

Note that, as mentioned in 'Why BioTIP?', multiple tipping points may coexist during an observed progression and multiple CTSs may coexist in the same critical transition state. We set a parameter n in our function getTopMCI to incorporate situations where multiple tipping points coexist; we also set a parameter n in our function getMaxMCImember to incorporate situations where multiple CTSs coexist. An example of how this function is used is in the Identifying Dynamic Network Biomodule (MCI) section of the Vignette.

Example with Bulk RNA dataset: GSE6136

As indicated in the BioTIP workflow chart, the analysis can be divided into the following two sections:

  1. Data preprocessing before running BioTIP,
  2. Standard Identification in 6 steps, detailed in the BioTIP workflow chart.

Data preprocessing before running BioTIP

Data Preprocessing

An existing dataset, GSE6136, is used to demonstrate how our functions are applied. Samples were collected from transgenic mouse lymphomas and divided into five groups based on clinical presentation, pathology, and flow cytometry (Lenburg 2007), thus belonging to cross-sectional profiles. Noticing these five group represent a control stage similar to non-transgenic B cells and four major periods of B-cell lymphomagenesis, Dr. Chen and coauthors used the DNB method to identify the pre-disease state exists around the normal activated period (P2), i.e., the system transitions to the disease state around the transitional lymphoma period (Figure S12 in publication (Chen 2012)). Start by installing the package 'BioTIP' and other dependent packages such as stringr, psych, and igraph if not. Below are some examples.

# load package
library(BioTIP)
# first row of the GEO-downloaded matrix should be column-name.
data(GSE6136_matrix)
GSE6136 = GSE6136_matrix[,-1]  
rownames(GSE6136) <- GSE6136_matrix[,1]
dim(GSE6136)               #[1] 22690 rows and 26 columns
rm(GSE6136_matrix)

The summary of GEO-downloaded clinical information GSE6136_cli is shown below.

#requires library(stringr)
library(BioTIP)
data(GSE6136_cli)

#dim(GSE6136_cli) #check dimension
cli = t(GSE6136_cli)

library(stringr)
colnames(cli) = str_split_fixed(cli[1,],'_',2)[,2]
cli = cli[-1,]
cli = data.frame(cli)
cli[,"cell-type:ch1"] = str_split_fixed(cli$characteristics_ch1.1,": ",2)[,2]
cli[,"Ig clonality:ch1"] = str_split_fixed(cli$characteristics_ch1.3,": ",2)[,2]

colnames(cli)[colnames(cli) == "cell-type:ch1"] = "group"
cli$Row.names = cli[,1]
head(cli[,1:3])

We normalized the expression of genes using log2() scale. This normalization will ensure a more accurate comparison of the variance between the expression groups (clusters).

df <- log2(GSE6136+1)
 df[1:3,1:5]
#           GSM142398 GSM142399 GSM142400 GSM142401 GSM142402
#1415670_at  10.21917  10.29048  9.987548  10.07682  9.827343
#1415671_at  10.90358  11.15993 10.776186  10.93000 11.468268
#1415672_at  11.11530  10.89209 11.091303  11.04029 11.109504

The sample assembles are previously published.

tmp <- names(table(cli$group))
cli$group = factor(cli$group, levels=c('resting','activated','lymphoma_marginal','lymphoma_transitional','lymphoma_aggressive'))
samplesL <- split(cli[,1],f = cli$group)
names(samplesL)
#[1] "resting"               "activated"             "lymphoma_marginal"    
#[4] "lymphoma_transitional" "lymphoma_aggressive"  

Go to Top

Standard Identification in 6 steps

S1: Finding Tipping Point

The first step is to calculate an Index of Critical transition (Ic score) of the dataset. One can use the function 'getIc' to calculate the Ic score based on randomly selected genes (20-1000). 'GetIc' has a key parameter fun which gives the options to call the existing Ic score or the proposed new Ic* score.

RandomG <- sample(rownames(df), 100)
IC.new <- getIc(df, samplesL, RandomG, fun='BioTIP')
plotIc(IC.new, las = 2)
IC.old <- getIc(df, samplesL, RandomG, fun='cor')
plotIc(IC.old, las = 2)

To estimate the distribution of random Ic scores, one can call the function simulation_Ic.

simuIC <- simulation_Ic(length(RandomG),samplesL,df, fun='BioTIP')
par(mar = c(10,5,0,2))
plot_Ic_Simulation(IC,simuIC,las = 2)

Go to Top

S2: Pre-selection Transcript

Once pre-processed, we can analyze the clinical-stage ensembles. Here, we demonstrate a pre-selection of 226 (1%) genes for each state.

test <- sd_selection(df, samplesL,0.01)
class(test)  #[1] "list"
lapply(test, dim)
#$resting
#[1] 226   5
#
#$activated
#[1] 226   3
#
#$lymphoma_marginal
#[1] 226   6
#
#$lymphoma_transitional
#[1] 226   5
#
#$lymphoma_aggressive
#[1] 226   7

Go to Top

S3: Network Partition

A graphical presentation of genes of interest can be achieved using the following functions. The getNetwork function will obtain an igraph object based on Pearson Correlation of test. This igraphL object is then run using the getCluster_methods function classify nodes.

library(igraph)
library(psych)
igraphL <- getNetwork(test, fdr = 1)

cluster <- getCluster_methods(igraphL)
names(cluster)
class(cluster)  #[1] "list"
class(cluster[[1]])  #[1] "communities"

Go to Top

S4: Identifying Dynamic Network Biomodule

Here, ‘module’ refers to a cluster of network nodes (e.g. transcripts) highly linked (e.g. by correlation). “Biomodule” refers to the module resenting significantly highest score called “Module-Criticality Index (MCI)” per state.

The following step shows a graph of classified clustered samples for five different states. MCI score is calculated for each module using the getMCI function. The getMaxMCImember function will obtain a list of modules with the highest MCI at each stage. Use "head(maxCIms)" to view the MCI scores calculated. Use plotMaxMCI function to view the line linking the highest MCI score at each state.

membersL_noweight <- getMCI(cluster,test,adjust.size = FALSE)
plotBar_MCI(membersL_noweight,ylim = c(0,6))
maxMCIms <- getMaxMCImember(membersL_noweight[[1]],membersL_noweight[[2]],min =10)
names(maxMCIms)
names(maxMCIms[[1]])
names(maxMCIms[[2]])
head(maxMCIms[['idx']])
head(maxMCIms[['members']][['lymphoma_aggressive']])

To get the selected statistics of biomodule (i.e., the gene module that presents the highest MCI score) of each state, please run the following

biomodules = getMaxStats(membersL_noweight[['members']],maxMCIms[[1]])
maxMCI = getMaxStats(membersL_noweight[['MCI']],maxMCIms[[1]])
head(maxMCI)
maxSD = getMaxStats(membersL_noweight[['sd']],maxMCIms[[1]])
head(maxSD)

To get the biomodule with the highest MCI score among all states, as we call it CTS (Critical Transition Signals), please run the following.

CTS = getCTS(maxMCI, maxMCIms[[2]])

Run the following to visualize the trendence of every state represented by the cluster with the highest MCI scores.

par(mar = c(10,5,0,2))
plotMaxMCI(maxMCIms,membersL_noweight[[2]],states = names(samplesL),las = 2)

We then perform simulation for MCI scores based on identified signature size (length(CTS) ) using the simulationMCI function.Use plot_MCI_simulation function to visualize the result. This step usually takes 20-30mins, so here to save time, we picked a small number 3 as the length of the CTS.

simuMCI <- simulationMCI(3,samplesL,df)
plot_MCI_Simulation(maxMCI,simuMCI)

Go to Top

S5: Evaluate Tipping Point

The next step is to calculate an Index of Critical transition (Ic score) of the dataset. First, use the getIc function to calculate the Ic score based on the biomodule previously identified. We use the function plotIc to draw a line plot of the Ic score.

IC <- getIc(df,samplesL,CTS)
par(mar = c(10,5,0,2))
plotIc(IC,las = 2)

Then use the two functions to evaluate two types of empirical significance, respectively. The first function simulation_Ic calculates random Ic-scores by shuffling features (transcripts). Showing in the plot is Ic scores of the identification (red) against its corresponding size-controlled random scores (grey).

simuIC <- simulation_Ic(length(CTS),samplesL,df)
par(mar = c(10,5,0,2))
plot_Ic_Simulation(IC, simuIC,las = 2)

Importantly, the function plot_simulation_sample calculates random Ic-scores by shuffling samples and visualizes the results. Showing in the plot is observed Ic-score (red vertical line) comparing to the density of random scores (grey), at the tipping point identified.

RandomIc = simulation_Ic_sample(df, sampleNo=length(samplesL[['lymphoma_aggressive']]), 
                                 Ic=IC[['lymphoma_aggressive']], genes=CTS, B=100)
plot_Ic_Simulation(IC, RandomIc, las = 2)                                 

Go to Top

S6: Evaluate Delta Score

The final step is to calculate the Delta Score, which evaluates the significance of our identified CTS.

plot_SS_Simulation(Ic,  Ic_simuG, 
                   main = paste("Delta of Ic*", length(CTS),"genes"), 
                   ylab='BioTIP 35 genes') # [1] 0 

plot_Ic_Simulation(Ic,  Ic_simuG,  las = 0,  ylim =  c(0,1.5),  
                   order = NULL,  main = 'BioTIP 35 genes',  
                   ylab = "Ic*",  fun = 'boxplot', 
                   which2point = NULL) 

plot_SS_Simulation(Ic,  Ic_simuG, 
                   main = paste("Delta of Ic*", length(CTS),"genes"), 
                   ylab='BioTIP 35 genes')

Go to Top

Go to Top

biotip_application's People

Contributors

xyang2uchicago avatar ysun98 avatar zhezhenwang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.