Giter Site home page Giter Site logo

vpang's Introduction

vPanG

1 Introduction

1.1 Pan-genome and PAV analysis

  Pan-genome is the total genes found across members of a given population, revealing the diversity and functional potential within that population. PAV(Presence/absence variation) analysis is an essential step in pangenome analysis. The core genome contains genes shared by all individuals and the distributed genome consist of genes not shared by all. The distributed genome can be further divided into genes shared in most members (soft-core genes), genes shared between some members (distributed/accessory genes), and genes present in only one member (unique/private genes).

  “Map-to-pan” is a common strategy for eukaryotic pan-genome analyses. A pan-genome is first constructed by integrating the reference genome and non-reference sequences. Then, reads are aligned to the pan-genome and the percentage of gene region and coding region covered by read alignment are examined. Finally, gene presence/absence variations (PAVs) are determined and consequent analysis is carried out based on this resulted PAV table.

1.2 The core processing workflow of vPanG

  The tool vPanG is efficient to explore and visualize the complex results in PAV analysis. It provides four modules to 1) display gene coverage distribution, 2) analyze and visualize PAV table, 3) estimate pan/core genome sizes by simulation, and 4) find phenotype-associated genes.

  The workflow starts with an object of COV(module 1) or PAV(module 2-4) class. It can be produced by function get_cov_obj() and get_pav_obj(). It contains coverage/PAV matrix, arguments, phenotype information, and gene information, and will be the main input for the subsequent steps.

  • Module 1 focuses on showing the gene coverage and the distribution of gene coverage among individuals. The functions in module 1 are cov_heatmap() and cov_density().

  • Module 2 focuses on PAV analysis, including overview, classifying genes, clustering, PCA analysis. The functions for PAV analysis are pav_heatmap(), pav_hist(), pav_stackbar(), pav_cluster(), pav_halfviolin() and pav_pca().

  • Module 3 focuses on pan/core/private genome size estimation by simulation and drawing growth curve. The functions in module 3 are sim_stat(), sim_plot(), sim_multi_stat() and sim_multi_plot.

  • Module 4 focuses on phenotype association and visualization. The functions in module 4 include phen_heatmap(), phen_manhattan(), phen_block(), phen_bar() and phen_violin().

2 Installation

2.1 Installing R/RStudio

  In order to run vPanG, we need the R software environment, which can be freely downloaded as follows:

2.2 Check or install packages

packages <- c("snowfall", "data.table", "ggdendro", "ggplot2", "ggrepel", "ggsignif", "randomcoloR")
lapply(packages, function(x) {
	if(!require(x, character.only = TRUE)) {
		install.packages(x, dependencies = TRUE)
	}})
if (!requireNamespace("BiocManager", quietly = TRUE)){
  install.packages("BiocManager")
}
BiocManager::install("ComplexHeatmap")

2.3 Install metaFunc from github.

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")
library(devtools)
install_github("SJTU-CGM/vPanG", build_vignettes = TRUE)

3 User manual

  You can use vignette("vPanG") in R to view the vignette of vPanG. It can help you start using vPanG, including the format and examples of input data, and the use of functions. You can use ?function to view the help documentation of the function, such as ?pav_heatmap.

vpang's People

Contributors

xiaonui avatar

Stargazers

xl_g avatar LIU avatar

Watchers

James Cloos avatar CWei avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.