Giter Site home page Giter Site logo

genvisr's Introduction

Build Status codecov

GenVisR

Please cite: "Skidmore et al. 2016 GenVisR: Genomic Visualizations in R Bioinformatics 32, 3012-3014" pubmed

Bioconductor

Intuitively visualizing and interpreting data from high-throughput genomic technologies continues to be challenging. "Genomic Visualizations in R" (GenVisR) attempts to alleviate this burden by providing highly customizable publication-quality graphics supporting multiple species and focused primarily on a cohort level (i.e., multiple samples/patients). GenVisR attempts to maintain a high degree of flexibility while leveraging the abilities of ggplot2 and bioconductor to achieve this goal.

Install from Bioconductor

For the majority of users we recommend installing GenVisR from the release branch of Bioconductor, Installation instructions using this method can be found on the GenVisR landing page on Bioconductor.

Please note that GenVisR imports a few packages that have "system requirements", in most cases these requirements will already be installed. If they are not please follow the instructions to install these packages given in the R terminal. Briefly these packages are: "libcurl4-openssl-dev" and "libxml2-dev"

Development

Development for GenVisR occurs on the griffith lab github repository available here. For users wishing to contribute to development we recommend cloning the GenVisR repo there and submitting a pull request. Please note that development occurs on the R version that will be available at each Bioconductor release cycle. This ensures that GenVisR will be stable for each Bioconductor release but it may necessitate developers download R-devel.

We also encourage users to report bugs and suggest enhancements to GenVisR on the github issue page available here:

To install the latest development version of GenVisR (not recommended for most users):

# install and load devtools package
install.packages("devtools")
library(devtools)

# install GenVisR from github
install_github("griffithlab/GenVisR")

Vignettes

Documentation for GenVisR can be found on the bioconductor landing page in the form of vignettes available here GenVisR. Tutorials can also be found on biostars.org. Vignettes can also be viewed from within R.

# view vignettes
vignette(package="GenVisR")

genvisr's People

Contributors

ahwagner avatar alanocallaghan avatar baptiste avatar cbrueffer avatar dtenenba avatar gatoravi avatar hpages avatar jwokaty avatar lbeltrame avatar malachig avatar mmoisse avatar nturaga avatar obigriffith avatar vobencha avatar zlskidmore avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genvisr's Issues

gene_plot error detected for CBFB gene

PTEN has only 1 transcript in UCSC however three transcripts are being plotted, this might be a bug in the master table creation within this function.

Cache txdb for given range

We should investigate caching all relevant data from txdb in first call, to speed up subsequent calls to the same region.

waterfall hierarchy does not remove duplicate entries

The waterfall_hierarchyTRV function which is designed to selectively remove mutations based on a hierarchy does not remove duplicate entries. as an example:
if in =
samp1 MLL3 missense
samp1 MLL3 missense
samp1 MLL3 intronic

out would =
samp1 MLL3 missense
samp1 MLL3 missense

This does not matter for plotting in the main plot however it would affect the mutation recurrence cutoff parameter in theory.

The fix should be to just unique the data frame at the end of the function

Point binning

We should introduce a means of binning points (like in a coverage plot) to a reasonable size, given the parameters of the resulting plot.

Intron ribbons

We need to highlight when we are viewing a compressed region.

Gene buffers

We need a mechanism for identifying distinct genes, and keeping a minimum gene.buffer distance between them.

Intron buffer

We need to remove the large intron before/after the gene in the gene_plot data frame. Consider adding a specified intron buffer on either end of the data frame (default 1kb?)

allow user to select isoform in gencov function

Often a user may only be interested in select isoforms, we should allow the user the ability to select which isoforms are desireable to display.

Currently the only options that exist are to display all isoforms or to reduce into a summary view.

Add calculate gender parameter

LOH plots should reflect whether or not there are two 'X' chromosomes. Default status should be to not calculate 'X', unless user specifies to.

mutSpec

When plotting clinical colors, it's currently difficult to differentiate between variables. For example, if I wanted ER status to be (positive=blue) and HER2 status to be (positive=red), I'd have to rename the variable values to something like HER2_positive, ER_positive, HER2_negative, etc. An alternative solution would be for the user to input a clinical data frame of colors (rather than variable values).

After opening a PDF graphics device, calling mutSpec multiple times only produces a single page of a plot. I'm guessing the layers from different mutSpec calls are getting put on top of each other.

You may want to rename the mutRecur.layers and main.layers. When I was trying to add ggtitle as a layer to the entire plot, I assumed it would be added to main.layers. But adding it to main.layers plotted nothing (ggtitle gets overridden internally there).

When drop_mutation=T, the mutation type colors can change depending on which types are present for a given set of samples. I think it would make more sense to have a static set of default colors that the user can alter manually.

Install requires RMySQL

Hi there,

When I tried to install using the instructions in README.md, I got the following error:

> devtools::install_github("griffithlab/GenVisR")
Downloading GitHub repo griffithlab/GenVisR@master
Installing GenVisR
Installing 1 packages: FField
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore CMD INSTALL  \
  '/private/var/folders/zz/zyxvpxvq6csfxvn_n00004c0000130/T/RtmpRvZbaZ/devtools73056e75bea7/griffithlab-GenVisR-17ba59b'  \
  --library='/Library/Frameworks/R.framework/Versions/3.2/Resources/library' --install-tests 

* installing *source* package ‘GenVisR’ ...
** R
** data
*** moving datasets to lazyload DB
** tests
** preparing package for lazy loading
Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
Warning in .recacheSubclasses(def@className, def, doSubclasses, env) :
  undefined subclass "externalRefMethod" of class "expressionORfunction"; definition not updated
Warning in .recacheSubclasses(def@className, def, doSubclasses, env) :
  undefined subclass "externalRefMethod" of class "functionORNULL"; definition not updated
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘RMySQL’
ERROR: lazy loading failed for package ‘GenVisR’
* removing ‘/Library/Frameworks/R.framework/Versions/3.2/Resources/library/GenVisR’
Error: Command failed (1)

After running install.packages('RMySQL') and re-running devtools::install_github("griffithlab/GenVisR"), the install completed successfully.

Reduce functionality broken in geneViz

Somewhere along the way the reduce functionality was broken, (i.e. currently the function errors out if reduce is set to TRUE)

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

library(BSgenome.Hsapiens.UCSC.hg19)
genome <- BSgenome.Hsapiens.UCSC.hg19

gr <- GRanges(seqnames=c("chr10"), ranges=IRanges(start=c(89622195), end=c(89729532)), strand=strand(c("+")))

geneViz(txdb, gr, genome, reduce=TRUE)

I think this occurred with the addition of txnames in the data frame.
@ahwagner can you take a look at this when you get a chance?

GenCov additional data quality filters needed

After looking over this code some more I feel additional quality checks would be useful to limit potential problems encountered by users, specifically:

if a user specifies an ambiguous strand in the granges object we should grab features for both strands, currently ambiguous strands are not supported

we should check that if a user specifies an isoform it actually exists, and if not return an error, currently the code will error out with an uninformative message

consistent color palletes

Currently the MAF and the MGI file inputs have different color palletes which is confusing when switching between them (e.g., Nonsense mutations are grey in the MAF format while Silent mutations are grey in the MGI format).

Cleanup Namespace

Currently the package depends on "UniProt.ws", putting this package in the imports field instead of depends will cause lolliplot to fail.

It would be good to eliminate the dependency if possible, more research is needed to determine the cause of this.

Lolliplot should have a parameter for hard cutoffs when stacking points

Currently there is no limit for points stacking on one another, this behavior would make the graphic unreadable given a large degree of stacking (100+). a parameter should be created to eliminate this behavior, further gene height should be set to be proportional to the graphic device to avoid inconsistent gene heights between graphics.

example figure for genCov draws UTRs inconsistently

Notice that in the vignette example for genCov the UTRs for the different isoforms look odd. In the top isoform there is just one UTR feature plotted in the middle of the gene. I the second isoform the UTRs are plotted on top of and overlapping with the coding portion of the first and last exon. In the third isoform the UTRs look conventional (as expected).

rename mutSpec function?

The mutation landscape function is currently named mutSpec. This could be confusing as it makes me think of mutation spectrum which this plot is not and for which we kind of have another function (TvTi).

Add coersion of factors entered in the "genes" argument of mutSpec

I got the following error when using a factor for my gene list

mutSpec(tNHL_variants, file_type="MGI",label_x=T,rmv_silent=T,genes=three$gene_name)
Error: Aesthetics must either be length one, or the same length as the dataProblems:gene, trv_type

However, this worked.

mutSpec(tNHL_variants, file_type="MGI",label_x=T,rmv_silent=T,genes=as.character(three$gene_name))

Add in "User Intron" space

In gene_plot.R, we should provide some mechanism for leaving specified intronic regions uncompressed if the transformIntronic flag is set to true.

Does conversion of c.notation to p.notation really work as described?

In the vignette docs it is stated "It is recommended for amino acid change to be in p. notation however lolliplot will attempt to convert from c. notation to p.notation by subtracting the 5’ UTR transcript length from the c. coordinate, when employing this functionality the user must specify an ensembl data set via the ensembl.dataset parameter." Isn't the c.notation at the cDNA/transcript level? Converting to p.notation would require more than just subtracting the UTR length.

number of samples in mutSpec is incorrect

During the various optional subsets of data in mutSpec samples have the potential to be removed from the levels of the data frame's the funciton uses. When plotting the title the levels of x$sample is grabbed to plot n=x.

This functionality used to work fine but now if a sample is plotted with nothing (i.e. NA) it is not counted toward the number of levels resulting in a lower than expected number.

Summary: Everything is still plotted as it should be but the way n is calculated for the title needs to be fixed.

Set X-axis limits on coverage plot

Currently X-axis limits are inferred from the coverage input file and the user defined Granges object for the coverage and gene plots respectively, The function expects that the Granges object matches the coverage in terms of range.

This should be changed, we should set the x-axis limits in ggplot based on the Granges object the user defines.

Bug when specifying a small genomic range

When specifying a small range in a genomic range object the coverage plot produces an error:

Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, :
Empty raster

To reproduce specify a grange object for input as follows:
gr <- GRanges(seqnames=c("chr16"), ranges=IRanges(start=c(67063051), end=c(67063191)), strand=strand(c("+")))

File Format Conversion Function

I think it would be helpful to have a function (or series of functions) that convert from one file format to another. For example, from a MAF file to the long data frame format that a lot of genviz functions take. Maybe also something handy like a MAF to VCF (and vice versa) converter for offline use. There are probably other file formats too that could be included.

Warnings: In loop_apply(n, do.ply) : Stacking not well defined when ymin != 0

Happens on the gene samples with mutation subplot of the waterfall plot, it occurs because the x axis is reversed, i.e. negative becomes positive, a necessity for matching the main plot.

It would be good to get rid of this warning somehow, wither suppress it or the warning might go away if the code is modified to use stat='identity'

Error: Error: Results must be all atomic, or all data frames

In the mutations_heatmap function, subfunctions plot_bar an error occurs when processing a large data sets.

For example running the code with a recurrence cutoff of 20 seems to work fine however a recurrence cutoff of 0 will produce the error. needs to be looked into further (test file is bcla maf file).

add strand information to genCov plot

We should add arrows to denote strand for the gene features in the genCov plot (similar to UCSC). This would require grid and the arrow parameter in geom_segment(), a new data frame will have to be created to accomplish this.

Transition/Transversion ratio add on

From a quick literature search the Transition/Transversion ratio can significantly vary not only between species but also on the type of data, WGS/Exome/Mitochondrial etc.

Given this I propose letting the user add in the rates as a pre-defined data frame structure if that is something the user wishes to plot.

Support for addtional file_types in waterfall plot

Currently only TGI annotation files and MAF version 2.4 are supported by the waterfall plot, It might be worthwhile to support additional file types (VCF, older MAF versions, etc.)

This would mean adding code in the following functions:
hiearchial_remove_trv_type.R
mutation_heatmap.R
plot_heatmap.R

Option to reduce resolution of genCov

We should have an option to reduce the resolution of genCov by x%. For most cases it is not necessary to plot at single base resolution.

At the least this would help speed up vignette creation

Lolliplot fetch domain function inoperable

The biomart query used in lolliplot.fetchdomain no longer works, I have checked the code has not changed since its inception in Feb.

Either biomaRt query structure for the interProd-1 database has changed that biomaRt functionality is broken.

I believe it will be possible in the interim to change the biomaRt query and restore functionality using the ensembl mart however it would be best to somehow store protein domain information within GenVisR or set up a server to hold this information.

at least for H.sapiens

Bug In Lolliplot disconnect between amino acid change and protien

Currently the Protein plotted as well as the domains are in amino acid coordinates, a discrepancy would occur if for the amino_acid_change column someone input c. nomenclature instead of p. (i.e. gave the coding dna sequence location instead of amino acid sequence location)

This needs to be corrected by requiring p. or converting everything into that

alternativley switch everything to c. and require that.

function to chage is: mutationObs.R and possibly cosmicObs.R

Collapse UTRs?

We should have a utr.collapse flag indicating if (and how much) to scale UTRs, since they're apparently huge.

Add parallel backend for cpu intensive functions

For a few functions that take over 10 seconds a parallel backend will significantly speed things up, specifically genCov, TvTi, and lohView seem like optimal candidates.

bioconductor recommends using BiocParallel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.