Giter Site home page Giter Site logo

playbase's Introduction

playbase

R package check Codecov test coverage

The `playbase´ package contains the core back-end functionality for the OmicsPlayground. This package allows you to run, develop, and test any essential functions used in the OmicsPlayground directly in the R console without needing to worry about the R Shiny front-end code.

Installation

You can install the development version of playbase from GitHub with:

# install.packages("devtools")
devtools::install_github("bigomics/playbase")

Data upload

The first step in any OmicsPlayground analysis is to upload data which can be used to create a pgx object. The pgx object is basically the core data structure in the OmicsPlayground upon which most analysis and plotting functions operate.

library(playbase)

# Here we check that your input files do not have problems

playbase::PGX_CHECKS # These are the possible errors you can encounter

# individual file checks

SAMPLES = playbase::pgx.checkINPUT(playbase::SAMPLES, type = "SAMPLES")
COUNTS = playbase::pgx.checkINPUT(playbase::COUNTS, type = "COUNTS")
CONTRASTS = playbase::pgx.checkINPUT(playbase::SAMPLES, type = "CONTRASTS")

# Checks across input files

INPUTS_CHECKED <- pgx.crosscheckINPUT(SAMPLES, COUNTS, CONTRASTS)

SAMPLES = INPUTS_CHECKED$SAMPLES
COUNTS = INPUTS_CHECKED$COUNTS
CONTRASTS = INPUTS_CHECKED$CONTRASTS

If no errors are reported (and PASS is TRUE), these new checked files SAMPLES, COUNTS and CONTRASTS can be used safely in the next step.

# Here we create a pgx object that can be used in Omics Playground.

# Step 1. create a pgx object

pgx <- playbase::pgx.createPGX(
 samples = playbase::SAMPLES,
 counts = playbase::COUNTS,
 contrasts = playbase::CONTRASTS
)

# Step 2. Populate pgx object with results

pgx <- playbase::pgx.computePGX(
  pgx = pgx
)

The pgx object

The core object in playbase is the pgx object. This object holds the raw data and any analysis results returned from playbase modules / boards. The pgx object is simply an R list. It contains minimally the following list items:

  • counts
  • samples
  • contrasts

A pgx object is created from these three list items via the following function:

my_pgx <- pgx.createPGX(counts, samples, contrasts)

Once a pgx object is created from these three items, the various playbase modules can operate on the pgx object to generate the analysis results relevant to that specific module.

Playbase modules

As mentioned above, the core object in playbase is the pgx object. This holds all of the analysis and results derived from the raw data, as well as the raw data itself. There are various modules in playbase that take a pgx object as input, perform some analysis on the raw data in the pgx object, and then append these results to the pgx object. These modules are more-or-less independent of one another and can therefore be parallelized or run in any arbitrary order.

The core playbase modules operate on either genes or genesets.

The gene modules are as follows:

  • ttest
  • ttest.welch
  • ttest.rank
  • voom.limma
  • trend.limma
  • notrend.limma
  • edger.qlf
  • edger.lrt
  • deseq2.wald
  • deseq2.lrt

The geneset methods are as follows:

  • fisher
  • gsva
  • ssgsea
  • spearman
  • camera
  • fry
  • fgsea

And extra modules are as follows:

  • meta.go
  • deconv
  • infer
  • drugs
  • graph
  • connectivity
  • wordcloud

playbase's People

Contributors

ivokwee avatar mauromiguelm avatar phisanti avatar ncullen93 avatar escri11 avatar shalinipandurangan avatar

Stargazers

John avatar  avatar

Watchers

 avatar  avatar

playbase's Issues

Error when running pgx.crosscheckINPUT example

I am just running some example data for testing different functions and got the following error. Below is the code:

  SAMPLES = playbase::pgx.checkINPUT(playbase::SAMPLES, type = "SAMPLES")
    COUNTS = playbase::pgx.checkINPUT(playbase::COUNTS, type = "COUNTS")
    CONTRASTS = playbase::pgx.checkINPUT(playbase::SAMPLES, type = "CONTRASTS")

    # Checks across input files

    INPUTS_CHECKED <- pgx.crosscheckINPUT(SAMPLES, COUNTS, CONTRASTS)
    )
[DBUG][2023-07-19 20:38:33][[? MB]] --- [UploadModule] 1 : dim.contrasts1 =  
[DBUG][2023-07-19 20:38:33][[? MB]] --- [UploadModule] 1 : dim.samples1   =
Error in if (dim(contrasts)[1] > dim(samples)[1] && PASS) { :
  missing value where TRUE/FALSE needed
In addition: Warning message:
In max(ncol(counts), nrow(samples)) :
  no non-missing arguments to max; returning -Inf

remove UTF-8 characters from gmt files

load(file="data-raw/extdata/gmt-all.rda",verbose=1)
Loading objects:
gmt.all
Warning message:
In load(file = "data-raw/extdata/gmt-all.rda", verbose = 1) :
input string 'RNASEQ:Amlexanox Ikk-Ɛ Tbk1 Nf-Κb GSE110206 1' cannot be translated to UTF-8, is it valid in 'CP1252'?

cannot install plsRcox

Apparently plsRcox is in BioConductor not in CRAN. Playbase install will fail because of dependency.

image

GSETS not in sync with playdata

  1. The GSET.rda in the data folder in playbase is not the same as playdata::GSETS.
  2. Eventually, playbase::GSETS should be deprecated in favor of playdata::GSETS.
  3. The use of iGSETS (GSETS with integer index) could be deprecated. Although iGSETS uses 100Mb and GSETS (with gene symbols) uses 1.4GB in memory. For memory reasons iGSETS would be preferable with accessor function getGSETS (as is now). This would reduce memory footprint. Needs to be done in playdata and playbase.

GSVA consumes much RAM

According to Mauro, GSVA might take up much RAM (perhaps more than 15Gb) during computation and might crash R process. This would need to be verified and mitigated.

Installation time needs to be greatly reduced by removing dependencies

Currently, it takes 40 - 60 minutes to install playbase. This is mostly due to the many dependencies. We need to greatly reduce this time - ideally to under 15 minutes. We can start by removing dependencies that are no longer needed, and then try to identify those dependencies which are only used once.

I don't think that moving dependencies from Imports to Suggests - thereby making them optional - is a viable strategy, because we need to assume that all dependencies are available in OmicsPlayground. This could help for people using playbase on its own however.

Error in pgx.computePGX when working on small datasets

Using a reduced version of the example dataset, the base pgx pipeline fails. See code below:

get_mini_example_data <- function() {
  counts <- playbase::COUNTS
  samples <- playbase::SAMPLES
  contrast <- playbase::CONTRASTS

  n_genes <- round(seq(1, nrow(counts), length.out = 100))
  
  # Subset each data frame to facilitate testing
  mini_counts <- counts[n_genes, c(1:3, 8:9, 11:12)]
  mini_samples <- samples[colnames(mini_counts),]
  mini_contrast <- contrast[1:3, 1:3]

  mini_data <- list(counts = mini_counts, samples = mini_samples, contrast = mini_contrast)

  return(mini_data)
}

d <- get_mini_example_data()

pgx2 <- playbase::pgx.createPGX(
 samples = d$samples,
 counts = d$counts,
 contrasts = d$contrast
)

x <- playbase::pgx.computePGX(pgx2)

The error is:

x <- playbase::pgx.computePGX(pgx2)
[pgx.computePGX] testing genes...
>>> computing gene tests for SINGLE-OMICS
[compute_testGenesSingleOmics] detecting stat groups...
[compute_testGenesSingleOmics] contrasts on groups (use design)
replacing contrast matrix...
[compute_testGenesSingleOmics] pruning unused contrasts
[compute_testGenesSingleOmics] normalizing contrasts
[compute_testGenesSingleOmics] 6 : creating model design matrix
[compute_testGenesSingleOmics] WARNING:: low total counts =  11795.79 
[compute_testGenesSingleOmics] applying global mean scaling to 1e6...
filtering for low-expressed genes: > 1 CPM in >= 2 samples
filtering out 6 low-expressed genes
keeping 85 expressed genes
>>> Testing differential expressed genes (DEG) with methods: ttest.welch trend.limma edger.qlf
[compute_testGenesSingleOmics] 12 : start fitting...
[ngs.fitContrastsWithAllMethods] using input log-expression matrix X...
[ngs.fitContrastsWithAllMethods] fitting using Welch t-test
[ngs.fitContrastsWithAllMethods] fitting using LIMMA trend
[ngs.fitContrastsWithLIMMA] fitting LIMMA contrasts using design matrix
[ngs.fitContrastsWithAllMethods] fitting edgeR using QL F-test 
[ngs.fitContrastsWithEDGER] fitting EDGER contrasts using design matrix
[ngs.fitContrastsWithAllMethods] correcting AveExpr values...
[ngs.fitContrastsWithAllMethods] calculating statistics...
[ngs.fitContrastsWithAllMethods] reshape matrices...
[ngs.fitContrastsWithAllMethods] aggregating p-values...
[compute_testGenesSingleOmics] 13 : fitting done!
            user.self sys.self elapsed user.child sys.child
ttest.welch      0.00     0.00    0.01         NA        NA
trend.limma      0.01     0.02    0.03         NA        NA
edger.qlf        0.34     0.01    0.38         NA        NA
[compute_testGenesSingleOmics] done!
[pgx.computePGX] testing genesets...
Filtering gene sets on size...
Matching gene set matrix...
Reducing gene set matrix...
Error in order(colnames(G)[jj]) : argument 1 is not a vector
In addition: Warning message:
2 very small variances detected, have been offset away from zero

This, therefore, means that x does not exist.

> x
Error: object 'x' not found

implement renv to track dependencies and their versions

Renv should be used here. For instance, I think this may be why the plotly output is so long and weird when using this package sometimes -- because the original code was written with an older version of plotly. Is there an easy way to copy over the renv file from omicsplayground @ivokwee ?

Error doing pgx computation: could not find function "prep"

I get this error when doing background pgx computation from OP.

  processx.1: [pgx.multipleDeconvolution] computing for Cancer type (CCLE)
  processx.1: [DBUG][2023-04-17 11:44:53][5497MB] --- [pgx.deconvolution] calculating DeconRNAseq...
  processx.1: Error in prep(x.data, scale = "none", center = TRUE) : 
  processx.1:   could not find function "prep"

@ncullen93 I saw you mentioned it in the code

## uses pcaMethods::prep

Packages that are required but cant be installed

When running devtools::check() you get a list of packages that are used but not included in the Imports section of the DESCRIPTION file. To add these packages to playbase, you run usethis::use_package("PACKAGE_NAME"). The following packages fail when I try to run that:

  • GREP2
  • base2grob
  • cgdsr
  • rgeolocate
  • inferncv

We should add them manually some other way or rethink their usage.

`plsRcox` is not an exported function from pls

In pgx-predict.R there is a function call of pls::plsRcox but the devtools check complains that this function does not exist in pls. A quick google shows that this function exists in the plsRcox package, so maybe it should be plsRcox::plsRcox instead. Can you look into this @ivokwee ? Maybe the pls method is never used in pgx.survivalVariableImportance so it never errors? Or the pls package once had this function? See below:

  if ("pls" %in% methods) {
    res <- pls::plsRcox(t(X), time = time, event = status, nt = 5)
    summary(res)
    cf <- res$Coeffs[, 1]
    cf[is.na(cf)] <- 0
    cf <- cf * sdx[names(cf)] ## really?
    imp[["pls.cox"]] <- abs(cf) / max(abs(cf), na.rm = TRUE)
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.