Giter Site home page Giter Site logo

accio / bioqc Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 8.0 10.6 MB

Detect tissue heterogeneity in gene expression data with BioQC

Home Page: http://accio.github.io/BioQC/

License: GNU General Public License v3.0

R 81.83% C 12.27% C++ 1.92% Makefile 1.46% Shell 2.52%

bioqc's People

Contributors

accio avatar dtenenba avatar grst avatar hpages avatar idavydov avatar jimhester avatar nturaga avatar planetmdx avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bioqc's Issues

Command-line script for BioQC

For integrating BioQC in automated workflows (such as nfcore/rnaseq) it would be really helpful to have a CLI to BioQC.
Essentially, the script should take

  • a gene expression matrix
  • optionally, a GMT file, or just use BioQC's default if none is specified
  • optionally, a species identifier to automatically remap to orthologous gene symbols (maybe even autodetect that for rat and mouse)

and produce

  • A heatmap with all signatures that are detected in at least one sample
  • A CSV/TSV file with all signature scores

SignedGenesets gives `NaN`s for `valType %in% c("r", "f")`

Hi @Accio ,

I noticed that wmwTest() on SignedGenesets returns NaNs for valTypes "r" and "f". Not sure if that is intentional.

m <- matrix(c(
  0, 0, 0, 0, 0, 5,
  0, 0, 0, 0, 5, 5,
  0, 0, 0, 5, 5, 5,
  5, 5, 0, 0, 0, 0,
  5, 0, 0, 0, 0, 0
), byrow=TRUE, nrow=5)

rownames(m) <- paste0("g", 1:5)
colnames(m) <- paste0("s", 1:6)

sign <- BioQC::SignedGenesets(list(
  list(name="grows left to right", positive = c("g1", "g2", "g3"), negative = c()),
  list(name="grows left to right with neg", positive = c("g1", "g2", "g3"), negative = c("g4", "g5"))
))

BioQC::wmwTest(m, sign, valType="U1")
#>                              s1   s2 s3 s4 s5 s6
#> grows left to right           0  1.5  3  4  5  6
#> grows left to right with neg -6 -3.0  0  2  4  6
BioQC::wmwTest(m, sign, valType="r")
#>                                s1   s2  s3        s4        s5  s6
#> grows left to right            -1 -0.5   0 0.3333333 0.6666667   1
#> grows left to right with neg -Inf -Inf NaN       Inf       Inf Inf
BioQC::wmwTest(m, sign, valType="f")
#>                                s1   s2  s3        s4        s5  s6
#> grows left to right             0 0.25 0.5 0.6666667 0.8333333   1
#> grows left to right with neg -Inf -Inf NaN       Inf       Inf Inf

Created on 2021-08-12 by the reprex package (v2.0.1)

U value inverted?

Hi David,

I noticed that U value computed by wmwTest seems to be inverted. This also results in production of inverted r and f values.

Please find a reprex below.

What do you think is the right solution for this?

  1. Update only behavior of r and r scores?
  2. Also update the U score computation and potentially break compatibility? E.g., with some warning messages?
  3. Add a flag inverted_U=TRUE, which is default?
  4. Something else?
library(BioQC)
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
set.seed(10)
x <- rnorm(10, 1)
y <- rnorm(100, 0)

wilcox.test(x, y, alternative='greater')
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  x and y
#> W = 697, p-value = 0.02052
#> alternative hypothesis: true location shift is greater than 0
# p.value is computed correctly
wmwTest(c(x, y), seq_along(x), valType='p.greater')
#> [1] 0.02052079
# U/W value seems to be inverted
wmwTest(c(x, y), seq_along(x), valType='U')
#> [1] 303
# let's invert and check
wmwTest(c(y, x), seq_along(y), valType='U')
#> [1] 697

Created on 2020-12-07 by the reprex package (v0.3.0)

BioQC::gini cannot handle negative values

By definition, gini can only be calculated on positive values.
The problem can be fixed by shifting/normalizing values.
Alternatively, an error message should be thrown. Atm. a nonsense value is calculated.

Problem with gmt files importet from broadinstitute

I am trying to use a gmt file from broad with wmwTest instead of the gmt file included in the package, but I get Error in FUN(X[[i]], ...) : index must be either integer vector in R. I tried to ways of importing the gmt file as below, none were successful.

gmt <- Sys.glob("E:/files/Rspace/geneset.gmt")
gmt<- read.gmt(gmt)
and
gmt <- Sys.glob("E:/files/Rspace/geneset.gmt")
gmt<- getGmt(gmt)

gmt files can be found here http://software.broadinstitute.org/gsea/downloads.jsp
Any solution?

Use of BioQC package to compute Shannon entropy on transcriptomic data

Hi,

I get really interested by your package as you integrated the metrics included in your paper Martinez and Reyes-Valdès PNAS 2008 (shannon index, diversity and specificity of gene expression). Would you have any tutorial to show how to compute adapted Shannon entropy, gene specificity, specialization of tissues as you define and apply them in your paper ?

Thanks,
Lucile

subsetting a non-existant signature should return an error?

Hi @Accio ,

I noticed that the following code works (and returns an invalid GmtList).

suppressPackageStartupMessages(library(BioQC))
gmtFile <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt <- readGmt(gmtFile)
gmt[c("Adipose_NGS_RNASEQATLAS_0.6_3", "NONEXISTeNT", "Brain_NGS_RNASEQATLAS_0.6_3")]
#> A gene-set list in GMT format with 3 genesets
#> Gene-sets:
#>   Adipose_NGS_RNASEQATLAS_0.6_3 (Roche,n=28): THRSP,KRT14,APOB,...
#> character(0)
#>   Brain_NGS_RNASEQATLAS_0.6_3 (Roche,n=387): MIR219-2,MIR378B,SLC17A6,...

Created on 2022-01-25 by the reprex package (v2.0.1)

Sharing this in case you would like to add some kind of warning in this case. Please feel free to close in case you think it's an expected behavior.

names(GmtList) will remove names

Reported by I. Davydov

library(BioQC)
gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
names(readGmt(gmt_file))
names(GmtList(readGmt(gmt_file)))

Check issues

  • checking for missing documentation entries ... WARNING
    Undocumented code objects:
    ‘IndexList’ ‘SignedIndexList’ ‘matchGenes’ ‘offset<-’ ‘valTypes’
    Undocumented S4 classes:
    ‘SignedIndexList’
    Undocumented S4 methods:
    generic 'SignedIndexList' and siglist 'list'
    generic 'matchGenes' and siglist 'GmtList,character'
    generic 'matchGenes' and siglist 'GmtList,eSet'
    generic 'matchGenes' and siglist 'GmtList,matrix'
    generic 'matchGenes' and siglist 'SignedGenesets,character'
    generic 'matchGenes' and siglist 'SignedGenesets,eSet'
    generic 'matchGenes' and siglist 'SignedGenesets,matrix'
    generic 'matchGenes' and siglist 'character,character'
    generic 'matchGenes' and siglist 'character,eSet'
    generic 'matchGenes' and siglist 'character,matrix'
    generic 'offset<-' and siglist 'IndexList,numeric'
    generic 'offset<-' and siglist 'SignedIndexList,numeric'
    generic 'wmwTest' and siglist 'ANY,list'
    generic 'wmwTest' and siglist 'ANY,logical'
    generic 'wmwTest' and siglist 'ANY,numeric'
    generic 'wmwTest' and siglist 'eSet,GmtList'
    generic 'wmwTest' and siglist 'eSet,SignedIndexList'
    generic 'wmwTest' and siglist 'eSet,list'
    generic 'wmwTest' and siglist 'eSet,logical'
    generic 'wmwTest' and siglist 'eSet,numeric'
    generic 'wmwTest' and siglist 'matrix,GmtList'
    generic 'wmwTest' and siglist 'matrix,IndexList'
    generic 'wmwTest' and siglist 'matrix,SignedIndexList'
    generic 'wmwTest' and siglist 'numeric,IndexList'
    generic 'wmwTest' and siglist 'numeric,SignedIndexList'

  • checking for code/documentation mismatches ... WARNING
    Codoc mismatches from documentation object 'wmwTest':
    wmwTest
    Code: function(object, indexList, ...)
    Docs: function(x, ind.list, valType = c("p.greater", "p.less",
    "p.two.sided", "U", "abs.log10p.greater",
    "log10p.less", "abs.log10p.two.sided", "Q"), simplify
    = TRUE)
    Argument names in code not in docs:
    object indexList ...
    Argument names in docs not in code:
    x ind.list valType simplify
    Mismatches in argument names:
    Position: 1 Code: object Docs: x
    Position: 2 Code: indexList Docs: ind.list
    Position: 3 Code: ... Docs: valType

  • checking Rd \usage sections ... WARNING
    Undocumented arguments in documentation object 'IndexList,list-method'
    ‘keepNA’ ‘keepDup’

Undocumented arguments in documentation object 'IndexList,logical-method'
‘keepNA’ ‘keepDup’

Undocumented arguments in documentation object 'IndexList,numeric-method'
‘keepNA’ ‘keepDup’

Undocumented arguments in documentation object 'as.gmtlist'
‘list’ ‘description’
Documented arguments not in \usage in documentation object 'as.gmtlist':
‘list:’ ‘description:’

Undocumented arguments in documentation object 'matchGenes.signedDefault'
‘signedGenesets’
Documented arguments not in \usage in documentation object 'matchGenes.signedDefault':
‘gmtList’

Undocumented arguments in documentation object 'parseIndex'
‘x’
Documented arguments not in \usage in documentation object 'parseIndex':
‘list’ ‘offset’

Undocumented arguments in documentation object 'readSignedGmt'
‘posPattern’ ‘negPattern’ ‘nomatch’
Documented arguments not in \usage in documentation object 'readSignedGmt':
‘posPattern:’ ‘negPattern:’ ‘nomatch:’

Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.

00check-20161108.txt

Error when running

Hello,

I'm getting this error when running from command line

wmwTest(x = t(data), indexList = (trait==names(table(trait))[1]), valType="p.two.sided", simplify = TRUE).

I don't get the error when I run from Rstudio. The sessioninfo seems to be the same. Have you seen this before ?

Error in .setupMethodsTables(fdef, initialize = TRUE) :
  trying to get slot "group" from an object of a basic class ("NULL") with no slots
Calls: filterfeaturesK ... validityMethod -> as -> .getMethodsTable -> .setupMethodsTables
Execution halted
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /share/apps/R-latest/lib64/R/lib/libRblas.so
LAPACK: /share/apps/R-latest/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] BioQC_1.6.0         Biobase_2.38.0      BiocGenerics_0.24.0
[4] Rcpp_0.12.15

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.