mixomicsteam / mixomics Goto Github PK

Development repository for the Bioconductor package 'mixOmics '

R 94.13% Dockerfile 0.03% TeX 5.19% HTML 0.65%

bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r r-package r-pkg r-project rstats rstats-package

mixomics's People

Contributors

Stargazers

Watchers

Forkers

frohart llrs jonas-hag jdiray cashoes abodein forked-packages blueskypie daniilsarkisyan pjx1990 birsbiointegration clinicopath tuhulab oliver-xie tysoncung aljabadi jkvalentina sdjmchattie mychan24 lsho76 benfeitas khemlalnirmalkar brandonmurugan lemengdong limeng12 clabornd sparg-uk xutongran vincent-van-hoef sheenaseven feigeliudan01 leila-ghalebani hanhanele ning-l zeehio juliedelanote pageneck jamesjiadazhan shicheng-guo hyzhou1990 murugesanraj lafontrapnouiltristan d-morrison topepo keunbae318 trintala aktermitu13 yaohuiding1 brendanlu

mixomics's Issues

installing trouble

I would like to install mixomics but many error messages occured in particular as :
The downloaded source packages are in
‘C:\Users\xx\AppData\Local\Temp\RtmpEHj6U3\downloaded_packages’
Warning messages:
1: In install.packages(...) :
installation of package ‘mixOmics’ had non-zero exit status
2: In install.packages(update[instlib == l, "Package"], l, repos = repos, :
installation of package ‘emmeans’ had non-zero exit status
3: In install.packages(update[instlib == l, "Package"], l, repos = repos, :
installation of package ‘rmarkdown’ had non-zero exit status

I tried to manually install emmeans and rmarkdown but another message occured 👍

install.packages("C:/Users/Emmanuelle Kesse/Desktop/tempo/rmarkdown_1.18.tar.gz", repos = NULL, type = "source")
Installing package into ‘C:/Users/Emmanuelle Kesse/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)

installing source package 'rmarkdown' ...
** package 'rmarkdown' correctement décompressé et sommes MD5 vérifiées
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Erreur fatale : impossible d'ouvrir le fichier 'C:\Users\Emmanuelle' : No such file or directory

ERROR: lazy loading failed for package 'rmarkdown'

removing 'C:/Users/xx/Documents/R/win-library/3.6/rmarkdown'
restoring previous 'C:/Users/Exxx/Documents/R/win-library/3.6/rmarkdown'
Warning in install.packages :
installation of package ‘C:/Users/xx/Desktop/tempo/rmarkdown_1.18.tar.gz’ had non-zero exit status
Could you help me?

Extracting the PLS-DA emmedding function

Hi, I am currently trying to use the PLS-DA function for dimensionality reduction (not classification) and am therefore just interested in extracting the components that it generates. I was wondering if there is an easy way to project new test data into this reduced dimension embedding (the embedding is based only on the training data).

Classification error rate (perf.diablo)

Hi,

When I try to assess global performance and choose the number of components, I'm getting high classification error rates between 0.4-0.55. Also, it is not decreasing by adding more components. What are the possible reasons and how can I solve this? Thank you for the help and package.

Best,

Error in plotting perf results from splsda

Thanks! Please delete the instructions above and fill in the items below:

Describe the bug
When trying to plot the results from a perf I get a error with both my data and the example data. I run these lines for the case study
library(mixOmics)
data(liver.toxicity)
X <- as.matrix(liver.toxicity$gene)
Y <- as.factor(liver.toxicity$treatment[, 4])
plsda.res <- plsda(X, Y, ncomp = 5)
set.seed(2543) # for reproducibility here, only when the `cpus' argument is not used
perf.plsda <- perf(plsda.res, validation = "Mfold", folds = 5,
progressBar = FALSE, auc = TRUE, nrepeat = 10)

perf.plsda.srbct$error.rate # error rates

plot(perf.plsda, col = color.mixo(1:3), sd = TRUE, legend.position = "horizontal")

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Minimal reproducible example, preferably using the fantastic reprex package:

Output of sessionInfo():

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stringr_1.4.0         Hmisc_4.2-0           Formula_1.2-3         survival_2.43-3       org.Mm.eg.db_3.7.0    AnnotationDbi_1.44.0  IRanges_2.16.0       
 [8] S4Vectors_0.20.1      Biobase_2.40.0        BiocGenerics_0.28.0   biomaRt_2.38.0        clusterProfiler_3.8.1 heatmap3_1.1.6        gplots_3.0.1.1       
[15] edgeR_3.22.5          limma_3.36.5          mixOmics_6.6.2        lattice_0.20-38       MASS_7.3-51.1         ggplot2_3.1.0         FSA_0.8.22           
[22] pander_0.6.3          png_0.1-7             xtable_1.8-4          knitr_1.21           

loaded via a namespace (and not attached):
  [1] backports_1.1.3     fastmatch_1.1-0     plyr_1.8.4          igraph_1.2.2        lazyeval_0.2.1      splines_3.5.1       BiocParallel_1.16.6 urltools_1.7.3     
  [9] digest_0.6.18       htmltools_0.3.6     GOSemSim_2.8.0      viridis_0.5.1       GO.db_3.7.0         gdata_2.18.0        magrittr_1.5        checkmate_1.9.1    
 [17] memoise_1.1.0       cluster_2.0.7-1     fastcluster_1.1.25  graphlayouts_0.5.0  matrixStats_0.54.0  rARPACK_0.11-0      enrichplot_1.2.0    prettyunits_1.0.2  
 [25] colorspace_1.4-0    blob_1.1.1          ggrepel_0.8.0       xfun_0.4            dplyr_0.8.3         crayon_1.3.4        RCurl_1.95-4.12     jsonlite_1.6       
 [33] glue_1.3.0          polyclip_1.10-0     gtable_0.2.0        UpSetR_1.4.0        scales_1.0.0        DOSE_3.8.2          DBI_1.0.0           Rcpp_1.0.2         
 [41] viridisLite_0.3.0   progress_1.2.0      htmlTable_1.13.1    gridGraphics_0.4-1  foreign_0.8-71      bit_1.1-14          europepmc_0.3       htmlwidgets_1.3    
 [49] httr_1.4.0          fgsea_1.8.0         RColorBrewer_1.1-2  acepack_1.4.1       pkgconfig_2.0.2     XML_3.98-1.16       farver_1.1.0        nnet_7.3-12        
 [57] locfit_1.5-9.1      ggplotify_0.0.4     tidyselect_0.2.5    labeling_0.3        rlang_0.4.0         reshape2_1.4.3      munsell_0.5.0       tools_3.5.1        
 [65] RSQLite_2.1.1       ggridges_0.5.1      evaluate_0.12       yaml_2.2.0          bit64_0.9-7         tidygraph_1.1.2     caTools_1.17.1.1    purrr_0.2.5        
 [73] ggraph_2.0.0        DO.db_2.9           xml2_1.2.0          compiler_3.5.1      rstudioapi_0.9.0    curl_3.3            tibble_2.0.1        tweenr_1.0.1       
 [81] stringi_1.2.4       RSpectra_0.13-1     Matrix_1.2-15       pillar_1.3.1        BiocManager_1.30.7  triebeard_0.3.0     data.table_1.12.0   cowplot_0.9.4      
 [89] bitops_1.0-6        corpcor_1.6.9       qvalue_2.14.1       R6_2.3.0            latticeExtra_0.6-28 KernSmooth_2.23-15  gridExtra_2.3       gtools_3.8.1       
 [97] assertthat_0.2.0    withr_2.1.2         hms_0.4.2           rpart_4.1-13        tidyr_0.8.2         rmarkdown_1.11      rvcheck_0.1.5       ggforce_0.3.1      
[105] base64enc_0.1-3     ellipse_0.4.1

inconsistent error rates when using perf.mint.splsda and tune.mint.splsda

The performance of the tune.mint.splsda model at optimum hyperparameters:

data(stemcells)
X = stemcells$gene
Y = stemcells$celltype
study <- stemcells$study
tune.mint = tune.mint.splsda(X = X, Y = Y, study = study, ncomp = 2, test.keepX = seq(1, 100, 5),
                 dist = "max.dist", progressBar = FALSE)
plot(tune.mint)

Should be similar to that of perf.mint.splsda using the same hyperparameters:

mint.splsda.res = mint.splsda(X = X, Y = Y, study = study, ncomp = 2,
                              keepX = tune.mint$choice.keepX)

mint.splsda.res # lists useful functions that can be used with a MINT object

perf.mint = perf.mint.splsda(mint.splsda.res, progressBar = FALSE, dist = 'max.dist')

plot(perf.mint)

A possible solution is to ensure LOGOCV and perf.mint.splsda (and possibly other perf functions) call the same internal that does dev/test on studies and then make sure the outputs are identical as well.

plot.tune.rcc instead of image.tune.rcc

Dear mixOmics team,

it would be great to have the function plot.tune.rcc instead of image.tune.rcc. Then it is easier to plot objects with the class tune.rcc, and you already have similar functions for the classes tune.splsda and tune.block.splsda in your package. If wanted, I'm happy to provide a pull request.

Thanks
Jonas

Feature request: add default values for non-compulsory arguments (e.g., keepX, keepY)

In sparse methods, e.g., sPLD-DA (splsda()), could default values be set for arguments, that are not compulsory? In particular, could it be: keepX = NULL, keepY = NULL? It would be easier to figure out, which arguments are really the most important and it would prevent from syntax checking and linting warning like this one:

Possible solution. As I understand, to enable this functionality, it would be sufficient to change some lines in internal function Check.entry.pls(). E.g., instead of:

    if (missing(keepX)) {

these lines could be used:

    if (missing(keepX) || is.null(keepX)) {

It's crucial to use || here and not | as the later will result in error, if keepX is missing.
The same idea could be applied for keepY.

Is there any reason why this functionality could not be implemented in mixOmics?

Predictions with DIABLO: failure if old and new data set don't have the same set of markers

If the data generated by DIABLO is tested then, with predict on a different data set that does not have the same genes and / or other markers like miRNA (an example is TCGA which uses very old miRNA annotations from Mirbase), the prediction fails with a rather cryptic error.

A look at the code shows that this is done by purpose:

stop("Each 'newdata[[i]]' must include all the variables of 'object$X[[i]]'")

However this can be problematic when one wants to compare data obtained from different platforms: for example we have predictions made with DIABLO on gene expression and miRNA microarrays built later than what TCGA has, and thus testing the DIABLO prediction in TCGA itself is not possible.

Add AUC output to perf.diablo

Currently perf.diablo does take auc=TRUE/FALSE argument but does not calculate auc

Mixomics on R

Hi I have downloaded R version 3.5.2 and since tried to install the mixomics package using the function: install.packages("mixOmics")

In response I get this error: "Warning in install.packages : package ‘mixOmics’ is not available (for R version 3.5.2)"

Do I need a older version of R or what do I do?

Thanks,

Shimon

cim heatmap values

Hi mixOmics team,

Firstly, thank very much for curating / distributing the package. It's really useful and the graphics are spot on.

I'm particularly fond of the clustered image heatmaps but I have noticed one issue - perhaps you could suggest out a workaround or point out any thing I've missed. When using 'cim' to illustrate correlations between features and samples from a centered log ratio transformed feature table I get a bizarre scale for the colour key. (see below).

Where 'combine_otu' is a log2 transformed, normalised feature table and 'Y_clin' is a character string defining subject class:
combine.plsda = plsda(X = t(combine_otu), Y_clin, ncomp = 3, logratio = 'CLR')
heat = cim(combine.splsda, cluster = "both", transpose = T, center = T)

Produces a 'cim' plot where the colour key scales from -2.61 to 2.61.

I'm assuming this is due to the arbitrary scaling value produced by the log ratio transformed data but would make much more sense if I could rescale this to between -1 and 1.

I can see no way to perform this rescaling within the 'cim' function call so tried to rescale the splsda matrix myself using the 'scales' package:
combine.splsda$X = rescale(combine.splsda$X, to = c(-1, 1))
Producing the error:
Error in UseMethod("rescale") : no applicable method for 'rescale' applied to an object of class "clr"

So I tried to coerce the matrix to a data frame to allow me to use rescale:
splsda.mat = as.matrix(combine.splsda$X)
splsda.mat = as.data.frame(splsda.mat)
Which also produced an error:
Error in as.data.frame.default(splsda.mat) : cannot coerce class "clr" to a data.frame

Any help would be massively appreciated!

Many thanks,
Greg

Similarity matrix / association between variables as output

As requested by several users, we need to extract the similarity matrices from PLS / CCA / DIABLO objects via plotVar and circosPlot.

Legend: plotIndiv overrides levels of factor

Notice that the levels of the factor "Family" in the des.df data.frame are 1, 2, ..., 10, in that order. However, plotIndiv() ignores this and reorders the levels as 1,10,2,3,...,9. Can this be corrected?

`
des.df$Family <- factor(des.df$Family, levels = 1:10)

levels(des.df$Family)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"

plotIndiv(pca.result, ind.names = FALSE,
group = des.df$Treatment, pch = des.df$Family,
legend = TRUE, legend.title = 'CO2 Treatment',
legend.title.pch = "Family")
`

Error in n() : could not find function "n"

Hi,

I am new to mixOmics and I was following the tutorial from:
http://mixomics.org/mixdiablo/case-study-tcga/

I downloaded the code and I am getting this problem:

perf.diablo = perf(sgccda.res, validation = 'Mfold', folds = 10, nrepeat = 10)
Error in n() : could not find function "n"

Here is my sessionInfo:

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mixOmics_6.6.1 ggplot2_3.1.0 lattice_0.20-38 MASS_7.3-51.1 knitr_1.21

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 RSpectra_0.13-1 pillar_1.3.1 compiler_3.5.1 RColorBrewer_1.1-2
[6] plyr_1.8.4 tools_3.5.1 tibble_2.0.1 gtable_0.2.0 pkgconfig_2.0.2
[11] rlang_0.3.1 Matrix_1.2-15 igraph_1.2.4 rstudioapi_0.9.0 yaml_2.2.0
[16] parallel_3.5.1 xfun_0.5 gridExtra_2.3 withr_2.1.2 dplyr_0.8.0.1
[21] stringr_1.4.0 grid_3.5.1 tidyselect_0.2.5 glue_1.3.0 ellipse_0.4.1
[26] R6_2.4.0 rARPACK_0.11-0 purrr_0.3.0 tidyr_0.8.2 reshape2_1.4.3
[31] corpcor_1.6.9 magrittr_1.5 scales_1.0.0 matrixStats_0.54.0 assertthat_0.2.0
[36] colorspace_1.4-0 stringi_1.3.1 lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4

Thanks,
João

fix/update website links

Form Bitbucket.

block.methods consensus plot

Could we add the option for the multi omics integration methods (2+ datasets)

rep.space = 'consensus'

This would be the average of the components across all data sets.

Thanks Al :)

mint.splsda variates not to have space in column names

Hi Florian,

Since some might want to use the outputs of mint.splsda for their own analyses, I think it is a good idea to replace space with an underscore in column names to avoid using backticks in ggplot2 - such as the last line below.

library(mixOmics)
data(stemcells)

# -- feature selection
res = mint.splsda(X = stemcells$gene, Y = stemcells$celltype, ncomp = 3, keepX = c(10, 5, 15),
                  study = stemcells$study)

variates = as.data.frame(res$variates$X)
ggplot(variates, aes(`comp 1`, `comp 2`)) + geom_point()

memory issue when applying PCA/NIPALS

Hi dear MixOmics community,
Using pca with NIPALS algo (X has NA values!), I get a memory issue. I must say I have a X matrix with 78,000 variables (and 700 observations).
The call to mixOmics::pca() gives the following message :
Error: cannot allocate vector of size 45.9 Gb
Is there a way to tune the method to have it use (much) less memory ?
Thanks,
Philippe

users to be able to pass `alpha` to the internal `t.test.process` in `tune` and `perf`

In the following:

# Here is the code
data(stemcells)

# the combined data set X
X = stemcells$gene
dim(X)

# the outcome vector Y: 
Y = stemcells$celltype
length(Y)
summary(Y)
study <- stemcells$study
# tuning sPLS-da
tune.mint = tune.mint.splsda(X = X, Y = Y, study = study, ncomp = 2, test.keepX = seq(1, 100, 5), dist = "max.dist", progressBar = FALSE)
plot(tune.mint)

tune.mint$choice.ncomp

Allow for specifying a significance threshold other than the default 0.01 for improvement threshold under which recommendation of ncomp=2 would also make sense.

R2Y, Q2Y and permutation P value for plsda

Is it possible to generate a figure similar to the one below which was produced using R package ropls.

error when running tune.block.splsda

Describe the bug
The function tune.block.splsda doesn't work; an error is returned regarding parallelization. This also occurs if the argument cpus is set to a number larger than 2.

Using the included data set:

> library(mixOmics)
> data("breast.TCGA")
>
> # this is the X data as a list of mRNA and miRNA; the Y data set is a single data set of proteins
> data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
+ protein = breast.TCGA$data.train$protein)
> # set up a full design where every block is connected
> # could also consider other weights, see our mixOmics manuscript
> design = matrix(1, ncol = length(data), nrow = length(data),
+ dimnames = list(names(data), names(data)))
> diag(design) =  0
> design
        mrna mirna protein
mrna       0     1       1
mirna      1     0       1
protein    1     1       0
> # set number of component per data set
> ncomp = 5
> test.keepX = list(mrna = seq(10,40,20), mirna = seq(10,30,10), protein = seq(1,10,5))
> 
> # the following may take some time to run, note that for through tuning
> # nrepeat should be > 1
> tune = tune.block.splsda(X = data, Y = breast.TCGA$data.train$subtype,
+ ncomp = ncomp, test.keepX = test.keepX, design = design, nrepeat = 3)

You have provided a sequence of keepX of length: 2 for block mrna and 3 for block mirna and 2 for block protein.
This results in 12 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
Error in if (cpus < 2) { : argument is of length zero
> 
> sessionInfo()
R version 3.6.1 Patched (2019-09-14 r77192)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mixOmics_6.10.2 ggplot2_3.2.1   lattice_0.20-38 MASS_7.3-51.4  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         RSpectra_0.15-0    pillar_1.4.2       compiler_3.6.1    
 [5] RColorBrewer_1.1-2 plyr_1.8.4         tools_3.6.1        zeallot_0.1.0     
 [9] tibble_2.1.3       lifecycle_0.1.0    gtable_0.3.0       pkgconfig_2.0.3   
[13] rlang_0.4.1        Matrix_1.2-17      igraph_1.2.4.1     parallel_3.6.1    
[17] gridExtra_2.3      withr_2.1.2        dplyr_0.8.3        stringr_1.4.0     
[21] vctrs_0.2.0        grid_3.6.1         tidyselect_0.2.5   glue_1.3.1        
[25] ellipse_0.4.1      R6_2.4.1           rARPACK_0.11-0     tidyr_1.0.0       
[29] purrr_0.3.3        reshape2_1.4.3     corpcor_1.6.9      magrittr_1.5      
[33] scales_1.0.0       backports_1.1.5    matrixStats_0.55.0 assertthat_0.2.1  
[37] colorspace_1.4-1   stringi_1.4.3      lazyeval_0.2.2     munsell_0.5.0     
[41] crayon_1.3.4      
>

Thanks for having a look at this.
Guido

MINT-PLS -> Not finished :'(

Hello,

I am a PhD student at the end of my thesis, in order to make an article I am using your "MINT-PLS" tool on my data from several experiments using a Y variable as a continuous variable: the weight. The function mint.pls works well and allows me to make a plot of my samples with the function plotIndiv. However, I would like to be able to do everything you do on http://mixomics.org/mixmint/stemcells-example/ as testing the performance of the model but with a continuous variable. I also want to use the predict function on a test dataset as explained here: http://mixomics.org/methods/pls-da/. Unfortunately when I want to use the function perf.mint.pls I get this error message telling me that the function is not ready yet:
Error in perf.mint.pls (res, validation = "Mfold", folds = 5, progressBar = FALSE,:
Yet to be implemented
Do you know when it will be ready please? The predict function does not work too:
Error in predict.mixo_pls (res, methyl_data, dist = "max.dist"):
'study.test' is missing

If these features are not ready soon, do you know another way to test the performance and use the predict function on a model from the mint.pls function with a continuous variable in Y?

Regards,
Jérémy

UNH, UMR1019, Equipe iMPROVINg
INRA, Centre de Clermont Ferrand-Theix
Site de Theix
63122 Saint Genès Champanelle
Tel: 04 73 62 49 39

Add background.predict support for mint.splsda

Hi,

trying to perform this:

data(stemcells)
res = mint.splsda(X = stemcells$gene, Y = stemcells$celltype, ncomp = 3,
                  study = stemcells$study)


background = background.predict(res, comp.predicted = 2, dist = "max.dist")

plotIndiv(res, background = background, legend = TRUE, alpha.background = 0.3 )

I get the following error:

Error in matrix(attr(Y, "scaled:center"), nrow = nrow(t.pred[[1]]), ncol = q, :
'data' must be of a vector type, was 'NULL'

I tried to debug it and found out that the mean_centering_per_study function expects "scaled:center" and "scaled"scale" attributes in res$ind.mat which in MINT case have different names and are on a per-study basis: mean:[study-index] and sigma:[study-index]. I was thinking maybe defining a pre-study max.dist and take the max of those for membership would be a good idea but I wasn't sure and the code is a bit hard to follow without comments. Would appreciate if you could add it to the mixOmics.

plot(tuned.mint.splsda) 's legend blocks the plot.

Hi Florian,

Occasionally, when trying to visualise the tune function's output, an important part of the plot is hidden under the legend ( see attached). It would be nice if the legend was somehow outside the main plot.

Extract VCV matrix?

Hello, sorry if this isn't the appropriate place to post this question, but I'd like to extract a variance-covariance matrix between the responses in my model. Are there methods to do this in mixOmics from a PLS similar to what is available is plsdof?

Thanks for a great package!

keep order of the X variables when plotting loadings from a pls-da model?

I would really like to have the option to keep the order of the covariates in the X matrix when using plotLoadings. Currently, the variables seem to be sorted by order of importance.

No 3D Plot Available

Hi all,

I have been having problems with getting the 3D plot for either the PCA or the PLSDA Analysis.

At first i was not aware that the package rgl, so i proceeded to install it in R. After, I realized that it needed xQuartz, which i proceeded to download as a tar.gz and via brew.

I have the current setup:

Macbook Pro

-R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

-> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rgl_0.99.16

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 digest_0.6.18 later_0.7.5 mime_0.6 R6_2.3.0
[6] xtable_1.8-3 jsonlite_1.6 magrittr_1.5 miniUI_0.1.1.1 promises_1.0.1
[11] webshot_0.5.1 tools_3.5.2 manipulateWidget_0.10.0 htmlwidgets_1.3 crosstalk_1.0.0
[16] shiny_1.2.0 httpuv_1.4.5.1 xfun_0.4 compiler_3.5.2 htmltools_0.3.6

-xQuartz 2.7.11 (xorg-server 1.18.4)

If i check the display on the terminal i get the following info:

jsgalan-gray:lib jsgalan$ echo $DISPLAY
/private/tmp/com.apple.launchd.8XHZxzhZOo/org.macosforge.xquartz:0

I have tried to run rgl examples (basically the demo(rgl) and i get the following:

Which crashes R and shows the following error:

I am just wondering if:

1- anyone else has encountered/solved this issue (which is mostly rgl related)
2- if there are other packages or options that could be passed to the mixomics library to get the 3D plots without using the rgl package.

Best

plotLoadings margin problem

problem

Trying to do a plotLoading of a diablo object, where the feature names are too long we get an Error:

suppressMessages(library(mixOmics))

data(nutrimouse)
Y = nutrimouse$diet
gene = nutrimouse$gene
lipid = nutrimouse$lipid
## extend feature names
suff <- "-a-long-suffix-from-abolutely-nowhere-which-is-gonna-be-longer-than-margins"
colnames(gene) <- paste0(colnames(gene), suff)
colnames(lipid) <- paste0(colnames(lipid), suff)
data = list(gene = gene, lipid = lipid)
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

nutrimouse.sgccda = block.splsda(X = data,
                                 Y = Y,
                                 design = design,
                                 keepX = list(gene = c(10,10), lipid = c(15,15)),
                                 ncomp = 2,
                                 scheme = "centroid")
plotLoadings(nutrimouse.sgccda, contrib = "max")
#> Error in plot.new(): figure margins too large

cause

traceback shows:

traceback()

5: plot.new()
4: barplot.default(df$importance, horiz = TRUE, las = 1, col = df$color, 
       axisnames = TRUE, names.arg = colnames.X, cex.names = size.name, 
       cex.axis = 0.7, beside = TRUE, border = border, xlim = xlim[i, 
           ])
3: barplot(df$importance, horiz = TRUE, las = 1, col = df$color, 
       axisnames = TRUE, names.arg = colnames.X, cex.names = size.name, 
       cex.axis = 0.7, beside = TRUE, border = border, xlim = xlim[i, 
           ]) at plotLoadings.splsda.R#166
2: plotLoadings.sgccda(nutrimouse.sgccda, contrib = "max") at plotLoadings.R#28
1: plotLoadings(nutrimouse.sgccda, contrib = "max")

Even though internally we try to accommodate for length of variables in plot layout, but currently there's no limit in size of the variables which can be long in some datasets. The error is also not informative at its current form.

remedy

A possible solution is to trim long variable names only in plots with a message. As a least-impact remedy, we can simply change the colnames.X passed to barplot functions in various functions, but it should be done before the layout considers their length in the margin specifications.

plot Arrow

Currently the x- and y-axes of the plotArrow are not explicit enough ('X' and 'Y'), instead, could we match it to the dimension numbers (e.g. 'Dimension xx' and 'Dimension yy').

Thanks!

plot.tune's X axis to have a consistent grid

In the following plot method for tune object, the X axis seems to have a log scale, while often people use a linear grid with fine tuning. @mixOmicsTeam do you think we can shift it back to linear scale?.

data(breast.tumors)
X = breast.tumors$gene.exp
Y = as.factor(breast.tumors$sample$treatment)
tune = tune.splsda(X, Y, ncomp = 2, test.keepX = c(5, 10, 40),
                   folds = 10, dist = "max.dist", nrepeat = 5, progressBar = TRUE)

plot(tune)

Restore default graphic parameters in image.tune.rcc

Dear mixOmics team,

it would be great if you could implement the automatic restoration of the default graphic parameters after the usage of image.tune.rcc (for example as recommended in help(par). Then one could avoid using par(mfrow=c(1,1)) manually after plotting the tunig results. If wanted, I'm happy to provide a pull request.

Thanks
Jonas

add `return.call=logical` feature to main functions

To address #30. It is better to default on return.call=FALSE to avoid unnecessary excessive memory usage.

plotIndiv(..., ind.names=FALSE) Warning: Removed N rows containing missing values (geom_point)

After upgrading to R 3.6.3, while the following works (with ind.names=TRUE)::

    data(nutrimouse)
    X <- nutrimouse$lipid
    Y <- nutrimouse$gene
    nutri.res <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)
    
    plotIndiv(nutri.res,  group = nutrimouse$genotype, ind.names = TRUE, legend = TRUE)

The following does not (with ind.names=FALSE):

    plotIndiv(nutri.res, group = nutrimouse$genotype, ind.names = FALSE, legend = TRUE)

Warning messages:
1: Removed 40 rows containing missing values (geom_point).
2: Removed 40 rows containing missing values (geom_point).

Sudden check-error in DepecheR, because of some linux-chenge to your sPLS-DA function

Click the "Preview" tab above to see this in HTML ^^^

By submitting this issue/request, I confirm that:

I have checked that I have the latest version installed, as described here.
I have searched the live NEWS file to see if it has been fixed in devel already. If so, I tried the latest devel version
I am aware that for analysis help I can refer to Discourse to see if my question is already answered or to submit a new one.
I am aware that if I get certain behaviour only using my data and not mixOmics datasets, it is best to send a fully confidential email with the code and (possibly a subset of) the data that reproduce the error to the following email (changing [at] and [dot] to @ and .): mixomics[at]math[dot]univ-toulouse[dot]fr

Thanks! Please delete the instructions above and fill in the items below:

Describe the bug
Dear maintainers,
I am the maintainer of DepecheR, that imports the sPLSDA function from mixOmics. Yesterday, I suddenly got a notice from the automatic build reports, that an check-error had happened on the Malbec1 test platform on BioConductor. See the build report here:
http://bioconductor.org/checkResults/3.10/bioc-LATEST/DepecheR/
It still works for merida1 and tokay1, and as I run mac myself, I will struggle to reproduce the problem locally. I hope this is due to some new feature, and that it is an easy fix for you, because I very much like using the sPLS-DA function in its current form!
Best regards
Jakob Theorell

Expected behavior
See the test results for the merida1 and tokay1 runs.

Screenshots
If applicable, add screenshots to help explain your problem.

Minimal reproducible example, preferably using the fantastic reprex package:
See the test module in the DepecheR package.

Output of sessionInfo():

sessionInfo()

Specify ylim when plotting perf

Hi,

I would like to specify the range of performance plot. Setting ylim does not work. In fact the plot.perf calls internal_graphic.perf which computes the perf range on its own.

I will go to a workaround but it sounds good to set ylim in a next release.

Best.

Include verbose argument to tune.rcc

Dear mixOmics team,

it would be great if the tune.rcc functions includes a verbose argument to turn off the automatic printing of the lambda values / CV-score. If wanted, I'm happy to provide a pull request.

Thanks
Jonas

plotIndiv error

Hi, I am running mixOmics using R v3.6.2. I was trying to plot the PLSDA plot using plotIndiv. I realised that when I set ind.names = FALSE, the symbol for the score plot disappeared.

The following error message was given:

Warning messages:
1: Removed 16 rows containing missing values (geom_point).
2: Removed 16 rows containing missing values (geom_point).

I am using the example given in the tutorial. The same script was working fine before I update my R and re-installed all packages (including ggplot2).

Thank you.

`perf.plsda` error: object 'test.keepX' not found

Original issue on Bitbucket.

unusual similarity measure outputs by some functions

Context

looking at the following analysis:

library(mixOmics)
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.pls <- pls(X, Y, ncomp = 2)

cim_res <- cim(nutri.pls, xlab = "genes", ylab = "lipids", 
               comp = 1,)

We get invalid correlation values:

sum(abs(cim_res$mat.cor)>1)
#> 52

Possible reason

In many instances such as cim, network, and circosPlot, we do something like ( from network)

cord.X = cor(mat$X, bisect, use = "pairwise")
cord.Y = cor(mat$Y, bisect, use = "pairwise")
mat = cord.X %*% t(cord.Y)

Which at times outputs elements > 1 (or < 1) as similarity measures.

I was wondering:

What is the reference that outlines as to why cord.X %*% t(cord.Y) is a similarity measure b/w X and Y?
What is the interpretation of this similarity measure? is it supposed to be of correlation type or some unbound measure? It seems that if we want to use the cosine angle b/w the vectors as similarity measure, we possibly need to standardise the inner product somehow?

I opened this issue so we can discuss the references used to develop this code and to come to a conclusion about this.

Enable Travis-CI for automatic checking

Please, enable continuous integration service, such as Travis CI (website: https://travis-ci.org), to carry out free automatic checking for mixOmics. In this way, you could know if, e.g., a new pull request breaks the functionality of the package or everything is OK.

The quickest way to enable Travis CI is to use function usethis::use_travis() (link to usethis) in the project of mixOmics. This function will create necessary setup files and will open websites, where you will have to sign in.

I cannot enable this service, as I am not in the "mixOmicsTeam".

plotVar legend to be customisable

Hi, I have been trying to label the blocks in plotVar legend and failed. It only shows 'X' and 'Y' as legends when I put legend = TRUE:

data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.res <- spls(X, Y)
plotVar(nutri.res, legend=T)

It would be great if it was customisable. For example legend argument accepting either FALSE , or TRUE, or a list of character vectors for block names.
Thanks

logratio.transform

Could we have a as.matrix directly inside logratio.transform in the input data?

logratio.transfo(as.matrix(data.filter), logratio = 'CLR')

Thanks!

Error in tune.spls: unused argument (test.keepY = ncol(Y))

From Bitbucket:

Hi, I'm a new user to mixOmics, so I apologize if this is naive. When tune.spls is ran with multiple CPUs an error saying "object 'test.keepY' not found" is thrown. When this is declared explicitly in the tune.spls function e.g tune.spls(X, Y, test.keepX = c(1, 10, 100), test.keepY = ncol(Y)) an unused argument error is thrown (unused argument (test.keepY = ncol(Y))). To avoid these errors test.keepY needs to be declared outside of the function. This seems only to be an issue when multiple cpus are used.

Error in the bookdown version of the Vignette

Hello,

The plots do not represent the code in the Vignette...
https://mixomicsteam.github.io/Bookdown/start.html

installing the mixOmics package

Hi, I am trying to run PLS-DA on my data and have been stuck getting the mixOmics library loaded. I have R 3.6.1 running on OS. I used the following script to install the mixOmics package:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("mixOmics")

when I then try to load the library for mixOmics, I get the following error:

Error: package or namespace load failed for ‘mixOmics’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘RSpectra’

when I try to install.packages("RSpectra") separately, I get the following message on repeat:

In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported/Eigen/SparseExtra:51:
/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
#pragma clang diagnostic pop

What can I do to load get the mixOmics library loaded?

Enable Codecov service

After Travis CI is enabled (#32), please also enable code coverage service, such as Codecov. It shows which lines of the package's code is covered by unit tests. So it helps to identify possible issues in functionality of a package.

The simplest way to enable is to use function usethis::use_coverage("codecov") after Travis CI is enabled.

About Codecov for R (link).

splsda - Error in if (classifier == "lda") { : argument is of length zero

Hello,

I ran your example (both copied from website and downloaded as R.script from the site) and ran into the error below. Any help would be appreciated.

Cheers!

perf.plsda.srbct <- perf(srbct.plsda, validation = "Mfold", folds = 5,

              progressBar = FALSE, nrepeat = 10)

Error in if (classifier == "lda") { : argument is of length zero

traceback()
7: predict.splsda(object.splsda.temp, newdata.scale = X.test, dist = dist,
misdata.all = any(misdata), is.na.X = list(X = is.na.A.train),
is.na.newdata = list(X = is.na.A.test))
6: predict(object.splsda.temp, newdata.scale = X.test, dist = dist,
misdata.all = any(misdata), is.na.X = list(X = is.na.A.train),
is.na.newdata = list(X = is.na.A.test))
5: FUN(X[[i]], ...)
4: lapply(1:M, fonction.j.folds)
3: MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation,
folds = folds, nrepeat = nrepeat, ncomp = comp, choice.keepX = choice.keepX,
test.keepX = test.keepX, measure = measure, dist = dist,
scale = scale, near.zero.var = near.zero.var, auc = auc,
progressBar = progressBar, class.object = class(object),
cl = cl, parallel = parallel, misdata = misdata, is.na.A = is.na.A,
ind.NA = ind.NA, ind.NA.col = ind.NA.col)
2: perf.plsda(srbct.plsda, validation = "Mfold", folds = 5, progressBar = FALSE,
nrepeat = 10)
1: perf(srbct.plsda, validation = "Mfold", folds = 5, progressBar = FALSE,
nrepeat = 10)
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C LC_TIME=English_Canada.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] mixOmics_6.3.1 knitr_1.21 spls_2.2-2 klaR_0.6-14 MASS_7.3-51.1 caret_6.0-81 ModelMetrics_1.2.2 ggplot2_3.1.0
[9] lattice_0.20-38 CMA_1.36.0 Biobase_2.38.0 BiocGenerics_0.24.0 BiocInstaller_1.28.0 plsgenomics_1.5-2 rgl_0.99.16 pls_2.7-0
[17] car_3.0-2 carData_3.0-2

loaded via a namespace (and not attached):
Error in x[["Version"]] : subscript out of bounds
In addition: Warning message:
In FUN(X[[i]], ...) :
DESCRIPTION file of package 'purrr' is missing or broken

#############################Below is the script I ran
data(srbct)
X = srbct$gene #the gene expression data
dim(X)

summary(srbct$class)

------------------------------------------------------------------------

pca.srbct = pca(X, ncomp = 10, center = TRUE, scale = TRUE)
#pca.srbct #outputs the explained variance per component
plot(pca.srbct) # screeplot of the eingenvalues (explained variance per component)

------------------------------------------------------------------------

plotIndiv(pca.srbct, group = srbct$class, ind.names = FALSE,
legend = TRUE, title = 'PCA on SRBCT')

------------------------------------------------------------------------

Y = srbct$class
summary(Y) #outcome categories

------------------------------------------------------------------------

srbct.plsda <- plsda(X, Y, ncomp = 10) # set ncomp to 10 for performance assessment later
plotIndiv(srbct.plsda , comp = 1:2,
group = srbct$class, ind.names = FALSE,
ellipse = TRUE, legend = TRUE, title = 'PLSDA on SRBCT')

------------------------------------------------------------------------

with background

background = background.predict(srbct.plsda, comp.predicted=2, dist = "max.dist")
#optional: xlim = c(-40,40), ylim = c(-30,30))

plotIndiv(srbct.plsda, comp = 1:2,
group = srbct$class, ind.names = FALSE, title = "Maximum distance",
legend = TRUE, background = background)

------------------------------------------------------------------------

takes a couple of minutes to run

set.seed(2543) # for reproducibility, only when the `cpus' argument is not used
perf.plsda.srbct <- perf(srbct.plsda, validation = "Mfold", folds = 5,
progressBar = FALSE, auc = TRUE, nrepeat = 10)

Method plot() fails for "perf", "perf.plsda.mthd"

Plotting method for an object of classes "perf", "perf.plsda.mthd" fails with error: Error in plot.window(...): need finite 'xlim' values. Consider the example from the manual http://mixomics.org/methods/pls-da/

library(mixOmics)
#> Loading required package: MASS
#> Loading required package: lattice
#> Loading required package: ggplot2
#> 
#> Loaded mixOmics 6.8.0
#> 
#> Thank you for using mixOmics! Learn how to apply our methods with our tutorials on www.mixOmics.org, vignette and bookdown on  https://github.com/mixOmicsTeam/mixOmics
#> Questions: email us at mixomics[at]math.univ-toulouse.fr  
#> Bugs, Issues? https://github.com/mixOmicsTeam/mixOmics/issues
#> Cite us:  citation('mixOmics')

data(liver.toxicity)
X <- as.matrix(liver.toxicity$gene)
Y <- as.factor(liver.toxicity$treatment[, 4])             

## PLS-DA function
plsda.res <- plsda(X, Y, ncomp = 5) # where ncomp is the number of components wanted


# this code takes ~ 1 min to run
set.seed(2543) # for reproducibility here, only when the `cpus' argument is not used
perf.plsda <- perf(plsda.res, validation = "Mfold", folds = 5, 
                   progressBar = FALSE, auc = TRUE, nrepeat = 10) 
# perf.plsda.srbct$error.rate  # error rates

plot(perf.plsda, col = color.mixo(1:3), sd = TRUE, legend.position = "horizontal")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Error in plot.window(...): need finite 'xlim' values

^{Created on 2019-07-15 by the reprex package (v0.3.0)}

I use Windows 10 and R 3.6.1.

Further customisable `auroc`

Description on Bitbucket.

Export logratio.transformed data

Click the "Preview" tab above to see this in HTML ^^^

By submitting this issue/request, I confirm that:

I have checked that I have the latest version installed, as described here.
I have searched the live NEWS file to see if it has been fixed in devel already. If so, I tried the latest devel version
I am aware that for analysis help I can refer to Discourse to see if my question is already answered or to submit a new one.
I am aware that if I get certain behaviour only using my data and not mixOmics datasets, it is best to send a fully confidential email with the code and (possibly a subset of) the data that reproduce the error to the following email (changing [at] and [dot] to @ and .): mixomics[at]math[dot]univ-toulouse[dot]fr

Thanks! Please delete the instructions above and fill in the items below:

Describe the bug
A clear and concise description of what the bug is.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Minimal reproducible example, preferably using the fantastic reprex package:

Output of sessionInfo():

sessionInfo()

mixomicsteam / mixomics Goto Github PK

mixomics's People

Contributors

Stargazers

Watchers

Forkers

mixomics's Issues

perf.plsda.srbct$error.rate # error rates

Regards, Jérémy

problem

cause

remedy

Context

Possible reason

------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------

with background

------------------------------------------------------------------------

takes a couple of minutes to run

Recommend Projects

Recommend Topics

Recommend Org

Regards,
Jérémy