Giter Site home page Giter Site logo

rprops / phenoflow_package Goto Github PK

View Code? Open in Web Editor NEW
9.0 6.0 5.0 133.19 MB

R package offering functionality for the advanced analysis of microbial flow cytometry data

License: GNU General Public License v2.0

R 100.00%
flow flow-cytometry diversity

phenoflow_package's People

Contributors

fmkerckhof avatar rprops avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

phenoflow_package's Issues

Error on unix

Build passed Travic CI but failed to install on unix platform:
image

Need to troubleshoot (problem lies with flowCore installation).

Previous mention: http://stackoverflow.com/questions/40721182/error-in-installing-flowcore-package-r

UNIX information:
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core

Everything installs smoothly on cmet server:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Reduce memory footprint of diversity_rf

When multi-threading with diversity_rf for certain samples the memory footprint gets excessive.
We should consider:

  • Profile(profvis) & optimize the code (Rcpp?)
  • Set a system-wide 75% RAM cutoff to avoid crashing systems.

Leinster & Cobbold implementation

A similarity-sensitive diversity metric may be of added value. Phyloseq's data structure allows including a phylogenetic tree within the object, which should be relatively eassy to extract distance information from. However, the question remains: how would you make a similarity-sensitive metric on flowcytometric fingerprint data?

Clean install on win10, R 3.3.3 does not work ==> dependency settings?

Issue lies with flowFDA, flowCore or matrixStats:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'matrixStats'
Error : package 'flowCore' could not be loaded
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Upon installing flowCore, also flowViz needs to be installed still:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'IDPmisc'
Error : package 'flowViz' could not be loaded
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Then, an issue with multcomp appears to be present:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error : package 'TH.data' required by 'multcomp' could not be found
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Which appears to clear the issue.
So, only upon manual installation of: flowViz, flowCore and multcomp, Phenoflow could be installed.

Bootstrap fp_contrasts

Would make it more sound/robust version of current implementation. Downside is that it will be computationally intensive and require more memory due to the inefficiency of foreach.

Supply datasets

The examples section of each command should contain executable code of data that can be loaded from the package.

Error in the wiki: some alpha-diversity components are not calculated

In the wiki, at one point,

### Export ecological data to .csv file in the chosen directory
write.csv2(file="results.metrics.csv",
           cbind(Diversity.fbasis, Evenness.fbasis,
                                          Structural.organization.fbasis,
                 Coef.var.fbasis))

is called.

However, currently the objects Evenness.fbasis, Structural.organization.fbasis, Coef.var.fbasis are not generated in the alpha-diversity section.
In the corresponding vignette, I will remove them from this chunck. However @rprops : how would you like to proceed here? Shall we keep them in the wiki? Should they be added to the vignette?

Rescaling for bandwidth calculations

In the Phenotypic Diversity Analysis wiki at a given time we select maxval <- max(summary[,9]) here the column identifier is largely dependent upon the parameter at that given column (which is FL1-H for the BD accuri C6 but may be completely different for e.g. the BD FACSVerse). Why don't we:

  1. use vector-based rescaling and rescale each parameter with it's largest value? Or adapt mytrans to mytrans <- function(x) x/max(x)?
  2. make this more generic?

Allow group_label in RandomF_FCS()

This would allow creating multiple random forest models for each "group". For example if you have three groups of strains that you would like to distinguish on a strain by strain basis per group. This should be straightforward to implement by combined apply(), and return results in list().

Adding non-CRAN packages

Have to look into how to automatically install non-CRAN packages (flowCore, flowViz, easyGgplot2)

Implement feature importance evaluation

From Caret documentation for RandomForest feature importances:

"Random Forest: from the R package: β€œFor each tree, the prediction accuracy on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two accuracies are then averaged over all trees, and normalized by the standard error. For regression, the MSE is computed on the out-of-bag data for each tree, and then the same computed after permuting a variable. The differences are averaged and normalized by the standard error. If the standard error is equal to 0 for a variable, the division is not done.”"

Issue with masking

Phyloseq is masking some functions from other packages (even though these functions are never used) resulting in a failed Travis build...

image

Add flowAI to imports?

Now that the latest bioconductor release deals with buggy Accuri data, shouldn't we consider making flowAI part of the default packages loaded/installed with Phenoflow?

replace FCS_clean

Replace flowclean approach in FCS_clean with the flowAI one, and call this function in Diversity_rf and RandomF_FCS

tidy up code

install.packages("formatR")
formatR::tidy_dir("R")

install.packages("lintr")
lintr::lint_package()

RandomF_FCS throws warnings when more than 1 dash occurs in the selected params (e.g. PerCP-Cy5.5-A)

RandomF_FCS throws warnings when more than 1 dash occurs in the selected params (e.g. PerCP-Cy5.5-A). This is specifically due to

add_measuredparam <- unique(do.call(rbind, strsplit(param,"-"))[, 2])[1]

The warning that is returned is:

Warning message:
In (function (..., deparse.level = 1)  :  number of columns of result is not a multiple of vector length (arg 1)

Whereas this (luckily) still returns the desired parameter in the current cases this is rather unpredictable behaviour, that we should circumvent. As a suggestion maybe we should rather grep (greedily) or use a final character anchor like .*-[AWH]$ or .*-[A-Z]$ to be more generic?

Implement FCS_pool in RandomF_FCS

This will allow premerge prior to training model if unbalanced data sets per group are provided.

  • FCS_pool on groups
  • FCS_resample downsample to specified number of cells
  • train classifier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.