The phenoflow_package from rprops

Growth curve analysis by FCM

Direct analysis of growth curves from FCM data.

Add option to average across replicates

Error on unix

Build passed Travic CI but failed to install on unix platform:

Need to troubleshoot (problem lies with flowCore installation).

Previous mention: http://stackoverflow.com/questions/40721182/error-in-installing-flowcore-package-r

UNIX information:
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core

Everything installs smoothly on cmet server:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Implement bootstrapping for evenness and So functions

Reduce memory footprint of diversity_rf

When multi-threading with diversity_rf for certain samples the memory footprint gets excessive.
We should consider:

Profile(profvis) & optimize the code (Rcpp?)
Set a system-wide 75% RAM cutoff to avoid crashing systems.

Leinster & Cobbold implementation

A similarity-sensitive diversity metric may be of added value. Phyloseq's data structure allows including a phylogenetic tree within the object, which should be relatively eassy to extract distance information from. However, the question remains: how would you make a similarity-sensitive metric on flowcytometric fingerprint data?

Implement bootstrapping for beta diversity analysis

Growth curve analysis

I can start with this if you can supply some data/guidelines.

Make the package pass CRAN checks

Ultimately, using a repository such as CRAN for the stable release version of our package would be more user friendly.

Add fcs to csv export of flowdata_transformed_all for downstream machine learning

A simple R alternative to python fcsparser:parse (https://github.com/eyurtsev/fcsparser/blob/master/fcsparser/api.py) should be able to do the trick.

ggCyto: ggplot-style plotting for FCM data?

There's a new Bioconductor software package for FCM data online that specifically allows ggplot-style plotting specifically for flow cytometry specific data structures. Might be nice to look into?

Here's the paper: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty441/5026650

Clean install on win10, R 3.3.3 does not work ==> dependency settings?

Issue lies with flowFDA, flowCore or matrixStats:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'matrixStats'
Error : package 'flowCore' could not be loaded
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Upon installing flowCore, also flowViz needs to be installed still:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'IDPmisc'
Error : package 'flowViz' could not be loaded
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Then, an issue with multcomp appears to be present:

* installing *source* package 'flowFDA' ...
** R
** inst
** preparing package for lazy loading
Error : package 'TH.data' required by 'multcomp' could not be found
ERROR: lazy loading failed for package 'flowFDA'
* removing 'C:/Users/fpkerckh/Documents/R/win-library/3.3/flowFDA'

Which appears to clear the issue.
So, only upon manual installation of: flowViz, flowCore and multcomp, Phenoflow could be installed.

Provide options for dealing with minority classes in RandomForest function

Documentation for resampling strategies here

Rename repo

http://r-pkgs.had.co.nz/package.html : see name criteria

Incorporate inext functions Diversity_16S()

include error propagation for replicates

In silico community function

Make function that creates in silico communities from real data of axenic cultures.
in_silico()

Bootstrap fp_contrasts

Would make it more sound/robust version of current implementation. Downside is that it will be computationally intensive and require more memory due to the inefficiency of foreach.

Evaluate effect sample size on Diversity()

Supply datasets

The examples section of each command should contain executable code of data that can be loaded from the package.

Error in the wiki: some alpha-diversity components are not calculated

In the wiki, at one point,

### Export ecological data to .csv file in the chosen directory
write.csv2(file="results.metrics.csv",
           cbind(Diversity.fbasis, Evenness.fbasis,
                                          Structural.organization.fbasis,
                 Coef.var.fbasis))

is called.

However, currently the objects Evenness.fbasis, Structural.organization.fbasis, Coef.var.fbasis are not generated in the alpha-diversity section.
In the corresponding vignette, I will remove them from this chunck. However @rprops : how would you like to proceed here? Shall we keep them in the wiki? Should they be added to the vignette?

Where does TimeChannel come from in RandomF_precit, line 47?

Where is this initialized? Is this a global variable?

Phenoflow_package/R/RandomF_predict.R

Line 47 in 00687dc

    
           filter_param <- filter_param[!filter_param %in% param_f & filter_param!= TimeChannel]

add unit tests

See http://r-pkgs.had.co.nz/tests.html

Make test data directly available + examples for all major functions.

Test package.

Rescaling for bandwidth calculations

In the Phenotypic Diversity Analysis wiki at a given time we select maxval <- max(summary[,9]) here the column identifier is largely dependent upon the parameter at that given column (which is FL1-H for the BD accuri C6 but may be completely different for e.g. the BD FACSVerse). Why don't we:

use vector-based rescaling and rescale each parameter with it's largest value? Or adapt mytrans to mytrans <- function(x) x/max(x)?
make this more generic?

Include function for absolute quantification of OTUs

Adjust vignette/tutorial with RandomForest example

Have made a test dataset from plos paper available in Phenoflow.

add flowclean for QC in default pipeline (on wiki?)

Recently, we saw with the Accuri that in high-throuphut experiments there could occur missed issues on stability of the data. Could we implement a standardized check? We already (briefly) evaluated flowClean (http://bioconductor.org/packages/release/bioc/html/flowClean.html), however it appears to be quite slow.

Suggestions:

We develop our own autocorrelation-based approach
We optimize the flowClean code and include it here

Shouldn't -W be included in the case of FACSVerse data in the build up to filter params for flowAI::autoQC?

See e.g.

Phenoflow_package/R/RandomF_predict.R

Line 44 in 00687dc

    
           param_f <- BiocGenerics::unique(gsub(param, pattern = "-H|-A", replacement = ""))

Allow group_label in RandomF_FCS()

This would allow creating multiple random forest models for each "group". For example if you have three groups of strains that you would like to distinguish on a strain by strain basis per group. This should be straightforward to implement by combined apply(), and return results in list().

Add citation

Add progress tag in Diversity()

Columns in confusion matrix after RandomForest training seem to be swapped around

This makes interpretation counter-intuitive, as the goal for most classification purposes is to achieve a diagonal matrix.

Implement Random forrest in R

This should be fairly straightforward by translating prubbens python code (https://github.com/prubbens/InSilicoFlow/blob/master/insilico.py)
to R with: https://cran.r-project.org/web/packages/randomForest/index.html and https://cran.r-project.org/web/packages/party/index.html

Adding non-CRAN packages

Have to look into how to automatically install non-CRAN packages (flowCore, flowViz, easyGgplot2)

maybe add travis ci build checks?

So that users have confidence that devtools::install_github() will deliver
https://docs.travis-ci.com/user/languages/r/

Parallelize Diversity_16S

Consider using parallel::mclapply to speed up the computation by putting different samples on different nodes.
It could replace the for loop at https://github.com/rprops/Phenoflow_package/blob/master/R/Diversity_16S.R#L44

However:

How cross-platform is this?
It will not speed-up computation per-sample

In the long run using Rcpp would be the more efficient solution here.

Justify text in markdown vignette

Will require custom css file.

Create release branch and tags

When beta release is ready, use tags to create release branch

Implement feature importance evaluation

From Caret documentation for RandomForest feature importances:

"Random Forest: from the R package: “For each tree, the prediction accuracy on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two accuracies are then averaged over all trees, and normalized by the standard error. For regression, the MSE is computed on the out-of-bag data for each tree, and then the same computed after permuting a variable. The differences are averaged and normalized by the standard error. If the standard error is equal to 0 for a variable, the division is not done.”"

Issue with masking

Phyloseq is masking some functions from other packages (even though these functions are never used) resulting in a failed Travis build...

Add flowAI to imports?

Now that the latest bioconductor release deals with buggy Accuri data, shouldn't we consider making flowAI part of the default packages loaded/installed with Phenoflow?

RandomF_FCS error

Random forest on time series fcm data

Implement createTimeSlices for random forest inference on time series FCM data.

replace FCS_clean

Replace flowclean approach in FCS_clean with the flowAI one, and call this function in Diversity_rf and RandomF_FCS

tidy up code

install.packages("formatR")
formatR::tidy_dir("R")

install.packages("lintr")
lintr::lint_package()

Server side reproducibility of diversity_rf?

Problem: different plots from div_rf()

RandomF_FCS throws warnings when more than 1 dash occurs in the selected params (e.g. PerCP-Cy5.5-A)

RandomF_FCS throws warnings when more than 1 dash occurs in the selected params (e.g. PerCP-Cy5.5-A). This is specifically due to

add_measuredparam <- unique(do.call(rbind, strsplit(param,"-"))[, 2])[1]

The warning that is returned is:

Warning message:
In (function (..., deparse.level = 1)  :  number of columns of result is not a multiple of vector length (arg 1)

Whereas this (luckily) still returns the desired parameter in the current cases this is rather unpredictable behaviour, that we should circumvent. As a suggestion maybe we should rather grep (greedily) or use a final character anchor like .*-[AWH]$ or .*-[A-Z]$ to be more generic?

Implement FCS_pool in RandomF_FCS

This will allow premerge prior to training model if unbalanced data sets per group are provided.

FCS_pool on groups
FCS_resample downsample to specified number of cells
train classifier

rprops / phenoflow_package Goto Github PK

phenoflow_package's People

Contributors

Stargazers

Watchers

Forkers

phenoflow_package's Issues

Recommend Projects

Recommend Topics

Recommend Org