Giter Site home page Giter Site logo

ohdsi / cyclops Goto Github PK

View Code? Open in Web Editor NEW
37.0 43.0 32.0 9.89 MB

Cyclops (Cyclic coordinate descent for logistic, Poisson and survival analysis) is an R package for performing large scale regularized regressions.

Home Page: http://ohdsi.github.io/Cyclops/

R 8.00% C++ 81.23% CMake 1.86% C 0.91% Cuda 7.85% Shell 0.02% Perl 0.01% Dockerfile 0.13%
hades

cyclops's Introduction

Cyclops

Build Status codecov.io CRAN_Status_Badge CRAN_Status_Badge

Cyclops is part of the HADES.

Introduction

Cyclops (Cyclic coordinate descent for logistic, Poisson and survival analysis) is an R package for performing large scale regularized regressions.

Features

  • Regression of very large problems: up to millions of observations, millions of variables
  • Supports (conditional) logistic regression, (conditional) Poisson regression, as well as (conditional) Cox regression
  • Uses a sparse representation of the independent variables when appropriate
  • Supports using no prior, a normal prior or a Laplace prior
  • Supports automatic selection of hyperparameter through cross-validation
  • Efficient estimation of confidence intervals for a single variable using a profile-likelihood for that variable

Examples

  library(Cyclops)
  cyclopsData <- createCyclopsDataFrame(formula)
  cyclopsFit <- fitCyclopsModel(cyclopsData)

Technology

Cyclops in an R package, with most functionality implemented in C++. Cyclops uses cyclic coordinate descent to optimize the likelihood function, which makes use of the sparse nature of the data.

System Requirements

Requires R (version 3.1.0 or higher). Compilation on Windows requires RTools >= 3.4.

Installation

In R, to install the latest stable version, install from CRAN:

install.packages("Cyclops")

To install the latest development version, install from GitHub. Note that this will require RTools to be installed.

install.packages("devtools")
devtools::install_github("OHDSI/Cyclops")

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available:

Support

Contributing

Read here how you can contribute to this package.

License

Cyclops is licensed under Apache License 2.0. Cyclops contains the TinyThread libray.

The TinyThread library is licensed under the zlib/libpng license as described here.

Development

Cyclops is being developed in R Studio.

Acknowledgements

  • This project is supported in part through the National Science Foundation grants IIS 1251151 and DMS 1264153.

cyclops's People

Contributors

chrisknoll avatar erickawaguchi avatar jianxiaoyang avatar kalibera avatar lhjohn avatar msuchard avatar schuemie avatar sushilmittal avatar yuxitian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cyclops's Issues

(Solved) 'Rcpp_precious_remove' not provided by package 'Rcpp'

Not a Cyclops bug, but did encounter it while running Cyclops. The error Rcpp_precious_remove' not provided by package 'Rcpp' is thrown when calling convertToCyclopsData() with certain versions of Rcpp. Oddly, some variants of Rcpp 1.0.6 are fine, while other variants (with same version number but different MD5) are not.

The solution seems to be to upgrade to Rcpp 1.0.7.

Use `futile.logger` to within `Cyclops`

Tasks (this is the usual contract):

  • Refactor message-level check into *Logger class. May address this later, since Cyclops only constructs strings if message-level <= threshold for efficiency.
  • Delegate printing of everything to flog commands

Alternative (ignoring usual contract for efficiency):

  • Pass message-level from currently active flog
  • Delegate printing to flog command of appropriate level

Note: this alternative is probably what is already happening by wrapping Rcpp::Rcout output

Embarrassing parallelization of cross-validation folds

Enable parallelization of model fitting during cross-validation via c++11 threads. Tasks are large enough where nothing fancy (a pool, fine-scale control, etc.) should be necessary. In terms of coding:

  • Refactor across-fold at grid-point computing into a lambda
  • Pass lambda to accumulate
  • Delegate accumulate to std::accumulate or threaded version

Each instance would have own CyclicCoordinateDescent (so this needs a copy constructor) and JointPrior (also needs a copy constructor), but share:

  • ModelData (very large)

Small differences in prediction between operating systems

Just trying to understand: we're fitting a not-so-large logistic regression (for a propensity score) using the exact same data on two platforms (Windows and Linux). The fitted model coefficients are identical, but there are tiny differences in the predicted propensity scores. The maximum difference between PS is 9.99e-16. (Ironically, this leads to different PS matching, leading to larger differences in the effect size estimate). There's no sampling before fitting the model, so we're calling Cyclops' predict() on the same data used to fit the model.

Repeat runs on the same OS produce the exact same result, so results are reproducible in that sense. We compared the output of .Machine in R , and the only difference we see is sizeof.long = 4 on Windows and sizeof.long = 8 on Linux.

Any thoughts what could explain these differences?

Add test for proportionality assumption

As part of our suite of study diagnostics, it would be good to evaluate whether the proportionality assumption holds when performing a Cox regression. Could we add something like Schoenfeld residuals to Cyclops?

Issue with covertToCyclopsData - ORDER By is Ignored in sub queries

In CohortMethod when calling fitOutcomeModel the following warning is thrown:

ORDER BY is ignored in subqueries without LIMIT
ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
Backtrace:
  1. CohortMethod::runCmAnalyses(...) test-eunomia.R:116:2
  2. ParallelLogger::clusterApply(cluster, modelsToFit, doFitOutcomeModelPlus) /Users/jamie/PycharmProjects/CohortMethod/R/RunAnalyses.R:426:6
  3. base::lapply(x, fun, ...)
  4. CohortMethod:::FUN(X[[i]], ...)
  6. CohortMethod::fitOutcomeModel(...)
  8. Cyclops:::convertToCyclopsData.tbl_dbi(...)
  9. Andromeda::batchApply(covariates, loadCovariates, batchSize = 1e+05)
 11. dbplyr:::sql_render.tbl_lazy(tbl, connection)
 13. dbplyr:::sql_render.op(query$ops, con = con, ..., subquery = subquery)
 15. dbplyr:::sql_render.select_query(qry, con = con, ..., subquery = subquery)
 24. dbplyr:::sql_render.join_query(query$from, con, ..., subquery = TRUE)
 33. dbplyr:::sql_render.tbl_lazy(query$y, con, ..., subquery = TRUE)
 35. dbplyr:::sql_render.op(query$ops, con = con, ..., subquery = subquery)
 37. dbplyr:::sql_render.select_query(qry, con = con, ..., subquery = subquery)
 38. dbplyr:::dbplyr_query_select(...)
 39. dbplyr:::dbplyr_fallback(con, "sql_select", ...)
 41. dbplyr:::sql_select.DBIConnection(con, ...)
 43. dbplyr:::sql_query_select.DBIConnection(...)
 45. dbplyr:::sql_clause_order_by(con, order_by, subquery, limit)
 46. dbplyr:::warn_drop_order_by()

This appears to be a problem with how cyclops is calling Andromeda. Reproducible in the unit tests found here

Allow user to specify initial values for betas

We'd like to do warm starts where the underlying data has slightly changed (different rows have been excluded). Providing a way to specify the initial values for the betas would achieve that, because we can then insert the estimated coefficients from the previous run.

Cross-validation for conditional PR and LR failing

Using this code:

library(Cyclops)
set.seed(1)
simData <- simulateCyclopsData(model = "poisson")

cyclopsData <- convertToCyclopsData(outcomes = simData$outcomes,
                                    covariates = simData$covariates,
                                    addIntercept = FALSE,
                                    modelType = "cpr")
prior <- createPrior(priorType = "laplace", useCrossValidation = TRUE)
control <- createControl(fold = 4, cvRepetitions = 1, threads = 1, noiseLevel = "quiet")
fit <- fitCyclopsModel(cyclopsData = cyclopsData,
                       prior = prior,
                       control = control)
fit

On v2.0.2 (CRAN) generates:

Cyclops model fit object

Call: fitCyclopsModel(cyclopsData = cyclopsData, prior = prior, control = control)

           Model: cpr
           Prior: Laplace(nan)
  Hyperparameter: NaN
     Return flag: SUCCESS
Log likelikehood: -265300.2
       Log prior: NaN

On v1.3.4 it generates:

Cyclops model fit object

Call: fitCyclopsModel(cyclopsData = cyclopsData, prior = prior, control = control)

           Model: cpr
           Prior: Laplace(13.1473)
  Hyperparameter: 0.0115706880060403
     Return flag: SUCCESS
Log likelikehood: -263547.1
       Log prior: -7.770069

This problem is the same for clr, but not pr, lr, and Cox.

(Tiny) differences in optimal hyperparameter depending on number of threads

I'm observing tiny differences in the optimal hyperparameter and therefore the fitted model, depending on the number of threads. For reproducibility purposes, it would be nice if the model was always exactly the same.

Code to reproduce:

library(Cyclops)

# Simulate data ----------------------------------------
set.seed(7)
data <- simulateCyclopsData(nstrata = 1,
                            nrows = 25000,
                            ncovars = 100,
                            effectSizeSd = 1,
                            zeroEffectSizeProp = 0.95,
                            model = "logistic")

# Fit models ------------------------------------------------
prior <- createPrior(priorType = "laplace",
                     exclude = 0,
                     useCrossValidation = TRUE)
control <- createControl(tolerance = 2.0E-7,
                         seed = 1,
                         fold = 10,
                         cvRepetitions = 10,
                         startingVariance = 0.01,
                         threads = 10)

# 1st fit
cyclopsData1 <- convertToCyclopsData(outcomes = data$outcomes,
                                    covariates = data$covariates,
                                    modelType = "lr")


fit1 <- fitCyclopsModel(cyclopsData = cyclopsData1,
                        prior = prior,
                        control = control)

# 2nd fit
cyclopsData2 <- convertToCyclopsData(outcomes = data$outcomes,
                                    covariates = data$covariates,
                                    modelType = "lr")
control$threads <- 5

fit2 <- fitCyclopsModel(cyclopsData = cyclopsData2,
                        prior = prior,
                        control = control)

# Compare fits -------------------------------------------------
fit1$variance
[1] 0.02582132
fit2$variance
[1] 0.02582133

We do get the exact same result if we use the same number of threads.

Error with Cyclops Install on Windows with R 3.2.3

When I go to install cyclops from R Studio with R 3.2.3 and latest RTools, i get this (unhelpful) error:

make: *** [cyclops/engine/AbstractModelSpecifics.o] Error 1
Warning: running command 'make -f "Makevars.win" -f "C:/PROGRA1/R/R-321.3/etc/i386/Makeconf" -f "C:/PROGRA1/R/R-321.3/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' SHLIB="Cyclops.dll" ' had status 2
ERROR: compilation failed for package 'Cyclops'

upper and lower bound of CI are equal, but no failure flag

Sometimes the confint() function returns upper and lower CI bounds that are identical but the fit$return_flag is “SUCCESS”. Other times, on the same data, it doesn’t – in the latter case I don’t know whether they or the coefficient itself is wrong. Running the function twice in a row on the same object can lead to different results (unpredictable, but sometimes happens with the example below).

library(Cyclops)
test_dat <- data.frame(exposure = c(0, 1), 
                       person_time = c(13775, 10115), 
                       outcomes = c(0, 3))
cyclops_dat <- createCyclopsData(outcomes ~ exposure, data = test_dat,
                                 time = log(test_dat$person_time), modelType = "pr")
cyclops_fit <- fitCyclopsModel(cyclops_dat)
coef(cyclops_fit)
confint(cyclops_fit, "exposure")

Receiving warnings when building from source in R 3.3.1

I'm seeing warnings during installing cyclops (as a dependency of CohortMethod) from source using the following command:

install_git("OHDSI\CohortMethod")

The warning is:

RcppCyclopsInterface.cpp: In member function 'bsccs::priors::JointPriorPtr bsccs::RcppCcdInterface::makePrior(const std::vector<std::basic_string<char> >&, const std::vector<double>&, const ProfileVector&, const HierarchicalChildMap&, const NeighborhoodMap&)':
RcppCyclopsInterface.cpp:581:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         && basePriorName.size() == length
                                 ^
RcppCyclopsInterface.cpp:582:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         && baseVariance.size() == length) {
                                ^

I think the warning is harmless but could this be cleaned up?

-Chris

Add location parameters for coefficient specific priors

Hi, @msuchard,

Me and @lhjohn are working on transfer learning at ErasmusMC. One of the methods we are interested in evaluation is for linear models and is called the prior method. It involves first training a model on a large source database as usual. Then for a target database which would not have as much data we train a regularized regression model, but instead of pushing the coefficients toward zero we push them towards the coefficients of the source model. I believe this is equivalent to having coefficient specific priors with the location parameter as the coefficient from the source model.

I can already deduce from the Cyclops source code (correct me if I'm wrong) that it's already possible to have coefficient specific priors with different variances. My question is, do you think it would be complicated to add location/scale information for the priors as well? This is something me and @lhjohn are interested in implementing, for example for the laplace prior, if it doesn't require extensive changes to the code and if we can get some guidance.

Although we are mostly thinking about prediction I think this can benefit the community everywhere where we are working in the small data regime.

Regards,
Egill Fridgeirsson

Use KKT conditions to speed up mode search

Hi @schuemie and @pbr6cornell ,

If you can generate a dataset with 100,000 or more covariates and many, many rows, I just implemented (in a15bb66) a mode search strategy that should be much faster than before. Please let me know your mileage, so I can tinker a bit with performance. The R commands are:

slowFit <- fitCyclopsModel(massiveCyclopsData,
                           forceNewObject = TRUE, # Cold start for fair comparison
                           prior = createPrior("laplace"))

fastFit <- fitCyclopsModel(massiveCyclopsData,
                          forceNewObject = TRUE, # Cold start for fair comparison
                          prior = createPrior("laplace"),
                          control = createControl(noiseLevel = "quiet",
                                                  useKKTSwindle = TRUE,
                                                  tuneSwindle = 10)) # Maybe try 50, 100 as well

Cv.txt is created by default

Whenever you use cross-validation, Cyclops automatically creates a file called cv.txt in the working folder. Although useful on some occasions, it often is annoying to find it in your folder. Should we change the default to not generating this file?

Fit a model on one set of data and predict on another?

I'm trying to do some analysis with at train/test setup, but I'm having trouble seeing how to do it with Cyclops. I've been using the predict function (https://github.com/OHDSI/Cyclops/blob/master/R/ModelFit.R#L474), but it seems (to me to be) pretty hard-wired to already taking a fit object. I'd like to pass it a fit object and the test set (or do something equivalent), but I can't see how to do that.

Is there a way to do this within Cyclops? I wanted to make sure I wasn't missing anything before I did my own home-cooked stuff. Thanks a lot!

error message when running Cyclops

When i run Cyclops i get this message:

Error in ifelse(!is.null(index$index), max(index$index), 1): object 'index' not found

I did a fresh install from github. (version 1.2.1-1)

Any ideas how to solve this?

Thanks.

Report seed used in cross-validation

This would be a nice-to-have: After fitting a model, could the cyclopFit object report on the random seed used to randomize the folds? The use case is when we've fitted a model, and later want to rerun our analysis we can choose to use the same seed so that if the data hasn't changed, the result will be the same.

Cyclops error in createPs function of CohortMethod package

When running createPs on our local instance from the CohortMethod package single study vignette (and when running our own studies), we get the following error when run against some of our databases (for others it works fine):

Error in .loadCyclopsDataMultipleX(object, covariateId, rowId, covariateValue, :
Repeated row-column entry at 9 - 1008

Do you have any insight as to why this might be happening? Any hints would be greatly appreciated.

'NAs produced by integer overflow' when calling `confint()`

In various places in confint() the covariate ID is cast to a (32-bit) integer, causing the message 'NAs produced by integer overflow' to be generated when the covariate ID does not fit in a 32-bit integer. Many covariate IDs (like those generated by the FeatureExtraction package) do not fit in a 32-bit integer, but are allowed in Cyclops.

what is the meaning of "POOR_BLR_STEP"?

Hi,
I ran a study package using the CohortMethod and Cyclops, but I got this error like below.
I think that there is a problem with the fitting steps of PS model, but I don't know exactly what is meaning of this error message and how can I fix this...
Please find attached error message and my sessioninfo. If you know how to solve this problem or whatever related to this, please let me know.

image

image

Create new release?

For several months now, when using Cyclops for example in CohortMethod, I get the following warnings:

Warning messages:
1: ORDER BY is ignored in subqueries without LIMIT
i Do you need to move arrange() later in the pipeline or use window_order() instead? 
2: In as.integer.integer64(parm) : NAs produced by integer overflow

I think all of these have already been address in the various develop branches of Cyclops. Would it be possible to create a Cyclops release just to keep the console clean?

Consistent cross-validation folds

Currently, we generate different (random) cross-validation folds for each hyper-parameter value. Should we re-use the same (randomly-generate) folds? I suspect the answer is "yes," but remain unsure.

Using old control object throws uninformative error

We tend to store the control object as part of our study settings. Control objects created using Cyclops versions < 2.0.0 cause an uninformative error to be thrown in versions >= 2.0.0:

library(Cyclops)
sim <- simulateCyclopsData()
cyclopsData <- convertToCyclopsData(sim$outcomes, sim$covariates, modelType = "cox")
control <- createControl()
# Older control object doesn't have algorithm variable:
control$algorithm <- NULL
fit <- fitCyclopsModel(cyclopsData, control = control)
# Error in if (is.na(control$algorithm)) { : argument is of length zero

Recommended behavior: if algorithm variable not found, assume "ccd". Alternative: throw meaningful error message stating the control object is out of date.

Cox regression cross-validation seems broken

We see some weird behavior in the cross-validation, but its hard to reproduce. At higher variance levels, we often see 'ill configured for prior' but not always. Also, in the below example, it seems a bit unlikely that the log likelihood really stays almost constant from .000001 to 10. Also, at the 'optimal' variance (e.g 0.000215443) all betas are typically zero which hardly seems optimal.

data <- simulateData(nstrata=1,nrows=1000,ncovars=200,model="survival")
cyclopsData <- convertToCyclopsData(data$outcomes,data$covariates,modelType = "cox")

prior <- createPrior("laplace", useCrossValidation = TRUE)
control <- createControl(noiseLevel = "quiet",lowerLimit = 0.000001,upperLimit = 10)
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)

Efron's method of handling ties in survival data

Cyclops is currently lacking support for Efron's method (default behavior in R as @schuemie brought to my attention). Default Cyclops behavior for ties in Cox regression is Breslow's method, which I believe is also the default behavior in SAS. More critically, Cyclops supports at scale an experimental exact method for handling ties that should reduce bias (manuscript in process, instructions to come); in general, exact methods are quite slow.

Notes and performance tests on Cyclops from MacBook Pro

  1. Working with a new install of R on a new Macbook. FYI, I had to change my workspace to "/" to get install to complete properly.
  2. Some interesting performance observations with R on Macs. Interested to hear thoughts. Unlike Windows and Linux, Mac on R uses multicore by default through ATLAS (an optimized BLAS). So I think that these tests should reflect the difference between Cyclops singlecore and multi core R.
  3. Based on results below, it would indeed be helpful to separate the data frame from the model that is being estimated. For example, my probit and logit models might perform slightly differently on the data, but it would not be unheard of to estimate both on the same data set.
  4. I hypothesize that Cyclops could be even further improved if it were deployed with Revolution R using Rvolution linear algebra kernels. When I get back to a windows OS I will try vanilla R, Revolution R, with and without Cyclops.

http://mran.revolutionanalytics.com

So here is the benchmarking, I haven't looped for statistics.

#Generate 10 X 20000000 and 1 X 20000000)
>  mlgen<-mlbench.friedman1(20000000,sd=.1)
> mlgen$bin<-test$y<mean(test$y)
> 
> # OLS
> system.time(cyclopsData <- createCyclopsDataFrame(mlgen$y ~ mlgen$x, modelType="ls"), gcFirst=TRUE)
   user  system elapsed 
 60.604   5.544  71.226 
> system.time(cyclopsFit <- fitCyclopsModel(cyclopsData), gcFirst=TRUE)
   user  system elapsed 
111.649   0.766 111.861 
> 
> #IWLS
> system.time(glmFit<-glm.fit(mlgen$x,mlgen$y), gcFirst=TRUE)
   user  system elapsed 
 33.594  10.497  50.574 
> 
> rm(cyclopsData)
> rm(cyclopsFit)
> 
> #LR
> system.time(cyclopsData <- createCyclopsDataFrame(mlgen$bin ~ mlgen$x, modelType="lr"), gcFirst=TRUE)
   user  system elapsed 
 25.297   5.661  35.693 
> system.time(cyclopsFit <- fitCyclopsModel(cyclopsData), gcFirst=TRUE)
   user  system elapsed 
451.624   1.447 452.686 
> 
> #GLMwLogit
> system.time(glmFit<-glm.fit(mlgen$x,mlgen$bin,family=binomial(link = "logit")), gcFirst=TRUE)
   user  system elapsed 
 48.457  19.237  94.046 
> 
> 
> Sys.info()
                                                                                            sysname 
                                                                                           "Darwin" 
                                                                                            release 
                                                                                           "13.3.0" 
                                                                                            version 
"Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64" 
                                                                                           nodename 
                                                                             "DM-MacBook-Pro.local" 
                                                                                            machine 
                                                                                           "x86_64" 
                                                                                              login 
                                                                                   "daniellameeker" 
                                                                                               user 
                                                                                   "daniellameeker" 
                                                                                     effective_user 
                                                                                   "daniellameeker" 
> 

Compilation error on win-builder

g++  -std=c++0x -I"D:/RCompile/recent/R/include"         -I. -Icyclops -DR_BUILD -DWIN_BUILD -DDOUBLE_PRECISION   -I"d:/RCompile/CRANpkg/lib/3.3/Rcpp/include" -I"d:/RCompile/CRANpkg/lib/3.3/BH/include" -I"d:/RCompile/CRANpkg/lib/3.3/RcppEigen/include" -I"d:/RCompile/CRANpkg/lib/3.3/RcppParallel/include" -I"d:/RCompile/r-compiling/local/local320/include"     -O2 -Wall  -mtune=core2 -c RcppExports.cpp -o RcppExports.o
RcppExports.cpp:152:1: error: 'Eigen' does not name a type
RcppExports.cpp: In function 'SEXPREC* Cyclops_cyclopsGetFisherInformation(SEXP, SEXP)':
RcppExports.cpp:160:9: error: 'Eigen' has not been declared
RcppExports.cpp:160:25: error: expected ';' before '__result'
RcppExports.cpp:161:9: error: '__result' was not declared in this scope
make: *** [RcppExports.o] Error 1

Any suggestions would be much appreciated!

Error install Cyclops 2.0.0 or 2.0.1 using"Build&Relaod"

I am trying to install Cyclops 2.0.0(or 2.0.1) to install CohortMethod 3.0.1 on R 3.4.2 by "Build&Reload."
But I keep getting this error and it seems like I am the only one with the error.
What could be the reason and how can I solve this issue?
Thank you.

src/RcppCyclopsInterface.cpp
Line 286 comparison between signed and unsigned integer expression [-wsign-compare]
src/RcppExports.cpp
Line153 'Eigen' does not name a type
Line 160 'cyclopsGetFisherInformation' was not declared in this scope

Rtools 3.4

If I install the latest Rtools version 3.4 the install of Cylops in R still wants 3.3

Add an option to automatically pick selectorType in createControl

The selectorType option allows you to specify what the unit of sampling is for cross-validation: rows or strata. We could implement a simple heuristic that can automatically make a choice that is probably correct, for example:
if rows / strata > folds then sample by rows else sample by strata

Faster computation of confidence intervals

Currently, computing the confidence intervals of non-regularized betas takes a long time (often longer than the time needed to create the original fit). When computing the confidence intervals for several variables this can be a bit cumbersome. Is there a way to speed up the computation of CIs? Computing the CI of each variable in a separate thread would already help a lot.

Install error on linux ( RHEL 6 and R 3.2 )

I am sure I am missing a dependency, but I have no idea what it is.
g++ 4.4.7
R-3.2.3

I tried the downgrade of BH that @msuchard suggested, but it did not help.

'/usr/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD INSTALL
'/tmp/RtmpgDiCmU/devtools19a34d0cb2af/OHDSI-Cyclops-5dea025'
--library='/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2' --install-tests

  • installing source package ‘Cyclops’ ...
    ** libs
    g++ -m64 -std=c++0x -I/usr/include/R -DNDEBUG -I. -Icyclops -DR_BUILD -DDOUBLE_PRECISION -I/usr/local/include -I"/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/Rcpp/include" -I"/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/BH/include" -I"/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include" -I"/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppParallel/include" -s -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c cyclops/CcdInterface.cpp -o cyclops/CcdInterface.o
    In file included from cyclops/CyclicCoordinateDescent.h:12,
    from cyclops/CcdInterface.cpp:29:
    cyclops/CompressedDataMatrix.h: In member function ‘void bsccs::CompressedDataMatrix::push_back(IntVectorItr, IntVectorItr, RealVectorItr, RealVectorItr, bsccs::FormatType)’:
    cyclops/CompressedDataMatrix.h:304: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(NULL, bsccs::RealVectorPtr&, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:314: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr&, NULL, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:316: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(NULL, NULL, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h: In member function ‘void bsccs::CompressedDataMatrix::replace(int, IntVectorItr, IntVectorItr, RealVectorItr, RealVectorItr, bsccs::FormatType)’:
    cyclops/CompressedDataMatrix.h:329: error: no matching function for call to ‘bsccs::CompressedDataMatrix::replace(int&, NULL, bsccs::RealVectorPtr&, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:409: note: candidates are: void bsccs::CompressedDataMatrix::replace(int, bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:336: error: no matching function for call to ‘bsccs::CompressedDataMatrix::replace(int&, bsccs::IntVectorPtr&, NULL, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:409: note: candidates are: void bsccs::CompressedDataMatrix::replace(int, bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:338: error: no matching function for call to ‘bsccs::CompressedDataMatrix::replace(int&, NULL, NULL, bsccs::FormatType&)’
    cyclops/CompressedDataMatrix.h:409: note: candidates are: void bsccs::CompressedDataMatrix::replace(int, bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h: In member function ‘void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)’:
    cyclops/CompressedDataMatrix.h:348: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(NULL, bsccs::RealVectorPtr&, bsccs::FormatType)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:358: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr&, NULL, bsccs::FormatType)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:360: error: no matching function for call to ‘bsccs::CompressedDataMatrix::push_back(NULL, NULL, bsccs::FormatType)’
    cyclops/CompressedDataMatrix.h:344: note: candidates are: void bsccs::CompressedDataMatrix::push_back(bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:399: note: void bsccs::CompressedDataMatrix::push_back(bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h: In member function ‘void bsccs::CompressedDataMatrix::insert(size_t, bsccs::FormatType)’:
    cyclops/CompressedDataMatrix.h:370: error: no matching function for call to ‘bsccs::CompressedDataMatrix::insert(__gnu_cxx::__normal_iterator<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn >, std::vector<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn >, std::allocator<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn > > > >, NULL, bsccs::RealVectorPtr&, bsccs::FormatType)’
    cyclops/CompressedDataMatrix.h:366: note: candidates are: void bsccs::CompressedDataMatrix::insert(size_t, bsccs::FormatType)
    cyclops/CompressedDataMatrix.h:421: note: void bsccs::CompressedDataMatrix::insert(__gnu_cxx::__normal_iterator<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn >
    , std::vector<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn >, std::allocator<std::unique_ptr<bsccs::CompressedDataColumn, std::default_deletebsccs::CompressedDataColumn > > > >, bsccs::IntVectorPtr, bsccs::RealVectorPtr, bsccs::FormatType)
    In file included from cyclops/CyclicCoordinateDescent.h:13,
    from cyclops/CcdInterface.cpp:29:
    cyclops/ModelData.h: In function ‘typename Itr::value_type bsccs::quantile(Itr, Itr, double)’:
    cyclops/ModelData.h:297: error: ‘floor’ is not a member of ‘std’
    cyclops/ModelData.h:297: error: unable to deduce ‘const auto’ from ‘’
    cyclops/ModelData.h:298: error: ‘ceil’ is not a member of ‘std’
    cyclops/ModelData.h:298: error: unable to deduce ‘const auto’ from ‘’
    In file included from cyclops/priors/JointPrior.h:14,
    from cyclops/CyclicCoordinateDescent.h:15,
    from cyclops/CcdInterface.cpp:29:
    cyclops/priors/CovariatePrior.h: In member function ‘virtual const std::string bsccs::priors::FusedLaplacePrior::getDescription() const’:
    cyclops/priors/CovariatePrior.h:256: error: expected initializer before ‘:’ token
    cyclops/priors/CovariatePrior.h:260: error: expected primary-expression before ‘return’
    cyclops/priors/CovariatePrior.h:260: error: expected ‘)’ before ‘return’
    In file included from cyclops/CyclicCoordinateDescent.h:15,
    from cyclops/CcdInterface.cpp:29:
    cyclops/priors/JointPrior.h: In member function ‘void bsccs::priors::JointPrior::addVarianceParameters(const std::vectorstd::shared_ptr<double, std::allocatorstd::shared_ptr >&)’:
    cyclops/priors/JointPrior.h:53: error: expected initializer before ‘:’ token
    cyclops/priors/JointPrior.h:56: error: expected primary-expression before ‘}’ token
    cyclops/priors/JointPrior.h:56: error: expected ‘;’ before ‘}’ token
    cyclops/priors/JointPrior.h:56: error: expected primary-expression before ‘}’ token
    cyclops/priors/JointPrior.h:56: error: expected ‘)’ before ‘}’ token
    cyclops/priors/JointPrior.h:56: error: expected primary-expression before ‘}’ token
    cyclops/priors/JointPrior.h:56: error: expected ‘;’ before ‘}’ token
    cyclops/priors/JointPrior.h: In member function ‘std::vector<double, std::allocator > bsccs::priors::JointPrior::getVariance() const’:
    cyclops/priors/JointPrior.h:65: error: expected initializer before ‘:’ token
    cyclops/priors/JointPrior.h:68: error: expected primary-expression before ‘return’
    cyclops/priors/JointPrior.h:68: error: expected ‘;’ before ‘return’
    cyclops/priors/JointPrior.h:68: error: expected primary-expression before ‘return’
    cyclops/priors/JointPrior.h:68: error: expected ‘)’ before ‘return’
    cyclops/priors/JointPrior.h: In member function ‘virtual const std::string bsccs::priors::MixtureJointPrior::getDescription() const’:
    cyclops/priors/JointPrior.h:96: error: expected initializer before ‘:’ token
    cyclops/priors/JointPrior.h:99: error: expected primary-expression before ‘return’
    cyclops/priors/JointPrior.h:99: error: expected ‘;’ before ‘return’
    cyclops/priors/JointPrior.h:99: error: expected primary-expression before ‘return’
    cyclops/priors/JointPrior.h:99: error: expected ‘)’ before ‘return’
    cyclops/priors/JointPrior.h: In member function ‘virtual bool bsccs::priors::MixtureJointPrior::getSupportsKktSwindle() const’:
    cyclops/priors/JointPrior.h:148: error: expected initializer before ‘:’ token
    cyclops/priors/JointPrior.h:154: error: expected primary-expression at end of input
    cyclops/priors/JointPrior.h:154: error: expected ‘;’ at end of input
    cyclops/priors/JointPrior.h:154: error: expected primary-expression at end of input
    cyclops/priors/JointPrior.h:154: error: expected ‘)’ at end of input
    cyclops/priors/JointPrior.h:154: error: expected statement at end of input
    cyclops/priors/JointPrior.h:154: error: expected ‘}’ at end of input
    cyclops/priors/JointPrior.h:154: warning: no return statement in function returning non-void
    In file included from /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/Core:288,
    from /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/Dense:1,
    from cyclops/CyclicCoordinateDescent.h:18,
    from cyclops/CcdInterface.cpp:29:
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h: In constructor ‘Eigen::DenseStorage<T, -0x00000000000000001, -0x00000000000000001, -0x00000000000000001, _Options>::DenseStorage(Eigen::DenseStorage<T, -0x00000000000000001, -0x00000000000000001, -0x00000000000000001, _Options>&&)’:
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h:281: error: ‘nullptr’ was not declared in this scope
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h: In constructor ‘Eigen::DenseStorage<T, -0x00000000000000001, _Rows, -0x00000000000000001, _Options>::DenseStorage(Eigen::DenseStorage<T, -0x00000000000000001, _Rows, -0x00000000000000001, _Options>&&)’:
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h:339: error: ‘nullptr’ was not declared in this scope
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h: In constructor ‘Eigen::DenseStorage<T, -0x00000000000000001, -0x00000000000000001, Cols, Options>::DenseStorage(Eigen::DenseStorage<T, -0x00000000000000001, -0x00000000000000001, Cols, Options>&&)’:
    /home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/RcppEigen/include/Eigen/src/Core/DenseStorage.h:393: error: ‘nullptr’ was not declared in this scope
    In file included from cyclops/drivers/GridSearchCrossValidationDriver.h:11,
    from cyclops/CcdInterface.cpp:48:
    cyclops/drivers/AbstractCrossValidationDriver.h: At global scope:
    cyclops/drivers/AbstractCrossValidationDriver.h:25: error: ‘nullptr’ was not declared in this scope
    cyclops/CcdInterface.cpp: In member function ‘double bsccs::CcdInterface::profileModel(bsccs::CyclicCoordinateDescent
    , bsccs::ModelData
    , const bsccs::ProfileVector&, bsccs::ProfileInformationMap&, int, double, bool, bool)’:
    cyclops/CcdInterface.cpp:333: error: expected primary-expression before ‘[’ token
    cyclops/CcdInterface.cpp:334: warning: left-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:335: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:335: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:335: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:336: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:336: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:336: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:336: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:336: warning: right-hand operand of comma has no effect
    cyclops/CcdInterface.cpp:337: error: expected primary-expression before ‘const’
    cyclops/CcdInterface.cpp:337: error: expected primary-expression before ‘
    ’ token
    cyclops/CcdInterface.cpp:337: error: unable to deduce ‘auto’ from ‘’
    cyclops/CcdInterface.cpp:337: error: expected ‘,’ or ‘;’ before ‘{’ token
    cyclops/CcdInterface.cpp:268: warning: unused variable ‘time2’
    cyclops/CcdInterface.cpp:333: warning: unused variable ‘getBound’
    cyclops/CcdInterface.cpp:719: error: expected ‘}’ at end of input
    cyclops/CcdInterface.cpp:719: warning: no return statement in function returning non-void
    cyclops/CcdInterface.cpp: At global scope:
    cyclops/CcdInterface.cpp:719: error: expected ‘}’ at end of input
    make: ** [cyclops/CcdInterface.o] Error 1
    ERROR: compilation failed for package ‘Cyclops’
  • removing ‘/home/rstarr7/R/x86_64-redhat-linux-gnu-library/3.2/Cyclops’
    Error: Command failed (1)

How to enable GPU in the current R package?

Dear Dr. Suchard, I am a PhD student in U of Texas Biostats, trying to speed up Cyclops using GPU, inspired by your 2013 TOMACS paper. However, I didn't find any guidance on how to enable it in the current package. I tried to uncomment find_package(CUDA) in the CMake file but it seems not working. I am wondering what else should I do? Thank you so much!

Automatically normalize covariate values

Large covariate values ( > 1 ) can lead to numeric overflow errors. We currently solve this by dividing the covariate value by the max value for that covariate before loading into Cyclops. It would be nice if Cyclops did this automatically, and rescales the beta coefficients so the normalization is invisible to the user.

For consistency we should also automatically scale the prior for each covariate.

In addition, we could add the option to automatically scale the prior by the standard deviation of the covariate (instead of the max).

Compilation on Mac OS X <= 10.8

R build for OS X <= 10.8 does not support C++11 by default (as discovered by @jduke99). To enable C++11 compilation locally, add into ~/.R/Makevars:

CXX1X = clang++
CXX1XFLAGS += -stdlib=libc++

Could we rename prior() to createPrior()?

The problem is that I would like to have a parameter called 'prior' with a default value, but it conflicts with the function name.

This breaks:

f <- function(prior = prior("none")){
  print(prior$priorType)
}

f()

but this doesn't:

f <- function(myPrior = prior("none")){
  print(myPrior$priorType)
}

f()

When excluding vars from regularization, reported prior variance is always 0

data <- simulateData(nstrata=1,nrows=1000,ncovars=2000,model="logistic")
cyclopsData <- convertToCyclopsData(data$outcomes,data$covariates,modelType = "lr",addIntercept = TRUE)
prior <- createPrior("laplace", useCrossValidation = TRUE)
control <- createControl(noiseLevel = "silent")
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
fit$variance #is always 0 because intercept is excluded from prior

more efficient support for time-varying covariates in cyclopsdata for cox models

From the release package documentation

These columns are expected in the outcome object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- y (real) The outcome variable
- time (real) For models that use time (e.g. Poisson or Cox regression) this contains time
(e.g. number of days)
- weights (real) (optional) Non-negative weights to apply to outcome
- censorWeights (real) (optional) Non-negative censoring weights for competing risk model; will be computed if not provided.

These columns are expected in the covariates object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- covariateId (integer) A numeric identifier of a covariate
- covariateValue (real) The value of the specified covariate

The correct way to dealing with timevarying data in a cox model is to split each individual's follow-up period into multiple intervals at each change in their covariate value. Thus a time-varying dataset for cox analysis would have more than [edit] 1 row per person, and the above data spec would require the covariates object to have the same row length as the outcome object. In the case of a cox model with both time-varying and time-invariant variables, all of the time-invariant values would need to be repeated for every interval within participant. A more efficient data structure would allow a time-invariant covariate object which would join to the outcome object on participant id, along with a time-varying covariates object which would link to the outcome on both participant id and time.

Add a function to profile likelihood

Would it be possible to add a function that returns the (log) likelihood for a vector of coefficient values? Ideally, the function would work similar to the confint function, so having the following arguments:

  • object: A fitted Cyclops model object
  • parm: A specification of which parameter requires profiling; either a number or a covariateId name
  • x: vector of values of the parameter for which we want to compute the likelihood
  • includePenalty: Logical: Include regularized covariate penalty in profile

A special case might be an unfitted model with only 1 independent variable. In that case it would be nice if it wasn't necessary to first fit the model (since an optimum may not exist).

RIght now I have to use this workaround, which works but is very slow.

issue with survfit()

I'm getting an error: "Variable -2147483648 is unknown"

debugging it seems to be caused by:
survfit() -> meanLinearPredictor() -> summary.cyclopsData() -> reduce() -> .cyclopsSum()

the same error occurs for .cyclopsSumByStratum()

could this be an int64 issue?

Fancy iterators

Fancy_iterator branch provides substantial performance increases, but some range functions do not compile under gcc 4.6.3 (reproduced on TravisCI and locally).

Cross validation output is incorrect when fitting inside a function

This is a bit scary:

  sim <- simulateData(nstrata = 1, nrows = 1000, ncovars = 10, eCovarsPerRow = 1, effectSizeSd = 1,model = "logistic")
  covariates <- sim$covariates
  outcomes <- sim$outcomes

  cyclopsData <- convertToCyclopsDataObject(outcomes,covariates,modelType = "lr",addIntercept = TRUE)
  fit <- fitCyclopsModel(cyclopsData,prior = prior("laplace", useCrossValidation = TRUE),
                         control = control(lowerLimit=0.01, upperLimit=10, fold=5, noiseLevel = "quiet"))  
  fit$variance
  #This shows the real selected variance

  f <- function(){
    cyclopsData <- convertToCyclopsDataObject(outcomes,covariates,modelType = "lr",addIntercept = TRUE)
    fit <- fitCyclopsModel(cyclopsData,prior = prior("laplace", useCrossValidation = TRUE),
                           control = control(lowerLimit=0.01, upperLimit=10, fold=5, noiseLevel = "quiet"))  
    fit$variance
  }
  f()
  #This always returns 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.