Comments (8)
Certainly easy to believe; I have not personally checked CV under the Cox model (which handles observation weights slightly differently than the other models). Will look into this.
from cyclops.
Try this latest commit to master 112353a and let me know how things go. The CCD update steps and log likelihoods were being calculated wrong when data were being held out and there existed survival ties.
from cyclops.
Almost! But there's still some funky stuff. When I do:
library(Cyclops)
set.seed(100)
data <- simulateData(nstrata=1,nrows=1000,ncovars=200,model="survival")
cyclopsData <- convertToCyclopsData(data$outcomes,data$covariates,modelType = "cox")
prior <- createPrior("laplace", useCrossValidation = TRUE)
control <- createControl(noiseLevel = "quiet",lowerLimit = 0.000001,upperLimit = 10,seed = 100)
fit <- fitCyclopsModel(cyclopsData,prior=prior,control=control)
The results only show the likelihood in one fold:
Performing 10-fold cross-validation [seed = 100]
Running at Laplace(1414.21) Grid-point #1 at 1e-006 Fold #1 Rep #1 pred log like = -4841.02
Running at Laplace(1414.21) Grid-point #1 at 1e-006 Fold #2 Rep #1 pred log like = 0
Running at Laplace(1414.21) Grid-point #1 at 1e-006 Fold #3 Rep #1 pred log like = 0
Running at Laplace(1414.21) Grid-point #1 at 1e-006 Fold #4 Rep #1 pred log like = 0
Running at Laplace(1414.21) Grid-point #1 at 1e-006 Fold #5 Rep #1 pred log like = 0
...
from cyclops.
I believe I have pin-pointed the problem. The Cox model uses the pid row to identify different strata and not subjects. Code in AbstractSelector.cpp assumes that pid identities exchangeable sampling units. This assumption is true for all other models except the conditional logistic regression. We should discuss what we mean by exchangeable sampling unit for this model. Basically, do we assume:
- Strata are small and we have a whole bunch of them, such that each strata are independently sampled, or
- Strata are large and we have few of them, such that we want to independently sample rows within strata.
In either case, I am working on a fix for the Cox model.
from cyclops.
@schuemie, try again with commit 1129a39. Of course, I have probably broken something else now.
from cyclops.
This certainly works, but I do now see a different issue: at least in this example, the likelihood has a local optimum far from the global one, and the auto-search lands us in the wrong spot. Any thoughts?
library(Cyclops)
set.seed(1)
data <- simulateCyclopsData(nstrata = 1, nrows = 1000, ncovars = 200, model = "survival")
cyclopsData <- convertToCyclopsData(data$outcomes, data$covariates, modelType = "cox")
prior <- createPrior("laplace", useCrossValidation = TRUE)
# Using grid search finds a small optimal variance
control <- createControl(noiseLevel = "quiet",lowerLimit = 0.000001,upperLimit = 10, seed = 1)
fit <- fitCyclopsModel(cyclopsData, prior = prior, control = control, forceNewObject = TRUE)
# The auto-search starts at large values, and settles in a local optimum
control <- createControl(noiseLevel = "quiet",seed = 1, cvType = "auto")
fit <- fitCyclopsModel(cyclopsData, prior = prior, control = control, forceNewObject = TRUE)
# Forcing a different starting variance helps find the global(?) optimum
control <- createControl(noiseLevel = "quiet",seed = 1, cvType = "auto", startingVariance = 0.000001)
fit <- fitCyclopsModel(cyclopsData, prior = prior, control = control, forceNewObject = TRUE)
from cyclops.
I noticed similar behavior with the previous seed (100) as well. The default variance estimate from Genkins et al. seems very far off the mark. Prior expert knowledge (i.e., me running cross-validation in MSCCS in the OMOP experiment) suggests variances < 1. Why not just start your searches with
startingVariance = 0.1
until we develop a better heuristic?
from cyclops.
Will do. I'm closing this issue now. Thanks so much Marc, this is just in time for our next study!
from cyclops.
Related Issues (20)
- (Solved) 'Rcpp_precious_remove' not provided by package 'Rcpp'
- 'NAs produced by integer overflow' when calling `confint()` HOT 3
- Add test for proportionality assumption HOT 2
- (Tiny) differences in optimal hyperparameter depending on number of threads HOT 2
- Small differences in prediction between operating systems HOT 3
- Add location parameters for coefficient specific priors HOT 2
- Create new release? HOT 4
- Merge fix for likelihood profile computation into main at some point? HOT 1
- upper and lower bound of CI are equal, but no failure flag
- what is the meaning of "POOR_BLR_STEP"?
- as(<numLike>, "dgeMatrix") is deprecated
- remove dependency on boost-headers `BH`
- arrow_S4 branch cross-validation generating unexpected output to the console HOT 1
- considerable time spent in `setPrior` for IHT and BAR HOT 3
- Cox model handling of intervals? HOT 2
- listOpenCLDevices returns character(0)
- Please don't forget to update package website when creating a release
- Faster likelihood profiling? HOT 2
- Add option for `getCyclopsProfileLogLikelihood` HOT 6
- getCyclopsProfileLogLikelihood crashes when all slope values are NA
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cyclops.