Comments (7)
What is size
, X
, y
, ...?
from biglasso.
Sorry, I forgot the line
size = length(y)
Not sure if this answers your question but
str(x)
Formal class 'big.matrix' [package "bigmemory"] with 1 slot
..@ address:
str(y)
logi [1:21989] FALSE TRUE FALSE FALSE FALSE FALSE ...
from biglasso.
I mean, could you provide a reproducible example with some example data so that we can run your code and see the error.
from biglasso.
x <- as.big.matrix(matrix(rnorm(n= 21989,2790), nrow = 21989))
y <- sample(c(0,1), size = 21989, replace = T)
train <- sample(c(T,F), length(y), c(.5,.5), replace = TRUE)
fit <- cv.biglasso(X = x, y = y, row.idx = which(train), penalty = "lasso", family = "binomial", nfolds = 10)
Error in[<-
(*tmp*
, cv.ind == i, 1:res$nl, value = res$loss) :
(subscript) logical subscript too long
from biglasso.
If we run the code step by step after using debugonce(cv.biglasso)
, we see a first problem where cv.ind
is defining folds for the whole sample size, instead of only the indices of training set.
Specifying cv.ind = sample(rep_len(1:10, sum(train)))
returns another error.
from biglasso.
I used deepcopy to workaround this bug, but this was taking too much memory and the server would sometimes crashes. When I use deepcopy on a big.matrix, a corresponding file is created at "dev/shm" (I tried changing the backingfile to my external hd but wasn't successful).
I found that the bigstatsr package makes it easier to deal with these problems (althought I'm not sure if what I'm doing is ok). The code is now something like this:
exthd_path <- "."
file <- file.path(exthd_path,"test.txt")
x <- as.big.matrix(matrix(rnorm(n= 21989,2790), nrow = 21989))
write.big.matrix(x, filename = file, row.names = FALSE, col.names = T, sep = " ")
rm(x); gc()
y <- sample(c(0,1), size = 21989, replace = T)
train <- sample(c(T,F), length(y), c(.5,.5), replace = TRUE)
x <- big_read(file, select = 1)
xtrain <- big_copy(x, ind.row = which(train), backingfile = paste0(bigstatsr::sub_bk(x$bk),"-train"))
fit <- cv.biglasso(X = xtrain$bm(), y = y[train], penalty = "lasso", family = "binomial", nfolds = 10)
unlink(xtrain$bk); rm(xtrain)
unlink(c(x$bk,x$rds)); rm(x)
from biglasso.
I think you can directly use big_copy()
when x
is a big.matrix
.
from biglasso.
Related Issues (20)
- Automatically detect and cast variables to expected type(s) HOT 4
- standardize = FALSE ?
- predict.cv.biglasso - address 0x2b2da5f69ae0, cause 'memory not mapped' HOT 2
- Dealing with Extremely large datasets (750 GB) HOT 6
- Bug in cv.biglasso() triggered by edge case HOT 1
- Split available cores in cv.biglasso() HOT 1
- Rigorous LASSO and causal inference HOT 1
- Disabling intercept HOT 1
- Warning: stack imbalance
- Error R_Reprotect: only x protected items, can't reprotect index y HOT 3
- row.idx doesn't work ?
- Default CV error contradictory documentation HOT 1
- Installation error on CRAN HOT 3
- Cross validation tests failing HOT 3
- Removed from CRAN HOT 1
- Update benchmarks
- .
- data storage mode affects model fit -- why??
- are NCV penalties working correctly? HOT 2
- bug in setupX? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biglasso.