Comments (1)
I don't think that furrr is the problem here. When you split and add the rownames on, your resulting list of matrix columns is much larger than the original matrix due to the repetition of the rowname values. So you end up shipping larger pieces than is necessary to the workers.
If you try to avoid adding the rownames onto the slices, it might improve the results, but I'm not sure.
library(purrr)
library(rlang)
dim <- c(50, 1000)
mat <- array(seq_len(prod(dim)), dim = dim)
colnames(mat) <- seq_len(ncol(mat))
rownames(mat) <- seq_len(nrow(mat))
# do the splitting
data <- mat |>
split(rep(colnames(mat), each = nrow(mat))) |>
map(set_names, rownames(mat))
object.size(mat)
#> 267688 bytes
object.size(data)
#> 3680208 bytes
# Over 13 times larger!
unclass(object.size(data) / object.size(mat))
#> [1] 13.74812
# The problem seems to be the repeated row names, so lets try not
# adding those on
data <- mat |>
split(rep(colnames(mat), each = nrow(mat)))
# That is more what you'd expect since you are just sharding a matrix.
# The extra ~.2 is from the memory overhead of the list
unclass(object.size(data) / object.size(mat))
#> [1] 1.196199
Created on 2023-04-05 with reprex v2.0.2.9000
from furrr.
Related Issues (20)
- Error indices from purrr() are misleading in furrr()
- Getting the same random result across `purrr::map()` and `furrr::future_map()` HOT 3
- `furrr` much slower than `purrr` (on Windows) HOT 2
- Export a wrapper around purrr `map()` to workers rather than `map()` itself
- Setting `seed = TRUE` / `.progress=TRUE` globally in the R session HOT 1
- Carrier HOT 2
- mgcv::gam(~s(pc = object not found)) when future::plan(multisession, workers > 1) HOT 1
- error: external pointer is not valid is not a particularly user actionable message HOT 5
- terra::rast() doesn't return a variable with workers > 1 HOT 3
- Understanding memory usage and performance of `furrr::future_apply` HOT 5
- Implement map_vec
- Use of variables in glue() inside furrr loops HOT 1
- Progress bar showing 100% even if not completed
- Error when plotting a {sf} object
- Identical RNG state for each task, despite setting different seeds in different tasks HOT 10
- How to pass to .f, character vectors in .x that reference global values?
- `future_map` not obviously faster than `map` in simple linear regression setting HOT 1
- Multiple calls of future_map() within a single plan() or script result in massive slowdown.
- furrr_options documentation doesn't say what to do with it
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from furrr.