Comments (4)
Hi. I ran into a similar issue and made some code in a separate package: https://github.com/pearcemc/ClusterUtils.jl . The function describepids
and returns a dict with keys corresponding to the lowest numbered process on each node and values being lists of the processes on those nodes (default excludes master node). I also have some code somewhere for ensuring the resulting arrays are named everywhere they need to be - I can fish it out tomorrow if that would help?
from clustermanagers.jl.
@pearcemc I couldn't get the ClusterUtils code to work, so I filed an issue on the repo.
But while I have your attention, is the premise behind describepids
that I can use the process IDs that it returns for initializing the SharedArray
s?
from clustermanagers.jl.
Thanks for filing the issue there. Yeah, that's the idea it provides the id of a representative process on each machine as a key to a list of the procs on the machine - which can be passed as the pids argument of a `SharedArray'. So you can create an appropriate array on each machine by iterating over the keys. The representative processes can be useful if you're doing blas operations as these use implicit parallelism, and when message passing (pass messages between representative procs only and shared mem takes care of the rest).
EDIT: My workflow for getting a SharedArray
on each machine looks like.
@everywhere using ClusterUtils
T, V = (19,23)
sow(:T, T)
sow(:V, V)
Y_init = randn(T,V)
topo = describepids()
reps = collect(keys(topo))
sow(:topor, topo);
Yrefs = sow(reps, :Y, :( copy!(SharedArray(Float32, (T, V), pids=[topor[myid()]]), $Y_init) ));
for r in reps
sow(topo[r], :Y, :(fetch($Yrefs[$r]))); #this makes every proc on machine r request the SharedArray from its rep
end
Don't know how useful this will be to you due to versioning, but it was painfully acquired ;) If you find another solution I'd be interested to hear about it!
from clustermanagers.jl.
Too old to reproduce. Please consider the latest release of the project.
from clustermanagers.jl.
Related Issues (20)
- Extra options on SGE HOT 5
- Error in `rmprocs` SGE HOT 1
- Ship telnet via jll? HOT 2
- addprocs(SGEManager) fails HOT 5
- SGE fails in rmprocs
- Singularity images does not work with SLURM HOT 5
- Error launching workers: no such file or directory HOT 5
- TagBot trigger issue HOT 8
- lsf_bpeek makes strong assumptions on iterator state of retry_delays
- [SlurmManager] 100 % CPU usage while waiting for the job to get created HOT 6
- Better handling of SLURM job submission timing
- Handling of busy LSF deamon HOT 4
- SLURM 10 nodes good, 16 nodes error HOT 3
- pbs error HOT 4
- LSF manager broken in Julia 1.8.1 HOT 2
- -o argument in addprocs_slurm leads to an error
- ClusterManagers can be run on top of dask clusters! HOT 2
- Elastic auto IP address function HOT 2
- Limiting number of cores per node on with LSF HOT 3
- Finalizer task switch bug
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clustermanagers.jl.