Comments (9)
I do have a clarifying question: are the packages and the IP address supposed to refer to the compute node the job is running on? In other words, could this mean that the compute node, rather than the master AMI, is lacking batchtools?
from future.batchtools.
Updates: I checked - batchtools were missing and are installed, but didn't fix the problem. However, the listed IP doesn't match either the master node's public or private address or the compute node's public or private address. I don't understand what's going on here.
from future.batchtools.
Hi. Some quick comments:
-
I've updated your top comment to surround your code blocks with triple backticks (
```
) above and below. You want to use that for code blocks - easier to read. The single backtick (`
) is only used for in-line code, e.g. somecode here
but not here. See https://guides.github.com/features/mastering-markdown/ -
When you troubleshoot, especially when you want to get started, rule out as many packages as possible. In your case, you don't need to involve foreach and doFuture. Instead, use a bare-bones future. Use the first example on https://github.com/HenrikBengtsson/future.batchtools. I don't think the problems are with foreach + doFuture but it's always easier if you can simplify your example as far as possible.
-
Yes, you need to have future.batchtools and friends installed on compute nodes too.
-
I don't understand what you mean by IP numbers.
-
If this is the first time you ever used future.batchtools and batchtools, it might be easier if you start by getting a simple batchtools
batchMap()
example going. This way you don't have to worry at all about future and future.batchtools. (https://cran.r-project.org/package=batchtools)
from future.batchtools.
Sorry, to clarify - the inside of the foreach loop intentionally didn't perform any computation - just a system call. This was meant to be a test of the piping. This is why I was trying to understand whether and why any of the batch* packages were even necessary on the nodes just to run a 'hostname' call. I suppose let's start at the beginning. I don't understand the traceback. Where in the process is it happening? At what stage? I will look at batchMap in parallel as well, but given that the inside of the foreach loop is completely gutted I doubt that this should be hard to troubleshoot even in its current form. Agree that there's a lot to the future* set of packages to a new user, even with decades of HPC experience, as in my case.
from future.batchtools.
Oops, closed the issue by accident. Thank you for the comments. Also relatively little github experience, especially with the web UI.
from future.batchtools.
Regarding ip numbers - the traceback refers to a machine by IP. I don't see any such machine in my cluster either by private or by public IP. That's what I was referring to. It's possible that the default SGE cluster configuration is somehow mangled, though the SGE compute node duly spins up, so it's unclear that this is at all a real issue.
from future.batchtools.
Ok, this simpler use case - using only the batchtools package - also doesn't work, probably because I can't find adequate documentation with a single example of usage anywhere at all. The jobs crash because the job folder in the tmp directory doesn't seem to get created prior to submission. First the code then the error message:
library(batchtools)
reg = makeRegistry(file.dir = NA, seed = 42)
reg$cluster.functions = makeClusterFunctionsSGE(
template="sge.tmpl")
piApprox = function(n) {
nums = matrix(runif(2 * n), ncol = 2)
d = sqrt(nums[, 1]^2 + nums[, 2]^2)
4 * mean(d <= 1)
}
piApprox(1000)
ids = batchMap(fun = piApprox, n = rep(1e3, 3))
names(getJobTable())
submitJobs(resources = list(walltime = 60, memory = 1024, ncpus=3, chunks.as.array.jobs = T))
and the error message from a job that ends up hanging with an Eqw status:
ubuntu@ip-172-31-97-248:~$ qstat -j 39 |grep error
error reason 1: 04/02/2020 21:40:28 [1000:6762]: error: can't open output file "/tmp/RtmpymOXuF/registry271e3e24ad73/logs/jobd266f15145f80da477cb855810ee5790.log": No such file or directory
Everything up to "logs" exists.
from future.batchtools.
When you use:
> reg <- makeRegistry(file.dir = NA, seed = 42)
your registry ends up on your local temp folder;
> reg
reg
Job Registry
Backend : Interactive
File dir : /tmp/alice/RtmpkEU3OV/registry6f7933696c60
Work dir : /home/alice/projects/SegalM_2017-FISH/article
Jobs : 0
Seed : 42
Writeable: TRUE
>
The compute nodes have their own, independent local /tmp/
. Avoid this by not setting file.dir = NA
, e.g.
reg <- makeRegistry(seed = 42)
Make sure your working directory is accessible by all machines.
from future.batchtools.
Btw, that tip resolved the issue, thank you.
from future.batchtools.
Related Issues (20)
- huge results file with 'conditions' - performance bottleneck HOT 4
- use batchtools directly for scheduling
- Eqw on SGE cluster while R code finishes without error
- Simple chunking with nested parallelism HOT 1
- Slurm readLog() Error - Option to change fs.latency & scheduler.latency from batchtools_slurm or future::tweak HOT 10
- Proposed bugfix for batchtools reveals bug in future.batchtools? HOT 4
- Problem forwarding batchtools resources to individual futures HOT 4
- problem with running example parallel futures using batchtools_lsf
- Add batchtools template for SGE
- batchtools templates: `resources[["asis"]]` for as-is declarations HOT 1
- print() for BatchtoolsFuture should report on the template file used HOT 1
- Compatibility with promises package HOT 2
- TESTS: Error: identical(Sys.getenv(), oenvs0) is not TRUE on MS Windows CRAN HOT 1
- run(), resolved()[?], and result(): the RNG state is updated - from where? HOT 3
- Add support for plan(batchtools_multicore, workers = I(1))
- All batchtools_nnn() functions should return the future invisibly
- Template tools: add option to ShellCheck rendered template
- Template tools: export functions to find and render template
- PERFORMANCE: status() to memoize "finalized" state? HOT 2
- Error: Log file for job with id 1 not available HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from future.batchtools.