Comments (8)
Hi,
Did you try looking at the error produced by the worker? I faced a similar problem, and looking at the output files in ~/.julia-htc/julia--1.e I discovered that the environment was not exported, which gave an error like this:
fatal: error thrown and no exception handler available.
Base.InitError(mod=:Pkg, error=Base.KeyError(key="HOME"))
I'm still working on how to resolve that issue, though.
from clustermanagers.jl.
Hi,
I get the same error as grero but I am just using vanilla julia and condor and run the julia file from a bash script. At the start of the bash script I run
eval julia --version
which nicely outputs
julia version 0.5.0
However, running any julia file produces the same error as @grero.
from clustermanagers.jl.
I just submitted a patch that fixes it on my machine. All it does it add the HOME environment variable to the Condor worker submission
from clustermanagers.jl.
Just applied the patch from @azraq27 on my local clone of the repo. It seemed to help (I also had to add the PATH environment variable), in that the worker doesn't die immediately, but now it just seems to wait indefinitely to connect to the master process. This could be a network issue, but I'm not at all sure how to debug it, since no output is produced.
from clustermanagers.jl.
It writes the STDOUT and STDERR into *.o and *.e files in the ~/.julia-htc
directory. Does the STDERR give you some idea of why the process isn't starting?
from clustermanagers.jl.
Thanks, it turns out that there was a problem with my network setup, unrelated to ClusterManagers. The patch completely solved the problem on the julia end for me.
from clustermanagers.jl.
Hi, does condor.jl support running HTCondor on Windows environment?
from clustermanagers.jl.
Too old to reproduce.
from clustermanagers.jl.
Related Issues (20)
- ElasticManager does not export get_connect_cmd
- htcondor manager: failure when listening to a telnet commu HOT 4
- Extra options on SGE HOT 5
- Error in `rmprocs` SGE HOT 1
- Ship telnet via jll? HOT 2
- addprocs(SGEManager) fails HOT 5
- SGE fails in rmprocs
- Singularity images does not work with SLURM HOT 5
- Error launching workers: no such file or directory HOT 5
- TagBot trigger issue HOT 8
- lsf_bpeek makes strong assumptions on iterator state of retry_delays
- [SlurmManager] 100 % CPU usage while waiting for the job to get created HOT 6
- Better handling of SLURM job submission timing
- Handling of busy LSF deamon HOT 4
- SLURM 10 nodes good, 16 nodes error HOT 3
- pbs error HOT 4
- LSF manager broken in Julia 1.8.1 HOT 2
- -o argument in addprocs_slurm leads to an error
- ClusterManagers can be run on top of dask clusters! HOT 2
- Elastic auto IP address function HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clustermanagers.jl.