Comments (5)
This seemed to happen exclusively on the compute optimized (c.3
) instances where the memory/core number is lowest...
from bazaar.
@raphaelhoffmann Now I am getting abort messages (Aborting. Fatal error: run() received nonzero return code 123 while executing!
) from some of the nodes running, but can't find anything in the logs... (in previous instances of this I at least saw that the run.sh
processes had been killed- now I can't find any error output passed back to me by fabric...)
Have you seen this before / any ideas?
This might be a specific error with my data / something in the XML parsing I did to it... but either way the current distribute paradigm seems to make it a bit difficult to trace back errors / partial restart when they do occur
Even just knowing which servers were aborting would be helpful, which apparently is not returned as default by Fabric, but will see if they have an option
from bazaar.
I haven't experienced this particular error before, but I do remember
encountering the memory/core issue. CoreNLP just needs so much memory.
Regarding debugging: It would be great if stderr were logged on each node
and also sent back to the submission node. Not sure how it is handled now
(maybe not at all?), but this would be a great feature.
On Wed, Aug 5, 2015 at 9:59 AM, Alex Ratner [email protected]
wrote:
@raphaelhoffmann https://github.com/raphaelhoffmann Now I am getting
abort messages (Aborting. Fatal error: run() received nonzero return code
123 while executing!) from some of the nodes running, but can't find
anything in the logs... (in previous instances of this I at least saw that
the run.sh processes had been killed- now I can't find any error output
passed back to me by fabric...)Have you seen this before / any ideas?
This might be a specific error with my data / something in the XML parsing
I did to it... but either way the current distribute paradigm seems to make
it a bit difficult to trace back errors / partial restart when they do occur—
Reply to this email directly or view it on GitHub
#9 (comment).
from bazaar.
@raphaelhoffmann Yeah- regarding crashes due to the high memory usage of coreNLP, this seemed to only have been a problem in ec2
on the compute-optimized instances (e.g. c3.4xlarge
) which have 1/2 the memory/core as e.g. the general purpose instances (e.g. m3.2xlarge
).
However a simple grep "Killed" fab_parse.log | wc -l
shows nothing this time, so something else has to be happening... and the problem is I don't know on what nodes or which segments. So really it would be fine if the abort error also spit out the segment that failed...
So I can look up on the xargs
& fab
documentation to see if there's a way to do this...
from bazaar.
@raphaelhoffmann Stored the logs and looked through them a bit more, should have caught this before- aborts I ran into were all either caused by memory or hard drive out of space errors... see #11
from bazaar.
Related Issues (12)
- Condor/AWS unified interface? HOT 2
- Auto-calculate the batch size? HOT 3
- [error] error message in setting up pip
- Launching a r3.8xlarge instance doesn't work HOT 1
- Instances can run out of HD space during parse operation HOT 1
- Something else causing fab parse to abort... HOT 2
- Keep local server register of started & completed segments HOT 4
- Medical Map HOT 1
- Exception and no TSV's HOT 2
- Default settings require empty config.properties
- Bazaar/Parser doesn't correctly escape some characters in TSV HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bazaar.