Giter Site home page Giter Site logo

Comments (5)

ajratner avatar ajratner commented on June 12, 2024

This seemed to happen exclusively on the compute optimized (c.3) instances where the memory/core number is lowest...

from bazaar.

ajratner avatar ajratner commented on June 12, 2024

@raphaelhoffmann Now I am getting abort messages (Aborting. Fatal error: run() received nonzero return code 123 while executing!) from some of the nodes running, but can't find anything in the logs... (in previous instances of this I at least saw that the run.sh processes had been killed- now I can't find any error output passed back to me by fabric...)

Have you seen this before / any ideas?

This might be a specific error with my data / something in the XML parsing I did to it... but either way the current distribute paradigm seems to make it a bit difficult to trace back errors / partial restart when they do occur

Even just knowing which servers were aborting would be helpful, which apparently is not returned as default by Fabric, but will see if they have an option

from bazaar.

raphaelhoffmann avatar raphaelhoffmann commented on June 12, 2024

I haven't experienced this particular error before, but I do remember
encountering the memory/core issue. CoreNLP just needs so much memory.

Regarding debugging: It would be great if stderr were logged on each node
and also sent back to the submission node. Not sure how it is handled now
(maybe not at all?), but this would be a great feature.

On Wed, Aug 5, 2015 at 9:59 AM, Alex Ratner [email protected]
wrote:

@raphaelhoffmann https://github.com/raphaelhoffmann Now I am getting
abort messages (Aborting. Fatal error: run() received nonzero return code
123 while executing!) from some of the nodes running, but can't find
anything in the logs... (in previous instances of this I at least saw that
the run.sh processes had been killed- now I can't find any error output
passed back to me by fabric...)

Have you seen this before / any ideas?

This might be a specific error with my data / something in the XML parsing
I did to it... but either way the current distribute paradigm seems to make
it a bit difficult to trace back errors / partial restart when they do occur


Reply to this email directly or view it on GitHub
#9 (comment).

from bazaar.

ajratner avatar ajratner commented on June 12, 2024

@raphaelhoffmann Yeah- regarding crashes due to the high memory usage of coreNLP, this seemed to only have been a problem in ec2 on the compute-optimized instances (e.g. c3.4xlarge) which have 1/2 the memory/core as e.g. the general purpose instances (e.g. m3.2xlarge).

However a simple grep "Killed" fab_parse.log | wc -l shows nothing this time, so something else has to be happening... and the problem is I don't know on what nodes or which segments. So really it would be fine if the abort error also spit out the segment that failed...

So I can look up on the xargs & fab documentation to see if there's a way to do this...

from bazaar.

ajratner avatar ajratner commented on June 12, 2024

@raphaelhoffmann Stored the logs and looked through them a bit more, should have caught this before- aborts I ran into were all either caused by memory or hard drive out of space errors... see #11

from bazaar.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.