Giter Site home page Giter Site logo

Comments (17)

aswinjoseroy avatar aswinjoseroy commented on May 22, 2024 3

I am getting this on Spark 1.6.0 (tested on standalone and a yarn cluster).

from sparkling-water.

nftw avatar nftw commented on May 22, 2024

I get a similar issue if I use Spark 1.2.1, it works with Spark 1.1.1

val h2oContext = new H2OContext(sc).start()
java.lang.IllegalArgumentException: Cannot execute H2O on all Spark executors:
  numH2OWorkers = -1"
  executorStatus = (0,false),(1,false),(2,false),(0,false),(1,false),(2,false),(1,false),(1,false),(2,false),(0,false),(2,false),(0,false),(2,false),(0,false),(1,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(1,false),(0,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(1,false),(0,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(2,false),(0,false),(1,false),(0,false),(2,false),(1,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(2,false),(0,false),(0,false),(1,false),(2,false),(2,false),(1,false),(0,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(0,false),(0,false),(2,false),(1,false),(1,false),(2,false),(1,false),(2,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(2,false),(1,false),(1,false),(0,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(2,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(1,false),(2,false),(0,false)
    at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:112)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
    at $iwC$$iwC$$iwC.<init>(<console>:31)
    at $iwC$$iwC.<init>(<console>:33)
    at $iwC.<init>(<console>:35)
    at <init>(<console>:37)
    at .<init>(<console>:41)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
    at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

from sparkling-water.

nftw avatar nftw commented on May 22, 2024

Spark 1.2.1 works fine with latest commit (265c1be). I was using the 0.2.10-81 release previously.

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 22, 2024

Perfect, thanks for trying!

Let us how Sparkling Water works for you and if you miss something there!
Thank you!
michal

Dne 3/12/15 v 8:21 AM nftw napsal(a):

Spark 1.2.1 works fine with latest commit (265c1be
265c1be). I was
using the 0.2.10-81 release previously.


Reply to this email directly or view it on GitHub
#4 (comment).

from sparkling-water.

lev112 avatar lev112 commented on May 22, 2024

Hi,
I'm working with spark 1.4.0 and sparkling water 1.4.3
and getting the same error:

ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: Cannot execute H2O on all Spark executors:
 Expected number of H2O workers is 12
 Detected number of Spark workers is 11
 Num of Spark executors before is 11
 Num of Spark executors after is 11

I'm running on a yarn cluster,
and I suspect that because the cluster is busy, I'm getting only part of the executors I required.
looking at the H2OContext code, I see that the number of h2o workers must be equal to the spark executors.

Maybe it is better to have spark.ext.h2o.cluster.size indicate the minimal size of the cluster to handle this situation?

I can try and do a PR if you think it is a good idea

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 22, 2024

Hi,

you are right!
Sometimes yarn cluster is too busy to provide all executors at once.
Sparkling Water will wait little bit, but if we cannot see all executors in limited amount of time right now we give up.

You can try to modify H2OContext, however, the main problem there is that we co-locate h2o data with spark data. For example, if a RDD is created and its partition is on a new node, which H2O did not see during launch, we will not able to create h2o chunk (~partition) there. (Simple solution for this situation is to download data from that node to existing h2o worker).

Does it make sense?

from sparkling-water.

lev112 avatar lev112 commented on May 22, 2024

It is,
Thanks!

Since I don't think it is possible tell spark to delay execution until all of the requested resources were received,
I will just config a high number of retries.

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 22, 2024

You are right - we provided PR for SPark to introduce hooks into lifecycle of executors, but it was not accepted. So right now, we doing number of retries to figure out number of executors.

from sparkling-water.

idanz avatar idanz commented on May 22, 2024

In a similar note, I'd like to ask what will happen to H2O if executors die (because of dynamic allocation or yarn pre-emption) - will the application fail?

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 22, 2024

On 9/25/15 1:27 AM, Idan Zalzberg wrote:

In a similar note, I'd like to ask what will happen to H2O if executors die (because of dynamic
allocation or yarn pre-emption) - will the application fail?

Right now, h2o cluster will go down and you have to repeat the computation (e.g., creation of model).

michal


Reply to this email directly or view it on GitHub
#4 (comment).

from sparkling-water.

jakubhava avatar jakubhava commented on May 22, 2024

This is a known technical problem, however we created a new sparkling water backend to solve this issue. Please refer to External backend https://github.com/h2oai/sparkling-water/blob/master/doc/backends.md for more information

from sparkling-water.

NkululekoThangelane avatar NkululekoThangelane commented on May 22, 2024

Hi @jakubhava

Was this issue resolved.
I am not sure what the problem is.
I am still getting this problem on sparkling water.
When trying to convert a spark data frame to a H2o frame I get the following.:
Py4JJavaError: An error occurred while calling o245.asH2OFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 192 in stage 418.0 failed 4 times, most recent failure: Lost task 192.3 in stage 418.0 : java.lang.ArrayIndexOutOfBoundsException: 65535

from sparkling-water.

jakubhava avatar jakubhava commented on May 22, 2024

Hi @NkululekoThangelane,
without further information, I would suspect that in your case one Spark executor died. Have you started H2OContext? Can you please share your code?

from sparkling-water.

NkululekoThangelane avatar NkululekoThangelane commented on May 22, 2024

HI @jakubhava
I restarted my spark context and H20 Context and the problem simply went away.

from sparkling-water.

sgt101 avatar sgt101 commented on May 22, 2024

from sparkling-water.

JeremyLG avatar JeremyLG commented on May 22, 2024

@NkululekoThangelane This issue is still around.

Sometimes when I run my jar on a YARN cluster, it gives me this error. I'm running on HDP 2.6.4 with Spark 2.2 and Sparkling Water 2.2.6

I still don't know how to reproduce it.

Is there a spark config option or a way in my scala code to make the jar robust to this kind of error ? Like wait longer for the initialisation or anything else.

Thanks

from sparkling-water.

jakubhava avatar jakubhava commented on May 22, 2024

@JeremyLG please have a look on https://github.com/h2oai/sparkling-water/blob/master/doc/backends.md . In some environments this is a known issue which can be eliminated by external backend solution.

In the original backend solution, we have to face the problem of dynamic allocation and yarn preemption. In case some new executor joins the Spark cluster or disconnects, we are not able to handle this cluster change in H2O and have to stop the cluster.

from sparkling-water.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.