Comments (17)
I am getting this on Spark 1.6.0 (tested on standalone and a yarn cluster).
from sparkling-water.
I get a similar issue if I use Spark 1.2.1, it works with Spark 1.1.1
val h2oContext = new H2OContext(sc).start()
java.lang.IllegalArgumentException: Cannot execute H2O on all Spark executors:
numH2OWorkers = -1"
executorStatus = (0,false),(1,false),(2,false),(0,false),(1,false),(2,false),(1,false),(1,false),(2,false),(0,false),(2,false),(0,false),(2,false),(0,false),(1,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(1,false),(0,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(0,false),(2,false),(1,false),(0,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(1,false),(0,false),(2,false),(0,false),(1,false),(1,false),(2,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(2,false),(0,false),(1,false),(0,false),(2,false),(1,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(2,false),(1,false),(0,false),(2,false),(0,false),(1,false),(2,false),(1,false),(2,false),(0,false),(0,false),(1,false),(2,false),(2,false),(1,false),(0,false),(0,false),(1,false),(2,false),(2,false),(0,false),(1,false),(0,false),(1,false),(2,false),(0,false),(0,false),(2,false),(1,false),(1,false),(2,false),(1,false),(2,false),(1,false),(0,false),(0,false),(2,false),(1,false),(2,false),(1,false),(0,false),(2,false),(0,false),(2,false),(1,false),(1,false),(0,false),(0,false),(2,false),(2,false),(1,false),(1,false),(0,false),(2,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(0,false),(2,false),(1,false),(1,false),(2,false),(0,false),(1,false),(2,false),(0,false)
at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:112)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC.<init>(<console>:33)
at $iwC.<init>(<console>:35)
at <init>(<console>:37)
at .<init>(<console>:41)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
from sparkling-water.
Spark 1.2.1 works fine with latest commit (265c1be). I was using the 0.2.10-81 release previously.
from sparkling-water.
Perfect, thanks for trying!
Let us how Sparkling Water works for you and if you miss something there!
Thank you!
michal
Dne 3/12/15 v 8:21 AM nftw napsal(a):
Spark 1.2.1 works fine with latest commit (265c1be
265c1be). I was
using the 0.2.10-81 release previously.—
Reply to this email directly or view it on GitHub
#4 (comment).
from sparkling-water.
Hi,
I'm working with spark 1.4.0 and sparkling water 1.4.3
and getting the same error:
ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: Cannot execute H2O on all Spark executors:
Expected number of H2O workers is 12
Detected number of Spark workers is 11
Num of Spark executors before is 11
Num of Spark executors after is 11
I'm running on a yarn cluster,
and I suspect that because the cluster is busy, I'm getting only part of the executors I required.
looking at the H2OContext
code, I see that the number of h2o workers must be equal to the spark executors.
Maybe it is better to have spark.ext.h2o.cluster.size
indicate the minimal size of the cluster to handle this situation?
I can try and do a PR if you think it is a good idea
from sparkling-water.
Hi,
you are right!
Sometimes yarn cluster is too busy to provide all executors at once.
Sparkling Water will wait little bit, but if we cannot see all executors in limited amount of time right now we give up.
You can try to modify H2OContext, however, the main problem there is that we co-locate h2o data with spark data. For example, if a RDD is created and its partition is on a new node, which H2O did not see during launch, we will not able to create h2o chunk (~partition) there. (Simple solution for this situation is to download data from that node to existing h2o worker).
Does it make sense?
from sparkling-water.
It is,
Thanks!
Since I don't think it is possible tell spark to delay execution until all of the requested resources were received,
I will just config a high number of retries.
from sparkling-water.
You are right - we provided PR for SPark to introduce hooks into lifecycle of executors, but it was not accepted. So right now, we doing number of retries to figure out number of executors.
from sparkling-water.
In a similar note, I'd like to ask what will happen to H2O if executors die (because of dynamic allocation or yarn pre-emption) - will the application fail?
from sparkling-water.
On 9/25/15 1:27 AM, Idan Zalzberg wrote:
In a similar note, I'd like to ask what will happen to H2O if executors die (because of dynamic
allocation or yarn pre-emption) - will the application fail?Right now, h2o cluster will go down and you have to repeat the computation (e.g., creation of model).
michal
—
Reply to this email directly or view it on GitHub
#4 (comment).
from sparkling-water.
This is a known technical problem, however we created a new sparkling water backend to solve this issue. Please refer to External backend https://github.com/h2oai/sparkling-water/blob/master/doc/backends.md for more information
from sparkling-water.
Hi @jakubhava
Was this issue resolved.
I am not sure what the problem is.
I am still getting this problem on sparkling water.
When trying to convert a spark data frame to a H2o frame I get the following.:
Py4JJavaError: An error occurred while calling o245.asH2OFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 192 in stage 418.0 failed 4 times, most recent failure: Lost task 192.3 in stage 418.0 : java.lang.ArrayIndexOutOfBoundsException: 65535
from sparkling-water.
Hi @NkululekoThangelane,
without further information, I would suspect that in your case one Spark executor died. Have you started H2OContext
? Can you please share your code?
from sparkling-water.
HI @jakubhava
I restarted my spark context and H20 Context and the problem simply went away.
from sparkling-water.
from sparkling-water.
@NkululekoThangelane This issue is still around.
Sometimes when I run my jar on a YARN cluster, it gives me this error. I'm running on HDP 2.6.4 with Spark 2.2 and Sparkling Water 2.2.6
I still don't know how to reproduce it.
Is there a spark config option or a way in my scala code to make the jar robust to this kind of error ? Like wait longer for the initialisation or anything else.
Thanks
from sparkling-water.
@JeremyLG please have a look on https://github.com/h2oai/sparkling-water/blob/master/doc/backends.md . In some environments this is a known issue which can be eliminated by external backend solution.
In the original backend solution, we have to face the problem of dynamic allocation and yarn preemption. In case some new executor joins the Spark cluster or disconnects, we are not able to handle this cluster change in H2O and have to stop the cluster.
from sparkling-water.
Related Issues (20)
- Sparkling Water not properly configuring RAM on Databricks HOT 1
- R docker build failing again
- h2o-pysparkling-3.x does not support pep517 builds HOT 4
- Install proper setuptools
- Scala 2.13 support - part 1 - investigation
- Scala 2.13 support - part 2 - implementation
- Use newer Ubuntu in test docker image
- Upgrade H2O to 3.44.0.3
- Can't install pysparkling after updating setuptools >= 69.0.0 HOT 2
- Quiet and Embedded arguments are not working in the last version 3.44.0.3 HOT 1
- libxgboost.so getting filled in /tmp HOT 8
- Error - Spark parameters on H2O Sparkling water SIG
- describe an h2oframe HOT 2
- describe an h2oframe
- RestApiCommunicationException: H2O node http://10.159.20.11:54321 responded with HOT 1
- Upgrade H2O to 3.46.0.1
- docs: out of date Spark version listings HOT 1
- AIC/Loglikelihood metrics generation problems
- when will sparkling-water 3.46.0.1 be released? HOT 1
- expose uuid for dai mojo
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkling-water.