Comments (8)
Hi @gurumoorthy208524,
can you share logs from pods of h2o cluster (external backend)?
from sparkling-water.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by ai.h2o.xgboost4j.java.NativeLibLoader (file:/opt/h2oai/h2o-3/h2o.jar) to field java.lang.ClassLoader.usr_paths
WARNING: Please consider reporting this to the maintainers of ai.h2o.xgboost4j.java.NativeLibLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
16:20:18.897 [main] WARN hex.tree.xgboost.util.NativeLibrary - Cannot load library from path lib/linux_64/libxgboost4j_gpu.so
16:20:18.903 [main] WARN hex.tree.xgboost.util.NativeLibrary - Failed to load library from both native path and jar!
16:20:18.904 [main] INFO hex.tree.xgboost.util.NativeLibraryLoaderChain - Cannot load library: xgboost4j_gpu (lib/linux_64/libxgboost4j_gpu.so)
16:20:18.956 [main] INFO hex.tree.xgboost.util.NativeLibrary - Loaded library from lib/linux_64/libxgboost4j_minimal.so (/tmp/libxgboost4j_minimal3709312247772050036.so)
16:20:19.175 [main] INFO water.k8s.H2OCluster - Starting Kubernetes-related REST API services
16:20:19.197 [main] INFO water.k8s.H2OCluster - Kubernetes REST API services successfully started.
16:20:19.197 [main] INFO water.k8s.H2OCluster - Initializing H2O Kubernetes cluster
16:20:19.198 [main] INFO water.k8s.H2OCluster - Timeout contraint: 180 seconds.
16:20:19.199 [main] INFO water.k8s.H2OCluster - Cluster size constraint: 3 nodes.
16:20:19.230 [main] INFO water.k8s.lookup.KubernetesDnsLookup - Timeout for node discovery is set to 180 seconds.
16:20:19.230 [main] INFO water.k8s.lookup.KubernetesDnsLookup - Desired cluster size is set to 3 nodes.
16:20:19.252 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:20.254 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:21.257 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:22.259 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:23.260 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:24.262 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:25.264 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:26.266 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:27.269 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:28.271 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:29.273 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:30.275 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:31.277 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:32.279 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:33.281 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:34.283 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:35.284 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:36.286 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:37.288 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:38.290 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:39.292 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:40.294 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:41.295 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:42.297 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:43.299 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:44.301 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:45.303 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:46.305 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:47.307 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:48.309 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:49.310 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:50.312 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:51.314 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:52.316 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:53.318 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:54.320 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:55.322 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:56.324 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:57.326 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:58.328 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:59.330 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:00.332 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:01.334 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:02.336 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:03.338 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:04.339 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:05.341 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:06.343 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:07.345 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:08.347 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:09.349 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:10.351 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:13.578 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:14.580 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:15.582 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:16.584 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:17.586 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:18.588 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:19.589 [main] INFO water.k8s.H2OCluster - Using the following pods to form H2O cluster: []
2022-03-21 16:23:19.868:INFO::main: Logging initialized @183284ms to org.eclipse.jetty.util.log.StdErrLog
03-21 16:23:20.345 X.X.X.X:54321 1 main INFO water.default: Dynamically loaded 'water.k8s.KubernetesEmbeddedConfigProvider' as AbstractEmbeddedH2OConfigProvider.
03-21 16:23:20.346 X.X.X.X:54321 1 main INFO water.default: ----- H2O started -----
03-21 16:23:20.346 X.X.X.X:54321 1 main INFO water.default: Build git branch: rel-zorn
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build git hash: 717d8bf831d5d6b0decda9c37a2a20de9a491754
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build git describe: jenkins-3.36.0.2-53-g717d8bf
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build project version: 3.36.0.3
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Build age: 1 month and 4 days
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Built by: 'jenkins'
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Built on: '2022-02-16 17:51:32'
03-21 16:23:20.349 X.X.X.X:54321 1 main INFO water.default: Found H2O Core extensions: [StackTraceCollector, XGBoost, KrbStandalone, Infogram]
03-21 16:23:20.349 X.X.X.X:54321 1 main INFO water.default: Processed H2O arguments: []
03-21 16:23:20.350 X.X.X.X:54321 1 main INFO water.default: Java availableProcessors: 1
03-21 16:23:20.350 X.X.X.X:54321 1 main INFO water.default: Java heap totalMemory: 889.4 MB
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: Java heap maxMemory: 27.73 GB
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: Java version: Java 11.0.14 (from Red Hat, Inc.)
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: JVM launch parameters: [-XX:+UseContainerSupport, -XX:MaxRAMPercentage=50]
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: JVM process id: 1@h2o-stateful-set-0
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: OS version: Linux 5.4.144+ (amd64)
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: Machine physical memory: 57.38 GB
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: Machine locale: en_US
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: X-h2o-cluster-id: 1647879616788
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: User name: 'root'
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: IPv6 stack selected: false
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: Possible IP Address: eth0 (eth0), X.X.X.X
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: Possible IP Address: lo (lo), 127.0.0.1
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: H2O node running in unencrypted mode.
03-21 16:23:20.357 X.X.X.X:54321 1 main INFO water.default: Internal communication uses port: 54322
03-21 16:23:20.357 X.X.X.X:54321 1 main INFO water.default: Listening for HTTP and REST traffic on http://X.X.X.X:54321/
03-21 16:23:20.358 X.X.X.X:54321 1 main WARN water.default: Flatfile configuration does not include self: /X.X.X.X:54321, but contains []
03-21 16:23:20.359 X.X.X.X:54321 1 main INFO water.default: H2O cloud name: 'root' on /X.X.X.X:54321, static configuration based on -flatfile null
03-21 16:23:20.359 X.X.X.X:54321 1 main INFO water.default: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
03-21 16:23:20.360 X.X.X.X:54321 1 main INFO water.default: 1. Open a terminal and run 'ssh -L 55555:localhost:54321 [email protected]'
03-21 16:23:20.360 X.X.X.X:54321 1 main INFO water.default: 2. Point your browser to http://localhost:55555
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Kerberos not configured
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Log dir: '/tmp/h2o-root/h2ologs'
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Cur dir: '/'
03-21 16:23:21.304 X.X.X.X:54321 1 main INFO water.default: Subsystem for distributed import from HTTP/HTTPS successfully initialized
03-21 16:23:21.305 X.X.X.X:54321 1 main INFO water.default: HDFS subsystem successfully initialized
03-21 16:23:21.309 X.X.X.X:54321 1 main INFO water.default: S3 subsystem successfully initialized
03-21 16:23:21.333 X.X.X.X:54321 1 main INFO water.default: GCS subsystem successfully initialized
03-21 16:23:21.333 X.X.X.X:54321 1 main INFO water.default: Flow dir: '/root/h2oflows'
03-21 16:23:21.351 X.X.X.X:54321 1 main INFO water.default: Cloud of size 1 formed [h2o-stateful-set-0.h2o-service.sparkling-water.svc.cluster.local/X.X.X.X:54321]
03-21 16:23:21.352 X.X.X.X:54321 1 main INFO water.default: Created cluster of size 1, leader node IP is 'h2o-stateful-set-0.h2o-service.sparkling-water.svc.cluster.local/X.X.X.X'
03-21 16:23:21.371 X.X.X.X:54321 1 main INFO water.default: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
03-21 16:23:21.373 X.X.X.X:54321 1 main INFO water.default: StackTraceCollector extension initialized
03-21 16:23:21.374 X.X.X.X:54321 1 main INFO water.default: XGBoost extension initialized
03-21 16:23:21.375 X.X.X.X:54321 1 main INFO water.default: KrbStandalone extension initialized
03-21 16:23:21.376 X.X.X.X:54321 1 main INFO water.default: Infogram extension initialized
03-21 16:23:21.376 X.X.X.X:54321 1 main INFO water.default: Registered 4 core extensions in: 2077ms
03-21 16:23:21.377 X.X.X.X:54321 1 main INFO water.default: Registered H2O core extensions: [StackTraceCollector, XGBoost, KrbStandalone, Infogram]
03-21 16:23:21.378 X.X.X.X:54321 1 main INFO water.default: Registered: 1 auth extensions in: 180633ms
03-21 16:23:21.378 X.X.X.X:54321 1 main INFO water.default: Registered Auth extensions: [water.server.LeaderNodeRequestFilter]
03-21 16:23:21.677 X.X.X.X:54321 1 main INFO hex.tree.xgboost.XGBoostExtension: Found XGBoost backend with library: xgboost4j_minimal
03-21 16:23:21.678 X.X.X.X:54321 1 main WARN hex.tree.xgboost.XGBoostExtension: Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
03-21 16:23:21.880 X.X.X.X:54321 1 main INFO water.default: Registered: 275 REST APIs in: 502ms
03-21 16:23:21.881 X.X.X.X:54321 1 main INFO water.default: Registered REST API extensions: [Amazon S3, XGBoost, Algos, Sparkling Water REST API Extensions, Infogram, AutoML, Core V3, TargetEncoder, Core V4]
03-21 16:23:22.034 X.X.X.X:54321 1 main INFO water.default: Registered: 330 schemas in 152ms
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default: H2O started in 185238ms
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default:
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default: Open H2O Flow in your web browser: http://X.X.X.X:54321
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default:
@mn-mikke
from sparkling-water.
I don't see h2o nodes communicating to each other. How did you deploy the H2O-3 cluster?
from sparkling-water.
I just followed the steps in this doc with one exception on h2oai/sparkling-water-external-backend:3.36.0.3-1-3.2 for deploying the external backend.
from sparkling-water.
Has the H2O-3 cluster been deployed with the configuration that is specified in documentation or did you use configuration I recommended here?
from sparkling-water.
As per the documentation @mn-mikke
from sparkling-water.
@mn-mikke On trying with your configuration i am able to see N nodes on the cluster. Thanks for the help. One last question even though on pass the value for spark.ext.h2o.cloud.name as h2o-cluster-1, i am able to see h2o cluster name as root. Am i passing anything wrong?
H2O_cluster_uptime: 9 mins 54 secs
H2O_cluster_timezone: Etc/GMT
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.36.0.3
H2O_cluster_version_age: 1 month and 14 days
H2O_cluster_name: root
H2O_cluster_total_nodes: 2
H2O_cluster_free_memory: 55.42 Gb
H2O_cluster_total_cores: 2
H2O_cluster_allowed_cores: 2
H2O_cluster_status: locked, healthy
H2O_connection_url: http://inith2o-py-6af5037fdfdba01e-driver-svc.spark.svc:54321
H2O_connection_proxy: null
H2O_internal_security: False
Python_version: 3.9.2 final
from sparkling-water.
Hi @gurumoorthy208524,
The sw parameter spark.ext.h2o.cloud.name
is utilized only when h2o cluster is being created (internal backend, automatically started external backend). Since you deploy h2o cluster manually, you need to specify the name of the cluster in the yaml file specifying the cluster stateful set.
By default, the SW external backend image executes the below command:
java -cp /opt/h2oai/h2o-3/sparkling-water-assembly-extensions_${scalaBaseVersion}-${version}-all.jar:/opt/h2oai/h2o-3/h2o.jar -XX:+UseContainerSupport -XX:MaxRAMPercentage=50 water.H2OApp
You can set the name of h2o cluster by adding the extra parameter -name
to this command.
YAML file of the stateful set than will look like this:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: h2o-stateful-set
namespace: default
spec:
serviceName: h2o-service
replicas: 2
selector:
matchLabels:
app: h2o-k8s
template:
metadata:
labels:
app: h2o-k8s
spec:
terminationGracePeriodSeconds: 10
containers:
- name: h2o-k8s
image: 'h2oai/sparkling-water-external-backend:3.36.0.3-1-3.1'
resources:
requests:
memory: "4Gi"
ports:
- containerPort: 54321
protocol: TCP
readinessProbe:
httpGet:
path: /kubernetes/isLeaderNode
port: 8081
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1
env:
- name: H2O_KUBERNETES_SERVICE_DNS
value: h2o-service.default.svc.cluster.local
- name: H2O_NODE_LOOKUP_TIMEOUT
value: '180'
- name: H2O_NODE_EXPECTED_COUNT
value: '2'
- name: H2O_KUBERNETES_API_PORT
value: '8081'
command: ["java"]
args: ["-cp", "/opt/h2oai/h2o-3/sparkling-water-assembly-extensions_2.12-3.36.0.3-1-3.1-all.jar:/opt/h2oai/h2o-3/h2o.jar", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=50", "water.H2OApp", "-name", "myh2ocluster"]
from sparkling-water.
Related Issues (20)
- Sparkling Water not properly configuring RAM on Databricks HOT 1
- R docker build failing again
- h2o-pysparkling-3.x does not support pep517 builds HOT 4
- Install proper setuptools
- Scala 2.13 support - part 1 - investigation
- Scala 2.13 support - part 2 - implementation
- Use newer Ubuntu in test docker image
- Upgrade H2O to 3.44.0.3
- Can't install pysparkling after updating setuptools >= 69.0.0 HOT 2
- Quiet and Embedded arguments are not working in the last version 3.44.0.3 HOT 1
- libxgboost.so getting filled in /tmp HOT 8
- Error - Spark parameters on H2O Sparkling water SIG
- describe an h2oframe HOT 2
- describe an h2oframe
- RestApiCommunicationException: H2O node http://10.159.20.11:54321 responded with HOT 1
- Upgrade H2O to 3.46.0.1
- docs: out of date Spark version listings HOT 1
- AIC/Loglikelihood metrics generation problems
- when will sparkling-water 3.46.0.1 be released? HOT 1
- expose uuid for dai mojo
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkling-water.