Giter Site home page Giter Site logo

Comments (8)

mn-mikke avatar mn-mikke commented on June 15, 2024

Hi @gurumoorthy208524,
can you share logs from pods of h2o cluster (external backend)?

from sparkling-water.

KunfuPanda24 avatar KunfuPanda24 commented on June 15, 2024

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by ai.h2o.xgboost4j.java.NativeLibLoader (file:/opt/h2oai/h2o-3/h2o.jar) to field java.lang.ClassLoader.usr_paths
WARNING: Please consider reporting this to the maintainers of ai.h2o.xgboost4j.java.NativeLibLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
16:20:18.897 [main] WARN hex.tree.xgboost.util.NativeLibrary - Cannot load library from path lib/linux_64/libxgboost4j_gpu.so
16:20:18.903 [main] WARN hex.tree.xgboost.util.NativeLibrary - Failed to load library from both native path and jar!
16:20:18.904 [main] INFO hex.tree.xgboost.util.NativeLibraryLoaderChain - Cannot load library: xgboost4j_gpu (lib/linux_64/libxgboost4j_gpu.so)
16:20:18.956 [main] INFO hex.tree.xgboost.util.NativeLibrary - Loaded library from lib/linux_64/libxgboost4j_minimal.so (/tmp/libxgboost4j_minimal3709312247772050036.so)
16:20:19.175 [main] INFO water.k8s.H2OCluster - Starting Kubernetes-related REST API services
16:20:19.197 [main] INFO water.k8s.H2OCluster - Kubernetes REST API services successfully started.
16:20:19.197 [main] INFO water.k8s.H2OCluster - Initializing H2O Kubernetes cluster
16:20:19.198 [main] INFO water.k8s.H2OCluster - Timeout contraint: 180 seconds.
16:20:19.199 [main] INFO water.k8s.H2OCluster - Cluster size constraint: 3 nodes.
16:20:19.230 [main] INFO water.k8s.lookup.KubernetesDnsLookup - Timeout for node discovery is set to 180 seconds.
16:20:19.230 [main] INFO water.k8s.lookup.KubernetesDnsLookup - Desired cluster size is set to 3 nodes.
16:20:19.252 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:20.254 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:21.257 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:22.259 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:23.260 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:24.262 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:25.264 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:26.266 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:27.269 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:28.271 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:29.273 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:30.275 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:31.277 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:32.279 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:33.281 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:34.283 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:35.284 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:36.286 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:37.288 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:38.290 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:39.292 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:40.294 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:41.295 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:42.297 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:43.299 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:44.301 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:45.303 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:46.305 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:47.307 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:48.309 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:49.310 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:50.312 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:51.314 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:52.316 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:53.318 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:54.320 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:55.322 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:56.324 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:57.326 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:58.328 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:20:59.330 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:00.332 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:01.334 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:02.336 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:03.338 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:04.339 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:05.341 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:06.343 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:07.345 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:08.347 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:09.349 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:21:10.351 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:13.578 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:14.580 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:15.582 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:16.584 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:17.586 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:18.588 [main] WARN water.k8s.lookup.KubernetesDnsLookup - DNS name not found [response code 3]
16:23:19.589 [main] INFO water.k8s.H2OCluster - Using the following pods to form H2O cluster: []
2022-03-21 16:23:19.868:INFO::main: Logging initialized @183284ms to org.eclipse.jetty.util.log.StdErrLog
03-21 16:23:20.345 X.X.X.X:54321 1 main INFO water.default: Dynamically loaded 'water.k8s.KubernetesEmbeddedConfigProvider' as AbstractEmbeddedH2OConfigProvider.
03-21 16:23:20.346 X.X.X.X:54321 1 main INFO water.default: ----- H2O started -----
03-21 16:23:20.346 X.X.X.X:54321 1 main INFO water.default: Build git branch: rel-zorn
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build git hash: 717d8bf831d5d6b0decda9c37a2a20de9a491754
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build git describe: jenkins-3.36.0.2-53-g717d8bf
03-21 16:23:20.347 X.X.X.X:54321 1 main INFO water.default: Build project version: 3.36.0.3
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Build age: 1 month and 4 days
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Built by: 'jenkins'
03-21 16:23:20.348 X.X.X.X:54321 1 main INFO water.default: Built on: '2022-02-16 17:51:32'
03-21 16:23:20.349 X.X.X.X:54321 1 main INFO water.default: Found H2O Core extensions: [StackTraceCollector, XGBoost, KrbStandalone, Infogram]
03-21 16:23:20.349 X.X.X.X:54321 1 main INFO water.default: Processed H2O arguments: []
03-21 16:23:20.350 X.X.X.X:54321 1 main INFO water.default: Java availableProcessors: 1
03-21 16:23:20.350 X.X.X.X:54321 1 main INFO water.default: Java heap totalMemory: 889.4 MB
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: Java heap maxMemory: 27.73 GB
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: Java version: Java 11.0.14 (from Red Hat, Inc.)
03-21 16:23:20.351 X.X.X.X:54321 1 main INFO water.default: JVM launch parameters: [-XX:+UseContainerSupport, -XX:MaxRAMPercentage=50]
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: JVM process id: 1@h2o-stateful-set-0
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: OS version: Linux 5.4.144+ (amd64)
03-21 16:23:20.352 X.X.X.X:54321 1 main INFO water.default: Machine physical memory: 57.38 GB
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: Machine locale: en_US
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: X-h2o-cluster-id: 1647879616788
03-21 16:23:20.353 X.X.X.X:54321 1 main INFO water.default: User name: 'root'
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: IPv6 stack selected: false
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: Possible IP Address: eth0 (eth0), X.X.X.X
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: Possible IP Address: lo (lo), 127.0.0.1
03-21 16:23:20.354 X.X.X.X:54321 1 main INFO water.default: H2O node running in unencrypted mode.
03-21 16:23:20.357 X.X.X.X:54321 1 main INFO water.default: Internal communication uses port: 54322
03-21 16:23:20.357 X.X.X.X:54321 1 main INFO water.default: Listening for HTTP and REST traffic on http://X.X.X.X:54321/
03-21 16:23:20.358 X.X.X.X:54321 1 main WARN water.default: Flatfile configuration does not include self: /X.X.X.X:54321, but contains []
03-21 16:23:20.359 X.X.X.X:54321 1 main INFO water.default: H2O cloud name: 'root' on /X.X.X.X:54321, static configuration based on -flatfile null
03-21 16:23:20.359 X.X.X.X:54321 1 main INFO water.default: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
03-21 16:23:20.360 X.X.X.X:54321 1 main INFO water.default: 1. Open a terminal and run 'ssh -L 55555:localhost:54321 [email protected]'
03-21 16:23:20.360 X.X.X.X:54321 1 main INFO water.default: 2. Point your browser to http://localhost:55555
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Kerberos not configured
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Log dir: '/tmp/h2o-root/h2ologs'
03-21 16:23:21.293 X.X.X.X:54321 1 main INFO water.default: Cur dir: '/'
03-21 16:23:21.304 X.X.X.X:54321 1 main INFO water.default: Subsystem for distributed import from HTTP/HTTPS successfully initialized
03-21 16:23:21.305 X.X.X.X:54321 1 main INFO water.default: HDFS subsystem successfully initialized
03-21 16:23:21.309 X.X.X.X:54321 1 main INFO water.default: S3 subsystem successfully initialized
03-21 16:23:21.333 X.X.X.X:54321 1 main INFO water.default: GCS subsystem successfully initialized
03-21 16:23:21.333 X.X.X.X:54321 1 main INFO water.default: Flow dir: '/root/h2oflows'
03-21 16:23:21.351 X.X.X.X:54321 1 main INFO water.default: Cloud of size 1 formed [h2o-stateful-set-0.h2o-service.sparkling-water.svc.cluster.local/X.X.X.X:54321]
03-21 16:23:21.352 X.X.X.X:54321 1 main INFO water.default: Created cluster of size 1, leader node IP is 'h2o-stateful-set-0.h2o-service.sparkling-water.svc.cluster.local/X.X.X.X'
03-21 16:23:21.371 X.X.X.X:54321 1 main INFO water.default: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
03-21 16:23:21.373 X.X.X.X:54321 1 main INFO water.default: StackTraceCollector extension initialized
03-21 16:23:21.374 X.X.X.X:54321 1 main INFO water.default: XGBoost extension initialized
03-21 16:23:21.375 X.X.X.X:54321 1 main INFO water.default: KrbStandalone extension initialized
03-21 16:23:21.376 X.X.X.X:54321 1 main INFO water.default: Infogram extension initialized
03-21 16:23:21.376 X.X.X.X:54321 1 main INFO water.default: Registered 4 core extensions in: 2077ms
03-21 16:23:21.377 X.X.X.X:54321 1 main INFO water.default: Registered H2O core extensions: [StackTraceCollector, XGBoost, KrbStandalone, Infogram]
03-21 16:23:21.378 X.X.X.X:54321 1 main INFO water.default: Registered: 1 auth extensions in: 180633ms
03-21 16:23:21.378 X.X.X.X:54321 1 main INFO water.default: Registered Auth extensions: [water.server.LeaderNodeRequestFilter]
03-21 16:23:21.677 X.X.X.X:54321 1 main INFO hex.tree.xgboost.XGBoostExtension: Found XGBoost backend with library: xgboost4j_minimal
03-21 16:23:21.678 X.X.X.X:54321 1 main WARN hex.tree.xgboost.XGBoostExtension: Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
03-21 16:23:21.880 X.X.X.X:54321 1 main INFO water.default: Registered: 275 REST APIs in: 502ms
03-21 16:23:21.881 X.X.X.X:54321 1 main INFO water.default: Registered REST API extensions: [Amazon S3, XGBoost, Algos, Sparkling Water REST API Extensions, Infogram, AutoML, Core V3, TargetEncoder, Core V4]
03-21 16:23:22.034 X.X.X.X:54321 1 main INFO water.default: Registered: 330 schemas in 152ms
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default: H2O started in 185238ms
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default:
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default: Open H2O Flow in your web browser: http://X.X.X.X:54321
03-21 16:23:22.035 X.X.X.X:54321 1 main INFO water.default:
@mn-mikke

from sparkling-water.

mn-mikke avatar mn-mikke commented on June 15, 2024

I don't see h2o nodes communicating to each other. How did you deploy the H2O-3 cluster?

from sparkling-water.

KunfuPanda24 avatar KunfuPanda24 commented on June 15, 2024

I just followed the steps in this doc with one exception on h2oai/sparkling-water-external-backend:3.36.0.3-1-3.2 for deploying the external backend.

from sparkling-water.

mn-mikke avatar mn-mikke commented on June 15, 2024

Has the H2O-3 cluster been deployed with the configuration that is specified in documentation or did you use configuration I recommended here?

from sparkling-water.

KunfuPanda24 avatar KunfuPanda24 commented on June 15, 2024

As per the documentation @mn-mikke

from sparkling-water.

KunfuPanda24 avatar KunfuPanda24 commented on June 15, 2024

@mn-mikke On trying with your configuration i am able to see N nodes on the cluster. Thanks for the help. One last question even though on pass the value for spark.ext.h2o.cloud.name as h2o-cluster-1, i am able to see h2o cluster name as root. Am i passing anything wrong?

H2O_cluster_uptime:         9 mins 54 secs
H2O_cluster_timezone:       Etc/GMT
H2O_data_parsing_timezone:  UTC
H2O_cluster_version:        3.36.0.3
H2O_cluster_version_age:    1 month and 14 days
H2O_cluster_name:           root
H2O_cluster_total_nodes:    2
H2O_cluster_free_memory:    55.42 Gb
H2O_cluster_total_cores:    2
H2O_cluster_allowed_cores:  2
H2O_cluster_status:         locked, healthy
H2O_connection_url:         http://inith2o-py-6af5037fdfdba01e-driver-svc.spark.svc:54321
H2O_connection_proxy:       null
H2O_internal_security:      False
Python_version:             3.9.2 final

from sparkling-water.

mn-mikke avatar mn-mikke commented on June 15, 2024

Hi @gurumoorthy208524,
The sw parameter spark.ext.h2o.cloud.name is utilized only when h2o cluster is being created (internal backend, automatically started external backend). Since you deploy h2o cluster manually, you need to specify the name of the cluster in the yaml file specifying the cluster stateful set.

By default, the SW external backend image executes the below command:
java -cp /opt/h2oai/h2o-3/sparkling-water-assembly-extensions_${scalaBaseVersion}-${version}-all.jar:/opt/h2oai/h2o-3/h2o.jar -XX:+UseContainerSupport -XX:MaxRAMPercentage=50 water.H2OApp

You can set the name of h2o cluster by adding the extra parameter -name to this command.

YAML file of the stateful set than will look like this:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: h2o-stateful-set
  namespace: default
spec:
  serviceName: h2o-service
  replicas: 2
  selector:
    matchLabels:
      app: h2o-k8s
  template:
    metadata:
      labels:
        app: h2o-k8s
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: h2o-k8s
          image: 'h2oai/sparkling-water-external-backend:3.36.0.3-1-3.1'
          resources:
            requests:
              memory: "4Gi"
          ports:
            - containerPort: 54321
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /kubernetes/isLeaderNode
              port: 8081
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 1
          env:
          - name: H2O_KUBERNETES_SERVICE_DNS
            value: h2o-service.default.svc.cluster.local
          - name: H2O_NODE_LOOKUP_TIMEOUT
            value: '180'
          - name: H2O_NODE_EXPECTED_COUNT
            value: '2'
          - name: H2O_KUBERNETES_API_PORT
            value: '8081'
          command: ["java"]
          args: ["-cp", "/opt/h2oai/h2o-3/sparkling-water-assembly-extensions_2.12-3.36.0.3-1-3.1-all.jar:/opt/h2oai/h2o-3/h2o.jar", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=50", "water.H2OApp", "-name", "myh2ocluster"]

from sparkling-water.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.