The Python with H2O (version 3.44.0.2) AutoML with only the XGB model included

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

That's exactly the kind of log that what we need, thank you <a class="user-mention not

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Errno 111] Connection refused') when running XGB model with > 850 columns and 1634870 rows of data about h2o-3 HOT 9 OPEN

magrenimish commented on June 2, 2024

[Errno 111] Connection refused') when running XGB model with > 850 columns and 1634870 rows of data

from h2o-3.

Comments (9)

tomasfryda commented on June 2, 2024

@magrenimish This is likely a memory issue - XGBoost lives in memory outside of JVM so h2o and xgboost compete for the same memory. Please see the documentation to find out how to limit h2o's memory so that XGBoost fits in the memory.

from h2o-3.

magrenimish commented on June 2, 2024

@tomasfryda As mentioned in the documentation, I allowed less than 2/3 of the total available RAM to H2O, leaving the rest for XGB. Available memory to XGB is well beyond 100gb.

from h2o-3.

tomasfryda commented on June 2, 2024

@magrenimish Thanks for adding the available memory information. I don't see any obvious reason why it should fail like this. Would you be able to provide us with logs? Please make sure there are no confidential data in the logs (the log might contain user name, column names, loaded file names etc).

from h2o-3.

magrenimish commented on June 2, 2024

@tomasfryda Here is the log:
automodeler.log

from h2o-3.

tomasfryda commented on June 2, 2024

Thank you @magrenimish . Unfortunately that's not the H2O (backend) log. Please see https://docs.h2o.ai/h2o/latest-stable/h2o-docs/logs.html to find out how to get the H2O (backend) logs.

from h2o-3.

magrenimish commented on June 2, 2024

Hi @tomasfryda I tried to access the H2O logs zip folder and after downloading it, I only see a 'nohup.out' file as attached here:
automodeler_h2o_logs (1).zip

from h2o-3.

tomasfryda commented on June 2, 2024

That's exactly the kind of log that what we need, thank you @magrenimish .
It looks like the failure occurs during the data load so it doesn't even get to AutoML.

The log ends in the middle of a line which I think might be due to OOM error but it's weird because you the file should be much smaller than available memory. @wendycwong I think this is a bug related to parquet parser.

The end of the log:

12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-43 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 122370
12-26 21:21:23.722 127.0.0.1:16822       9972     FJ-3-113 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 123275
12-26 21:21:23.722 127.0.0.1:16822       9972     FJ-3-105 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 121590
12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-87 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 126722
12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-47 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 125185
12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-19 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 125131
12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-43 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 122371
12-26 21:21:23.722 127.0.0.1:16822       9972     FJ-3-113 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 123276
12-26 21:21:23.722 127.0.0.1:16822       9972     FJ-3-105 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value: 121591
12-26 21:21:23.722 127.0.0.1:16822       9972      FJ-3-87 DEBUG org.apache.parquet.hadoop.InternalParquetRecordReader: read value

from h2o-3.

magrenimish commented on June 2, 2024

Hi @tomasfryda @wendycwong, were you able to confirm if this was an error related to parquet parsing?

from h2o-3.

wendycwong commented on June 2, 2024

Hi Nimish:

I don't have your parquet file, so I created one for myself. I started my backend using this command:

java -Xmx50g -jar build/h2o.jar

I ran the following code. Please change the directory path to your path if you want to run my code:

fr = h2o.create_frame(rows=163481, cols=851, real_fraction=1.0, categorical_fraction=0, has_response=True,
response_factors=2, seed=12345, missing_fraction=0.0)
h2o.export_file(fr, "/Users/wendycwong/temp/gh_16011.parquet", header=True, format="parquet") # export as parquet file
h2o.remove_all()
fr = h2o.import_file("/Users/wendycwong/temp/gh_16011.parquet")
m = H2OXGBoostEstimator(ntrees=10, seed=1234)
m.train(x=list(range(1, fr.ncol)), y="response", training_frame=fr)
print("Done")

The code run okay for me. So, the file size is not an issue here (I was worried about that).

So, without having access to your parquet code, I cannot debug what the problem is with your file. If you can change your parquet file to .csv, perhaps that may run for you.

Thanks,
Wendy

from h2o-3.

[Errno 111] Connection refused') when running XGB model with > 850 columns and 1634870 rows of data about h2o-3 HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent