datacanvasio / hypergbm Goto Github PK
View Code? Open in Web Editor NEWA full pipeline AutoML tool for tabular data
Home Page: https://hypergbm.readthedocs.io/
License: Apache License 2.0
A full pipeline AutoML tool for tabular data
Home Page: https://hypergbm.readthedocs.io/
License: Apache License 2.0
System information
pip list
):Package Version
--------------------- ---------
asttokens 2.0.8
backcall 0.2.0
bcrypt 4.0.0
catboost 1.0.6
certifi 2022.6.15
cffi 1.15.1
charset-normalizer 2.1.1
click 8.1.3
cloudpickle 2.1.0
convertdate 2.4.0
cryptography 37.0.4
cycler 0.11.0
dask 2022.7.1
decorator 5.1.1
distributed 2022.7.1
executing 1.0.0
featuretools 1.13.0
fonttools 4.37.1
fsspec 2022.8.2
graphviz 0.20.1
HeapDict 1.0.1
hijri-converter 2.2.4
holidays 0.15
hypergbm 0.2.5.4
hypernets 0.2.5.4
idna 3.3
imbalanced-learn 0.9.1
ipython 8.4.0
jedi 0.18.1
Jinja2 3.1.2
joblib 1.1.0
kiwisolver 1.4.4
korean-lunar-calendar 0.2.1
lightgbm 3.3.2
locket 1.0.0
MarkupSafe 2.1.1
matplotlib 3.5.3
matplotlib-inline 0.1.6
msgpack 1.0.4
numpy 1.23.2
packaging 21.3
pandas 1.4.4
paramiko 2.11.0
parso 0.8.3
partd 1.3.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.2.0
pip 22.1.2
plotly 5.10.0
prettytable 3.4.0
prompt-toolkit 3.0.30
psutil 5.9.1
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 9.0.0
pycparser 2.21
Pygments 2.13.0
PyMeeus 0.5.11
PyNaCl 1.5.0
pyparsing 3.0.9
python-dateutil 2.8.2
python-geohash 0.8.5
pytz 2022.2.1
PyYAML 6.0
requests 2.28.1
scikit-learn 1.1.2
scipy 1.9.1
seaborn 0.11.2
setuptools 63.4.1
six 1.16.0
sortedcontainers 2.4.0
stack-data 0.5.0
tblib 1.7.0
tenacity 8.0.1
threadpoolctl 3.1.0
toolz 0.12.0
tornado 6.1
tqdm 4.64.0
traitlets 5.3.0
urllib3 1.26.12
wcwidth 0.2.5
wheel 0.37.1
woodwork 0.18.0
xgboost 1.6.2
XlsxWriter 3.0.3
zict 2.2.0
Describe the current behavior
Failed to run a simple demo.
Describe the expected behavior
Standalone code to reproduce the issue
Environment:
conda clean -a
pip cache purge
conda create -n gbm
conda activate gbm
conda install pip
pip install hypergbm
Log:
Successfully installed MarkupSafe-2.1.1 XlsxWriter-3.0.3 asttokens-2.0.8 backcall-0.2.0 bcrypt-4.0.0 catboost-1.0.6 cffi-1.15.1 charset-normalizer-2.1.1 click-8.1.3 cloudpickle-2.1.0 convertdate-2.4.0 cryptography-37.0.4 cycler-0.11.0 dask-2022.7.1 decorator-5.1.1 distributed-2022.7.1 executing-1.0.0 featuretools-1.13.0 fonttools-4.37.1 fsspec-2022.8.2 graphviz-0.20.1 heapdict-1.0.1 hijri-converter-2.2.4 holidays-0.15 hypergbm-0.2.5.4 hypernets-0.2.5.4 idna-3.3 imbalanced-learn-0.9.1 ipython-8.4.0 jedi-0.18.1 jinja2-3.1.2 joblib-1.1.0 kiwisolver-1.4.4 korean-lunar-calendar-0.2.1 lightgbm-3.3.2 locket-1.0.0 matplotlib-3.5.3 matplotlib-inline-0.1.6 msgpack-1.0.4 numpy-1.23.2 packaging-21.3 pandas-1.4.4 paramiko-2.11.0 parso-0.8.3 partd-1.3.0 pexpect-4.8.0 pickleshare-0.7.5 pillow-9.2.0 plotly-5.10.0 prettytable-3.4.0 prompt-toolkit-3.0.30 psutil-5.9.1 ptyprocess-0.7.0 pure-eval-0.2.2 pyarrow-9.0.0 pycparser-2.21 pygments-2.13.0 pymeeus-0.5.11 pynacl-1.5.0 pyparsing-3.0.9 python-dateutil-2.8.2 python-geohash-0.8.5 pytz-2022.2.1 pyyaml-6.0 requests-2.28.1 scikit-learn-1.1.2 scipy-1.9.1 seaborn-0.11.2 six-1.16.0 sortedcontainers-2.4.0 stack-data-0.5.0 tblib-1.7.0 tenacity-8.0.1 threadpoolctl-3.1.0 toolz-0.12.0 tornado-6.1 tqdm-4.64.0 traitlets-5.3.0 urllib3-1.26.12 wcwidth-0.2.5 woodwork-0.18.0 xgboost-1.6.2 zict-2.2.0
Code sample:
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
X,y = datasets.load_breast_cancer(as_frame=True,return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.7,random_state=335)
train_data = pd.concat([X_train,y_train],axis=1)
from hypergbm import make_experiment
experiment = make_experiment(train_data, target='target', reward_metric='precision')
estimator = experiment.run()
The output:
09-01 14:28:01 E hypernets.m.hyper_model.py 71 - run_trail failed! trail_no=7
09-01 14:28:01 E hypernets.m.hyper_model.py 73 - Traceback (most recent call last):
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypernets/model/hyper_model.py", line 61, in _run_trial
scores, oof, oof_scores = estimator.fit_cross_validation(X, y, stratified=True, num_folds=num_folds,
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/hyper_gbm.py", line 312, in fit_cross_validation
fold_est.fit(x_train_fold, y_train_fold, **fit_kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/estimators.py", line 482, in fit
return self.fit_with_encoder(super().fit, X, y, kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/estimators.py", line 181, in fit_with_encoder
return fn_fit(X, y, **kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/core.py", line 575, in inner_f
return f(**kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/sklearn.py", line 1400, in fit
self._Booster = train(
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/core.py", line 575, in inner_f
return f(**kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/training.py", line 159, in train
_assert_new_callback(callbacks)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/training.py", line 25, in _assert_new_callback
raise ValueError(
ValueError: Old style callback was removed in version 1.6. See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html.
09-01 14:28:01 E hypernets.m.hyper_model.py 71 - run_trail failed! trail_no=8
09-01 14:28:01 E hypernets.m.hyper_model.py 73 - Traceback (most recent call last):
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypernets/model/hyper_model.py", line 61, in _run_trial
scores, oof, oof_scores = estimator.fit_cross_validation(X, y, stratified=True, num_folds=num_folds,
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/hyper_gbm.py", line 312, in fit_cross_validation
fold_est.fit(x_train_fold, y_train_fold, **fit_kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/estimators.py", line 482, in fit
return self.fit_with_encoder(super().fit, X, y, kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/hypergbm/estimators.py", line 181, in fit_with_encoder
return fn_fit(X, y, **kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/core.py", line 575, in inner_f
return f(**kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/sklearn.py", line 1400, in fit
self._Booster = train(
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/core.py", line 575, in inner_f
return f(**kwargs)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/training.py", line 159, in train
_assert_new_callback(callbacks)
File "~/miniconda3/envs/gbm/lib/python3.10/site-packages/xgboost/training.py", line 25, in _assert_new_callback
raise ValueError(
ValueError: Old style callback was removed in version 1.6. See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html.
ERROR: Failed building wheel for python-geohash
When I'm using dask dataframe as input for HyperGBM(v0.2.2) with options:
use_cache=False
and don't specify cache_dir
it gaves the error bellow:
[ERROR] E hypergbm.hyper_gbm.py 584 - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/workdir/hypergbm_cache/22805_16_32f8e48ef673a57e6644a4b1b295fb66_4355a2f3c42cdb4a7b3c3f53ee8a26b5.parquet/part.0.parquet'
[ERROR] Traceback (most recent call last):
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/hyper_gbm.py", line 582, in _save_df
[ERROR] to_parquet(df, filepath, fs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/persistence.py", line 93, in to_parquet
[ERROR] result = dask.compute(parts)
[ERROR] File "/usr/local/lib/python3.7/site-packages/dask/base.py", line 565, in compute
[ERROR] results = schedule(dsk, keys, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 2654, in get
[ERROR] results = self.gather(packed, asynchronous=asynchronous, direct=direct)
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 1969, in gather
[ERROR] asynchronous=asynchronous,
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 838, in sync
[ERROR] self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 351, in sync
[ERROR] raise exc.with_traceback(tb)
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 334, in f
[ERROR] result[0] = yield future
[ERROR] File "/usr/local/lib/python3.7/site-packages/tornado/gen.py", line 762, in run
[ERROR] value = future.result()
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 1828, in _gather
[ERROR] raise exception.with_traceback(traceback)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/persistence.py", line 54, in _arrow_write_parquet
[ERROR] pq.write_table(tbl, target_path, filesystem=filesystem, **pa_options)
[ERROR] File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line 1797, in write_table
[ERROR] **kwargs) as writer:
[ERROR] File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line 609, in __init__
[ERROR] path, compression=None)
[ERROR] File "pyarrow/_fs.pyx", line 660, in pyarrow._fs.FileSystem.open_output_stream
[ERROR] out_handle = GetResultValue(self.fs.OpenOutputStream(pathstr))
[ERROR] File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
[ERROR] return check_status(status)
[ERROR] File "pyarrow/_fs.pyx", line 1072, in pyarrow._fs._cb_open_output_stream
[ERROR] stream = handler.open_output_stream(frombytes(path))
[ERROR] File "/usr/local/lib/python3.7/site-packages/pyarrow/fs.py", line 314, in open_output_stream
[ERROR] return PythonFile(self.fs.open(path, mode="wb"), mode="w")
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypernets/utils/_fsutils.py", line 105, in execute
[ERROR] result = fn(self.to_rpath(rpath), *args, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/fsspec/spec.py", line 943, in open
[ERROR] **kwargs,
[ERROR] File "/usr/local/lib/python3.7/site-packages/fsspec/implementations/local.py", line 118, in _open
[ERROR] return LocalFileOpener(path, mode, fs=self, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/fsspec/implementations/local.py", line 200, in __init__
[ERROR] self._open()
[ERROR] File "/usr/local/lib/python3.7/site-packages/fsspec/implementations/local.py", line 205, in _open
[ERROR] self.f = open(self.path, mode=self.mode)
[ERROR]
And sometimes results in another error:
[ERROR] 07-14 16:24:18 E hypernets.e._experiment.py 85 - ExperiementID:[None] - evaluate feature importance:
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypernets/experiment/_experiment.py", line 75, in run
[ERROR] y_eval=self.y_eval, eval_size=self.eval_size, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/experiment.py", line 1116, in train
[ERROR] return super().train(hyper_model, X_train, y_train, X_test, X_eval, y_eval, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/experiment.py", line 839, in train
[ERROR] step.fit_transform(hyper_model, X_train, y_train, X_test=X_test, X_eval=X_eval, y_eval=y_eval, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/experiment.py", line 431, in fit_transform
[ERROR] importances = feature_importance_batch(estimators, X_eval, y_eval, self.scorer, n_repeats=5)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/feature_importance.py", line 73, in feature_importance_batch
[ERROR] random_state=random_state)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/dask_ex.py", line 356, in permutation_importance
[ERROR] col_scores.append(scorer(estimator, X_permuted, y))
[ERROR] File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 170, in __call__
[ERROR] sample_weight=sample_weight)
[ERROR] File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 247, in _score
[ERROR] y_pred = method_caller(clf, "predict_proba", X)
[ERROR] File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 53, in _cached_call
[ERROR] return getattr(estimator, method)(*args, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/dask_ex.py", line 274, in call_and_compute
[ERROR] r = fn_call(*args, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/hyper_gbm.py", line 483, in predict_proba
[ERROR] proba = getattr(self.gbm_model, method)(X)
[ERROR] File "/usr/local/lib/python3.7/site-packages/hypergbm/estimators.py", line 382, in predict_proba
[ERROR] proba = dex.fix_binary_predict_proba_result(proba)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/dask_ex.py", line 261, in fix_binary_predict_proba_result
[ERROR] proba = make_chunk_size_known(proba)
[ERROR] File "/usr/local/lib/python3.7/site-packages/tabular_toolbox/dask_ex.py", line 142, in make_chunk_size_known
[ERROR] a = a.compute_chunk_sizes()
[ERROR] File "/usr/local/lib/python3.7/site-packages/dask/array/core.py", line 1274, in compute_chunk_sizes
[ERROR] [tuple([int(chunk) for chunk in chunks]) for chunks in compute(tuple(c))[0]]
[ERROR] File "/usr/local/lib/python3.7/site-packages/dask/base.py", line 565, in compute
[ERROR] results = schedule(dsk, keys, **kwargs)
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 2654, in get
[ERROR] results = self.gather(packed, asynchronous=asynchronous, direct=direct)
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 1969, in gather
[ERROR] asynchronous=asynchronous,
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/client.py", line 838, in sync
[ERROR] self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
[ERROR] File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 351, in sync
[ERROR] raise exc.with_traceback(tb)
This first error is logged by hyper_gbm.py
:
def _save_df(self, filepath, df):
try:
# with fs.open(filepath, 'wb') as f:
# df.to_parquet(f)
if not isinstance(df, pd.DataFrame):
fs.mkdirs(filepath, exist_ok=True)
to_parquet(df, filepath, fs)
except Exception as e:
logger.error(e)
# traceback.print_exc()
if fs.exists(filepath):
fs.rm(filepath, recursive=True)
I can see before the invocation of to_parquet
, the filepath
is already created by fs
. Here I'm confused about:
While the default cache_dir is hypergbm_cache
, where does the prefix /tmp/workdir/
come from? The only location related with this /tmp/workdir/
is hypernets\utils\_fsutils.py
:
if type(fs).__name__.lower().find('local') >= 0:
if fs_root is None or fs_root == '':
fs_root = os.path.join(tempfile.gettempdir(), 'workdir')
Is the path create by fs
the same as the path in function to_parquet
where there are also some operations related with the file system.
In jupyter, the error disappears?
Then comes to the mechanism of cache:
The use_cache
option cannot control the cache behavior of hypergbm.hyper_gbm.HyperGBMEstimator.predict
, hypergbm.hyper_gbm.HyperGBMEstimator.predict_proba
in steps such as hypergbm.experiment.PermutationImportanceSelectionStep
or hypergbm.experiment.EnsembleStep
, where the HyperGBMEstimator
is loaded from training trails and the predict method is invoked with use_cache=None
, then in hypergbm.hyper_gbm.HyperGBMEstimator.transform_data
:
def transform_data(self, X, y=None, fit=False, use_cache=None, verbose=0):
if use_cache is None:
use_cache = False
results in the action of saving intermediate data.
To avoid this, I think the easiest way may be:
Change use_cache
to False
when it's None
in HyperGBMEstimator's transform_data
.
Or, fix the to_parquet
.
Expect for the fix.
Hi, distinguished composer! I'm a freshman in machine learning. I want to use the hypergbm on dealing with a classification task. But after I pip intall hypergbm, I met some problem that made me fail to install the package. Could you please take a look and help me with the problem. I'll be greatly appreciated!
ERROR: Failed building wheel for python-geohash
Running setup.py clean for python-geohash
Successfully built jieba
Failed to build python-geohash
Installing collected packages: packaging, jedi, llvmlite, distributed, seaborn, python-geohash, pyarrow, plotly, numba, lightgbm, featuretools, dask-glm, slicer, hypernets, dask-ml, catboost, shap, jieba, hypernets-jupyter-widget, hypergbm
Attempting uninstall: packaging
Found existing installation: packaging 20.9
Uninstalling packaging-20.9:
Successfully uninstalled packaging-20.9
Attempting uninstall: jedi
Found existing installation: jedi 0.14.1
Uninstalling jedi-0.14.1:
Successfully uninstalled jedi-0.14.1
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.31.0
ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
This template is for miscellaneous issues not covered by the other issue categories.
When we ran HyperGBM on k8s, if the resources of cpu and memory were far lower than the whole resources on the machine after has setted the k8s resource limits, the HyperGBM would trigger the earlystop with timeout error. We found the k8s pod can watch all cpu even if it has the resource limits.
HyperGBM[notebook]在Jypter不显示可视化
It would be really nice to be able to run HyperGBM on a csv/tsv file with a few tuning parameters through command line, rather than coding with own python code.
Update defalt catboost's earlying_stop_round from None to 200
10-29 08:52:16 I hypernets.d.percentile.py 26 - direction:max, promising:False, percentile_score:0.4766159871250443, current_trial_score:0.476461461619065, trajectory size:11
10-29 08:52:16 I hypernets.m.hyper_model.py 68 - unpromising trial:[0.6487236394773013, 0.613287478701026, 0.5840506989466481, 0.5604754071974789, 0.5409672820410317, 0.5247583446365381, 0.5112257482275185, 0.4997889628536849, 0.4906462009430983, 0.4828609200866817, 0.476461461619065]
it seems the model got good result, but it said 'promising:False'
try eval_data like code below.
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from hypergbm import make_experiment
X,y = datasets.load_breast_cancer(as_frame=True,return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.7,random_state=335)
dftrain = pd.concat([X_train,y_train],axis=1)
dfval = pd.concat([X_test,y_test],axis=1)
experiment = make_experiment(train_data=dftrain.copy(),
target='target',
reward_metric='precision',
eval_data=dfval.copy(),
max_trials=5)
estimator = experiment.run()
estimator
it throws an error like this:
03-21 22:28:22 E hypernets.e._experiment.py 100 - ExperimentID:[HyperGBM_51f6515f9945cce8cf9365412d041ebe] - None: 'CompeteExperiment' object has no attribute 'y_eval_pred'
Traceback (most recent call last):
File "/conda/envs/notebook/lib/python3.6/site-packages/hypernets/experiment/_experiment.py", line 86, in run
callback.experiment_start(self)
File "/conda/envs/notebook/lib/python3.6/site-packages/hypernets/experiment/_callback.py", line 577, in experiment_start
d = ExperimentExtractor(exp).extract()
File "/conda/envs/notebook/lib/python3.6/site-packages/hypernets/experiment/_extractor.py", line 668, in extract
if self.is_evaluated():
File "/conda/envs/notebook/lib/python3.6/site-packages/hypernets/experiment/_extractor.py", line 654, in is_evaluated
return self.exp.X_eval is not None and self.exp.y_eval is not None and self.exp.y_eval_pred is not None
AttributeError: 'CompeteExperiment' object has no attribute 'y_eval_pred'
Could this bug be fixed in the next version?
ImportError Traceback (most recent call last)
Cell In[17], line 1
----> 1 from hypergbm import make_experiment
File ~.conda\envs\hypergbm\lib\site-packages\hypergbm_init_.py:7
3 """
4
5 """
6 from hypernets.experiment import CompeteExperiment
----> 7 from .hyper_gbm import HyperGBM, HyperGBMEstimator, HyperGBMExplainer, HyperEstimator, HyperModel
8 from .experiment import make_experiment
10 from ._version import version
File ~.conda\envs\hypergbm\lib\site-packages\hypergbm\hyper_gbm.py:12
10 import re
11 import time
---> 12 from imblearn.over_sampling import RandomOverSampler, SMOTE, ADASYN
13 from imblearn.under_sampling import RandomUnderSampler, NearMiss, TomekLinks, EditedNearestNeighbours
14 from sklearn import pipeline as sk_pipeline
File ~.conda\envs\hypergbm\lib\site-packages\imblearn_init_.py:52
48 sys.stderr.write("Partial import of imblearn during the build process.\n")
49 # We are not importing the rest of scikit-learn during the build
50 # process, as it may not be compiled yet
51 else:
---> 52 from . import (
53 combine,
54 ensemble,
55 exceptions,
56 metrics,
57 over_sampling,
58 pipeline,
59 tensorflow,
60 under_sampling,
61 utils,
62 )
63 from ._version import version
64 from .base import FunctionSampler
File ~.conda\envs\hypergbm\lib\site-packages\imblearn\combine_init_.py:5
1 """The :mod:imblearn.combine
provides methods which combine
2 over-sampling and under-sampling.
3 """
----> 5 from ._smote_enn import SMOTEENN
6 from ._smote_tomek import SMOTETomek
8 all = ["SMOTEENN", "SMOTETomek"]
File ~.conda\envs\hypergbm\lib\site-packages\imblearn\combine_smote_enn.py:12
9 from sklearn.base import clone
10 from sklearn.utils import check_X_y
---> 12 from ..base import BaseSampler
13 from ..over_sampling import SMOTE
14 from ..over_sampling.base import BaseOverSampler
File ~.conda\envs\hypergbm\lib\site-packages\imblearn\base.py:21
18 from sklearn.utils.multiclass import check_classification_targets
20 from .utils import check_sampling_strategy, check_target_type
---> 21 from .utils._param_validation import validate_parameter_constraints
22 from .utils._validation import ArraysTransformer
25 class SamplerMixin(BaseEstimator, metaclass=ABCMeta):
File ~.conda\envs\hypergbm\lib\site-packages\imblearn\utils_param_validation.py:908
906 from sklearn.utils._param_validation import generate_valid_param # noqa
907 from sklearn.utils._param_validation import validate_parameter_constraints # noqa
--> 908 from sklearn.utils._param_validation import (
909 HasMethods,
910 Hidden,
911 Interval,
912 Options,
913 StrOptions,
914 _ArrayLikes,
915 _Booleans,
916 _Callables,
917 _CVObjects,
918 _InstancesOf,
919 _IterablesNotString,
920 _MissingValues,
921 _NoneConstraint,
922 _PandasNAConstraint,
923 _RandomStates,
924 _SparseMatrices,
925 _VerboseHelper,
926 make_constraint,
927 validate_params,
928 )
ImportError: cannot import name '_MissingValues' from 'sklearn.utils._param_validation' (C:\Users\fish.conda\envs\hypergbm\lib\site-packages\sklearn\utils_param_validation.py)
下面是 pip list
PS D:\codes\hyperGBM> pip list
Package Version
anyio 3.7.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
async-lru 2.0.2
attrs 23.1.0
Babel 2.12.1
backcall 0.2.0
bcrypt 4.0.1
beautifulsoup4 4.12.2
bleach 6.0.0
catboost 1.2
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 3.1.0
click 8.1.3
cloudpickle 2.2.1
colorama 0.4.6
comm 0.1.3
cryptography 41.0.1
cycler 0.11.0
dask 2023.6.1
dask-glm 0.2.0
dask-ml 2023.3.24
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
distributed 2023.6.1
exceptiongroup 1.1.1
executing 1.2.0
fastjsonschema 2.17.1
featuretools 1.26.0
fonttools 4.40.0
fqdn 1.5.1
fsspec 2023.6.0
graphviz 0.20.1
hboard 0.1.1
hboard-widget 0.1.1
holidays 0.27.1
hypergbm 0.2.5.7
hypernets 0.2.5.7
idna 3.4
imbalanced-learn 0.10.1
importlib-metadata 6.7.0
importlib-resources 5.12.0
iniconfig 2.0.0
ipykernel 6.23.3
ipython 8.14.0
ipywidgets 8.0.6
isoduration 20.11.0
jedi 0.18.2
jieba 0.42.1
Jinja2 3.1.2
joblib 1.3.1
json5 0.9.14
jsonpointer 2.4
jsonschema 4.17.3
jupyter_client 8.3.0
jupyter_core 5.3.1
jupyter-events 0.6.3
jupyter-lsp 2.2.0
jupyter_server 2.7.0
jupyter_server_terminals 0.4.4
jupyterlab 4.0.2
jupyterlab-pygments 0.2.2
jupyterlab_server 2.23.0
jupyterlab-widgets 3.0.7
kiwisolver 1.4.4
lightgbm 3.3.5
llvmlite 0.40.1
locket 1.0.0
MarkupSafe 2.1.3
matplotlib 3.5.3
matplotlib-inline 0.1.6
mistune 3.0.1
msgpack 1.0.5
multipledispatch 1.0.0
nbclient 0.8.0
nbconvert 7.6.0
nbformat 5.9.0
nest-asyncio 1.5.6
notebook_shim 0.2.3
numba 0.57.1
numpy 1.24.4
overrides 7.3.1
packaging 23.1
pandas 1.5.3
pandocfilters 1.5.0
paramiko 3.2.0
parso 0.8.3
partd 1.4.0
pickleshare 0.7.5
Pillow 10.0.0
pip 23.1.2
platformdirs 3.8.0
plotly 5.15.0
pluggy 1.2.0
prettytable 3.8.0
prometheus-client 0.17.0
prompt-toolkit 3.0.38
psutil 5.9.5
pure-eval 0.2.2
pyarrow 12.0.1
pycparser 2.21
Pygments 2.15.1
PyNaCl 1.5.0
pyparsing 3.1.0
pyrsistent 0.19.3
pytest 7.4.0
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3
pywin32 306
pywinpty 2.0.10
PyYAML 6.0
pyzmq 25.1.0
requests 2.31.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
scikit-learn 1.2.0
scipy 1.11.1
seaborn 0.12.2
Send2Trash 1.8.2
setuptools 67.8.0
shap 0.41.0
six 1.16.0
slicer 0.0.7
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.4.1
stack-data 0.6.2
tblib 2.0.0
tenacity 8.2.2
terminado 0.17.1
threadpoolctl 3.1.0
tinycss2 1.2.1
tomli 2.0.1
toolz 0.12.0
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.7.1
uri-template 1.3.0
urllib3 2.0.3
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.1
wheel 0.38.4
widgetsnbextension 4.0.7
woodwork 0.24.0
xgboost 1.7.6
XlsxWriter 3.1.2
zict 3.0.0
zipp 3.15.0
开始觉得可能是scikit-learn版本每对应上,我试了几个版本还是都报这个错误
在搜索中中很容易因为达到了max_no_improvement_trials就终止搜索,怎么禁用掉这一项,或者把它设高点?
Does HyperGBM's make_experiment return the best model?
How does it work on paramter tuning? It's say that, what's its seach space (e.g. in XGboost)???
I want to build my dataset for my application, how can i build the dataset, is there any standard of HyperGBM for the input data,or any user guide can help me figure out the question。
thanks for your time
请问代码内部是如何处理nan值的,我知道 XGB、LGB包本身是支持数据带nan进行训练的,但是我想知道HyperGBM是否在进行训练前将nan做了一定的处理,还是直接输入模型进行训练,我应该在哪可以发现相关处理逻辑。
System information
pip list
):Describe the current behavior
Hi, I got a nice lightgbm model using EvolutionSearcher, Then I used PlaybackSearcher to get cross-validation score, but I got a different score every time I run it. I found that the values of model's random_state are different.
Describe the expected behavior
For same hyper_model_.history, Get same reward using PlaybackSearcher.
Standalone code to reproduce the issue
import numpy as np
import pandas as pd
from hypernets.searchers import EvolutionSearcher
from hypergbm.search_space import GeneralSearchSpaceGenerator
from hypergbm import make_experiment
from hypernets.searchers import PlaybackSearcher
from hypernets.core.trial import TrialHistory
from hypernets.tabular.datasets import dsutils
train_data = dsutils.load_adult()
coloums_ = [f'feature{i}' for i in range(14)]+['target']
train_data.columns = coloums_
target = 'target'
print(train_data.shape)
history_file = 'history.txt'
search_space_ = GeneralSearchSpaceGenerator(
n_estimators=600,
enable_xgb=False,
enable_catboost=False,
)
rs = EvolutionSearcher(search_space_,optimize_direction='max',population_size=50, sample_size=6, candidates_size=5)
exp = make_experiment(train_data.copy(),eval_data=None,test_data=None,target=target,max_trials=1,random_state=1,ensemble_size = 0, searcher=rs,log_level='info')
estimator = exp.run()
_,ensemble_model = estimator.steps[-1]
print(f'model\'s random_state is {ensemble_model.cv_gbm_models_[0].random_state}')
exp.hyper_model_.history.save(history_file)
history = TrialHistory.load_history(search_space_, history_file)
playback = PlaybackSearcher(history, top_n=1, optimize_direction='max')
exp = make_experiment(train_data.copy(),eval_data=None,test_data=None,target=target,max_trials=1,random_state=1,ensemble_size = 0,searcher=playback,log_level='info')
estimator = exp.run()
_,ensemble_model = estimator.steps[-1]
print(f'model\'s random_state is {ensemble_model.cv_gbm_models_[0].random_state}')
Are you willing to submit PR?(Yes/No)
Yes
Other info / logs
The process is as follows:
System information
pip list
):Describe the current behavior
In hypergbm, if catboost train with gpu, it will report an error.
Describe the expected behavior
catboost support trainning with gpu.
Standalone code to reproduce the issue
import numpy as np
import pandas as pd
from hypernets.searchers import EvolutionSearcher
from hypergbm.search_space import GeneralSearchSpaceGenerator
from hypergbm import make_experiment
from hypernets.searchers import PlaybackSearcher
from hypernets.core.trial import TrialHistory
from hypernets.tabular.datasets import dsutils
train_data = dsutils.load_adult()
coloums_ = [f'feature{i}' for i in range(14)]+['target']
train_data.columns = coloums_
target = 'target'
print(train_data.shape)
history_file = 'history.txt'
search_space_ = GeneralSearchSpaceGenerator(
n_estimators=600,
enable_xgb=False,
enable_lightgbm=False,
catboost_init_kwargs={'task_type': 'GPU',
'devices':'1'
}
)
rs = EvolutionSearcher(search_space_,optimize_direction='max',population_size=50, sample_size=6, candidates_size=5)
exp = make_experiment(train_data.copy(),eval_data=None,test_data=None,target=target,max_trials=1,random_state=1,ensemble_size = 0, searcher=rs,log_level='info')
estimator = exp.run()
Are you willing to submit PR?(Yes/No)
No
Other info / logs
请问共线性检测是计算方差膨胀系数吗?输出结果是保留某个特征,对应的删除某个特征,这个依据是什么?
使用from hypergbm.hyper_gbm import HyperGBM
时报错:
ModuleNotFoundError: No module named "dask_ml"
@jackguagua
Just a test
同一个数据集运行HyperGBM算子,有时几分钟运行完,有时1-2天都没有结果。
System information
pip list
):Describe the current behavior
Some trials which is stopped by discriminator will reload in playbackSearch when optimize_direction='min',Cause those trials's reward is 0 which is minimum, So it's considered as the optimal model.
Describe the expected behavior
delete those trials which is stopped by discriminator when reloading history.
Are you willing to submit PR?(Yes/No)
Yes.
示例中的reward_metric的函数,好像只传了训练集的预测值和实际值?
I want to use a new evaluation criterion defined by myself.
System information
pip list
):Describe the current behavior
I wanna use hypergbm on kaggle, but i got error, Looks like a version conflict
Describe the expected behavior
Just use !pip3 install -U hypergbm then i can use hypergbm on kaggle.
Standalone code to reproduce the issue
!pip3 install -U hypergbm
import hypergbm
Are you willing to submit PR?(Yes/No)
I notice here.
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 0.23.2
Uninstalling scikit-learn-0.23.2:
Successfully uninstalled scikit-learn-0.23.2
If add !pip3 install -U scikit-learn==0.23.2, it will be okay.
Installing collected packages: scikit-learn
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.0
Uninstalling scikit-learn-1.0:
Successfully uninstalled scikit-learn-1.0
Other info / logs
Collecting hypergbm
Downloading hypergbm-0.2.3-py3-none-any.whl (2.9 MB)
|████████████████████████████████| 2.9 MB 809 kB/s eta 0:00:01
Requirement already satisfied: plotly in /opt/conda/lib/python3.7/site-packages (from hypergbm) (5.3.1)
Requirement already satisfied: catboost>=0.26 in /opt/conda/lib/python3.7/site-packages (from hypergbm) (0.26.1)
Requirement already satisfied: distributed in /opt/conda/lib/python3.7/site-packages (from hypergbm) (2021.9.0)
Requirement already satisfied: dask in /opt/conda/lib/python3.7/site-packages (from hypergbm) (2021.9.0)
Collecting hypernets==0.2.3
Downloading hypernets-0.2.3-py3-none-any.whl (3.2 MB)
|████████████████████████████████| 3.2 MB 54.2 MB/s eta 0:00:01
Collecting dask-ml
Downloading dask_ml-1.9.0-py3-none-any.whl (143 kB)
|████████████████████████████████| 143 kB 64.3 MB/s eta 0:00:01
Requirement already satisfied: xgboost>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from hypergbm) (1.4.2)
Requirement already satisfied: imbalanced-learn>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from hypergbm) (0.8.0)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from hypergbm) (4.62.1)
Requirement already satisfied: fsspec>=0.8.0 in /opt/conda/lib/python3.7/site-packages (from hypergbm) (2021.8.1)
Requirement already satisfied: lightgbm>=3.2.0 in /opt/conda/lib/python3.7/site-packages (from hypergbm) (3.2.1)
Requirement already satisfied: scikit-learn>=0.22.1 in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (0.23.2)
Requirement already satisfied: numpy>=1.16.5 in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (1.19.5)
Requirement already satisfied: pandas>=0.25.3 in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (1.3.2)
Collecting seaborn==0.11.0
Downloading seaborn-0.11.0-py3-none-any.whl (283 kB)
|████████████████████████████████| 283 kB 64.3 MB/s eta 0:00:01
Requirement already satisfied: pyarrow in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (5.0.0)
Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (1.7.1)
Requirement already satisfied: featuretools in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (0.27.1)
Requirement already satisfied: ipython in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (7.26.0)
Collecting python-geohash
Downloading python-geohash-0.8.5.tar.gz (17 kB)
Requirement already satisfied: traitlets in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (5.0.5)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from hypernets==0.2.3->hypergbm) (1.15.0)
Requirement already satisfied: matplotlib>=2.2 in /opt/conda/lib/python3.7/site-packages (from seaborn==0.11.0->hypernets==0.2.3->hypergbm) (3.4.3)
Requirement already satisfied: graphviz in /opt/conda/lib/python3.7/site-packages (from catboost>=0.26->hypergbm) (0.8.4)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from imbalanced-learn>=0.7.0->hypergbm) (1.0.1)
Collecting scikit-learn>=0.22.1
Downloading scikit_learn-1.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (23.1 MB)
|████████████████████████████████| 23.1 MB 52.6 MB/s eta 0:00:01
Requirement already satisfied: wheel in /opt/conda/lib/python3.7/site-packages (from lightgbm>=3.2.0->hypergbm) (0.37.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn==0.11.0->hypernets==0.2.3->hypergbm) (2.8.0)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn==0.11.0->hypernets==0.2.3->hypergbm) (8.2.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn==0.11.0->hypernets==0.2.3->hypergbm) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn==0.11.0->hypernets==0.2.3->hypergbm) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn==0.11.0->hypernets==0.2.3->hypergbm) (1.3.1)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas>=0.25.3->hypernets==0.2.3->hypergbm) (2021.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.22.1->hypernets==0.2.3->hypergbm) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /opt/conda/lib/python3.7/site-packages (from dask->hypergbm) (0.11.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from dask->hypergbm) (1.6.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from dask->hypergbm) (21.0)
Requirement already satisfied: partd>=0.3.10 in /opt/conda/lib/python3.7/site-packages (from dask->hypergbm) (1.2.0)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.7/site-packages (from dask->hypergbm) (5.4.1)
Requirement already satisfied: locket in /opt/conda/lib/python3.7/site-packages (from partd>=0.3.10->dask->hypergbm) (0.2.1)
Collecting dask-glm>=0.2.0
Downloading dask_glm-0.2.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: multipledispatch>=0.4.9 in /opt/conda/lib/python3.7/site-packages (from dask-ml->hypergbm) (0.6.0)
Requirement already satisfied: numba>=0.51.0 in /opt/conda/lib/python3.7/site-packages (from dask-ml->hypergbm) (0.53.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (2.4.0)
Requirement already satisfied: psutil>=5.0 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (5.8.0)
Requirement already satisfied: tornado>=5 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (6.1)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (57.4.0)
Requirement already satisfied: tblib>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (1.0.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (3.0.1)
Requirement already satisfied: click>=6.6 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (8.0.1)
Requirement already satisfied: zict>=0.1.3 in /opt/conda/lib/python3.7/site-packages (from distributed->hypergbm) (2.0.0)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from click>=6.6->distributed->hypergbm) (3.4.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /opt/conda/lib/python3.7/site-packages (from numba>=0.51.0->dask-ml->hypergbm) (0.36.0)
Requirement already satisfied: heapdict in /opt/conda/lib/python3.7/site-packages (from zict>=0.1.3->distributed->hypergbm) (1.0.1)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=6.6->distributed->hypergbm) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=6.6->distributed->hypergbm) (3.5.0)
Requirement already satisfied: backcall in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (0.2.0)
Requirement already satisfied: decorator in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (5.0.9)
Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (4.8.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (3.0.19)
Requirement already satisfied: matplotlib-inline in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (0.1.2)
Requirement already satisfied: pygments in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (2.10.0)
Requirement already satisfied: pickleshare in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (0.7.5)
Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.7/site-packages (from ipython->hypernets==0.2.3->hypergbm) (0.18.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /opt/conda/lib/python3.7/site-packages (from jedi>=0.16->ipython->hypernets==0.2.3->hypergbm) (0.8.2)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.7/site-packages (from pexpect>4.3->ipython->hypernets==0.2.3->hypergbm) (0.7.0)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->hypernets==0.2.3->hypergbm) (0.2.5)
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.7/site-packages (from traitlets->hypernets==0.2.3->hypergbm) (0.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from jinja2->distributed->hypergbm) (2.0.1)
Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from plotly->hypergbm) (8.0.1)
Building wheels for collected packages: python-geohash
Building wheel for python-geohash (setup.py) ... done
Created wheel for python-geohash: filename=python_geohash-0.8.5-cp37-cp37m-linux_x86_64.whl size=50692 sha256=0e0dc40b118fd1a5d26f093ec91dfc5cfbe771e34e3f469b9a788f8100ba7926
Stored in directory: /root/.cache/pip/wheels/ea/62/7a/e8b943f1d8025cd93a93928a162319e56843301c8c06610ffe
Successfully built python-geohash
Installing collected packages: scikit-learn, seaborn, python-geohash, dask-glm, hypernets, dask-ml, hypergbm
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 0.23.2
Uninstalling scikit-learn-0.23.2:
Successfully uninstalled scikit-learn-0.23.2
Attempting uninstall: seaborn
Found existing installation: seaborn 0.11.2
Uninstalling seaborn-0.11.2:
Successfully uninstalled seaborn-0.11.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.3 which is incompatible.
hypertools 0.7.0 requires scikit-learn!=0.22,<0.24,>=0.19.1, but you have scikit-learn 1.0 which is incompatible.
Successfully installed dask-glm-0.2.0 dask-ml-1.9.0 hypergbm-0.2.3 hypernets-0.2.3 python-geohash-0.8.5 scikit-learn-1.0 seaborn-0.11.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
AttributeError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in dep_map(self)
3015 try:
-> 3016 return self.__dep_map
3017 except AttributeError:
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in getattr(self, attr)
2812 if attr.startswith(''):
-> 2813 raise AttributeError(attr)
2814 return getattr(self._provider, attr)
AttributeError: _DistInfoDistribution__dep_map
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in _parsed_pkg_info(self)
3006 try:
-> 3007 return self.pkg_info
3008 except AttributeError:
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in getattr(self, attr)
2812 if attr.startswith(''):
-> 2813 raise AttributeError(attr)
2814 return getattr(self._provider, attr)
AttributeError: _pkg_info
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_43/4208425762.py in
----> 1 import hypergbm
/opt/conda/lib/python3.7/site-packages/hypergbm/init.py in
4
5 """
----> 6 from hypernets.experiment import CompeteExperiment
7 from .hyper_gbm import HyperGBM, HyperGBMEstimator, HyperGBMExplainer, HyperEstimator, HyperModel
8 from .experiment import make_experiment
/opt/conda/lib/python3.7/site-packages/hypernets/experiment/init.py in
7 from ._experiment import Experiment, ExperimentCallback
8 from .general import GeneralExperiment
----> 9 from .compete import CompeteExperiment, SteppedExperiment, StepNames
10 from ._callback import ConsoleCallback, SimpleNotebookCallback
11 from ._maker import make_experiment
/opt/conda/lib/python3.7/site-packages/hypernets/experiment/compete.py in
19 from hypernets.core import set_random_state
20 from hypernets.experiment import Experiment
---> 21 from hypernets.tabular import dask_ex as dex, column_selector as cs
22 from hypernets.tabular import drift_detection as dd, feature_importance as fi, pseudo_labeling as pl
23 from hypernets.tabular.cache import cache
/opt/conda/lib/python3.7/site-packages/hypernets/tabular/dask_ex/init.py in
18
19 try:
---> 20 import dask_ml.preprocessing as dm_pre
21 import dask_ml.model_selection as dm_sel
22
/opt/conda/lib/python3.7/site-packages/dask_ml/init.py in
7
8 try:
----> 9 version = get_distribution(name).version
10 all.append("version")
11 except DistributionNotFound:
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in get_distribution(dist)
464 dist = Requirement.parse(dist)
465 if isinstance(dist, Requirement):
--> 466 dist = get_provider(dist)
467 if not isinstance(dist, Distribution):
468 raise TypeError("Expected string, Requirement, or Distribution", dist)
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in get_provider(moduleOrReq)
340 """Return an IResourceProvider for the named module or requirement"""
341 if isinstance(moduleOrReq, Requirement):
--> 342 return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
343 try:
344 module = sys.modules[moduleOrReq]
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in require(self, *requirements)
884 included, even if they were already activated in this working set.
885 """
--> 886 needed = self.resolve(parse_requirements(requirements))
887
888 for dist in needed:
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
778
779 # push the new requirements onto the stack
--> 780 new_requirements = dist.requires(req.extras)[::-1]
781 requirements.extend(new_requirements)
782
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in requires(self, extras)
2732 def requires(self, extras=()):
2733 """List of Requirements needed for this distro if extras are used"""
-> 2734 dm = self._dep_map
2735 deps = []
2736 deps.extend(dm.get(None, ()))
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in _dep_map(self)
3016 return self.__dep_map
3017 except AttributeError:
-> 3018 self.__dep_map = self._compute_dependencies()
3019 return self.__dep_map
3020
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in _compute_dependencies(self)
3025 reqs = []
3026 # Including any condition expressions
-> 3027 for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
3028 reqs.extend(parse_requirements(req))
3029
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in _parsed_pkg_info(self)
3007 return self._pkg_info
3008 except AttributeError:
-> 3009 metadata = self.get_metadata(self.PKG_INFO)
3010 self._pkg_info = email.parser.Parser().parsestr(metadata)
3011 return self._pkg_info
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in get_metadata(self, name)
1405 return ""
1406 path = self._get_metadata_path(name)
-> 1407 value = self._get(path)
1408 try:
1409 return value.decode('utf-8')
/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py in _get(self, path)
1609
1610 def _get(self, path):
-> 1611 with open(path, 'rb') as stream:
1612 return stream.read()
1613
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/lib/python3.7/site-packages/scikit_learn-0.23.2.dist-info/METADATA'
程序运行报错.
System information
pip list
):backcall 0.2.0
bcrypt 4.0.1
catboost 1.2.2
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.3.0
click 8.1.7
cloudpickle 2.2.1
cryptography 41.0.4
cycler 0.11.0
dask 2022.2.0
dask-glm 0.3.0
dask-ml 2022.5.27
decorator 5.1.1
distributed 2022.2.0
fonttools 4.38.0
fsspec 2023.1.0
graphviz 0.20.1
HeapDict 1.0.1
hypergbm 0.3.0
hypernets 0.3.0
idna 3.4
imbalanced-learn 0.11.0
importlib-metadata 6.7.0
ipython 7.34.0
jedi 0.19.1
Jinja2 3.1.2
joblib 1.3.2
kiwisolver 1.4.5
lightgbm 4.1.0
llvmlite 0.39.1
locket 1.0.0
MarkupSafe 2.1.3
matplotlib 3.5.3
matplotlib-inline 0.1.6
msgpack 1.0.5
multipledispatch 1.0.0
numba 0.56.4
numpy 1.21.6
packaging 23.2
pandas 1.3.5
paramiko 3.3.1
parso 0.8.3
partd 1.4.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.0.1
plotly 5.17.0
prettytable 3.7.0
prompt-toolkit 3.0.39
psutil 5.8.0
ptyprocess 0.7.0
pyarrow 12.0.1
pycparser 2.21
Pygments 2.16.1
PyNaCl 1.5.0
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
requests 2.31.0
scikit-learn 1.0.2
scipy 1.7.3
setuptools 58.1.0
six 1.16.0
sortedcontainers 2.4.0
sparse 0.13.0
tblib 2.0.0
tenacity 8.2.3
threadpoolctl 3.1.0
toolz 0.12.0
tornado 6.2
tqdm 4.66.1
traitlets 5.9.0
typing_extensions 4.7.1
urllib3 2.0.7
wcwidth 0.2.8
xgboost 1.6.2
XlsxWriter 3.1.8
zict 2.2.0
zipp 3.15.0
**distributed.worker - WARNING - Compute Failed
Function: execute_task
args: ((<function pipe at 0x7f54ae8260e0>, [0 1
1 0
Name: Class, dtype: int64], <Serialize: functools.partial(<function _concat at 0x7f54916d0cb0>, ignore_index=False)>, <Serialize: functools.partial(<function unique at 0x7f54916c97a0>, series_name='Class')>))
kwargs: {}
Exception: 'TypeError("'Serialize' object is not callable")'
Traceback (most recent call last):
File "", line 2, in
File "", line 5, in train
File "/usr/local/python3.7.17/lib/python3.7/site-packages/hypergbm/experiment.py", line 246, in make_experiment
**kwargs
File "/usr/local/python3.7.17/lib/python3.7/site-packages/hypernets/experiment/_maker.py", line 312, in make_experiment
task, _ = tb.infer_task_type(y_train, excludes=dc_nan_chars if dc_nan_chars is not None else None)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/hypernets/tabular/toolbox.py", line 320, in infer_task_type
uniques = cls.unique(y)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/hypernets/tabular/dask_ex/_toolbox.py", line 230, in unique
uniques = y.unique().compute()
File "/usr/local/python3.7.17/lib/python3.7/site-packages/dask/base.py", line 290, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/dask/base.py", line 573, in compute
results = schedule(dsk, keys, **kwargs)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/client.py", line 2994, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/client.py", line 2152, in gather
asynchronous=asynchronous,
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/utils.py", line 310, in sync
self.loop, func, *args, callback_timeout=callback_timeout, kwargs
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/utils.py", line 376, in sync
raise exc.with_traceback(tb)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/utils.py", line 349, in f
result = yield future
File "/usr/local/python3.7.17/lib/python3.7/site-packages/tornado/gen.py", line 769, in run
value = future.result()
File "/usr/local/python3.7.17/lib/python3.7/site-packages/distributed/client.py", line 2009, in _gather
raise exception.with_traceback(traceback)
File "/usr/local/python3.7.17/lib/python3.7/site-packages/toolz/functoolz.py", line 628, in pipe
data = func(data)
TypeError: 'Serialize' object is not callable
Describe the expected behavior
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Jupyter notebook.
Are you willing to submit PR?(Yes/No)
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.
System information
pip list
):Describe the current behavior
if i set log_level='error' in make_experiment, no trainning output when i run it on local PC, However, if i run it in Jupyter notebook, it useless, i try all log_lovels, output is same.
Describe the expected behavior
if set log_level='error', error info only can be showed when i run in jupyter notebook.
Standalone code to reproduce the issue
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
X,y = datasets.load_breast_cancer(as_frame=True,return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.7,random_state=335)
from hypergbm import make_experiment
train_data = pd.concat([X_train,y_train],axis=1)
experiment = make_experiment(train_data, target='target', log_level = 'error',reward_metric='precision')
estimator = experiment.run()
System information
pip list
):Describe the current behavior
When I use hypergbm to predict < House Prices> , only the result of xgboost very poor (reference(rmse): LGB: 25000, catboost: 25000, xgboost: 170000), and then I try to use native xgboost to predict, The results are very close to the LightgBM model
Describe the expected behavior
Same results between hypergbm_xgboost and native xgboost.
Are you willing to submit PR?(Yes/No)
Yes
In the feature selection stage, no feature is selected, and an error was returned:
07-11 13:49:59 I hypernets.e.compete.py 882 - feature_selection drop 10 columns, 0 kept
raise ValueError("No data output, ??? ") ValueError: No data output, ???
Can protection mechanism be added so that at least a few features are retained when selected
is None?
Please make sure that this is a bug.
System information
pip list
):Describe the current behavior
We wanna train a dataset via dask distrubuted with 3 nodes.
Describe the expected behavior
Training finished successfully
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Jupyter notebook.
dask.zip
Are you willing to submit PR?(Yes/No)
Yes
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.
您好,我在说明文档中未找到Shap的使用示例,请问怎么使用Shap对生成的融合模型进行解释呢?
In sklearn cross validation function, we can pass group parameter. Looking for this option here,
Please make sure that this is a bug.
System information
pip list
):anyio 3.6.1
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asttokens 2.0.5
attrs 22.1.0
Babel 2.10.3
backcall 0.2.0
bcrypt 3.2.2
beautifulsoup4 4.11.1
bleach 5.0.1
bokeh 2.4.3
catboost 1.0.6
certifi 2022.6.15
cffi 1.15.1
charset-normalizer 2.1.0
click 8.1.3
cloudpickle 2.1.0
convertdate 2.4.0
cryptography 37.0.4
cycler 0.11.0
dask 2022.8.0
dask-glm 0.2.0
dask-ml 2022.5.27
debugpy 1.6.2
decorator 5.1.1
defusedxml 0.7.1
distributed 2022.8.0
entrypoints 0.4
executing 0.9.1
fastjsonschema 2.16.1
featuretools 1.12.1
Flask 2.2.1
fonttools 4.34.4
fsspec 2022.7.1
graphviz 0.20.1
gunicorn 20.1.0
hboard 0.1.0
hboard-widget 0.1.0
HeapDict 1.0.1
hijri-converter 2.2.4
holidays 0.14.2
hypergbm 0.2.5.4
hypernets 0.2.5.4
idna 3.3
imbalanced-learn 0.9.1
importlib-metadata 4.12.0
iniconfig 1.1.1
ipykernel 6.15.1
ipython 8.4.0
ipython-genutils 0.2.0
ipywidgets 7.7.1
itsdangerous 2.1.2
jedi 0.18.1
jieba 0.42.1
Jinja2 3.1.2
joblib 1.1.0
json5 0.9.9
jsonschema 4.9.1
jupyter 1.0.0
jupyter-client 7.3.4
jupyter-console 6.4.4
jupyter-core 4.11.1
jupyter-server 1.18.1
jupyterlab 3.4.4
jupyterlab-pygments 0.2.2
jupyterlab-server 2.15.0
jupyterlab-widgets 1.1.1
kiwisolver 1.4.4
korean-lunar-calendar 0.2.1
lightgbm 3.3.2
llvmlite 0.39.0
locket 1.0.0
MarkupSafe 2.1.1
matplotlib 3.5.2
matplotlib-inline 0.1.3
mistune 0.8.4
msgpack 1.0.4
multipledispatch 0.6.0
nbclassic 0.4.3
nbclient 0.6.6
nbconvert 6.5.0
nbformat 5.4.0
nest-asyncio 1.5.5
notebook 6.4.12
notebook-shim 0.1.0
numba 0.56.0
numpy 1.22.4
packaging 21.3
pandas 1.4.3
pandocfilters 1.5.0
paramiko 2.11.0
parso 0.8.3
partd 1.2.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.2.0
pip 22.0.4
plotly 5.9.0
pluggy 1.0.0
prettytable 3.3.0
prometheus-client 0.14.1
prompt-toolkit 3.0.30
psutil 5.9.1
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
py4j 0.10.9.5
pyarrow 9.0.0
pycparser 2.21
Pygments 2.12.0
PyMeeus 0.5.11
PyNaCl 1.5.0
pyparsing 3.0.9
pyrsistent 0.18.1
pyspark 3.3.0
pytest 7.1.2
python-dateutil 2.8.2
python-geohash 0.8.5
pytz 2022.1
PyYAML 6.0
pyzmq 23.2.0
qtconsole 5.3.2
QtPy 2.2.0
requests 2.28.1
scikit-learn 1.1.2
scikit-plot 0.3.7
scipy 1.9.0
seaborn 0.11.2
Send2Trash 1.8.0
setuptools 58.1.0
shap 0.41.0
six 1.16.0
slicer 0.0.7
sniffio 1.2.0
sortedcontainers 2.4.0
soupsieve 2.3.2.post1
stack-data 0.3.0
tblib 1.7.0
tenacity 8.0.1
terminado 0.15.0
threadpoolctl 3.1.0
tinycss2 1.1.1
tomli 2.0.1
toolz 0.12.0
tornado 6.1
tqdm 4.64.0
traitlets 5.3.0
typing_extensions 4.3.0
urllib3 1.26.11
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.3.3
Werkzeug 2.2.1
wheel 0.37.1
widgetsnbextension 3.6.1
woodwork 0.17.2
xgboost 1.6.1
XlsxWriter 3.0.3
zict 2.2.0
zipp 3.8.1
Describe the current behavior
开启了特征衍生后,选取类别型特征、连续型特征,训练报错
Describe the expected behavior
Finish the training successfully
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Jupyter notebook.
from hypergbm import make_experiment
import json
test={"train_data":"./train_1010.csv",
"test_data":"./test_1010.csv",
"pseudo_labeling_sample_number":10,
"pseudo_labeling":True,
"max_trials":10,
"eval_size":0.2,
"feature_selection_quantile":0.3,
"train_test_split_strategy":False,
"ensemble_size":10,
"random_state":8888,
"feature_selection":True,
"feature_reselection_estimator_size":10,
"drift_detection_remove_size":0.1,
"uuid":"train_2da2bceca48c4a568cd904ed8aa66db7",
"test_ratio":0.3,
"drift_detection_num_folds":5,
"feature_selection_threshold":0.1,
"pseudo_labeling_strategy":"threshold",
"feature_reselection_strategy":"number",
"down_sample_search":False,
"feature_generation":True,
"feature_generation_categories_cols":["join_chl_type","is_permonth_fee"],
#"feature_generation_date_cols":["],
"feature_generation_continuous_cols":["sph_num","total_tax_fee"],
"pseudo_labeling_resplit":True,
"down_sample_search_max_trials":10,
"feature_selection_number":0.8,
"collinearity_detection":True,
"drift_detection_remove_shift_variable":True,
"pseudo_labeling_proba_threshold":0.1,
"early_stopping_time_limit":3600,
"num_folds":5,
"searcher":"evolution",
"drift_detection_min_features":10,
"down_sample_search_size":0.9,
"drift_detection":True,
"retrain_on_wholedata":True,
"early_stopping_rounds":10,
"log_level":"debug",
"data_cleaner_args":{
"drop_idness_columns":True,
"drop_constant_columns":True,
"drop_label_nan_rows":True,
"drop_duplicated_columns":True,
"nan_chars":"/t"
},
"feature_reselection":False,
"drift_detection_variable_shift_threshold":0.7,
"drift_detection_threshold":0.7,
"target":"churn_target",
"testRatio":0.3,
"task":"binary",
"cv":True,
"feature_selection_strategy":"threshold",
"reward_metric":"auc",
"feature_generation":True,
"early_stopping_reward":0.8,
"feature_reselection_number":10,
"down_sample_search_time_limit":300
}
ex = make_experiment(**test)
model = ex.run()
Are you willing to submit PR?(Yes/No)
Yes
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.
您好,我新建了一个虚拟环境专门安装HyperGBM,依赖包都已成功安装,但是运行最简单的示例程序报错了,可否帮忙看一下是哪里出了问题?谢谢!
运用以下示例程序:
from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils
train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)
报错内容很长,截取最前面几条:
Traceback (most recent call last):
File "", line 1, in
File "D:\Software\Anaconda3\envs\hp\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Software\Anaconda3\envs\hp\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "D:\Software\Anaconda3\envs\hp\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Software\Anaconda3\envs\hp\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
还包括多个MemoryError例如:
import scipy.linalg._interpolative_backend as _backend
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 839, in exec_module
File "", line 934, in get_code
File "", line 1033, in get_data
MemoryError
System information
Describe the feature and the current behavior/state.
PlaybackSearcher - Could not find the help page for this.
Will this change the current api? How?
No,
Hello. I trained an estimator with make_experiment setting the parameter feature_selection=True, however, the estimator.get_params() does not give me the selected features and I am unable to extract feature importances as well.
HyperGBM version - 0.2.5.2
experiment = make_experiment(
train_data=train, target='Target', eval_data=test,
cv=True, num_folds=5,
reward_metric='auc', pos_label=1,
max_trials=2,
feature_selection=True,
feature_selection_threshold=0.0001,
drift_detection=True,
class_balancing='RandomUnderSampler',
njobs = -1)
estimator = experiment.run()
The estimator.get_params() gives the following ouput:
{'memory': None, 'steps': [('data_adaption', DataAdaptionStep(name='data_adaption')), ('data_clean', DataCleanStep(cv=True, data_cleaner_args={'correct_object_dtype': True, 'drop_columns': None, 'drop_constant_columns': True, 'drop_duplicated_columns': False, 'drop_idness_columns': True, 'drop_label_nan_rows': True, 'int_convert_to': 'float', 'nan_chars': None, 'reduce_mem_usage': False, 'reserve_columns': None}, name='data_clean')), ('feature_selection', FeatureImportanceSelectionStep(name='feature_selection', number=15, quantile=None, strategy='number', threshold=None)), ('estimator', GreedyEnsemble(weight=[0.0, 1.0], scores=[0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832]))], 'verbose': False, 'data_adaption': DataAdaptionStep(name='data_adaption'), 'data_clean': DataCleanStep(cv=True, data_cleaner_args={'correct_object_dtype': True, 'drop_columns': None, 'drop_constant_columns': True, 'drop_duplicated_columns': False, 'drop_idness_columns': True, 'drop_label_nan_rows': True, 'int_convert_to': 'float', 'nan_chars': None, 'reduce_mem_usage': False, 'reserve_columns': None}, name='data_clean'), 'feature_selection': FeatureImportanceSelectionStep(name='feature_selection', number=15, quantile=None, strategy='number', threshold=None), 'estimator': GreedyEnsemble(weight=[0.0, 1.0], scores=[0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832, 0.7872902176075832]), 'data_adaption__memory_limit': 0.05, 'data_adaption__min_cols': 0.3, 'data_adaption__name': 'data_adaption', 'data_adaption__target': None, 'data_clean__cv': True, 'data_clean__data_cleaner_args': {'nan_chars': None, 'correct_object_dtype': True, 'drop_constant_columns': True, 'drop_label_nan_rows': True, 'drop_idness_columns': True, 'drop_columns': None, 'reserve_columns': None, 'drop_duplicated_columns': False, 'reduce_mem_usage': False, 'int_convert_to': 'float'}, 'data_clean__name': 'data_clean', 'data_clean__train_test_split_strategy': None, 'feature_selection__name': 'feature_selection', 'feature_selection__number': 15, 'feature_selection__quantile': None, 'feature_selection__strategy': 'number', 'feature_selection__threshold': None}
when i start with the distributed training example:
def train():
cluster = LocalCluster(processes=False)
client = Client(cluster)
train_data = dd.from_pandas(dsutils.load_blood(), npartitions=1)
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)
some error occured:
Traceback (most recent call last):
File "", line 2, in
File "", line 5, in train
File "/root/anaconda3/lib/python3.9/site-packages/hypergbm/experiment.py", line 104, in make_experiment
tb = get_tool_box(train_data)
File "/root/anaconda3/lib/python3.9/site-packages/hypernets/tabular/_base.py", line 68, in get_tool_box
raise ValueError(f'No toolbox found for your data with types: {[type(x) for x in data]}. '
ValueError: No toolbox found for your data with types: [<class 'dask.dataframe.core.DataFrame'>]. Registered tabular toolboxes are ['ToolBox'].
how can i sovle it?
System information
pip list
):asttokens 2.0.5
backcall 0.2.0
bcrypt 3.2.2
catboost 1.0.6
certifi 2022.6.15
cffi 1.15.0
charset-normalizer 2.0.12
click 8.1.3
cloudpickle 2.1.0
convertdate 2.4.0
cryptography 37.0.3
cycler 0.11.0
dask 2022.6.1
dask-glm 0.2.0
dask-ml 2022.5.27
decorator 5.1.1
distributed 2022.6.1
executing 0.8.3
featuretools 1.10.0
Flask 2.1.2
fonttools 4.33.3
fsspec 2022.5.0
graphviz 0.20
gunicorn 20.1.0
HeapDict 1.0.1
hijri-converter 2.2.4
holidays 0.14.2
hypergbm 0.2.5.2
hypernets 0.2.5.2
idna 3.3
imbalanced-learn 0.9.1
importlib-metadata 4.12.0
ipython 8.4.0
itsdangerous 2.1.2
jedi 0.18.1
Jinja2 3.1.2
joblib 1.1.0
kiwisolver 1.4.3
korean-lunar-calendar 0.2.1
lightgbm 3.3.2
llvmlite 0.38.1
locket 1.0.0
MarkupSafe 2.1.1
matplotlib 3.5.2
matplotlib-inline 0.1.3
msgpack 1.0.4
multipledispatch 0.6.0
numba 0.55.2
numpy 1.22.4
packaging 21.3
pandas 1.4.1
paramiko 2.11.0
parso 0.8.3
partd 1.2.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.1.1
pip 22.0.4
plotly 5.9.0
prettytable 3.3.0
prompt-toolkit 3.0.30
psutil 5.9.1
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 8.0.0
pycparser 2.21
Pygments 2.12.0
PyMeeus 0.5.11
PyNaCl 1.5.0
pyparsing 3.0.9
python-dateutil 2.8.2
python-geohash 0.8.5
pytz 2022.1
PyYAML 6.0
requests 2.28.0
scikit-learn 1.1.1
scikit-plot 0.3.7
scipy 1.8.1
seaborn 0.11.2
setuptools 58.1.0
six 1.16.0
sortedcontainers 2.4.0
stack-data 0.3.0
tblib 1.7.0
tenacity 8.0.1
threadpoolctl 3.1.0
toolz 0.11.2
tornado 6.1
tqdm 4.64.0
traitlets 5.3.0
urllib3 1.26.9
wcwidth 0.2.5
Werkzeug 2.1.2
wheel 0.37.1
woodwork 0.16.4
xgboost 1.6.1
XlsxWriter 3.0.3
zict 2.2.0
zipp 3.8.0
Describe the current behavior
Training dataset with Multiclass and F1 estimator, if the search space called catboost, the estimator reported an error.
File "/usr/local/lib/python3.9/site-packages/hypernets/model/hyper_model.py", line 61, in _run_trial
scores, oof, oof_scores = estimator.fit_cross_validation(X, y, stratified=True, num_folds=num_folds,
File "/usr/local/lib/python3.9/site-packages/hypergbm/hyper_gbm.py", line 304, in fit_cross_validation
fold_est.fit(x_train_fold, y_train_fold, **fit_kwargs)
File "/usr/local/lib/python3.9/site-packages/hypergbm/estimators.py", line 656, in fit
super().fit(X, y, **kwargs)
File "/usr/local/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
File "/usr/local/lib/python3.9/site-packages/catboost/core.py", line 2278, in _fit
self._train(
File "/usr/local/lib/python3.9/site-packages/catboost/core.py", line 1705, in _train
self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
File "_catboost.pyx", line 4585, in _catboost._CatBoost._train
File "_catboost.pyx", line 4634, in _catboost._CatBoost._train
_catboost.CatBoostError: catboost/libs/metrics/metric.cpp:6252: Eval metric should have a single value. Metric F1 provides a value for each class, thus it cannot be used as a single value to select best iteration or to detect overfitting. If you just want to look on the values of this metric use custom_metric parameter.
Describe the expected behavior
Finish the training successfully and get the model result.
Standalone code to reproduce the issue
test.json
{"pseudo_labeling_sample_number": 10, "pseudo_labeling": false, "max_trials": 10, "eval_size": 0.2, "feature_selection_quantile": 0.3, "train_test_split_strategy": false, "ensemble_size": 10, "random_state": 8888, "feature_selection": true, "feature_reselection_estimator_size": 10, "drift_detection_remove_size": 0.1, "uuid": "train_5fedecea53b14abbab7196c49cc93a82", "test_ratio": 0.3, "drift_detection_num_folds": 5, "feature_selection_threshold": 0.1, "pseudo_labeling_strategy": "threshold", "feature_reselection_strategy": "number", "down_sample_search": true, "pseudo_labeling_resplit": true, "down_sample_search_max_trials": 10, "test_data": "/root/data/test.csv", "feature_selection_number": 0.8, "collinearity_detection": true, "drift_detection_remove_shift_variable": true, "pseudo_labeling_proba_threshold": 0.1, "early_stopping_time_limit": 3600, "train_data": "/root/data/train.csv", "num_folds": 3, "searcher": "evolution", "drift_detection_min_features": 10, "down_sample_search_size": 0.1, "drift_detection": true, "retrain_on_wholedata": true, "early_stopping_rounds": 10, "data_cleaner_args": {"drop_idness_columns": true, "drop_constant_columns": true, "drop_label_nan_rows": true, "drop_duplicated_columns": true, "nan_chars": "/t", "reserve_columns": ["dt_m_1000", "dt_m_1003", "dt_m_1004", "dt_m_1005", "dt_m_1006", "dt_m_1009", "dt_m_1011", "dt_m_1012", "dt_m_1015", "dt_m_1017", "dt_m_1027", "dt_m_1028", "dt_m_1032", "dt_m_1034", "dt_m_1035", "dt_m_1041", "dt_m_1043", "dt_m_1051", "dt_m_1052", "dt_m_1067", "dt_m_1068", "dt_m_1073", "dt_m_1074", "dt_m_1075", "dt_m_1085", "dt_m_1086", "dt_m_1087", "dt_m_1096", "dt_m_1099", "dt_m_1102", "dt_m_1105", "dt_m_1108", "dt_m_1111", "dt_m_1594", "dt_m_1601", "dt_m_1617", "dt_m_1618", "dt_m_1620", "dt_m_1630", "dt_m_1633", "cust_access_net_dt", "credit_level", "membership_level", "gender", "birth_date", "cust_point", "inet_pd_inst_cnt", "star_level", "product_nbr", "open_date", "last_year_capture_user_flag", "app1_visits", "app2_visits", "app3_visits", "app4_visits", "app5_visits", "app6_visits", "app7_visits", "app8_visits", "pro_brand", "term_model", "register_date", "market_price", "label"]}, "feature_reselection": false, "drift_detection_variable_shift_threshold": 0.7, "drift_detection_threshold": 0.7, "target": "label", "testRatio": 0.3, "task": "multiclass", "cv": true, "feature_selection_strategy": "threshold", "reward_metric": "f1", "feature_generation": false, "early_stopping_reward": 0.8, "feature_reselection_number": 10, "down_sample_search_time_limit": 300}
import json
import pickle
from hypergbm import make_experiment
from sklearn.model_selection import train_test_split
import pandas as pd
import sys
if name == "main":
args = sys.argv[1:]
prams = json.load(open("test.json", "r"))
ex = make_experiment(**prams)
model = ex.run()
pickle.dump(model, open('/root/model/model.pkl', 'wb'))
print("Training finished")
Are you willing to submit PR?(Yes/No)
Yes
Please make sure that this is a bug.
System information
pip list
):Describe the current behavior
2023-03-01 17:32:55.332 [ERROR] 03-01 17:32:55 E hypernets.m.hyper_model.py 71 - run_trail failed! trail_no=5
2023-03-01 17:32:55.333 [ERROR] 03-01 17:32:55 E hypernets.m.hyper_model.py 73 - Traceback (most recent call last):
2023-03-01 17:32:55.333 [ERROR] File "/usr/local/lib/python3.7/site-packages/hypernets/model/hyper_model.py", line 64, in _run_trial
2023-03-01 17:32:55.333 [ERROR] **fit_kwargs)
2023-03-01 17:32:55.333 [ERROR] ValueError: not enough values to unpack (expected 3, got 2)
Describe the expected behavior
Training completed
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Jupyter notebook.
from hypergbm import make_experiment
experiment = make_experiment(train_data, target='y', reward_metric='auc', cv=True)
estimator = experiment.run()
Are you willing to submit PR?(Yes/No)
Yes
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.
Dear HyperGBM team,
I'd like to use xgboost to select features for my survival model. In xgboost original packages I find they already support Cox Proportional Hazard Regression for survival analyses (see https://github.com/dmlc/xgboost/blob/f9302a56fbdabb28d071ed9e402ca75e03673b4d/src/objective/regression_obj.cu#L273-L358).
I would like to know if Cox regression is available in HyperGBM, and how should I originize my input data for it, numpy structured array or something else (survival data has two labels)?
All the best,
Wenyi Jin
I've installed hypergbm from pip, but the quick start example running popped the error.
from hypergbm import make_experiment
from tabular_toolbox.datasets import dsutils
train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)
traceback (most recent call last):
File "E:/git/HyperGBM/test.py", line 1, in <module>
from hypergbm import make_experiment
File "E:\git\HyperGBM\hypergbm\__init__.py", line 6, in <module>
from hypernets.experiment import CompeteExperiment
ImportError: cannot import name 'CompeteExperiment' from 'hypernets.experiment' (C:\Users\LENOVO\AppData\Local\Programs\Python\Python37\lib\site-packages\hypernets\experiment\__init__.py)
My python is 3.7.5, and all the other dependencies are installed automatically based on requirements.txt.
Am I take the correct configuration?
当 webui=True 参数设定时
会报错 AttributeError: 'CompeteExperiment' object has no attribute 'y_eval_pred'
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\hypernets\experiment_experiment.py", line 86, in run
callback.experiment_start(self)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\hboard\callbacks.py", line 125, in experiment_start
super(WebVisExperimentCallback, self).experiment_start(exp)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\hypernets\experiment_callback.py", line 577, in experiment_start
d = ExperimentExtractor(exp).extract()
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\hypernets\experiment_extractor.py", line 671, in extract
if self.is_evaluated():
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\hypernets\experiment_extractor.py", line 657, in is_evaluated
return self.exp.X_eval is not None and self.exp.y_eval is not None and self.exp.y_eval_pred is not None
AttributeError: 'CompeteExperiment' object has no attribute 'y_eval_pred'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.