Giter Site home page Giter Site logo

deephyper / deephyper Goto Github PK

View Code? Open in Web Editor NEW
262.0 262.0 59.0 43.65 MB

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

Home Page: https://deephyper.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 99.58% Shell 0.39% Dockerfile 0.03%
autodl automl deep-learning deep-neural-networks hpc hyperparameter-optimization hyperparameter-search keras machine-learning ml multi-fidelity neural-architecture-search python pytorch scalability tensorflow uncertainty-quantification

deephyper's Introduction

DOI GitHub tag (latest by date) Documentation Status PyPI - License PyPI - Downloads Open In Colab

What is DeepHyper?

DeepHyper is a powerful Python package for automating machine learning tasks, particularly focused on optimizing hyperparameters, searching for optimal neural architectures, and quantifying uncertainty through the use of deep ensembles. With DeepHyper, users can easily perform these tasks on a single machine or distributed across multiple machines, making it ideal for use in a variety of environments. Whether you're a beginner looking to optimize your machine learning models or an experienced data scientist looking to streamline your workflow, DeepHyper has something to offer. So why wait? Start using DeepHyper today and take your machine-learning skills to the next level!

Install Instructions

Installation with pip:

# For the most basic set of features (hyperparameter search)
pip install deephyper

# For the default set of features including:
# - hyperparameter search with transfer-learning
# - neural architecture search
# - deep ensembles
# - Ray-based distributed computing
# - Learning-curve extrapolation for multi-fidelity hyperparameter search
pip install "deephyper[default]"

More details about the installation process can be found at DeepHyper Installations.

Quickstart

Open In Colab

The black-box function named run is defined by taking an input job named job which contains the different variables to optimize job.parameters. Then the run-function is bound to an Evaluator in charge of distributing the computation of multiple evaluations. Finally, a Bayesian search named CBO is created and executed to find the values of config which MAXIMIZE the return value of run(job).

def run(job):
    # The suggested parameters are accessible in job.parameters (dict)
    x = job.parameters["x"]
    b = job.parameters["b"]

    if job.parameters["function"] == "linear":
        y = x + b
    elif job.parameters["function"] == "cubic":
        y = x**3 + b

    # Maximization!
    return y


# Necessary IF statement otherwise it will enter in a infinite loop
# when loading the 'run' function from a new process
if __name__ == "__main__":
    from deephyper.problem import HpProblem
    from deephyper.search.hps import CBO
    from deephyper.evaluator import Evaluator

    # define the variable you want to optimize
    problem = HpProblem()
    problem.add_hyperparameter((-10.0, 10.0), "x") # real parameter
    problem.add_hyperparameter((0, 10), "b") # discrete parameter
    problem.add_hyperparameter(["linear", "cubic"], "function") # categorical parameter

    # define the evaluator to distribute the computation
    evaluator = Evaluator.create(
        run,
        method="process",
        method_kwargs={
            "num_workers": 2,
        },
    )

    # define your search and execute it
    search = CBO(problem, evaluator, random_state=42)

    results = search.search(max_evals=100)
    print(results)

Which outputs the following results where the best parameters are with function == "cubic", x == 9.99 and b == 10.

    p:b p:function       p:x    objective  job_id  m:timestamp_submit  m:timestamp_gather
0     7     linear  8.831019    15.831019       1            0.064874            1.430992
1     4     linear  9.788889    13.788889       0            0.064862            1.453012
2     0      cubic  2.144989     9.869049       2            1.452692            1.468436
3     9     linear -9.236860    -0.236860       3            1.468123            1.483654
4     2      cubic -9.783865  -934.550818       4            1.483340            1.588162
..  ...        ...       ...          ...     ...                 ...                 ...
95    6      cubic  9.862098   965.197192      95           13.538506           13.671872
96   10      cubic  9.997512  1009.253866      96           13.671596           13.884530
97    6      cubic  9.965615   995.719961      97           13.884188           14.020144
98    5      cubic  9.998324  1004.497422      98           14.019737           14.154467
99    9      cubic  9.995800  1007.740379      99           14.154169           14.289366

The code defines a function run that takes a RunningJob job as input and returns the maximized objective y. The if block at the end of the code defines a black-box optimization process using the CBO (Centralized Bayesian Optimization) algorithm from the deephyper library.

The optimization process is defined as follows:

  1. A hyperparameter optimization problem is created using the HpProblem class from deephyper. In this case, the problem has three variables. The x hyperparameter is a real variable in a range from -10.0 to 10.0. The b hyperparameter is a discrete variable in a range from 0 to 10. The function hyperparameter is a categorical variable with two possible values.
  2. An evaluator is created using the Evaluator.create method. The evaluator will be used to evaluate the function run with different configurations of suggested hyperparameters in the optimization problem. The evaluator uses the process method to distribute the evaluations across multiple worker processes, in this case, 2 worker processes.
  3. A search object is created using the CBO class, the problem and evaluator defined earlier. The CBO algorithm is a derivative-free optimization method that uses a Bayesian optimization approach to explore the hyperparameter space.
  4. The optimization process is executed by calling the search.search method, which performs the evaluations of the run function with different configurations of the hyperparameters until a maximum number of evaluations (100 in this case) is reached.
  5. The results of the optimization process, including the optimal configuration of the hyperparameters and the corresponding objective value, are printed to the console.

How do I learn more?

Contributions

Find the list of contributors on the DeepHyper Authors page of the Documentation.

Citing DeepHyper

If you wish to cite the Software, please use the following:

@misc{deephyper_software,
    title = {"DeepHyper: A Python Package for Scalable Neural Architecture and Hyperparameter Search"},
    author = {Balaprakash, Prasanna and Egele, Romain and Salim, Misha and Maulik, Romit and Vishwanath, Venkat and Wild, Stefan and others},
    organization = {DeepHyper Team},
    year = 2018,
    url = {https://github.com/deephyper/deephyper}
} 

Find all our publications on the Research & Publication page of the Documentation.

How can I participate?

Questions, comments, feature requests, bug reports, etc. can be directed to:

  • Issues on GitHub

Patches through pull requests are much appreciated on the software itself as well as documentation. Optionally, please include in your first patch a credit for yourself in the list above.

The DeepHyper Team uses git-flow to organize the development: Git-Flow cheatsheet. For tests we are using: Pytest.

Acknowledgments

  • Scalable Data-Efficient Learning for Scientific Domains, U.S. Department of Energy 2018 Early Career Award funded by the Advanced Scientific Computing Research program within the DOE Office of Science (2018--Present)
  • Argonne Leadership Computing Facility: This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
  • SLIK-D: Scalable Machine Learning Infrastructures for Knowledge Discovery, Argonne Computing, Environment and Life Sciences (CELS) Laboratory Directed Research and Development (LDRD) Program (2016--2018)

Copyright and license

Copyright © 2019, UChicago Argonne, LLC

DeepHyper is distributed under the terms of BSD License. See LICENSE

Argonne Patent & Intellectual Property File Number: SF-19-007

deephyper's People

Contributors

albertkklam avatar aperezdieguez avatar bethanyl avatar bigwater avatar boneyag avatar craymichael avatar deathn0t avatar dipendra009 avatar felixeperez avatar felker avatar iamyixuan avatar jgouneau avatar jtchilders avatar masalim2 avatar mdorier avatar nesar avatar pbalapra avatar rmjcs2020 avatar romit-maulik avatar saforem2 avatar sande33p avatar sjiang87 avatar sunhaozhe avatar thchang avatar wildsm avatar z223i avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deephyper's Issues

[BUG] Gaussian Process broken when using ConfigSpace AMBS with custom DH-skopt

Describe the bug

Gaussian Process broken when using ConfigSpace AMBS with custom DH-skopt

To Reproduce

Just choose "GP" as base_estimator in AMBS.

Expected behavior

Raising an exception that some sampled points are not in accepted space.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: macOS
  • Python version: 3.7.4
  • DeepHyper Version 0.1.11

[BUG] Typo in NAS tutorial

Describe the bug
deephyper nas random --evaluator ray --problem nas_polynome2.problem.Problem in https://deephyper.readthedocs.io/en/latest/tutorials/nas.html

I think it should be deephyper nas random --evaluator ray --problem nas_problems.polynome2.problem.Problem

Screenshots

Run of deephyper nas random --evaluator ray --problem nas_polynome2.problem.Problem

(dh-env) [jkoo@blueslogin1 polynome2]$ deephyper nas random --evaluator ray --problem nas_polynome2.problem.Problem
Uncaught exception <class 'deephyper.core.exceptions.loading.GenericLoaderError'>: Traceback (most recent call last):
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 164, in generic_loader
    attr = load_attr_from(target)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 124, in load_attr_from
    module = importlib.import_module(str_module)
  File "/home/jkoo/.conda/envs/dh-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'nas_polynome2'

The attribute 'Problem' cannot be importe from 'nas_polynome2.problem.Problem'.Traceback (most recent call last):
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 164, in generic_loader
    attr = load_attr_from(target)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 124, in load_attr_from
    module = importlib.import_module(str_module)
  File "/home/jkoo/.conda/envs/dh-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'nas_polynome2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jkoo/.conda/envs/dh-env/bin/deephyper", line 33, in <module>
    sys.exit(load_entry_point('deephyper', 'console_scripts', 'deephyper')())
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/core/cli/cli.py", line 42, in main
    args.func(**vars(args))
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/core/cli/nas.py", line 37, in main
    search_obj = search_cls(**kwargs)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/nas/full_random.py", line 24, in __init__
    super().__init__(problem=problem, run=run, evaluator=evaluator, **kwargs)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/nas/__init__.py", line 23, in __init__
    super().__init__(problem, run=run, evaluator=evaluator, **kwargs)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/search.py", line 52, in __init__
    self.problem = util.generic_loader(problem, "Problem")
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 168, in generic_loader
    raise GenericLoaderError(target, attribute, trace_source)
deephyper.core.exceptions.loading.GenericLoaderError: Traceback (most recent call last):
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 164, in generic_loader
    attr = load_attr_from(target)
  File "/gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/util.py", line 124, in load_attr_from
    module = importlib.import_module(str_module)
  File "/home/jkoo/.conda/envs/dh-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'nas_polynome2'

The attribute 'Problem' cannot be importe from 'nas_polynome2.problem.Problem'.

Run of deephyper nas random --evaluator ray --problem nas_problems.polynome2.problem.Problem

deephyper nas random --evaluator ray --problem nas_problems.polynome2.problem.Problem
 *************************************************************************************
   Maximizing the return value of function: deephyper.search.nas.model.run.alpha.run
 *************************************************************************************
2020-09-01 12:42:44,724 INFO resource_spec.py:231 -- Starting Ray with 33.35 GiB memory available for workers and up to 16.69 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-09-01 12:42:47,575 INFO services.py:1193 -- View the Ray dashboard at localhost:8265
This search doesn't have an exiting procedure...
(pid=3581) train_X shape: (8000, 10)
(pid=3581) train_y shape: (8000, 1)
(pid=3581) valid_X shape: (2000, 10)
(pid=3581) valid_y shape: (2000, 1)
(pid=3581) WARNING:tensorflow:From /home/jkoo/.conda/envs/dh-env/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
(pid=3581) Instructions for updating:
(pid=3581) If using Keras pass *_constraint arguments to layers.
(pid=3581) WARNING:tensorflow:From /gpfs/fs1/home/jkoo/github/deephyper/deephyper/search/nas/model/trainer/train_valid.py:28: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.
(pid=3581)
(pid=3581) WARNING:tensorflow:OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
(pid=3581) 2020-09-01 12:43:00.710118: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/mic/lib:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4:/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64:/soft/mpich3/intel-13.1/lib:/soft/mpich3/intel-13.1/lib
(pid=3581) 2020-09-01 12:43:00.710704: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
(pid=3581) 2020-09-01 12:43:00.710786: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (blueslogin1): /proc/driver/nvidia/version does not exist
(pid=3581) 2020-09-01 12:43:00.824195: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2568190000Hz
(pid=3581) 2020-09-01 12:43:00.826721: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557fe00be950 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
(pid=3581) 2020-09-01 12:43:00.826758: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
(pid=3581) WARNING:tensorflow:Expected a shuffled dataset but input dataset `x` is not shuffled. Please invoke `shuffle()`on input dataset.
(pid=3581) Train on 250 steps, validate on 63 steps
(pid=3581) Epoch 1/20
  1/250 [..............................] - ETA: 2:19 - loss: 16.1466 - r2: -13.2896
 26/250 [==>...........................] - ETA: 5s - loss: 2.7257 - r2: -1.8484
 47/250 [====>.........................] - ETA: 2s - loss: 1.9577 - r2: -1.0532
 70/250 [=======>......................] - ETA: 1s - loss: 1.6224 - r2: -0.6890
 93/250 [==========>...................] - ETA: 1s - loss: 1.4246 - r2: -0.4893
115/250 [============>.................] - ETA: 0s - loss: 1.2935 - r2: -0.3564
140/250 [===============>..............] - ETA: 0s - loss: 1.1997 - r2: -0.2524
166/250 [==================>...........] - ETA: 0s - loss: 1.1169 - r2: -0.1614
190/250 [=====================>........] - ETA: 0s - loss: 1.0597 - r2: -0.1005
218/250 [=========================>....] - ETA: 0s - loss: 0.9931 - r2: -0.0333
249/250 [============================>.] - ETA: 0s - loss: 0.9348 - r2: 0.0317
250/250 [==============================] - 1s 5ms/step - loss: 0.9325 - r2: 0.0336 - val_loss: 0.4275 - val_r2: 0.5657
(pid=3581) Epoch 2/20
  1/250 [..............................] - ETA: 0s - loss: 0.3700 - r2: 0.6726
 28/250 [==>...........................] - ETA: 0s - loss: 0.3953 - r2: 0.5612
 56/250 [=====>........................] - ETA: 0s - loss: 0.3604 - r2: 0.6124
 85/250 [=========>....................] - ETA: 0s - loss: 0.3396 - r2: 0.6378
116/250 [============>.................] - ETA: 0s - loss: 0.3203 - r2: 0.6562
144/250 [================>.............] - ETA: 0s - loss: 0.3007 - r2: 0.6792

Desktop (please complete the following information):

  • OS: Linux
  • System: LCRC
  • Python version 3.7
  • DeepHyper Version 0.1.11

[BUG] Installation fails when OPEN-MPI is not installed

Describe the bug

Stacktrace:

Failed to build mpi4py horovod
ERROR: ray 0.8.5 has requirement numpy>=1.16, but you'll have numpy 1.15.4 which is incompatible.
ERROR: tensorboard 1.15.0 has requirement setuptools>=41.0.0, but you'll have setuptools 40.2.0 which is incompatible.
ERROR: tensorflow 1.15.2 has requirement numpy<2.0,>=1.16.0, but you'll have numpy 1.15.4 which is incompatible.
ERROR: keras 2.4.0 has requirement tensorflow>=2.2.0, but you'll have tensorflow 1.15.2 which is incompatible.
Installing collected packages: async-timeout, multidict, yarl, aiohttp, msgpack, py-spy, protobuf, google, redis, grpcio, ray, mpi4py, deap, horovod, pyaml, dh-scikit-optimize, termcolor, markdown, absl-py, tensorboard, gast, tensorflow-estimator, keras-applications, wrapt, astor, google-pasta, keras-preprocessing, opt-einsum, tensorflow, pydot, psycopg2-binary, sphinx-rtd-theme, asgiref, sqlparse, django, balsam-flow, ConfigSpace, keras, deephyper
  Attempting uninstall: msgpack
    Found existing installation: msgpack 0.5.6
    Uninstalling msgpack-0.5.6:
      Successfully uninstalled msgpack-0.5.6
    Running setup.py install for mpi4py ... error
    ERROR: Command errored out with exit status 1:
     command: /anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-install-ckllhusd/mpi4py/setup.py'"'"'; __file__='"'"'/private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-install-ckllhusd/mpi4py/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-record-xenpo_c8/install-record.txt --single-version-externally-managed --compile --install-headers /anaconda3/include/python3.7m/mpi4py
         cwd: /private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-install-ckllhusd/mpi4py/
    Complete output (126 lines):
    running install
    running build
    running build_src
    running build_py
    creating build
    creating build/lib.macosx-10.7-x86_64-3.7
    creating build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/run.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/__init__.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/bench.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/__main__.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    creating build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/_base.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/server.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/__init__.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/pool.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/aplus.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/__main__.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/futures/_lib.py -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/futures
    copying src/mpi4py/__init__.pxd -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/libmpi.pxd -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    copying src/mpi4py/MPI.pxd -> build/lib.macosx-10.7-x86_64-3.7/mpi4py
    creating build/lib.macosx-10.7-x86_64-3.7/mpi4py/include
    creating build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    copying src/mpi4py/include/mpi4py/mpi4py.MPI.h -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    copying src/mpi4py/include/mpi4py/mpi4py.MPI_api.h -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    copying src/mpi4py/include/mpi4py/mpi4py.h -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    copying src/mpi4py/include/mpi4py/mpi4py.i -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    copying src/mpi4py/include/mpi4py/mpi.pxi -> build/lib.macosx-10.7-x86_64-3.7/mpi4py/include/mpi4py
    running build_clib
    MPI configuration: [mpi] from 'mpi.cfg'
    checking for library 'lmpe' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -llmpe -o _configtest
    ld: library not found for -llmpe
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    building 'mpe' dylib library
    creating build/temp.macosx-10.7-x86_64-3.7
    creating build/temp.macosx-10.7-x86_64-3.7/src
    creating build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c src/lib-pmpi/mpe.c -o build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/mpe.o
    creating build/lib.macosx-10.7-x86_64-3.7/mpi4py/lib-pmpi
    gcc -shared -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 -install_name libmpe.dylib build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/mpe.o -o build/lib.macosx-10.7-x86_64-3.7/mpi4py/lib-pmpi/libmpe.dylib
    checking for library 'vt-mpi' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt-mpi -o _configtest
    ld: library not found for -lvt-mpi
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    checking for library 'vt.mpi' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt.mpi -o _configtest
    ld: library not found for -lvt.mpi
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    building 'vt' dylib library
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c src/lib-pmpi/vt.c -o build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt.o
    gcc -shared -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 -install_name libvt.dylib build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt.o -o build/lib.macosx-10.7-x86_64-3.7/mpi4py/lib-pmpi/libvt.dylib
    checking for library 'vt-mpi' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt-mpi -o _configtest
    ld: library not found for -lvt-mpi
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    checking for library 'vt.mpi' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt.mpi -o _configtest
    ld: library not found for -lvt.mpi
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    building 'vt-mpi' dylib library
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c src/lib-pmpi/vt-mpi.c -o build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt-mpi.o
    gcc -shared -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 -install_name libvt-mpi.dylib build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt-mpi.o -o build/lib.macosx-10.7-x86_64-3.7/mpi4py/lib-pmpi/libvt-mpi.dylib
    checking for library 'vt-hyb' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt-hyb -o _configtest
    ld: library not found for -lvt-hyb
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    checking for library 'vt.ompi' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -lvt.ompi -o _configtest
    ld: library not found for -lvt.ompi
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    failure.
    removing: _configtest.c _configtest.o
    building 'vt-hyb' dylib library
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -c src/lib-pmpi/vt-hyb.c -o build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt-hyb.o
    gcc -shared -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 -install_name libvt-hyb.dylib build/temp.macosx-10.7-x86_64-3.7/src/lib-pmpi/vt-hyb.o -o build/lib.macosx-10.7-x86_64-3.7/mpi4py/lib-pmpi/libvt-hyb.dylib
    running build_ext
    MPI configuration: [mpi] from 'mpi.cfg'
    checking for dlopen() availability ...
    checking for header 'dlfcn.h' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c _configtest.c -o _configtest.o
    success!
    removing: _configtest.c _configtest.o
    success!
    checking for library 'dl' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c _configtest.c -o _configtest.o
    gcc -flat_namespace -undefined suppress -arch x86_64 _configtest.o -Lbuild/temp.macosx-10.7-x86_64-3.7 -ldl -o _configtest
    success!
    removing: _configtest.c _configtest.o _configtest
    checking for function 'dlopen' ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c _configtest.c -o _configtest.o
    gcc -arch x86_64 _configtest.o -Lbuild/temp.macosx-10.7-x86_64-3.7 -ldl -o _configtest
    success!
    removing: _configtest.c _configtest.o _configtest
    building 'mpi4py.dl' extension
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/anaconda3/include/python3.7m -c src/dynload.c -o build/temp.macosx-10.7-x86_64-3.7/src/dynload.o
    gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/src/dynload.o -Lbuild/temp.macosx-10.7-x86_64-3.7 -ldl -o build/lib.macosx-10.7-x86_64-3.7/mpi4py/dl.cpython-37m-darwin.so
    checking for MPI compile and link ...
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c _configtest.c -o _configtest.o
    _configtest.c:2:10: fatal error: 'mpi.h' file not found
    #include <mpi.h>
             ^~~~~~~
    1 error generated.
    failure.
    removing: _configtest.c _configtest.o
    error: Cannot compile MPI programs. Check your configuration!!!
    ----------------------------------------
ERROR: Command errored out with exit status 1: /anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-install-ckllhusd/mpi4py/setup.py'"'"'; __file__='"'"'/private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-install-ckllhusd/mpi4py/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/l2/7nm8thsj2kz3cwsrl9njdz6r0000gn/T/pip-record-xenpo_c8/install-record.txt --single-version-externally-managed --compile --install-headers /anaconda3/include/python3.7m/mpi4py Check the logs for full command output.

To Reproduce

pip install deephyper

Expected behavior

Installing the package without any issue.

Desktop (please complete the following information):

  • OS: MacOS
  • DeepHyper Version 0.1.11

Warning with deap package when running ambs search

When I start an ambs search, locally with the following command:

python -m deephyper.search.hps.ambs --problem deephyper.benchmark.hps.polynome2.Problem --run deephyper.benchmark.hps.polynome2.run

This warning is raised in the terminal:

/anaconda3/envs/deeptest/lib/python3.6/site-packages/deap/tools/_hypervolume/pyhv.py:33: ImportWarning: Falling back to the python version of hypervolume module. Expect this to be very slow.
  "module. Expect this to be very slow.", ImportWarning)

My current environment is:

Package             Version   
------------------- ----------
absl-py             0.7.0     
astor               0.7.1     
certifi             2018.11.29
chardet             3.0.4     
deap                1.2.2     
decorator           4.3.0     
deephyper           0.0.1     
future              0.17.1    
gast                0.2.2     
grpcio              1.18.0    
gym                 0.10.9    
h5py                2.9.0     
idna                2.8       
joblib              0.13.1    
Keras               2.2.4     
Keras-Applications  1.0.6     
Keras-Preprocessing 1.0.5     
Markdown            3.0.1     
mpi4py              3.0.0     
networkx            2.2       
numpy               1.16.0    
pip                 18.1      
protobuf            3.6.1     
pyglet              1.3.2     
PyYAML              3.13      
requests            2.21.0    
scikit-learn        0.20.2    
scikit-optimize     0.5.2     
scipy               1.2.0     
setuptools          40.6.3    
six                 1.12.0    
tensorboard         1.12.2    
tensorflow          1.12.0    
termcolor           1.1.0     
tqdm                4.29.1    
urllib3             1.24.1    
Werkzeug            0.14.1    
wheel               0.32.3    

[Doc] Best practices documentation for scaling

It will be very valuable to have a best practices documentation on how one should scale a problem on large-scale systems with DeepHyper. What scale sizes should one evaluate, such as 8, 16, 32 nodes first? Next, when should one be ready to scale out and the parameters involved here.

[BUG] Bad address error for python in miniconda-3 enviroment on Theta

Describe the bug

I tried running a hyperparameter search job on ALCF's Theta using
DeepHyper. I followed the Theta installation instructions
(https://deephyper.readthedocs.io/en/latest/install/theta.html) and the
tutorial
(https://deephyper.readthedocs.io/en/latest/tutorials/hps.html), and I
adapted the instructions for my optimization problem.

The run failed. The contents of the log file serial-launcher_2020-08-26_022719.log are as follows:

cat serial-launcher_2020-08-26_022719.log
26-Aug-2020 02:27:20|80434|    INFO|balsam.core.models:292] Job source
filtering for un-scheduled BalsamJobs
26-Aug-2020 02:27:20|80434| INFO|balsam.launcher.mpi_ensemble:56] Not
scheduled by Balsam service
26-Aug-2020 02:27:20|80434| INFO|balsam.launcher.mpi_ensemble:57]
Assigning jobs to 255 worker ranks
26-Aug-2020 02:27:20|80434| INFO|balsam.launcher.mpi_ensemble:274] MPI
Ensemble pulling jobs with WF currentopt
26-Aug-2020 02:27:20|80434| INFO|balsam.launcher.mpi_ensemble:125]
Acquired lock on 1 out of 1 jobs marked for running
26-Aug-2020 02:27:20|80434| INFO|balsam.launcher.mpi_ensemble:139] Sent
1 jobs to rank 1: occupancy is now 1.0
26-Aug-2020 02:27:20|80451| INFO|balsam.launcher.mpi_ensemble:425] rank
1 [currentopt | 5d081b10]  Popen (shell=False):
['/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python',
'-m', 'deephyper.search.hps.ambs', '--evaluator', 'balsam', '--problem',
'g4blopt.currentopt.problem.Problem', '--run',
'g4blopt.currentopt.model_run.run']
26-Aug-2020 02:27:20|80451| ERROR|balsam.launcher.mpi_ensemble:435] rank
1 [currentopt | 5d081b10] Popen error:
[Errno 14] Bad address:
'/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python'

26-Aug-2020 02:27:24|80451| ERROR|balsam.launcher.mpi_ensemble:380] rank
1 [currentopt | 5d081b10] nonzero return 12345:

26-Aug-2020 02:27:24|80451| ERROR|balsam.launcher.mpi_ensemble:391] rank
1 [currentopt | 5d081b10] can retry task (err occured after 3.41 sec;
attempt 1/3)
26-Aug-2020 02:27:24|80451| INFO|balsam.launcher.mpi_ensemble:425] rank
1 [currentopt | 5d081b10]  Popen (shell=False):
['/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python',
'-m', 'deephyper.search.hps.ambs', '--evaluator', 'balsam', '--problem',
'g4blopt.currentopt.problem.Problem', '--run',
'g4blopt.currentopt.model_run.run']
26-Aug-2020 02:27:25|80451| ERROR|balsam.launcher.mpi_ensemble:380] rank
1 [currentopt | 5d081b10] nonzero return -11:

26-Aug-2020 02:27:25|80451| ERROR|balsam.launcher.mpi_ensemble:391] rank
1 [currentopt | 5d081b10] can retry task (err occured after 1.12 sec;
attempt 2/3)
26-Aug-2020 02:27:25|80451| INFO|balsam.launcher.mpi_ensemble:425] rank
1 [currentopt | 5d081b10]  Popen (shell=False):
['/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python',
'-m', 'deephyper.search.hps.ambs', '--evaluator', 'balsam', '--problem',
'g4blopt.currentopt.problem.Problem', '--run',
'g4blopt.currentopt.model_run.run']
26-Aug-2020 02:27:25|80451| ERROR|balsam.launcher.mpi_ensemble:435] rank
1 [currentopt | 5d081b10] Popen error:
[Errno 14] Bad address:
'/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python'

26-Aug-2020 02:27:28|80451| ERROR|balsam.launcher.mpi_ensemble:380] rank
1 [currentopt | 5d081b10] nonzero return 12345:

26-Aug-2020 02:27:28|80451| ERROR|balsam.launcher.mpi_ensemble:391] rank
1 [currentopt | 5d081b10] can retry task (err occured after 3.35 sec;
attempt 3/3)
26-Aug-2020 02:27:28|80451| INFO|balsam.launcher.mpi_ensemble:425] rank
1 [currentopt | 5d081b10]  Popen (shell=False):
['/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python',
'-m', 'deephyper.search.hps.ambs', '--evaluator', 'balsam', '--problem',
'g4blopt.currentopt.problem.Problem', '--run',
'g4blopt.currentopt.model_run.run']
26-Aug-2020 02:27:28|80451| ERROR|balsam.launcher.mpi_ensemble:435] rank
1 [currentopt | 5d081b10] Popen error:
[Errno 14] Bad address:
'/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python'

26-Aug-2020 02:27:30|80451| ERROR|balsam.launcher.mpi_ensemble:380] rank
1 [currentopt | 5d081b10] nonzero return 12345:

26-Aug-2020 02:27:54|80434| INFO|balsam.launcher.mpi_ensemble:306]
Nothing to do for 20.0 seconds: quitting
26-Aug-2020 02:27:55|80434| INFO|balsam.launcher.mpi_ensemble:288]
Shutting down with 0 jobs still running..timing out
26-Aug-2020 02:27:55|80434| INFO|balsam.launcher.mpi_ensemble:293]
master calling MPI Finalize
26-Aug-2020 02:27:55|80434| INFO|balsam.launcher.mpi_ensemble:295]
ensemble master exit gracefully

I checked that
/lus/theta-fs0/projects/ATPESC2020/valetov/deephyper/dh-env/bin/python
does exist, and I was able to run this Python executable from a login node.

To Reproduce
Steps to reproduce the behavior:

Submit a hyperparameter search on ALCF's Theta using a command like the following:

deephyper balsam-submit hps currentopt -p g4blopt.currentopt.problem.Problem -r g4blopt.currentopt.model_run.run -t 360 -q default -n 256 -A ATPESC2020 -j serial

I think that the essential parts are deephyper balsam-submit hps, -j serial, and the number of nodes -n larger than one.

Expected behavior
The hyperparameter search runs and completes successfully.

System:

  • OS: [e.g. Ubuntu]: SUSE Linux Enterprise Server 15 SP1
  • System: Theta
  • Python version [e.g. 3.8]: 3.7.6
  • DeepHyper Version [e.g. 0.1.11]: 0.1.0-beta0

[BUG] Ray tasks are not Distributed on the different threads of a same node

Describe the bug

From @ekourlit,

Could I use the Ray technology to parallelize the HPS on a single machine? For example, if I switch back to the CPU usage, can the different Ray tasks run on the different cores or threads of my CPU in parallel? At the moment, Ray is creating 24 tasks (as many as my CPU threads) but only one is actually running, the rest are IDLE.

How to get the best model?

I have tried exactly same as mentioned here for nas problem using regevo

deephyper start-project nas_problems
cd nas_problems/nas_problems/
deephyper new-problem nas polynome2
cd nas_problems/nas_problems/polynome2
python load_data.py
python problem.py
deephyper nas regevo --evaluator ray --problem nas_problems.polynome2.problem.Problem --max-evals 100

After this I am getting a deephyper.log file

Now how to predict a data? Where is the best model located?

Deprecation Warning: sklearn.externals.joblib

DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip insta    ll joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.

Use path to python scripts in NAS

Enable folder/problem.py feature for --problem argument in NAS. Be careful with "local" imports not installed because the --run function "alpha" will try to import this file.

Noisy warning while using the DeepHyper CLI

I quote @felker from #23

Still, curious that DeepHyper loads TensorFlow (and its noisy warnings) even for the --help dialog.

The problem is happening when using:

$ deephyper --help

Indeed these warnings are flooding the shell and are very annoying, I create this issue so that we keep thinking about solving this.

Installation error. Can't find the specified version of tensorflow.

I am using python 3.8 with pip 20.0.2
ERROR: Could not find a version that satisfies the requirement tensorflow<=1.15.2,>=1.13.1 (from deephyper==0.1.7) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3) ERROR: No matching distribution found for tensorflow<=1.15.2,>=1.13.1 (from deephyper==0.1.7)

How to use AMBNAS algorithm?

Hi @Deathn0t

For regevo we can use deephyper nas regevo --problem nas_problems...................................................
For random we can use deephyper nas random --problem nas_problems...................................................

What is command for using AMBNAS? Nothing is mentioned here. Is it still under development?

Is it something like deephyper nas ambs --problem nas_problems................................................... ?

When I tried with deephyper nas regevo --evaluator subprocess --problem nas_problems_train_final.polynome2.problem.Problem --max-evals 1 , it is working but when I tried with deephyper nas ambs --evaluator subprocess --problem nas_problems_train_final.polynome2.problem.Problem --max-evals 1 but I am getting the following error:

2021-04-15 05:07:27.656897: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/gurobi/linux64/lib/:/opt/gurobi/linux64/lib
2021-04-15 05:07:27.656939: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
 ************************************************************************
   Maximizing the return value of function: deephyper.nas.run.alpha.run
 ************************************************************************
train_X shape: (29327, 34)
train_y shape: (29327, 2)
valid_X shape: (12569, 34)
valid_y shape: (12569, 2)
2021-04-15 05:07:30.397783: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/gurobi/linux64/lib/:/opt/gurobi/linux64/lib
2021-04-15 05:07:30.397824: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
train_X shape: (29327, 34)
train_y shape: (29327, 2)
valid_X shape: (12569, 34)
valid_y shape: (12569, 2)
Uncaught exception <class 'AssertionError'>: Number of possible operations is: 2, but index given is: 13 (index starts from 0)!Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/deephyper/evaluator/runner.py", line 36, in <module>
    retval = func(d)
  File "/usr/local/lib/python3.6/dist-packages/deephyper/nas/run/alpha.py", line 65, in run
    search_space = setup_search_space(config, input_shape, output_shape, seed=seed)
  File "/usr/local/lib/python3.6/dist-packages/deephyper/nas/run/util.py", line 134, in setup_search_space
    search_space.set_ops(arch_seq)
  File "/usr/local/lib/python3.6/dist-packages/deephyper/nas/space/keras_search_space.py", line 112, in set_ops
    node.set_op(op_i)
  File "/usr/local/lib/python3.6/dist-packages/deephyper/nas/space/node.py", line 111, in set_op
    self.get_op(index).init(self)
  File "/usr/local/lib/python3.6/dist-packages/deephyper/nas/space/node.py", line 122, in get_op
    ), f"Number of possible operations is: {len(self._ops)}, but index given is: {index} (index starts from 0)!"
AssertionError: Number of possible operations is: 2, but index given is: 13 (index starts from 0)!

How to use AUC metric in DeepHyper

Refers to this conversation: #62 (comment)
with @anuragverma77

To install develop version of DeepHyper:

git clone https://github.com/deephyper/deephyper.git
cd deephyper/
git checkout develop
pip install -e.

Then the string to use the AUC of the ROC in the Problem.metric(...) is "auroc" for Precision-Recall it is "aucpr".

Process put in state sleep after a few minutes

commit: 8500715

I ran this command to run NAS locally on my computer:

mpirun -np 2 python -m deephyper.search.nas.nas_a3c_sync --evaluator subprocess --problem 'deephyper.benchmark.nas.linearReg.problem.Problem' --run 'deephyper.search.nas.model.run.alpha.run'

The code start to run, the logger print good stuff in deephyper.log, the training of submodels is starting, 1, 2, 3 epochs then all processes are put in SLEEP state.

Not any error is returned.

Installation on macOS currently requires OpenSSL workaround

Currently on macOS Catalina, using Anaconda, I have successfully installed pip install deephyper. However,

python -m deephyper.search.hps.ambs --problem deephyper.benchmark.hps.polynome2.Problem --run deephyper.benchmark.hps.polynome2.run

returns:

/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/scikit_learn-0.21.3-py3.7-macosx-10.9-x86_64.egg/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
 **********************************************************************************
   Maximizing the return value of function: deephyper.benchmark.hps.polynome2.run
 **********************************************************************************
Uncaught exception <class 'ImportError'>: dlopen(/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/pyarrow_files/pyarrow/lib.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
  Referenced from: /usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/pyarrow_files/pyarrow/libarrow.14.dylib
  Reason: image not foundTraceback (most recent call last):
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/deephyper/search/hps/ambs.py", line 112, in <module>
    search = AMBS(**vars(args))
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/deephyper/search/hps/ambs.py", line 48, in __init__
    super().__init__(problem, run, evaluator, **kwargs)
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/deephyper/search/search.py", line 53, in __init__
    self.evaluator = Evaluator.create(self.run_func, method=evaluator, **kwargs)
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/deephyper/evaluator/evaluate.py", line 126, in create
    from deephyper.evaluator.ray_evaluator import RayEvaluator
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/deephyper/evaluator/ray_evaluator.py", line 7, in <module>
    import ray
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/__init__.py", line 28, in <module>
    import pyarrow  # noqa: F401
  File "/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/pyarrow_files/pyarrow/__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: dlopen(/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/pyarrow_files/pyarrow/lib.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
  Referenced from: /usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/ray/pyarrow_files/pyarrow/libarrow.14.dylib
  Reason: image not found

Apple has had a complicated relationship with OpenSSL, LibreSSL, and CommonCrypto. I normally install OpenSSL when necessary with brew install openssl, but Homebrew bumped their formula for OpenSSL from 1.0 to 1.1 on 2019-11-21 and removed the openssl, [email protected] formulas since OpenSSL 1.0 will reach EOL on 2019-12-31. This has broken dependencies for a lot of people, including for the DeepHyper dependency, Ray.

One workaround is given here to downgrade to v1.0:

brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/30fd2b68feb458656c2da2b91e577960b11c42f4/Formula/openssl.rb

I also think that bumping

'ray==0.7.2',

to https://github.com/ray-project/ray/releases/tag/ray-0.7.6 fixes the issue.

This is obviously not an issue with DeepHyper per se, but I figured that other macOS users might be encountering this.

definition of R^2 incorrect

R^2 is defined in deephyper/deephyper/search/nas/model/train_utils.py. For multi-variate regression (predicting a vector output), this definition is incorrect. It sums over both dimensions (# of examples and # of responses/outputs), producing one R^2 score. This is equivalent to flattening the data before calculating R^2. We should change this to match the default behavior of scikit-learn: calculate one R^2 score for each response/output, then average those R^2 scores.

Cannot install on non-x86_64 architectures due to Ray dependency

I am using 2x IBM AC922 systems (including OLCF Summit) with POWER9 CPUs and V100 GPUs, and it would be great to deploy DeepHyper on those systems. Although the DeepHyper binary wheel on PyPI is compatible since it specifies arch=any, I cannot install it due to the dependencies.

Ray does not distribute any .whl on PyPI for arch=ppc64le, only manylinux1 and macOS. There is an inactive open issue about this: ray-project/ray#4309

As I noted in #20 (comment), building Ray from source appears to be impossible right now.

Is there a way to bypass this dependency if only the BalsamEvaluator will be used?

AssertionError in AMBS optimization when there are no continuous hyperparameters

Bethany ran a custom hps search resulting in

File "/Users/blusch/Documents/deephyper/deephyper/deephyper/search/optimizers/optimizer.py", line 91, in tell
    assert len(XX) == len(YY) == self.counter
AssertionError

deephyper.log

This invariant seems to break only after the first repeated evaluation; which makes sense because this code path was not exercised in prior AMBS experiments that always had at least one continuous h.p. In this case, all h.p. are discrete which causes repeat evaluations near convergence.

[BUG] DeepHyper Installation on Cooley

Describe the bug
Installed DeepHyper from source on Cooley as described in the documentation. Got the error as follows

from deephyper.benchmark.datasets import airlines as dataset
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lus/theta-fs0/projects/hpcbdsm/deephyper/deephyper/benchmark/__init__.py", line 5, in <module>
    from deephyper.benchmark.problem import HpProblem, NaProblem
  File "/lus/theta-fs0/projects/hpcbdsm/deephyper/deephyper/benchmark/problem.py", line 1, in <module>
    from deephyper.problem import HpProblem
  File "/lus/theta-fs0/projects/hpcbdsm/deephyper/deephyper/problem/__init__.py", line 1, in <module>
    import ConfigSpace as config_space
ImportError: No module named 'ConfigSpace'

** System information:**

  • System: Cooley
  • Python version: 3.7
  • DeepHyper Version: 0.0.7

Unable to execute deephyper command from Linux terminal

I pip installed deephyper on a Linux server. However, I getting the following error when I try to run a HPS using deephyper hps ambs --problem problem.Problem --run model_run.run
I get the following error:

bash: deephyper: command not found

Hardware arch: x86_64
Server CPU: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
OS: 18.04.2-Ubuntu

Python version: 3.7.5
TensorFlow version: 1.14 (tried with 2.0 as well with the same result)

Please suggest a solution.

DeepHyper on TensorFlow 2 & GPUs

Hi,

I have cloned DH from git and changed the setup.py to point it to TF2.2.0-GPU.
I 'm working on an HPS so I developed my model_run.py which when I test with a command like:

python model_run.py

I get the expected behavior, TF is using my machine GPU(s) to train the model. However, when I run my run() function with DH:

deephyper hps ambs --problem <myProblem> --run <myRun>

TF is not seeing the GPUs:

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

and the training proceeds on my CPU.

Have you faced any similar issue before? Or DH makes a call to TF that prohibit it to see the GPUs?

Thanks in advance!
Vangelis

[BUG] Installation error due to Horovod on JLSE

Describe the bug
Deephyper building error using pip or from github: horovod, mpi4py

To Reproduce
Steps to reproduce the behavior:

  1. conda create --name dh python=3.7
  2. conda activate dh
  3. pip install deephyper

Expected behavior
installing deephyper on Argonne nodes

Screenshots

  Building wheel for horovod (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/jkoo/anaconda3/envs/dh/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-3c8nukd0
       cwd: /tmp/pip-install-9o1ukjdt/horovod/
  Complete output (283 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/horovod
  copying horovod/__init__.py -> build/lib.linux-x86_64-3.7/horovod
  creating build/lib.linux-x86_64-3.7/horovod/_keras
  copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/_keras
  copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/_keras
  creating build/lib.linux-x86_64-3.7/horovod/common
  copying horovod/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/common
  copying horovod/common/basics.py -> build/lib.linux-x86_64-3.7/horovod/common
  copying horovod/common/util.py -> build/lib.linux-x86_64-3.7/horovod/common
  creating build/lib.linux-x86_64-3.7/horovod/keras
  copying horovod/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/keras
  copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/keras
  creating build/lib.linux-x86_64-3.7/horovod/mxnet
  copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-3.7/horovod/mxnet
  copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/mxnet
  creating build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/gloo_run.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/js_run.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/mpi_run.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/run_task.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/runner.py -> build/lib.linux-x86_64-3.7/horovod/run
  copying horovod/run/task_fn.py -> build/lib.linux-x86_64-3.7/horovod/run
  creating build/lib.linux-x86_64-3.7/horovod/spark
  copying horovod/spark/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark
  copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-3.7/horovod/spark
  copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-3.7/horovod/spark
  copying horovod/spark/runner.py -> build/lib.linux-x86_64-3.7/horovod/spark
  creating build/lib.linux-x86_64-3.7/horovod/tensorflow
  copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
  copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
  copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
  copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
  creating build/lib.linux-x86_64-3.7/horovod/torch
  copying horovod/torch/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch
  copying horovod/torch/compression.py -> build/lib.linux-x86_64-3.7/horovod/torch
  copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/torch
  copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-3.7/horovod/torch
  creating build/lib.linux-x86_64-3.7/horovod/run/common
  copying horovod/run/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common
  creating build/lib.linux-x86_64-3.7/horovod/run/driver
  copying horovod/run/driver/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/driver
  copying horovod/run/driver/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/run/driver
  creating build/lib.linux-x86_64-3.7/horovod/run/http
  copying horovod/run/http/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/http
  copying horovod/run/http/http_client.py -> build/lib.linux-x86_64-3.7/horovod/run/http
  copying horovod/run/http/http_server.py -> build/lib.linux-x86_64-3.7/horovod/run/http
  creating build/lib.linux-x86_64-3.7/horovod/run/task
  copying horovod/run/task/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/task
  copying horovod/run/task/task_service.py -> build/lib.linux-x86_64-3.7/horovod/run/task
  creating build/lib.linux-x86_64-3.7/horovod/run/util
  copying horovod/run/util/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/util
  copying horovod/run/util/cache.py -> build/lib.linux-x86_64-3.7/horovod/run/util
  copying horovod/run/util/lsf.py -> build/lib.linux-x86_64-3.7/horovod/run/util
  copying horovod/run/util/network.py -> build/lib.linux-x86_64-3.7/horovod/run/util
  copying horovod/run/util/threads.py -> build/lib.linux-x86_64-3.7/horovod/run/util
  creating build/lib.linux-x86_64-3.7/horovod/run/common/service
  copying horovod/run/common/service/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
  copying horovod/run/common/service/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
  copying horovod/run/common/service/task_service.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
  creating build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/codec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/config_parser.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/env.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/host_hash.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/network.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/secret.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/settings.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/timeout.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  copying horovod/run/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
  creating build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/params.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/store.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  copying horovod/spark/common/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
  creating build/lib.linux-x86_64-3.7/horovod/spark/driver
  copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
  copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
  copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
  copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
  copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
  creating build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
  creating build/lib.linux-x86_64-3.7/horovod/spark/task
  copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
  copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
  copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
  copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
  copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
  creating build/lib.linux-x86_64-3.7/horovod/spark/torch
  copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
  copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
  copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
  copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
  creating build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
  copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
  copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
  creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib
  copying horovod/torch/mpi_lib/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib
  creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl
  copying horovod/torch/mpi_lib_impl/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl
  running build_ext
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -fPIC -O3 -Wall -fassociative-math -ffast-math -ftree-vectorize -funsafe-math-optimizations -mavx -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--version-script=horovod.lds build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 341, in get_mpi_flags
      shlex.split(show_command), universal_newlines=True).strip()
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 411, in check_output
      **kwargs).stdout
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 488, in run
      with Popen(*popenargs, **kwargs) as process:
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 800, in __init__
      restore_signals, start_new_session)
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 1551, in _execute_child
      raise child_exception_type(errno_num, err_msg, err_filename)
  FileNotFoundError: [Errno 2] No such file or directory: 'mpicxx': 'mpicxx'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 622, in get_common_options
      mpi_flags = get_mpi_flags()
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 354, in get_mpi_flags
      '%s' % (show_command, traceback.format_exc()))
  distutils.errors.DistutilsPlatformError: mpicxx -show failed (see error below), is MPI in $PATH?
  Note: If your version of MPI has a custom command to show compilation flags, please specify it with the HOROVOD_MPICXX_SHOW environment variable.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 341, in get_mpi_flags
      shlex.split(show_command), universal_newlines=True).strip()
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 411, in check_output
      **kwargs).stdout
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 488, in run
      with Popen(*popenargs, **kwargs) as process:
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 800, in __init__
      restore_signals, start_new_session)
    File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 1551, in _execute_child
      raise child_exception_type(errno_num, err_msg, err_filename)
  FileNotFoundError: [Errno 2] No such file or directory: 'mpicxx': 'mpicxx'
  
  
  INFO: Cannot find MPI compilation flags, will skip compiling with MPI.
  INFO: Unable to build TensorFlow plugin, will skip it.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 75, in check_tf_version
      import tensorflow as tf
  ModuleNotFoundError: No module named 'tensorflow'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1526, in build_extensions
      build_tf_extension(self, options)
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 933, in build_tf_extension
      check_tf_version()
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 82, in check_tf_version
      'import tensorflow failed, is it installed?\n\n%s' % traceback.format_exc())
  distutils.errors.DistutilsPlatformError: import tensorflow failed, is it installed?
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 75, in check_tf_version
      import tensorflow as tf
  ModuleNotFoundError: No module named 'tensorflow'
  
  
  INFO: Unable to build PyTorch plugin, will skip it.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1134, in check_torch_version
      import torch
  ModuleNotFoundError: No module named 'torch'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1538, in build_extensions
      torch_version = check_torch_version()
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1141, in check_torch_version
      'import torch failed, is it installed?\n\n%s' % traceback.format_exc())
  distutils.errors.DistutilsPlatformError: import torch failed, is it installed?
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1134, in check_torch_version
      import torch
  ModuleNotFoundError: No module named 'torch'
  
  
  -- The CXX compiler identification is GNU 4.8.5
  -- The C compiler identification is GNU 4.8.5
  -- Check for working CXX compiler: /usr/bin/c++
  -- Check for working CXX compiler: /usr/bin/c++ -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working C compiler: /usr/bin/cc
  -- Check for working C compiler: /usr/bin/cc -- works
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Configuring done
  -- Generating done
  -- Build files have been written to: /tmp/pip-install-9o1ukjdt/horovod/build/temp.linux-x86_64-3.7/gloo/mxnet
  Scanning dependencies of target gloo
  [  6%] [  6%] [  9%] [ 12%] Building CXX object gloo/CMakeFiles/gloo.dir/allgather.cc.o
  Building CXX object gloo/CMakeFiles/gloo.dir/allreduce.cc.o
  Building CXX object gloo/CMakeFiles/gloo.dir/allgatherv.cc.o
  Building CXX object gloo/CMakeFiles/gloo.dir/algorithm.cc.o
  [ 16%] Building CXX object gloo/CMakeFiles/gloo.dir/allreduce_local.cc.o
  [ 19%] Building CXX object gloo/CMakeFiles/gloo.dir/barrier.cc.o
  [ 22%] Building CXX object gloo/CMakeFiles/gloo.dir/broadcast.cc.o
  [ 25%] Building CXX object gloo/CMakeFiles/gloo.dir/context.cc.o
  [ 29%] Building CXX object gloo/CMakeFiles/gloo.dir/gather.cc.o
  [ 32%] Building CXX object gloo/CMakeFiles/gloo.dir/reduce.cc.o
  [ 35%] Building CXX object gloo/CMakeFiles/gloo.dir/scatter.cc.o
  [ 38%] Building CXX object gloo/CMakeFiles/gloo.dir/types.cc.o
  [ 41%] Building CXX object gloo/CMakeFiles/gloo.dir/common/logging.cc.o
  [ 45%] Building CXX object gloo/CMakeFiles/gloo.dir/common/linux.cc.o
  [ 48%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/context.cc.o
  [ 51%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/file_store.cc.o
  [ 54%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/hash_store.cc.o
  [ 58%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/prefix_store.cc.o
  [ 61%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/store.cc.o
  [ 64%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/address.cc.o
  [ 67%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/buffer.cc.o
  [ 70%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/context.cc.o
  [ 74%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/device.cc.o
  [ 77%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/pair.cc.o
  [ 80%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/unbound_buffer.cc.o
  [ 83%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/address.cc.o
  [ 87%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/buffer.cc.o
  [ 90%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/context.cc.o
  [ 93%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/device.cc.o
  [ 96%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/pair.cc.o
  [100%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/unbound_buffer.cc.o
  Linking CXX static library /tmp/pip-install-9o1ukjdt/horovod/build/temp.linux-x86_64-3.7/lib/mxnet/libgloo.a
  [100%] Built target gloo
  INFO: Unable to build MXNet plugin, will skip it.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 91, in check_mx_version
      import mxnet as mx
  ModuleNotFoundError: No module named 'mxnet'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1554, in build_extensions
      build_mx_extension(self, options)
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1082, in build_mx_extension
      check_mx_version()
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 98, in check_mx_version
      'import mxnet failed, is it installed?\n\n%s' % traceback.format_exc())
  distutils.errors.DistutilsPlatformError: import mxnet failed, is it installed?
  
  Traceback (most recent call last):
    File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 91, in check_mx_version
      import mxnet as mx
  ModuleNotFoundError: No module named 'mxnet'
  
  
  error: None of TensorFlow, PyTorch, or MXNet plugins were built. See errors above.
  ----------------------------------------
  ERROR: Failed building wheel for horovod
  Running setup.py clean for horovod
  Building wheel for mpi4py (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/jkoo/anaconda3/envs/dh/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9o1ukjdt/mpi4py/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9o1ukjdt/mpi4py/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-2vbbxt4_
       cwd: /tmp/pip-install-9o1ukjdt/mpi4py/
  Complete output (126 lines):
  running bdist_wheel
  running build
  running build_src
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/run.py -> build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/__main__.py -> build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/bench.py -> build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/__init__.py -> build/lib.linux-x86_64-3.7/mpi4py
  creating build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/__main__.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/_base.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/server.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/pool.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/_lib.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/__init__.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/futures/aplus.py -> build/lib.linux-x86_64-3.7/mpi4py/futures
  copying src/mpi4py/MPI.pxd -> build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/__init__.pxd -> build/lib.linux-x86_64-3.7/mpi4py
  copying src/mpi4py/libmpi.pxd -> build/lib.linux-x86_64-3.7/mpi4py
  creating build/lib.linux-x86_64-3.7/mpi4py/include
  creating build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  copying src/mpi4py/include/mpi4py/mpi4py.MPI_api.h -> build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  copying src/mpi4py/include/mpi4py/mpi4py.MPI.h -> build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  copying src/mpi4py/include/mpi4py/mpi4py.h -> build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  copying src/mpi4py/include/mpi4py/mpi4py.i -> build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  copying src/mpi4py/include/mpi4py/mpi.pxi -> build/lib.linux-x86_64-3.7/mpi4py/include/mpi4py
  running build_clib
  MPI configuration: [mpi] from 'mpi.cfg'
  checking for library 'lmpe' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -llmpe -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -llmpe
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  building 'mpe' dylib library
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/src
  creating build/temp.linux-x86_64-3.7/src/lib-pmpi
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c src/lib-pmpi/mpe.c -o build/temp.linux-x86_64-3.7/src/lib-pmpi/mpe.o
  creating build/lib.linux-x86_64-3.7/mpi4py/lib-pmpi
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--no-as-needed build/temp.linux-x86_64-3.7/src/lib-pmpi/mpe.o -o build/lib.linux-x86_64-3.7/mpi4py/lib-pmpi/libmpe.so
  checking for library 'vt-mpi' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt-mpi -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt-mpi
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  checking for library 'vt.mpi' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt.mpi -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt.mpi
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  building 'vt' dylib library
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c src/lib-pmpi/vt.c -o build/temp.linux-x86_64-3.7/src/lib-pmpi/vt.o
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--no-as-needed build/temp.linux-x86_64-3.7/src/lib-pmpi/vt.o -o build/lib.linux-x86_64-3.7/mpi4py/lib-pmpi/libvt.so
  checking for library 'vt-mpi' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt-mpi -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt-mpi
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  checking for library 'vt.mpi' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt.mpi -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt.mpi
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  building 'vt-mpi' dylib library
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c src/lib-pmpi/vt-mpi.c -o build/temp.linux-x86_64-3.7/src/lib-pmpi/vt-mpi.o
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--no-as-needed build/temp.linux-x86_64-3.7/src/lib-pmpi/vt-mpi.o -o build/lib.linux-x86_64-3.7/mpi4py/lib-pmpi/libvt-mpi.so
  checking for library 'vt-hyb' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt-hyb -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt-hyb
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  checking for library 'vt.ompi' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -lvt.ompi -o _configtest
  /home/jkoo/anaconda3/envs/dh/compiler_compat/ld: cannot find -lvt.ompi
  collect2: error: ld returned 1 exit status
  failure.
  removing: _configtest.c _configtest.o
  building 'vt-hyb' dylib library
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c src/lib-pmpi/vt-hyb.c -o build/temp.linux-x86_64-3.7/src/lib-pmpi/vt-hyb.o
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--no-as-needed build/temp.linux-x86_64-3.7/src/lib-pmpi/vt-hyb.o -o build/lib.linux-x86_64-3.7/mpi4py/lib-pmpi/libvt-hyb.so
  running build_ext
  MPI configuration: [mpi] from 'mpi.cfg'
  checking for dlopen() availability ...
  checking for header 'dlfcn.h' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c _configtest.c -o _configtest.o
  success!
  removing: _configtest.c _configtest.o
  success!
  checking for library 'dl' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -Lbuild/temp.linux-x86_64-3.7 -ldl -o _configtest
  success!
  removing: _configtest.c _configtest.o _configtest
  checking for function 'dlopen' ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c _configtest.c -o _configtest.o
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ _configtest.o -Lbuild/temp.linux-x86_64-3.7 -ldl -o _configtest
  success!
  removing: _configtest.c _configtest.o _configtest
  building 'mpi4py.dl' extension
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c src/dynload.c -o build/temp.linux-x86_64-3.7/src/dynload.o
  gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/src/dynload.o -Lbuild/temp.linux-x86_64-3.7 -ldl -o build/lib.linux-x86_64-3.7/mpi4py/dl.cpython-37m-x86_64-linux-gnu.so
  checking for MPI compile and link ...
  gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c _configtest.c -o _configtest.o
  _configtest.c:2:17: fatal error: mpi.h: No such file or directory
   #include <mpi.h>
                   ^
  compilation terminated.
  failure.
  removing: _configtest.c _configtest.o
  error: Cannot compile MPI programs. Check your configuration!!!
  ----------------------------------------
  ERROR: Failed building wheel for mpi4py
  Running setup.py clean for mpi4py
Failed to build horovod mpi4py
DEPRECATION: Could not build wheels for horovod, mpi4py which do not use PEP 517. pip will fall back to legacy 'setup.py install' for these. pip 21.0 will remove support for this functionality. A possible replacement is to fix the wheel build issue reported above. You can find discussion regarding this at https://github.com/pypa/pip/issues/8368.
Installing collected packages: termcolor, six, absl-py, gast, numpy, keras-preprocessing, opt-einsum, astor, google-pasta, tensorflow-estimator, grpcio, h5py, keras-applications, wrapt, zipp, importlib-metadata, markdown, protobuf, werkzeug, tensorboard, tensorflow, scipy, joblib, threadpoolctl, scikit-learn, pyyaml, pyaml, dh-scikit-optimize, xgboost, deap, soupsieve, beautifulsoup4, google, opencensus-context, cachetools, pyasn1, pyasn1-modules, rsa, google-auth, googleapis-common-protos, idna, chardet, urllib3, requests, pytz, google-api-core, opencensus, py-spy, filelock, prometheus-client, msgpack, colorful, psutil, nvidia-ml-py3, blessings, gpustat, multidict, attrs, typing-extensions, yarl, async-timeout, aiohttp, click, redis, pyrsistent, jsonschema, hiredis, aioredis, colorama, ray, cloudpickle, pycparser, cffi, horovod, decorator, networkx, python-dateutil, pandas, sphinxcontrib-serializinghtml, imagesize, alabaster, MarkupSafe, Jinja2, sphinxcontrib-applehelp, sphinxcontrib-htmlhelp, sphinxcontrib-devhelp, snowballstemmer, pyparsing, packaging, Pygments, docutils, babel, sphinxcontrib-qthelp, sphinxcontrib-jsmath, sphinx, sphinx-rtd-theme, psycopg2-binary, mpi4py, sqlparse, asgiref, django, balsam-flow, pydot, keras, tqdm, cython, ConfigSpace, future, pyglet, gym, deephyper
    Running setup.py install for horovod ... error
    ERROR: Command errored out with exit status 1:
     command: /home/jkoo/anaconda3/envs/dh/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lznddl7n/install-record.txt --single-version-externally-managed --compile --install-headers /home/jkoo/anaconda3/envs/dh/include/python3.7m/horovod
         cwd: /tmp/pip-install-9o1ukjdt/horovod/
    Complete output (271 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/horovod
    copying horovod/__init__.py -> build/lib.linux-x86_64-3.7/horovod
    creating build/lib.linux-x86_64-3.7/horovod/_keras
    copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/_keras
    copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/_keras
    creating build/lib.linux-x86_64-3.7/horovod/common
    copying horovod/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/common
    copying horovod/common/basics.py -> build/lib.linux-x86_64-3.7/horovod/common
    copying horovod/common/util.py -> build/lib.linux-x86_64-3.7/horovod/common
    creating build/lib.linux-x86_64-3.7/horovod/keras
    copying horovod/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/keras
    copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/keras
    creating build/lib.linux-x86_64-3.7/horovod/mxnet
    copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-3.7/horovod/mxnet
    copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/mxnet
    creating build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/gloo_run.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/js_run.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/mpi_run.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/run_task.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/runner.py -> build/lib.linux-x86_64-3.7/horovod/run
    copying horovod/run/task_fn.py -> build/lib.linux-x86_64-3.7/horovod/run
    creating build/lib.linux-x86_64-3.7/horovod/spark
    copying horovod/spark/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark
    copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-3.7/horovod/spark
    copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-3.7/horovod/spark
    copying horovod/spark/runner.py -> build/lib.linux-x86_64-3.7/horovod/spark
    creating build/lib.linux-x86_64-3.7/horovod/tensorflow
    copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
    copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
    copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
    copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow
    creating build/lib.linux-x86_64-3.7/horovod/torch
    copying horovod/torch/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch
    copying horovod/torch/compression.py -> build/lib.linux-x86_64-3.7/horovod/torch
    copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/torch
    copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-3.7/horovod/torch
    creating build/lib.linux-x86_64-3.7/horovod/run/common
    copying horovod/run/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common
    creating build/lib.linux-x86_64-3.7/horovod/run/driver
    copying horovod/run/driver/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/driver
    copying horovod/run/driver/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/run/driver
    creating build/lib.linux-x86_64-3.7/horovod/run/http
    copying horovod/run/http/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/http
    copying horovod/run/http/http_client.py -> build/lib.linux-x86_64-3.7/horovod/run/http
    copying horovod/run/http/http_server.py -> build/lib.linux-x86_64-3.7/horovod/run/http
    creating build/lib.linux-x86_64-3.7/horovod/run/task
    copying horovod/run/task/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/task
    copying horovod/run/task/task_service.py -> build/lib.linux-x86_64-3.7/horovod/run/task
    creating build/lib.linux-x86_64-3.7/horovod/run/util
    copying horovod/run/util/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/util
    copying horovod/run/util/cache.py -> build/lib.linux-x86_64-3.7/horovod/run/util
    copying horovod/run/util/lsf.py -> build/lib.linux-x86_64-3.7/horovod/run/util
    copying horovod/run/util/network.py -> build/lib.linux-x86_64-3.7/horovod/run/util
    copying horovod/run/util/threads.py -> build/lib.linux-x86_64-3.7/horovod/run/util
    creating build/lib.linux-x86_64-3.7/horovod/run/common/service
    copying horovod/run/common/service/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
    copying horovod/run/common/service/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
    copying horovod/run/common/service/task_service.py -> build/lib.linux-x86_64-3.7/horovod/run/common/service
    creating build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/__init__.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/codec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/config_parser.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/env.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/host_hash.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/network.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/secret.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/settings.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/timeout.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    copying horovod/run/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-3.7/horovod/run/common/util
    creating build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/params.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/store.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    copying horovod/spark/common/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/common
    creating build/lib.linux-x86_64-3.7/horovod/spark/driver
    copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
    copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
    copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
    copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
    copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-3.7/horovod/spark/driver
    creating build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/keras
    creating build/lib.linux-x86_64-3.7/horovod/spark/task
    copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
    copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
    copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
    copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
    copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-3.7/horovod/spark/task
    creating build/lib.linux-x86_64-3.7/horovod/spark/torch
    copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
    copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
    copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
    copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-3.7/horovod/spark/torch
    creating build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
    copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
    copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow/keras
    creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib
    copying horovod/torch/mpi_lib/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib
    creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl
    copying horovod/torch/mpi_lib_impl/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl
    running build_ext
    gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -fPIC -O3 -Wall -fassociative-math -ffast-math -ftree-vectorize -funsafe-math-optimizations -mavx -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
    gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
    gcc -pthread -B /home/jkoo/anaconda3/envs/dh/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jkoo/anaconda3/envs/dh/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
    gcc -pthread -shared -B /home/jkoo/anaconda3/envs/dh/compiler_compat -L/home/jkoo/anaconda3/envs/dh/lib -Wl,-rpath=/home/jkoo/anaconda3/envs/dh/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--version-script=horovod.lds build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 341, in get_mpi_flags
        shlex.split(show_command), universal_newlines=True).strip()
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'mpicxx': 'mpicxx'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 622, in get_common_options
        mpi_flags = get_mpi_flags()
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 354, in get_mpi_flags
        '%s' % (show_command, traceback.format_exc()))
    distutils.errors.DistutilsPlatformError: mpicxx -show failed (see error below), is MPI in $PATH?
    Note: If your version of MPI has a custom command to show compilation flags, please specify it with the HOROVOD_MPICXX_SHOW environment variable.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 341, in get_mpi_flags
        shlex.split(show_command), universal_newlines=True).strip()
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/home/jkoo/anaconda3/envs/dh/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'mpicxx': 'mpicxx'
    
    
    INFO: Cannot find MPI compilation flags, will skip compiling with MPI.
    INFO: Compiler /usr/bin/g++ (version 4.8.5 20150623 (Red Hat 4.8.5-36)) is not usable for this TensorFlow installation. Require g++ (version >=7.3.1 20180303, <999).
    INFO: Unable to build TensorFlow plugin, will skip it.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1526, in build_extensions
        build_tf_extension(self, options)
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 989, in build_tf_extension
        'Could not find compiler compatible with this TensorFlow installation.\n'
    distutils.errors.DistutilsPlatformError: Could not find compiler compatible with this TensorFlow installation.
    Please check the Horovod website for recommended compiler versions.
    To force a specific compiler version, set CC and CXX environment variables.
    
    INFO: Unable to build PyTorch plugin, will skip it.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1134, in check_torch_version
        import torch
    ModuleNotFoundError: No module named 'torch'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1538, in build_extensions
        torch_version = check_torch_version()
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1141, in check_torch_version
        'import torch failed, is it installed?\n\n%s' % traceback.format_exc())
    distutils.errors.DistutilsPlatformError: import torch failed, is it installed?
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1134, in check_torch_version
        import torch
    ModuleNotFoundError: No module named 'torch'
    
    
    -- The CXX compiler identification is GNU 4.8.5
    -- The C compiler identification is GNU 4.8.5
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /tmp/pip-install-9o1ukjdt/horovod/build/temp.linux-x86_64-3.7/gloo/mxnet
    Scanning dependencies of target gloo
    [  3%] [  6%] [  9%] [ 12%] Building CXX object gloo/CMakeFiles/gloo.dir/algorithm.cc.o
    Building CXX object gloo/CMakeFiles/gloo.dir/allgather.cc.o
    Building CXX object gloo/CMakeFiles/gloo.dir/allreduce.cc.o
    Building CXX object gloo/CMakeFiles/gloo.dir/allgatherv.cc.o
    [ 16%] Building CXX object gloo/CMakeFiles/gloo.dir/allreduce_local.cc.o
    [ 19%] Building CXX object gloo/CMakeFiles/gloo.dir/barrier.cc.o
    [ 22%] Building CXX object gloo/CMakeFiles/gloo.dir/broadcast.cc.o
    [ 25%] Building CXX object gloo/CMakeFiles/gloo.dir/context.cc.o
    [ 29%] Building CXX object gloo/CMakeFiles/gloo.dir/gather.cc.o
    [ 32%] Building CXX object gloo/CMakeFiles/gloo.dir/reduce.cc.o
    [ 35%] Building CXX object gloo/CMakeFiles/gloo.dir/scatter.cc.o
    [ 38%] Building CXX object gloo/CMakeFiles/gloo.dir/types.cc.o
    [ 41%] Building CXX object gloo/CMakeFiles/gloo.dir/common/logging.cc.o
    [ 45%] Building CXX object gloo/CMakeFiles/gloo.dir/common/linux.cc.o
    [ 48%] [ 51%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/context.cc.o
    Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/file_store.cc.o
    [ 54%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/hash_store.cc.o
    [ 58%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/prefix_store.cc.o
    [ 61%] Building CXX object gloo/CMakeFiles/gloo.dir/rendezvous/store.cc.o
    [ 64%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/address.cc.o
    [ 67%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/buffer.cc.o
    [ 70%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/context.cc.o
    [ 74%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/device.cc.o
    [ 77%] [ 80%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/pair.cc.o
    Building CXX object gloo/CMakeFiles/gloo.dir/transport/unbound_buffer.cc.o
    [ 83%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/address.cc.o
    [ 87%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/buffer.cc.o
    [ 90%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/context.cc.o
    [ 93%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/device.cc.o
    [ 96%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/pair.cc.o
    [100%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/unbound_buffer.cc.o
    Linking CXX static library /tmp/pip-install-9o1ukjdt/horovod/build/temp.linux-x86_64-3.7/lib/mxnet/libgloo.a
    [100%] Built target gloo
    INFO: Unable to build MXNet plugin, will skip it.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 91, in check_mx_version
        import mxnet as mx
    ModuleNotFoundError: No module named 'mxnet'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1554, in build_extensions
        build_mx_extension(self, options)
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 1082, in build_mx_extension
        check_mx_version()
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 98, in check_mx_version
        'import mxnet failed, is it installed?\n\n%s' % traceback.format_exc())
    distutils.errors.DistutilsPlatformError: import mxnet failed, is it installed?
    
    Traceback (most recent call last):
      File "/tmp/pip-install-9o1ukjdt/horovod/setup.py", line 91, in check_mx_version
        import mxnet as mx
    ModuleNotFoundError: No module named 'mxnet'
    
    
    error: None of TensorFlow, PyTorch, or MXNet plugins were built. See errors above.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/jkoo/anaconda3/envs/dh/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9o1ukjdt/horovod/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lznddl7n/install-record.txt --single-version-externally-managed --compile --install-headers /home/jkoo/anaconda3/envs/dh/include/python3.7m/horovod Check the logs for full command output.

Desktop (please complete the following information):

  • OS: Linux fedora
  • System: JLSE, GCE
  • Python version: 3.7
  • DeepHyper Version: 0.1.11?

Running NAS mpi4py import failed on windows

Hi

I am wondering why your base example does not work!!!!!!

python -m deephyper.search.nas.ppo_a3c_sync --problem deephyper.benchmark.nas.mnist1D.problem.Problem --run deephyper.search.nas.model.run.alpha.run

By its running, the following error will occur :

from mpi4py import MPI

ImportError: DLL load failed: The specified module could not be found.

and the mpi4py has been installed successfully.

Can anyone have any suggestions?

CLI --help flag does not work

I just tried deephyper --help, deephyper nas --help, and deephyper nas regevo --help as suggested in https://deephyper.readthedocs.io/en/latest/usage/command_line.html, and they all return (the standard) TensorFlow warnings and MPI errors instead of any help:

/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/anaconda3/envs/frnn/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Fatal error in MPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(572)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(178).............:
MPID_nem_tcp_get_business_card(425):
MPID_nem_tcp_init(384).............: gethostbyname failed, K-MBP-ANL.local (errno 1)
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=3191567
:
system msg for write_line failure : Bad file descriptor

[BUG] Deephyper and Ray Cluster using GPUs on Cori

Describe the bug
I'm facing an issue when I try to start deephyper running on ray cluster and allocating GPUs on Cori (NERSC).
I'm using Tuster to deploy everything but when Tuster (most likely) attempts to submit the job with srun then I receive the following error on the --gres argument.

srun: error: Unable to create step for job X: Invalid generic resource (gres) specification

This is the sbatch script I'm submitting: Run_Batch_Ray_Bepop_GPU.zip

To Reproduce
Steps to reproduce the behavior:
module load cgpu
sbatch Run_Batch_Ray_Bepop_GPU.sh

Desktop (please complete the following information):

  • System: Cori dev with GPUs
  • Python version: 3.7.4
  • DeepHyper Version: 0.2.1
  • Tuster version: 0.0.1
  • Ray version: 0.7.6

Could you please point me to where I should look to debug and resolve this issue?

Add ability to log output quantities other than the objective

The objective of a problem is defined by the return value of run(). As far as I know, this is the only way that a quantity other than the search space dimensions will ever be logged to results.csv by the evaluators.

It might be useful to track other output metrics that are not directly optimized by the search algorithm. I could envision a counterpart to:

def add_dim(self, p_name, p_space):
"""Add a dimension to the search space.
Args:
p_name (str): name of the parameter/dimension.
p_space (Object): space corresponding to the new dimension.
"""
self._space[p_name] = p_space

e.g. add_output(out_name) that defines a dictionary key that could be written to inside run().

If there is already a convenient way to do this inside DH, please let me know

[BUG] Aging Evolution Search Crash for NAS when one of the Variable Nodes has only one possible operation

Describe the bug

The search process crash:

Uncaught Exception <class 'ValueError'>: 'a' cannot be empty unless no samples are taken
  12     Traceback (most recent call last):
  13       File "/home/rmaulik/.conda/envs/ae_search_env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
  14         "__main__", mod_spec)
  15       File "/home/rmaulik/.conda/envs/ae_search_env/lib/python3.7/runpy.py", line 85, in _run_code
  16         exec(code, run_globals)
  17       File "/lus/theta-fs0/projects/datascience/rmaulik/AE_Search/deephyper_repo/deephyper/search/nas/regevo.py", line 160, in <module>
  18         search.main()
  19       File "/lus/theta-fs0/projects/datascience/rmaulik/AE_Search/deephyper_repo/deephyper/search/nas/regevo.py", line 105, in main
  20         child = self.copy_mutate_arch(parent)
  21       File "/lus/theta-fs0/projects/datascience/rmaulik/AE_Search/deephyper_repo/deephyper/search/nas/regevo.py", line 149, in copy_mutate_arch
  22         sample = np.random.choice(elements, 1)[0]
  23       File "mtrand.pyx", line 900, in numpy.random.mtrand.RandomState.choice
  24     ValueError: 'a' cannot be empty unless no samples are taken
  25     Application 22199611 exit codes: 1
  26     Application 22199611 resources: utime ~36s, stime ~18s, Rss ~296852, inblocks ~18490, outblocks ~1416

Additional context

The issue is coming from

    def copy_mutate_arch(self, parent_arch: list) -> dict:
        """
        # ! Time performance is critical because called sequentialy
        Args:
            parent_arch (list(int)): [description]
        Returns:
            dict: [description]
        """
        i = np.random.choice(len(parent_arch))
        child_arch = parent_arch[:]
        range_upper_bound = self.space_list[i][1]
        elements = [j for j in range(range_upper_bound + 1) if j != child_arch[i]]
        # The mutation has to create a different search_space!
        sample = np.random.choice(elements, 1)[0]
        child_arch[i] = sample
        cfg = self.pb_dict.copy()
        cfg["arch_seq"] = child_arch
        return cfg

so the ligne sample = np.random.choice(elements, 1)[0] is failing because len(elements) == 0 (cf. Numpy Doc) it means that one of the Variable Nodes of the search space has only 1 operation

Benchmarks should load their data automatically

Some benchmarks like deephyper.benchmark.hps.b2 are not checking if data are already loaded locally and if not download them. Benchmarks are here for different kind of tests: software testing, efficiency of algorithms...etc.

Not user friendly exception while giving a none existing path

Executing:

python -m deephyper.search.hps.ambs --problem ../deephyper/deephyper/benchmark/hps/toyCond/problem.py --run ../deephyper/deephyper/benchmark/hps/toyCond/problem.py --evaluator subprocess

Stacktrace example

Uncaught exception <class 'TypeError'>: the 'package' argument is required to perform a relative import for '../deephyper/deephyper/benchmark/hps/toyCond/problem'Traceback (most recent call last):
  File "/Users/romainegele/opt/anaconda3/envs/dh/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/romainegele/opt/anaconda3/envs/dh/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/romainegele/Documents/Argonne/deephyper/deephyper/search/hps/ambs.py", line 115, in <module>
    search = AMBS(**vars(args))
  File "/Users/romainegele/Documents/Argonne/deephyper/deephyper/search/hps/ambs.py", line 48, in __init__
    super().__init__(problem, run, evaluator, **kwargs)
  File "/Users/romainegele/Documents/Argonne/deephyper/deephyper/search/search.py", line 43, in __init__
    self.problem = util.generic_loader(problem, 'Problem')
  File "/Users/romainegele/Documents/Argonne/deephyper/deephyper/search/util.py", line 159, in generic_loader
    return load_attr_from(target)
  File "/Users/romainegele/Documents/Argonne/deephyper/deephyper/search/util.py", line 123, in load_attr_from
    module = import_module(str_module)
  File "/Users/romainegele/opt/anaconda3/envs/dh/lib/python3.7/importlib/__init__.py", line 122, in import_module
    raise TypeError(msg.format(name))
TypeError: the 'package' argument is required to perform a relative import for '../deephyper/deephyper/benchmark/hps/toyCond/problem'

autodoc not working with readthedocs

The autodoc instructions are not appearing in the documentation build. I have inspected
logs of the building process which I attached to this issue.
build_logs.txt
It's look like this is due to a syntax error because we are using the new format string syntax with f'...'. The interpreter selected in readthedocs settings is CPython 3.X.

DeepHyper with TF 2.0

Hello,

I will like to know if it's possible to get DeepHyper support for TensorFlow 2.x

Thx,
FM

Feature importance based on tree interpreter

  1. First I created a csv like the following for all the operations, the last column is the reward.
    nas_result.csv

dim(4)[cell1],repeat(1)[cell1],attn(linear)[cell1],head(1)[cell1],aggr(mean)[cell1],update(mlp)[cell1],act(linear)[cell1],skip[link1],dim(16)[cell2],repeat(1)[cell2],attn(const)[cell2],head(4)[cell2],aggr(mean)[cell2],update(mlp)[cell2],act(linear)[cell2],skip[link2],skip[link3],dim(16)[cell3],repeat(1)[cell3],attn(gat)[cell3],head(1)[cell3],aggr(max)[cell3],update(mlp)[cell3],act(softplus)[cell3],connect[link4],skip[link5],skip[link6],GlobalSumPool(node),-0.3519964814186096
dim(16)[cell1],repeat(1)[cell1],attn(gen-linear)[cell1],head(1)[cell1],aggr(sum)[cell1],update(gru)[cell1],act(elu)[cell1],connect[link1],dim(16)[cell2],repeat(1)[cell2],attn(gcn)[cell2],head(4)[cell2],aggr(mean)[cell2],update(gru)[cell2],act(tanh)[cell2],connect[link2],connect[link3],dim(8)[cell3],repeat(1)[cell3],attn(gen-linear)[cell3],head(2)[cell3],aggr(max)[cell3],update(mlp)[cell3],act(relu)[cell3],skip[link4],connect[link5],connect[link6],GlobalAvgPool(node),-0.3885207176208496

  1. Then, I used the following function
from treeinterpreter import treeinterpreter as ti

def feature_importance(DATA_DIR, PLOT_DIR):
    train_data = pd.read_csv(DATA_DIR + 'nas_result.csv', header=None)
    df = train_data
    df_new = pd.DataFrame()
    for i in range(df.shape[1]):
        if (df.dtypes[i] == 'object'):
            vals = pd.get_dummies(df.iloc[:,i])
        else:
            vals = df.iloc[:,i]
        df_new = pd.concat([df_new.reset_index(drop=True), vals.reset_index(drop=True)], axis=1)
    X = df_new.iloc[:, :-1]
    y = df_new.iloc[:, -1]
    scaler = StandardScaler()
    y = scaler.fit_transform(y.values[..., np.newaxis]).squeeze()
    reg = RandomForestRegressor(n_estimators=100, random_state=0).fit(X.values, y)
    prediction, bias, contributions = ti.predict(reg, X.values)
    mask = np.copy(X.values)
    mask = mask.astype(float)
    mask[mask==0] = -1
    importance = np.multiply(contributions, mask)
    importance = importance.mean(axis=0)
    importance = importance / np.max(np.abs(importance))
    indices = np.argsort(importance)[-5:] # 5 positive operations
    indices_neg = np.argsort(importance)[:5] # 5 negative operations
    plt.figure(figsize=(8,3.5))
    plt.barh(range(5, 10), importance[indices], align='center')
    plt.barh(range(5), importance[indices_neg], align='center')
    plt.yticks(range(10), [X.columns[i] for i in indices_neg] + [X.columns[i] for i in indices] )
    plt.xlabel('Relative Importance')
    plt.tight_layout()
    plt.savefig(PLOT_DIR + 'feature_importance.png', dpi=300, bbox_inches='tight')

[BUG] deephyper on laptop results in error about numpy.ndarray size

I tried this example from the documentation on my laptop:
deephyper hps ambs --problem deephyper.benchmark.hps.polynome2.Problem --run deephyper.benchmark.hps.polynome2.run

I got this error:

Traceback (most recent call last):
File "/opt/anaconda3/envs/deephyper-Feb21/bin/deephyper", line 33, in
sys.exit(load_entry_point('deephyper', 'console_scripts', 'deephyper')())
File "/opt/anaconda3/envs/deephyper-Feb21/bin/deephyper", line 25, in importlib_load_entry_point
return next(matches).load()
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/importlib_metadata/init.py", line 98, in load
module = import_module(match.group('module'))
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/blusch/deephyper/deephyper/core/cli/init.py", line 1, in
from deephyper.core.cli.cli import main
File "/Users/blusch/deephyper/deephyper/core/cli/cli.py", line 9, in
from deephyper.core.cli import hps, nas, balsam_submit
File "/Users/blusch/deephyper/deephyper/core/cli/hps.py", line 6, in
from deephyper.search.util import load_attr_from
File "/Users/blusch/deephyper/deephyper/search/init.py", line 6, in
from deephyper.search.search import Search
File "/Users/blusch/deephyper/deephyper/search/search.py", line 8, in
from deephyper.evaluator.evaluate import Evaluator
File "/Users/blusch/deephyper/deephyper/evaluator/init.py", line 5, in
from deephyper.evaluator.evaluate import Encoder
File "/Users/blusch/deephyper/deephyper/evaluator/evaluate.py", line 14, in
import skopt
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/init.py", line 45, in
from . import callbacks
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/callbacks.py", line 17, in
from skopt.utils import dump
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/utils.py", line 19, in
from .sampler import Sobol, Lhs, Hammersly, Halton, Grid
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/sampler/init.py", line 4, in
from .lhs import Lhs
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/sampler/lhs.py", line 9, in
from ..space import Space, Categorical
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/space/init.py", line 5, in
from .space import *
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/skopt/space/space.py", line 27, in
import ConfigSpace as CS
File "/opt/anaconda3/envs/deephyper-Feb21/lib/python3.7/site-packages/ConfigSpace/init.py", line 37, in
from ConfigSpace.configuration_space import Configuration,
File "ConfigSpace/configuration_space.pyx", line 39, in init ConfigSpace.configuration_space
File "ConfigSpace/hyperparameters.pyx", line 1, in init ConfigSpace.hyperparameters
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

To Reproduce
First I created a fresh conda environment to install deephyper:
conda create -n deephyper-Feb21 python=3.7
conda activate deephyper-Feb21
git clone https://github.com/deephyper/deephyper.git
cd deephyper/
pip install -e .

Then I ran this example from the deephyper documentation:
deephyper hps ambs --problem deephyper.benchmark.hps.polynome2.Problem --run deephyper.benchmark.hps.polynome2.run
(from this page: https://deephyper.readthedocs.io/en/latest/run/local.html)

Desktop (please complete the following information):

[BUG] mpirun not found when starting an AMBS job on ALCF's Cooley

Describe the bug
AMBS jobs fail on ALCF's Cooley within about a minute of starting. The log file shows the error "FileNotFoundError: [Errno 2] No such file or directory: 'mpirun'". Jobs run successfully on ALCF's Theta.

To Reproduce
Set up and submit an AMBS job on ALFC's Cooley using a command like the following:

deephyper balsam-submit hps currentopt-cooley-06 -p g4blopt-cooley2.currentopt.problem.Problem -r g4blopt-cooley2.currentopt.model_run.run -t 720 -q default -n 4 -A HiMB_Beamline -j serial

Expected behavior
The AMBS job runs and completes successfully.

Desktop (please complete the following information):

  • OS: Red Hat Enterprise Linux Server 7.8 (Maipo)
  • System: Cooley
  • Python version: 3.6
  • DeepHyper Version [e.g. 0.1.11]: 0.1.11

Log file contents

[BalsamDB: g4bldb-cooley] (dh-env-cooley2)[valetov@cooleylogin1 g4bl]$ cat g4bldb-cooley/log/launcher_2020-09-20_235849.log 

20-Sep-2020 23:58:49|1371|    INFO|balsam.launcher.launcher:442] Loading Balsam Launcher
20-Sep-2020 23:58:49|1371|    INFO|balsam.core.transitions:65] Starting 5 transition processes
20-Sep-2020 23:58:49|1371|    INFO|balsam.launcher.worker:53] Built 4 COOLEY workers
20-Sep-2020 23:58:49|1371|    INFO|balsam.launcher.launcher:381] Starting MPI Fork ensemble process:
mpirun -n 4 --ppn 1  --hosts cc066.cooley,cc108.cooley,cc112.cooley,cc039.cooley   /home/valetov/anaconda/x86_64/envs/dh-env-cooley2/bin/python /home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/site-packages/balsam/launcher/mpi_ensemble.py --time-limit-min=716.999999988079 --wf-name=currentopt-cooley-06
20-Sep-2020 23:58:49|1389|    INFO|balsam.core.transitions:207] EXIT_FLAG: exiting main loop
20-Sep-2020 23:58:50|1390|    INFO|balsam.core.transitions:207] EXIT_FLAG: exiting main loop
20-Sep-2020 23:58:50|1388|    INFO|balsam.core.transitions:207] EXIT_FLAG: exiting main loop
20-Sep-2020 23:58:50|1387|    INFO|balsam.core.transitions:207] EXIT_FLAG: exiting main loop
20-Sep-2020 23:58:50|1386|    INFO|balsam.core.transitions:207] EXIT_FLAG: exiting main loop
20-Sep-2020 23:58:50|1371|    INFO|balsam.core.transitions:78] All Transition processes joined: done.
20-Sep-2020 23:58:50|1371|    INFO|balsam.launcher.launcher:428] Exit: Launcher exit graceful



20-Sep-2020 23:58:50|1371|   ERROR|balsam:47] Uncaught Exception <class 'FileNotFoundError'>: [Errno 2] No such file or directory: 'mpirun'
Traceback (most recent call last):
  File "/home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/site-packages/balsam/launcher/launcher.py", line 443, in <module>
    main(args)
  File "/home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/site-packages/balsam/launcher/launcher.py", line 423, in main
    launcher.run()
  File "/home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/site-packages/balsam/launcher/launcher.py", line 389, in run
    shell=False)
  File "/home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/home/valetov/anaconda/x86_64/envs/dh-env-cooley2/lib/python3.6/subprocess.py", line 1333, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'mpirun'

Issue with relative "Problem" import in HPS demo

I've been working through the HPS demo and have run into an issue with the relative imports trying to run the scan. Executing

deephyper hps ambs --problem polynome2.problem.Problem --run polynome2.model_run.run

Throws: ModuleNotFoundError: No module named 'polynome2'

I can, however, get the HPS to run if I instead give the explicit full path:

deephyper hps ambs --problem $(PATH_TO_DEMO)/demo/demo/polynome2/problem.py --run $(PATH_TO_DEMO)/demo/demo/polynome2/model_run.py

Hopefully this isn't something obvious I'm missing, I didn't see any other issues or notes raised on this. Thanks!


Full traceback:

Uncaught exception <class 'deephyper.core.exceptions.loading.GenericLoaderError'>: Traceback (most recent call last):
  File "/gpfs/mira-home/tburch/dh-env/lib/python3.7/site-packages/deephyper/search/util.py", line 164, in generic_loader
    attr = load_attr_from(target)
  File "/gpfs/mira-home/tburch/dh-env/lib/python3.7/site-packages/deephyper/search/util.py", line 124, in load_attr_from
    module = importlib.import_module(str_module)
  File "/gpfs/mira-home/tburch/dh-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)

Not working HPS Example

I am running the following example:
https://deephyper.readthedocs.io/en/latest/benchmark/hps.html

I put the problem.py and model_ren.py in the same folder called Test. Once I am running the code, just model_ren.py code works properly and uses the default configurations. The issue is that the file problem.py is not matched well with the model_ren.py . The authors proposed IOS import and I also gave the path to the file like the following:

here = os.path.dirname(os.path.abspath('C:\Users\Matt\deephyper-master\deephyper\benchmark\nas\Test'))
top = os.path.dirname(os.path.dirname(os.path.dirname(here)))
sys.path.append(top)
BNAME = os.path.splitext(os.path.basename('problem'))[0]

But nothing happens.

[BUG] Unable to run NAS using the commands from the readme file

Describe the bug
I followed the instructions of Quickstart section in the README.md. However, an error was generated when I run the following command.

deephyper nas ambs --evaluator ray --problem deephyper.benchmark.nas.polynome2Reg.Problem --n-jobs 1

The error text looks like there is no dense_skipco module in this repository.

  File "/home/hyliu/anl/work/deephyper/deephyper/benchmark/nas/polynome2Reg/__init__.py", line 1, in <module>
    from deephyper.benchmark.nas.polynome2Reg.problem import Problem
  File "/home/hyliu/anl/work/deephyper/deephyper/benchmark/nas/polynome2Reg/problem.py", line 3, in <module>
    from deephyper.nas.space.dense_skipco import create_search_space
ModuleNotFoundError: No module named 'deephyper.nas.space.dense_skipco'

I tried another command but ended up in another error.

deephyper nas ambs --evaluator subprocess --problem deephyper.benchmark.nas.linearReg.Problem

and the error message:

(test001) [hyliu@in02 deephyper]$ deephyper nas ambs --evaluator subprocess --problem deephyper.benchmark.nas.linearReg.Problem
2021-01-26 23:20:07.103393: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
 ************************************************************************
   Maximizing the return value of function: deephyper.nas.run.alpha.run
 ************************************************************************
Uncaught exception <class 'TypeError'>: __init__() got multiple values for argument 'num_workers'Traceback (most recent call last):
  File "/home/hyliu/miniconda/miniconda3/envs/test001/bin/deephyper", line 33, in <module>
    sys.exit(load_entry_point('deephyper', 'console_scripts', 'deephyper')())
  File "/home/hyliu/anl/work/deephyper/deephyper/core/cli/cli.py", line 42, in main
    args.func(**vars(args))
  File "/home/hyliu/anl/work/deephyper/deephyper/core/cli/nas.py", line 37, in main
    search_obj = search_cls(**kwargs)
  File "/home/hyliu/anl/work/deephyper/deephyper/search/nas/ambs.py", line 69, in __init__
    **kwargs,
TypeError: __init__() got multiple values for argument 'num_workers'

To Reproduce
To set up the environment, I made these steps.

  1. conda create --name test001 python=3.7
  2. conda activate test001
  3. mkdir work && cd work
  4. git clone [email protected]:deephyper/deepspace.git
  5. git clone [email protected]:deephyper/deephyper.git
  6. pip install -e deephyper
  7. pip install -e deepspace

Then, run the two commands to generate the error messages above:

deephyper nas ambs --evaluator ray --problem deephyper.benchmark.nas.polynome2Reg.Problem --n-jobs 1
deephyper nas ambs --evaluator subprocess --problem deephyper.benchmark.nas.linearReg.Problem

Expected behavior
The framework does the NAS on the two problems.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Red Hat Enterprise Linux Server release 7.2
  • Python version 3.7.9
  • DeepHyper Version a267505

pip list

Package                  Version             Location
------------------------ ------------------- ------------------------------------
absl-py                  0.11.0
aiohttp                  3.7.3
aiohttp-cors             0.7.0
aioredis                 1.3.1
antlr4-python3-runtime   4.7.2
astunparse               1.6.3
async-timeout            3.0.1
attrs                    19.1.0
blessings                1.7
cachetools               4.2.1
certifi                  2020.12.5
chardet                  3.0.4
click                    7.1.2
cloudpickle              1.6.0
colorama                 0.4.4
colorful                 0.5.4
ConfigSpace              0.4.16
cycler                   0.10.0
Cython                   0.29.21
deap                     1.3.1
decorator                4.4.0
deephyper                0.2.1               /home/hyliu/anl/work/deephyper
deepspace                0.0.3               /home/hyliu/anl/work/deepspace
dh-scikit-optimize       0.8.2
filelock                 3.0.12
future                   0.18.2
gast                     0.3.3
google-api-core          1.25.1
google-auth              1.24.0
google-auth-oauthlib     0.4.2
google-pasta             0.2.0
googleapis-common-protos 1.52.0
gpustat                  0.6.0
graphviz                 0.8.4
grpcio                   1.35.0
gym                      0.18.0
h5py                     2.10.0
hiredis                  1.1.0
idna                     2.10
importlib-metadata       3.4.0
Jinja2                   2.11.2
joblib                   1.0.0
jsonschema               3.2.0
Keras                    2.4.3
Keras-Preprocessing      1.1.2
kiwisolver               1.1.0
liac-arff                2.5.0
Markdown                 3.3.3
MarkupSafe               1.1.1
matplotlib               3.1.0
msgpack                  1.0.2
multidict                5.1.0
mxnet                    1.4.1
mypy                     0.710
mypy-extensions          0.4.1
networkx                 2.5
numpy                    1.18.5
nvidia-ml-py3            7.352.0
oauthlib                 3.1.0
opencensus               0.7.12
opencensus-context       0.1.2
openml                   0.10.2
opt-einsum               3.3.0
orderedset               2.0.1
pandas                   0.24.2
Pillow                   7.2.0
pip                      20.3.3
prometheus-client        0.9.0
protobuf                 3.14.0
psutil                   5.6.3
py-spy                   0.3.4
pyaml                    20.4.0
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pydot                    1.4.1
pyglet                   1.5.0
pyparsing                2.4.0
pyrsistent               0.17.3
python-dateutil          2.8.0
pytz                     2019.1
PyYAML                   5.4.1
ray                      1.1.0
redis                    3.5.3
requests                 2.25.1
requests-oauthlib        1.3.0
rsa                      4.7
scikit-learn             0.24.1
scipy                    1.3.0
setuptools               52.0.0.post20210125
six                      1.15.0
tensorboard              2.4.1
tensorboard-plugin-wit   1.8.0
tensorflow               2.3.2
tensorflow-estimator     2.3.0
termcolor                1.1.0
threadpoolctl            2.1.0
tornado                  6.0.2
tqdm                     4.56.0
typed-ast                1.4.0
typeguard                2.10.0
typing-extensions        3.7.4.3
urllib3                  1.26.3
Werkzeug                 1.0.1
wheel                    0.36.2
wrapt                    1.12.1
xgboost                  0.90
xmltodict                0.12.0
yarl                     1.6.3
zipp                     3.4.0

conda list

# packages in environment at /home/hyliu/miniconda/miniconda3/envs/test001:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.11.0                   pypi_0    pypi
aiohttp                   3.7.3                    pypi_0    pypi
aiohttp-cors              0.7.0                    pypi_0    pypi
aioredis                  1.3.1                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
async-timeout             3.0.1                    pypi_0    pypi
blessings                 1.7                      pypi_0    pypi
ca-certificates           2021.1.19            h06a4308_0  
cachetools                4.2.1                    pypi_0    pypi
certifi                   2020.12.5        py37h06a4308_0  
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
cloudpickle               1.6.0                    pypi_0    pypi
colorama                  0.4.4                    pypi_0    pypi
colorful                  0.5.4                    pypi_0    pypi
configspace               0.4.16                   pypi_0    pypi
cython                    0.29.21                  pypi_0    pypi
deap                      1.3.1                    pypi_0    pypi
deephyper                 0.2.1                     dev_0    <develop>
deepspace                 0.0.3                     dev_0    <develop>
dh-scikit-optimize        0.8.2                    pypi_0    pypi
filelock                  3.0.12                   pypi_0    pypi
future                    0.18.2                   pypi_0    pypi
gast                      0.3.3                    pypi_0    pypi
google-api-core           1.25.1                   pypi_0    pypi
google-auth               1.24.0                   pypi_0    pypi
google-auth-oauthlib      0.4.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
googleapis-common-protos  1.52.0                   pypi_0    pypi
gpustat                   0.6.0                    pypi_0    pypi
grpcio                    1.35.0                   pypi_0    pypi
gym                       0.18.0                   pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
hiredis                   1.1.0                    pypi_0    pypi
idna                      2.10                     pypi_0    pypi
importlib-metadata        3.4.0                    pypi_0    pypi
jinja2                    2.11.2                   pypi_0    pypi
joblib                    1.0.0                    pypi_0    pypi
jsonschema                3.2.0                    pypi_0    pypi
keras                     2.4.3                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
liac-arff                 2.5.0                    pypi_0    pypi
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
markdown                  3.3.3                    pypi_0    pypi
markupsafe                1.1.1                    pypi_0    pypi
msgpack                   1.0.2                    pypi_0    pypi
multidict                 5.1.0                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1  
networkx                  2.5                      pypi_0    pypi
numpy                     1.18.5                   pypi_0    pypi
nvidia-ml-py3             7.352.0                  pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
opencensus                0.7.12                   pypi_0    pypi
opencensus-context        0.1.2                    pypi_0    pypi
openml                    0.10.2                   pypi_0    pypi
openssl                   1.1.1i               h27cfd23_0  
opt-einsum                3.3.0                    pypi_0    pypi
pillow                    7.2.0                    pypi_0    pypi
pip                       20.3.3           py37h06a4308_0  
prometheus-client         0.9.0                    pypi_0    pypi
protobuf                  3.14.0                   pypi_0    pypi
py-spy                    0.3.4                    pypi_0    pypi
pyaml                     20.4.0                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pydot                     1.4.1                    pypi_0    pypi
pyglet                    1.5.0                    pypi_0    pypi
pyrsistent                0.17.3                   pypi_0    pypi
python                    3.7.9                h7579374_0  
pyyaml                    5.4.1                    pypi_0    pypi
ray                       1.1.0                    pypi_0    pypi
readline                  8.0                  h7b6447c_0  
redis                     3.5.3                    pypi_0    pypi
requests                  2.25.1                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.7                      pypi_0    pypi
scikit-learn              0.24.1                   pypi_0    pypi
setuptools                52.0.0           py37h06a4308_0  
six                       1.15.0                   pypi_0    pypi
sqlite                    3.33.0               h62c20be_0  
tensorboard               2.4.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tensorflow                2.3.2                    pypi_0    pypi
tensorflow-estimator      2.3.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               hbc83047_0  
tqdm                      4.56.0                   pypi_0    pypi
typeguard                 2.10.0                   pypi_0    pypi
typing-extensions         3.7.4.3                  pypi_0    pypi
urllib3                   1.26.3                   pypi_0    pypi
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0  
wrapt                     1.12.1                   pypi_0    pypi
xmltodict                 0.12.0                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0  
yarl                      1.6.3                    pypi_0    pypi
zipp                      3.4.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  

Additional context
Add any other context about the problem here.

Thank you so much for your help!

Best regards,
Hongyuan Liu

[BUG] Post-training pipeline raise: KeyError: 'post_train'

Encountered by @Romit-Maulik.

Edit by @Romit-Maulik to fill in details.

Describe the bug

Uncaught exception <class 'KeyError'>: 'post_train'Traceback (most recent call last):
  File "/lus/theta-fs0/projects/datascience/rmaulik/AE_Search/deephyper_repo/deephyper/evaluator/runner.py", line 36, in <module>
    retval = func(d)
  File "/lus/theta-fs0/projects/datascience/rmaulik/AE_Search/deephyper_repo/deephyper/post/pipeline.py", line 47, in train
    repeat = config["post_train"]["repeat"]
KeyError: 'post_train'

To Reproduce
After a successful deployment of NAS - I followed the steps in the NAS post-training documentation here. Briefly, these are:

  1. Create a new app for post-training
balsam app --name POST --exe "$(which python) -m deephyper.post.train"
  1. Find the best architectures from within a Balsam database.
deephyper-analytics json best -n 100 -p /home/rmaulik/project_link/AE_Search/swe_ae_search_db/data/swe_ensemble_0/swe_ensemble_0_0022b977/swe_ensemble_0_2021-03-06_21.json
  1. Create and submit a post-training job
balsam job --name post_ensemble_0 --workflow post_ensemble_0 --app POST --args '--evaluator balsam --problem swe_ae_search.swe_ensemble_0.problem.Problem --fbest /home/rmaulik/project_link/AE_Search/swe_ae_search_db/data/swe_ensemble_0/swe_ensemble_0_0022b977/best_archs.json'

and

balsam submit-launch -n 8 -q debug-cache-quad -t 60 -A datascience --job-mode mpi --wf-filter post_ensemble_0

Expected behavior
A post-training run of the best-architectures and their metrics in /home/rmaulik/project_link/AE_Search/swe_ae_search_db/data/post_ensemble_0. In addition, it would be desirable to save the trained models as *.h5 files in the different task subdirectories.

System
This was performed on Theta KNL.

Update documentation for NAS Analytics on Theta

Currently, documentation only has the following code snippet for analytics

deephyper-analytics json -h
usage: deephyper-analytics json [-h] {best} ...
positional arguments:
{best}      Kind of analytics.
    best      Select the best n search_spaces and save them into a JSON file.

when we need something like

deephyper-analytics json best -n 100 -p json_file_name.json

This may confuse new users.

Another issue is this command

deephyper-analytics parse deephyper.log

should be replaced with

deephyper-analytics parse $(pwd)/deephyper.log

and it should be ensured that the Balsam database is active before running it as well to compute the workload appropriately.

[BUG] Error when a starting point is given with constraints

Describe the bug

  • Parameters are integer types and constraints are defined among them and set a starting point
nhid = Problem.add_hyperparameter(name='Nhid', value=(1,100),default_value=50) # Number of hidden neurons, used only by N2 and N3
nhid2 = Problem.add_hyperparameter(name='Nhid2', value=(1,100),default_value=None) # Number of hidden neurons, used only by N3 
Problem.add_condition(cs.InCondition(child=nhid, parent=network, values=['N2','N3']))
Problem.add_condition(cs.InCondition(child=nhid2, parent=network, values=['N3']))
Problem.add_starting_point(
    network='N2',
    Nhid=50,
    Nhid2=None,
...

-Run deephyer with ray tuster, the following error occurred after Requested eval x: .

2021-03-07 14:11:58|23998|INFO|deephyper.evaluator.evaluate:293] New eval finished: {"batch_size": 100, ..., "Nhid": 50, "Nhid2": null, "seed": 4247772107} --> 0.8573999999999999
2021-03-07 14:11:58|23998|INFO|deephyper.evaluator.evaluate:304] Requested eval x: {'batch_size': 100, ..., 'Nhid': 50, 'Nhid2': None, 'seed': 4247772107} y: 0.8573999999999999
2021-03-07 14:11:58|23998|INFO|deephyper.search.hps.ambs:141] >>> {"type": "env_stats", "num_cache_used": 0}
2021-03-07 14:11:58|23998|ERROR|deephyper:72] Uncaught exception:
Traceback (most recent call last):
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/deephyper/search/hps/ambs.py", line 231, in <module>
    search.main()
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/deephyper/search/hps/ambs.py", line 156, in main
    new_X = self.opt.ask(n_points=len(new_results))
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 443, in ask
    opt._tell(x, y_lie)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 573, in _tell
    est.fit(Xtt, self.yi)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 305, in fit
    accept_sparse="csc", dtype=DTYPE)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/sklearn/base.py", line 433, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/sklearn/utils/validation.py", line 821, in check_X_y
    estimator=estimator)
  File "/blues/gpfs/home/jkoo/.conda/envs/dh-env2/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f

To Reproduce
Steps to reproduce the behavior is presented as above: defining integer parameters, constraints among them, and a starting point.

Desktop (please complete the following information):

  • OS: CentOS Linux
  • System: lcrc-bebop
  • Python version 3.7
  • DeepHyper Version 0.2.1

Additional context
We assume the error is because of the incoherence in interfaces between Skopt and Configspace when a starting point is provided with constraints.

[BUG] GPU memory overload

Describe the bug
When performing a HPS on multiple GPUs (i.e. 1 Ray task per GPU), after each model training & evaluation the memory is not flushed. As an effect, in each HPS-point the GPU memory accumulates. Eventually, the user will end up without any memory left before the HPS converge.

To Reproduce

  1. Use tensorflow-gpu.
  2. Initialize as many Ray tasks as your GPUs. Described here, but use GPU instead of CPU.
  3. On your model_run.py configure a dynamic GPU memory allocation by using tf.config.experimental.set_memory_growth as described here.
  4. You should be ready! Execute your GPU-based HPS with:
deephyper hps ambs --problem <myProblem> --run <myRun>

Expected behavior
I would expect after each HPS-point training & evaluation the GPU memory to be flushed. In this way I could run as many HPS-points as needed without accumulating the memory load of the previous point.

Desktop:

  • OS: Ubuntu 18.04.4 LTS
  • System: Local machine
  • Python version: 3.7.7
  • DeepHyper Version: 0.1.11
  • TensorFlow Version: 2.2.0 (GPU)

Additional context
This is a follow up of the issue #40. After some web investigation, the problem seems to be related with the TensorFlow behavior and is reported here.
I tried the solution using multiprocessing proposed here but it didn't work for me. I get the error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1045] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED: initialization error; GPU dst: 0x7f7f63000d00; host src: 0x5614a238a040; size: 8=0x8

I note however, people are commenting that the solution is not working for TF 2.2 on RTX cards, which is exactly my setup. It would be great if someone can cross-check in different cards.

AddByProjecting axis argument

The current AddByProjecting operation does not have an option for the axis of projection. This would be beneficial for LSTM/CNN type operations involving skip connections.

I suggest the following update to the __init__

    def __init__(self, search_space, stacked_nodes=None, activation=None, axis=-1):
        self.search_space = search_space
        self.node = None # current_node of the operation
        self.stacked_nodes = stacked_nodes
        self.activation = activation
        self.axis = axis

and the following update to this line
proj_size = values[0].get_shape()[self.axis]

With these, I am able to make search spaces of LSTMs as shown in the attachment
sampled_neural_network

could not install the deephyper.

I am installing deephyper using, pip install deephper, but i am getting error:

Running setup.py clean for yarl
 Building wheel for multidict (PEP 517) ... error
 Complete output from command /usr/bin/python3 /usr/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmp0rmzbrl6:
 *********************
 * Accelerated build *
 *********************
 running bdist_wheel
 running build
 running build_py
 creating build
 creating build/lib.linux-x86_64-3.6
 creating build/lib.linux-x86_64-3.6/multidict
 copying multidict/_compat.py -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/__init__.py -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/_multidict_py.py -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/_abc.py -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/_multidict_base.py -> build/lib.linux-x86_64-3.6/multidict
 running egg_info
 writing multidict.egg-info/PKG-INFO
 writing dependency_links to multidict.egg-info/dependency_links.txt
 writing top-level names to multidict.egg-info/top_level.txt
 adding license file 'LICENSE' (matched pattern 'LICENSE')
 reading manifest file 'multidict.egg-info/SOURCES.txt'
 reading manifest template 'MANIFEST.in'
 warning: no previously-included files matching '*.pyc' found anywhere in distribution
 warning: no previously-included files found matching 'multidict/_multidict.html'
 warning: no previously-included files found matching 'multidict/*.so'
 warning: no previously-included files found matching 'multidict/*.pyd'
 warning: no previously-included files found matching 'multidict/*.pyd'
 no previously-included directories found matching 'docs/_build'
 writing manifest file 'multidict.egg-info/SOURCES.txt'
 copying multidict/__init__.pyi -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/_multidict.c -> build/lib.linux-x86_64-3.6/multidict
 copying multidict/py.typed -> build/lib.linux-x86_64-3.6/multidict
 creating build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/defs.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/dict.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/istr.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/iter.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/pair_list.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 copying multidict/_multilib/views.h -> build/lib.linux-x86_64-3.6/multidict/_multilib
 running build_ext
 building 'multidict._multidict' extension
 creating build/temp.linux-x86_64-3.6
 creating build/temp.linux-x86_64-3.6/multidict
 gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python3.6m -c multidict/_multidict.c -o build/temp.linux-x86_64-3.6/multidict/_multidict.o -O2 -std=c99 -Wall -Wsign-compare -Wconversion -fno-strict-aliasing -pedantic
 multidict/_multidict.c:1:20: fatal error: Python.h: No such file or directory
  #include "Python.h"
                     ^
 compilation terminated.
 error: command 'gcc' failed with exit status 1
 
 ----------------------------------------
 Failed building wheel for multidict
 Running setup.py clean for multidict
Failed to build yarl multidict
Could not build wheels for yarl, multidict which use PEP 517 and cannot be installed directly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.