Giter Site home page Giter Site logo

dehb's Introduction

DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization

Installation

# from pypi
pip install dehb

# to run examples, install from github
git clone https://github.com/automl/DEHB.git
pip install -e DEHB  # -e stands for editable, lets you modify the code and rerun things

Tutorials/Example notebooks

To run PyTorch example: (note additional requirements)

python examples/03_pytorch_mnist_hpo.py \
     --min_budget 1 --max_budget 3 --verbose --runtime 60

Running DEHB in a parallel setting

DEHB has been designed to interface a Dask client. DEHB can either create a Dask client during instantiation and close/kill the client during garbage colleciton. Or a client can be passed as an argument during instantiation.

  • Setting n_workers during instantiation
    If set to 1 (default) then the entire process is a sequential run without invoking Dask.
    If set to >1 then a Dask Client is initialized with as many workers as n_workers.
    This parameter is ignored if client is not None.
  • Setting client during instantiation
    When None (default), the a Dask client is created using n_workers specified.
    Else, any custom configured Dask Client can be created and passed as the client argument to DEHB.

Using GPUs in a parallel run

Certain target function evaluations (especially for Deep Learning) requires computations to be carried out on GPUs. The GPU devices are often ordered by device ID and if not configured, all spawned worker processes access these devices in the same order and can either run out of memory, or not exhibit parallelism.

For n_workers>1 and when running on a single node (or local), the single_node_with_gpus can be passed to the run() call to DEHB. Setting it to False (default) has no effect on the default setup of the machine. Setting it to True will reorder the GPU device IDs dynamically by setting the environment variable CUDA_VISIBLE_DEVICES for each worker process executing a target function evaluation. The re-ordering is done in a manner that the first priority device is the one with the least number of active jobs assigned to it by that DEHB run.

To run the PyTorch MNIST example on a single node using 2 workers:

python examples/03_pytorch_mnist_hpo.py --min_budget 1 --max_budget 3 \
  --verbose --runtime 60 --n_workers 2 --single_node_with_gpus

Multi-node runs

Multi-node parallelism is often contingent on the cluster setup to be deployed on. Dask provides useful frameworks to interface various cluster designs. As long as the client passed to DEHB during instantiation is of type dask.distributed.Client, DEHB can interact with this client and distribute its optimisation process in a parallel manner.

For instance, Dask-CLI can be used to create a dask-scheduler which can dump its connection details to a file on a cluster node accessible to all processes. Multiple dask-worker can then be created to interface the dask-scheduler by connecting to the details read from the file dumped. Each dask-worker can be triggered on any remote machine. Each worker can be configured as required, including mapping to specific GPU devices.

Some helper scripts can be found here, that can be used as reference to run DEHB in a multi-node manner on clusters managed by SLURM. (not expected to work off-the-shelf)

To run the PyTorch MNIST example on a multi-node setup using 4 workers:

bash utils/run_dask_setup.sh -f dask_dump/scheduler.json -e env_name -n 4
sleep 5
python examples/03_pytorch_mnist_hpo.py --min_budget 1 --max_budget 3 \
  --verbose --runtime 60 --scheduler_file dask_dump/scheduler.json 

DEHB Hyperparameters

We recommend the default settings. The default settings were chosen based on ablation studies over a collection of diverse problems and were found to be generally useful across all cases tested. However, the parameters are still available for tuning to a specific problem.

The Hyperband components:

  • min_budget: Needs to be specified for every DEHB instantiation and is used in determining the budget spacing for the problem at hand.
  • max_budget: Needs to be specified for every DEHB instantiation. Represents the full-budget evaluation or the actual black-box setting.
  • eta: (default=3) Sets the aggressiveness of Hyperband's aggressive early stopping by retaining 1/eta configurations every round

The DE components:

  • strategy: (default=rand1_bin) Chooses the mutation and crossover strategies for DE. rand1 represents the mutation strategy while bin represents the binomial crossover strategy.
    Other mutation strategies include: {rand2, rand2dir, best, best2, currenttobest1, randtobest1}
    Other crossover strategies include: {exp}
    Mutation and crossover strategies can be combined with a _ separator, for e.g.: rand2dir_exp.
  • mutation_factor: (default=0.5) A fraction within [0, 1] weighing the difference operation in DE
  • crossover_prob: (default=0.5) A probability within [0, 1] weighing the traits from a parent or the mutant

To cite the paper or code

@inproceedings{awad-ijcai21,
  author    = {N. Awad and N. Mallik and F. Hutter},
  title     = {{DEHB}: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization},
  pages     = {2147--2153},
  title     = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {ijcai.org},
  editor    = {Z. Zhou},
  year      = {2021}
}

dehb's People

Contributors

neeratyoy avatar bouthilx avatar goktug97 avatar noorawad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.