maxim5 / hyper-engine Goto Github PK

Python library for Bayesian hyper-parameters optimization

Home Page: https://pypi.python.org/pypi/hyperengine

License: Apache License 2.0

Python 100.00%

machine-learning deep-learning tensorflow hyperparameter-optimization neural-network data-science python bayesian-optimization gaussian-processes convolutional-neural-networks

hyper-engine's Introduction

Hyper-parameters Tuning for Machine Learning

Overview
Features
Bayesian Optimization

Overview

About

HyperEngine is a toolbox for model selection and hyper-parameters tuning. It aims to provide most state-of-the-art techniques via intuitive API and with minimum dependencies. HyperEngine is not a framework, which means it doesn't enforce any structure or design to the main code, thus making integration local and non-intrusive.

Installation

pip install hyperengine

Dependencies:

six, numpy, scipy
tensorflow (optional)
matplotlib (optional, only for development)

Compatibility:

https://travis-ci.org/maxim5/hyper-engine.svg?branch=master

Python 2.7, 3.5, 3.6

License:

Apache 2.0

HyperEngine is designed to be ML-platform agnostic, but currently provides only simple TensorFlow binding.

How to use

Adapting your code to HyperEngine usually boils down to migrating hard-coded hyper-parameters to a dictionary (or an object) and giving names to particular tensors.

Before:

def my_model():
  x = tf.placeholder(...)
  y = tf.placeholder(...)
  ...
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
  ...

After:

def my_model(params):
  x = tf.placeholder(..., name='input')
  y = tf.placeholder(..., name='label')
  ...
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=params['learning_rate'])
  ...

# Now can run the model with any set of hyper-parameters

The rest of the integration code is isolated and can be placed in the main script. See the examples of hyper-parameter tuning in examples package.

Features

Straight-forward specification

The crucial part of hyper-parameter tuning is the definition of a domain over which the engine is going to optimize the model. Some variables are continuous (e.g., the learning rate), some variables are integer values in a certain range (e.g., the number of hidden units), some variables are categorical and represent architecture knobs (e.g., the choice of non-linearity).

You can define all these variables and their ranges in numpy-like fashion:

hyper_params_spec = {
  'optimizer': {
    'learning_rate': 10**spec.uniform(-3, -1),          # makes the continuous range [0.1, 0.001]
    'epsilon': 1e-8,                                    # constants work too
  },
  'conv': {
    'filters': [[3, 3, spec.choice(range(32, 48))],     # an integer between [32, 48]
                [3, 3, spec.choice(range(64, 96))],     # an integer between [64, 96]
                [3, 3, spec.choice(range(128, 192))]],  # an integer between [128, 192]
    'activation': spec.choice(['relu','prelu','elu']),  # a categorical range: 1 of 3 activations
    'down_sample': {
      'size': [2, 2],
      'pooling': spec.choice(['max_pool', 'avg_pool'])  # a categorical range: 1 of 2 pooling methods
    },
    'residual': spec.random_bool(),                     # either True or False
    'dropout': spec.uniform(0.75, 1.0),                 # a uniform continuous range
  },
}

Note that 10**spec.uniform(-3, -1) is not the same distribution as spec.uniform(0.001, 0.1) (though they both define the same range of values). In the first case, the whole logarithmic spectrum (-3, -1) is equally probable, while in the second case, small values around 0.001 are much less likely than the values around the mean 0.0495. Specifying the following domain range for the learning rate - spec.uniform(0.001, 0.1) - will likely skew the results towards higher learning rates. This outlines the importance of random variable transformations and arithmetic operations.

Exploration-exploitation trade-off

Machine learning model selection is expensive. Each model evaluation requires full training from scratch and may take minutes to hours to days, depending on the problem complexity and available computational resources. HyperEngine provides the algorithm to explore the space of parameters efficiently, focus on the most promising areas, thus converge to the maximum as fast as possible.

Example 1: the true function is 1-dimensional, f(x) = x * sin(x) (black curve) on [-10, 10] interval. Red dots represent each trial, red curve is the Gaussian Process mean, blue curve is the mean plus or minus one standard deviation. The optimizer randomly chose the negative mode as more promising.

Example 2: the 2-dimensional function f(x, y) = (x + y) / ((x - 1) ** 2 - sin(y) + 2) (black surface) on [0,9]x[0,9] square. Red dots represent each trial, the Gaussian Process mean and standard deviations are not shown for simplicity. Note that to achieve the maximum both variables must be picked accurately.

The code for these and others examples is here.

Learning Curve Estimation

HyperEngine can monitor the model performance during the training and stop early if it's learning too slowly. This is done via learning curve prediction. Note that this technique is compatible with Bayesian Optimization, since it estimates the model accuracy after full training - this value can be safely used to update Gaussian Process parameters.

Example code:

curve_params = {
  'burn_in': 30,                # burn-in period: 30 models
  'min_input_size': 5,          # start predicting after 5 epochs
  'value_limit': 0.80,          # stop if the estimate is less than 80% with high probability
}
curve_predictor = LinearCurvePredictor(**curve_params)

Currently there is only one implementation of the predictor, LinearCurvePredictor, which is very efficient, but requires relatively large burn-in period to predict model accuracy without flaws.

Note that learning curves can be reused between different models and works quite well for the burn-in, so it's recommended to serialize and load curve data via io_save_dir and io_load_dir parameters.

Bayesian Optimization

Implements the following methods:

Probability of improvement (See H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. J. Basic Engineering, 86:97–106, 1964.)
Expected Improvement (See J. Mockus, V. Tiesis, and A. Zilinskas. Toward Global Optimization, volume 2, chapter The Application of Bayesian Methods for Seeking the Extremum, pages 117–128. Elsevier, 1978)
Upper Confidence Bound
Mixed / Portfolio strategy
Naive random search.

PI method prefers exploitation to exploration, UCB is the opposite. One of the best strategies we've seen is a mixed one: start with high probability of UCB and gradually decrease it, increasing PI probability.

Default kernel function used is RBF kernel, but it is extensible.

hyper-engine's People

Contributors

Stargazers

Watchers

hyper-engine's Issues

TensorFlow is not optional

tensorflow (optional)

__init__ -> impl.tensorflow.__init__ -> impl.tensorflow.tensorflow_model_io -> tensorflow

Command "python setup.py egg_info" failed with error code 1 when install hyper-engine

system:
platform: win10 1709
anaconda: 5.1.0
python:3.6.4
tensorflow:1.8
The following error occurred while installing the hyper-engine:

(base) C:\Users\Samantha>pip install hyperengine
Collecting hyperengine
  Using cached https://files.pythonhosted.org/packages/d7/de/cc05d99e18ddb74012bf5d5ec8f7932fd5d667a5373c576d26dfad6f598a/hyperengine-0.1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Samantha\AppData\Local\Temp\pip-install-7qdfd7at\hyperengine\setup.py", line 21, in <module>
        long_description = read('README.rst'),
      File "C:\Users\Samantha\AppData\Local\Temp\pip-install-7qdfd7at\hyperengine\setup.py", line 9, in read
        return open(os.path.join(os.path.dirname(__file__), file_)).read()
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 8018: illegal multibyte sequence

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\Samantha\AppData\Local\Temp\pip-install-7qdfd7at\hyperengine\

Any idea about this? Thanks for your time!

Only support Tensorflow？

Hello， thanks for your great work, I found all examples implemented in TF codes, Now I want to optimize hyper-parameters on Pytorch. How do I use it?

How to feed our own data?

How can I feed my own data instead of using mnist?
Like the example in this post
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/build_an_image_dataset.ipynb
Thanks for any help!

Using fine tuned CNN model from example without hyper-engine

Hello! First of all, thank you for your work, it is really helpful! I am experimenting with code from 1_3_saving_best_models_mnist.py . Tell me, please, how can I use fine-tuned models directly from Tensorflow? I am asking, because I don't see any Tensorflow variable declaration in this code. How to do that easily?

XA and XB must have the same number of columns (i.e. feature dimension.)

hello,

i am trying to implement your example with 2_1_cnn_mnist.py.
and i want to add some layers.

like this:
layer = conv_layer(x, params.conv[0])
layer = conv_layer(layer, params.conv[1])
layer = conv_layer(layer, params.conv[2])
layer = dense_layer(layer, params.dense)
logits = tf.layers.dense(inputs=layer, units=10)

conv = [
# Layer 1
hype.spec.new(
filter_num = hype.spec.choice([32]),
filter_size = [hype.spec.choice([5])] * 2,
activation = hype.spec.choice(ACTIVATIONS.keys()),
batch_norm = hype.spec.random_bool(),
dropout = hype.spec.uniform(0.0, 0.0),
),
# Layer 2
hype.spec.new(
filter_num = hype.spec.choice([32]),
filter_size = [hype.spec.choice([5])] * 2,
activation = hype.spec.choice(ACTIVATIONS.keys()),
batch_norm = hype.spec.random_bool(),
dropout = hype.spec.uniform(0.0, 0.0),
),
# Layer 3
hype.spec.new(
filter_num = hype.spec.choice([64]),
filter_size = [hype.spec.choice([5])] * 2,
activation = hype.spec.choice(ACTIVATIONS.keys()),
batch_norm = hype.spec.random_bool(),
dropout = hype.spec.uniform(0.0, 0.0),
),
]

i just add one layer and one params.con[]

so i got this issue:
Traceback (most recent call last):
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/examples/2_1_cnn_mnist_try.py", line 118, in
tuner.tune()
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/model/hyper_tuner.py", line 45, in tune
point = self._strategy.next_proposal()
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/bayesian/strategy.py", line 150, in next_proposal
return self._maximizer.compute_max_point()
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/bayesian/maximizer.py", line 41, in compute_max_point
values = self._utility.compute_values(batch)
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/bayesian/utility.py", line 132, in compute_values
mu, sigma = self.mean_and_std(batch)
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/bayesian/utility.py", line 64, in mean_and_std
k_star = np.swapaxes(self.kernel.compute(self.points, batch), 0, 1)
File "/Users/dongyijie/Downloads/hyper-engine/hyperengine/bayesian/kernel.py", line 57, in compute
dist = cdist(batch_x, batch_y, **self._params)
File "/Users/dongyijie/Downloads/hyper-engine/venv/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2721, in cdist
raise ValueError('XA and XB must have the same number of columns '
ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)

so how can i fix it?
thank you so much.

Selecting the method

Hi! Thank you for making a great library.

I was wondering how can I specify different methods (utilities) when optimizing?
It seems that "UpperConfidenceBound" is the default option and I tried to specify something like strategy_params = {
'utility': 'ExpectedImprovement'
}

and some other attempts but without much success. And in the source code, I have not found how I can do this. Could you kindly provide some explanation on this?

Thank you very much in advance!

Black box optimizer example

import hyperengine
import numpy as np

def rosenbrock(hyperparams):
	return (hyperparams["x"]-1)**2 + 10*(hyperparams["x"]**2-hyperparams["y"])**2

class BlackBoxSolver:
	def __init__(self, func):
		self.func=func
		self._val_loss_curve = []

	def train(self):
		loss=self.func()
		self._val_loss_curve.append(loss)
		return self._reducer(self._val_loss_curve)
	
	def _reducer(self, *args, **kwargs):
		return np.min(*args, **kwargs)
	
	def terminate(self):
		pass

def solver_generator(hyperparams):
	return BlackBoxSolver(partial(rosenbrock, hyperparams))

class IterLimitedHyperTuner(hyperengine.HyperTuner):
	def __init__(self, hyper_params_spec, solver_generator, iterLimit, *args, **kwargs):
		j=0
		def solver_generator_limited(hyperparams):
			nonlocal j
			if j<iterLimit:
				j+=1
				return solver_generator(hyperparams)
			else:
				raise StopIteration()
		
		super().__init__(hyper_params_spec, solver_generator_limited, *args, **kwargs)
	
	def tune(self):
		try:
			super().tune()
		except StopIteration as ex:
			minLossPointNum=np.argmin(self.strategy.values)
			return dict(zip(self.parsed._spec.keys(), self.strategy.points[minLossPointNum]))


spec=hp.new({
	"x": hp.uniform(-10, 10),
	"y": hp.uniform(-10, 10),
})
tuner=IterLimitedHyperTuner(spec, solver_generator, iterLimit=10, strategy='bayesian') #'portfolio'
tuner.tune()

Missing usage examples

I found this project from your post on stackoverflow. I'd like to experiment with it, but I'm unclear on how to use it - there isn't any obvious documentation describing the use of hyper-engine.

The README says hyper-engine provides a tensorflow binding. Depending on what that binding entails, it may or may not accommodate my use-case.

In which method do the system choose hyper-parameters?

I am just a little confused.

When i changed the neural network setting with concrete parameters, because i just only want to optimize learning rate, the second training will use a new learning rate to train the neural network.

For example, after the first training, i got a final accuracy=0.8172.
And then, it will go for a second training with a new learning rate. But the final accuracy maybe be very bad.
I thought, after one training, the second training will use the old results to train this model, and then get a better result. But actually, it may get a worse result.

So i want know, how do this system choose the next learning rate?
Thank you so much.

Curve Predictor

Hi , i am from Stackoverflow. I am trying to understand your implementation from the paper " Extrapolating of Learning Curve .. ". As far as i understand , they use 11 different mathematic model to fit the learning curve and then predict with monte carlo estimator. But i can't find in your code where you built these model and where the monte carlo calculation are. Can you please clarify it ? Thanks