Giter Site home page Giter Site logo

Comments (6)

ili3p avatar ili3p commented on May 25, 2024 1
  1. Sure, hyperband is basically smart (for most of the time) early stopping. So you can use hyperband and HORD, I like this implementation of hyperband https://github.com/zygmuntz/hyperband. Or you can also implement simple early stopping in your training script, i.e. check the val or train error from time to time and decide when a hyperparameter set is not worth training anymore. However, hyperband does not work well when optimizing the learning rate or dropout rate, since these and possible other hyperparameters affect the training error curve. Low learning rate might be very bad at the beginning and thus will be stopped by hyperband, but the same low learning rate can give best final performance if just let to run the time it needs to. Same for dropout, networks without dropout quickly converge but to lower final performance than networks with dropout.

  2. I am not familiar enough with MaxLIPO+TR to comment, but from what I can read it seems it uses gradient information. I personally don't like this since the objective function of hyperparameter optimization is very spiky so gradients do not work well most of the time.

from hord.

ili3p avatar ili3p commented on May 25, 2024

This repo is only for reproducing the AAAI paper experiments. The optimization tool is at https://github.com/dme65/pySOT/ . You can use the code in this repo as a use case example and as a guideline on which combination of surrogate and search strategy works best for hyperparameter optimization.

from hord.

adrianog avatar adrianog commented on May 25, 2024

Thanks, clearer.

I have another question (apologies in advance if it is very basic, I'm relatively new to this)
Can I use pysot (or is there any code in HORD I can use for an inspiration) to define an heterogeneous, possibly nested / conditional parameter space, possibly across different models (using a variety of parameter types), or is it only useful for numeric parameters
E.g. see hyperopt example here: https://goo.gl/i1zynY

And if it is not currently possible, could it be reasonably implemented, or is there anything inherent in the model / implementation that would prevent such a param space form being usable with HORD.

In any case, thanks for the unique resource.

from hord.

ili3p avatar ili3p commented on May 25, 2024

I understand your question as:

Does HORD or pySOT support optimization of categorical, i.e. choices such as model type, or an activation function, and conditional parameters, such as parameters specific for the model type, or simply a number of layers and the number of nodes for each layer?

The answer is they do not support it. The search strategies are designed specifically for numerical optimization however they can be easily modified to support this.
For categorical parameters you can assign an integer number to each of the choices, e.g. ReLU is 1, tanh is 2, etc., and optimize this parameter as integer.
For conditional parameters, you optimize all possible parameters but ignore and not use the ones whose condition is not satisfied.

This should work since pySOT and HORD don't use gradient information so they do not require a comparison between the parameter values, i.e. 1 < 2 doesn't need to have a meaning. They work by exploring efficiently the parameters space, and the only assumption is that closeby points in the parameter space have similar objective function values.

The other way to do it is to run the optimization separately for each condition and category. For example, run one optimization for each model type and/or each activation function etc.

from hord.

adrianog avatar adrianog commented on May 25, 2024

Well, thanks for the explanation, it makes perfect sense.
Since we are at it I take advantage of it, and ask another couple of questions.

  1. I notice that hyperband leverages hyperopt to "sample" the runs and concentrate on the most promising to run fully - could HORD be plugged in for the param sampling rather than hyperopt? Am I missing something?

  2. How does HORD compare to MaxLIPO+TR, as implemented in Dlib? Do you have any comments in that respect?

I reiterate my thanks for your inputs thus far!

from hord.

adrianog avatar adrianog commented on May 25, 2024

RE: 2) I thought that was gradient-free as well i.e. see this comment from the author. The LIPO part is gradient-free (only relies on estimating K), and the "classic trust region method" (e.g. BOBYQA) is also derivative-free.

But yes, it does still use gradient information somehow, even though it never performs additional function evaluation to estimate the gradient. It will estimate k (for upper bound estimation) from the largest observed gradient thus far, so this bit you write seems to still be a problem: "the objective function of hyperparameter optimization is very spiky so gradients do not work well most of the time", as evaluating a two function points too close to an irregularity might cause the constant k to explode.

Furthermore, the author also seems to discourage using it for neural network hyperparam optimisation, probably because of that.

Would be interested in knowing your comments on this statement on that same page:
"I wouldn't attempt to optimize functions with more than 10s of parameters with a derivative free optimizer."

Thanks again, your comments have been invaluable so far.

from hord.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.