prolearner / hypertorch Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 16.0 721 KB

License: MIT License

Python 100.00%

hypertorch's People

Contributors

Stargazers

Watchers

Forkers

giodiro skeletondyh ihaeyong minimario myelinio reinholdm jikaiyi sowmaster yaoyugua dionman bamos sk413025 sungfeng-huang zixind xiaoyuwang2821 zaccharieramzi

hypertorch's Issues

Hyperparameter Optimization on MLP

Hello,
I am an undergraduate student who is trying to perform hyperparameter optimization on a simple MLP with the MNIST dataset. I started to build from the code in logistic_regression.ipynb, but I found that the conversion is not very trivial. Do you mind to provide me some hint for doing that? Thanks a lot!

Could you put a license in this repo?

Hi,

Could you put a license (e.g., a MIT license) in this repo so that it's officially an open-source code repo? I need to use your code for research purposes.

Thanks a lot!

What does "stochastic" mean here?

As titled, what does "stochastic" mean in hg.CG and hg.fix_point? Does this mean the inner loop can be updated with SGD?

hypertorch/hypergrad/hypergradients.py

Line 143 in 2482634

stochastic=False) -> List[Tensor]:

Reproducing results of the paper with iMAML

Hi,

Thanks so much for your work, it's very nice and useful.
I am at the moment struggling with trying to reproduce the results of the iMAML paper anyway I can (I tried with the original implementation, see issue here and I am developing my own atm).
I tried with your implementation and got the following results after quite some outer steps:

MT k=37500 (1.726s F: 0.877s, B: 0.847s) Val Loss: 1.80e-02, Val Acc: 99.50.
Test loss 1.04e-01 +- 1.51e-01: Test acc: 96.71 +- 4.27e+00 (mean +- std over 1000 tasks).
MT k=37510 (1.758s F: 0.875s, B: 0.880s) Val Loss: 3.04e-02, Val Acc: 99.33.
MT k=37520 (1.767s F: 0.877s, B: 0.888s) Val Loss: 2.14e-02, Val Acc: 99.42.
MT k=37530 (1.791s F: 0.880s, B: 0.910s) Val Loss: 4.65e-02, Val Acc: 98.33.
MT k=37540 (1.814s F: 0.882s, B: 0.929s) Val Loss: 3.73e-02, Val Acc: 98.50.
MT k=37550 (1.715s F: 0.877s, B: 0.835s) Val Loss: 2.39e-02, Val Acc: 99.17.
MT k=37560 (1.709s F: 0.873s, B: 0.834s) Val Loss: 2.13e-02, Val Acc: 99.08.
MT k=37570 (1.724s F: 0.877s, B: 0.845s) Val Loss: 4.14e-02, Val Acc: 98.17.
MT k=37580 (1.742s F: 0.881s, B: 0.858s) Val Loss: 6.90e-02, Val Acc: 98.08.
MT k=37590 (1.697s F: 0.879s, B: 0.815s) Val Loss: 2.95e-02, Val Acc: 98.67.
MT k=37600 (1.742s F: 0.874s, B: 0.865s) Val Loss: 2.82e-02, Val Acc: 99.08.
MT k=37610 (1.784s F: 0.877s, B: 0.904s) Val Loss: 4.80e-02, Val Acc: 98.50.
MT k=37620 (1.695s F: 0.877s, B: 0.816s) Val Loss: 5.39e-02, Val Acc: 98.50.
MT k=37630 (1.760s F: 0.873s, B: 0.885s) Val Loss: 6.28e-02, Val Acc: 97.92.
MT k=37640 (1.737s F: 0.880s, B: 0.855s) Val Loss: 2.16e-02, Val Acc: 99.33.
MT k=37650 (1.759s F: 0.880s, B: 0.877s) Val Loss: 2.55e-02, Val Acc: 99.08.
MT k=37660 (1.707s F: 0.877s, B: 0.828s) Val Loss: 2.95e-02, Val Acc: 99.08.
MT k=37670 (1.808s F: 0.873s, B: 0.933s) Val Loss: 3.21e-02, Val Acc: 98.92.
MT k=37680 (1.821s F: 0.875s, B: 0.944s) Val Loss: 6.20e-02, Val Acc: 97.67.
MT k=37690 (1.737s F: 0.877s, B: 0.858s) Val Loss: 4.85e-02, Val Acc: 98.58.

As you can see the test accuracy is quite below that reported in the paper (96.7% vs 99.1%) and the std is very high (4.27 vs 0.35).

I was wondering if you had managed to reproduce the results of the original paper.

I also had a few questions regarding your implementation choices:

you use a batch size of 16. I saw that the original implementation uses 32. How did you chose that number?
Could you explain what you mean here? In particular I am not sure what approximation of the paper you are referring to?
You use the batch norm after the activation and the pooling, unlike the original implementation, why?
Again a question regarding the difference between your implementation and the original: you initialize linear weights at zero, why?

Thanks so much for taking the time to read me it's very much appreciated.

Cheers!

Optimizers of inner problem and fixed point maps in approximate implicit hypergradient computations do not necessarily match

In some of the examples (e.g. in iMAML) I notice that the inner optimization corresponds to a different fixed point map compared to the one used in the approximate implicit hypergradient function call (e.g. both use GradientDescent steps, but have different learning rates). Is it required that the same fixed point map is used at both places, or is there no such necessity? I see in the paper that the approximate implicit hypergradient does not use information of the trajectory to the approximate solution of the inner problem (it only depends on the approximate solution itself), however I'm confused with the semantics of the fp_map in the implementation of the implicit hypergradient functions.

Add suport for unused variables in the graph

The variables not used in the graph are not supported and gives a runtime error - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Modification for the feature-head setting of meta-learning?

Thanks for the excellent hyper-grad implementation! I read the iMAML example, but in MAML, the whole network is the meta-model, and in the feature-head setting, we have a common feature extractor as the meta-model, while a task-specific head as inner variables. What are the major modifications to the iMAML example for this scenario?

In particular, how to pass the meta_model argument to the Task class? Should we still send the whole (feature+head) as meta_model, while specify params to be the parameter of head, and hparams to be the parameter of feature? A simple example pointing out what should be changed would be perfect. Thanks!

imaml.py issue regarding inner_loop solver

Hi,

I was going through imaml.py example code that you have given.
I didn't understand the benefit of passing meta_model.parameters() and params as arguments to the inner_loop function here and here. Because they are similar datawise and in fact, this makes the regularization part in train_loss_f redundant since it is always zero.
0.5 * self.reg_param * self.bias_reg_f(hparams, params) -> 0 always
Can you please explain?

support for custom datasets

Hi, thanks for your great job. I wonder if your work support for custom datasets?
I would like to train on open-datasets and then test on my own datasets.
Thank you!

prolearner / hypertorch Goto Github PK

hypertorch's People

Contributors

Stargazers

Watchers

Forkers

hypertorch's Issues

Hyperparameter Optimization on MLP

Could you put a license in this repo?

What does "stochastic" mean here?

Reproducing results of the paper with iMAML

Optimizers of inner problem and fixed point maps in approximate implicit hypergradient computations do not necessarily match

Add suport for unused variables in the graph

Modification for the feature-head setting of meta-learning?

imaml.py issue regarding inner_loop solver

support for custom datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent