prolearner / hypertorch Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello,
I am an undergraduate student who is trying to perform hyperparameter optimization on a simple MLP with the MNIST dataset. I started to build from the code in logistic_regression.ipynb, but I found that the conversion is not very trivial. Do you mind to provide me some hint for doing that? Thanks a lot!
Hi,
Could you put a license (e.g., a MIT license) in this repo so that it's officially an open-source code repo? I need to use your code for research purposes.
Thanks a lot!
As titled, what does "stochastic" mean in hg.CG and hg.fix_point? Does this mean the inner loop can be updated with SGD?
hypertorch/hypergrad/hypergradients.py
Line 143 in 2482634
Hi,
Thanks so much for your work, it's very nice and useful.
I am at the moment struggling with trying to reproduce the results of the iMAML paper anyway I can (I tried with the original implementation, see issue here and I am developing my own atm).
I tried with your implementation and got the following results after quite some outer steps:
MT k=37500 (1.726s F: 0.877s, B: 0.847s) Val Loss: 1.80e-02, Val Acc: 99.50.
Test loss 1.04e-01 +- 1.51e-01: Test acc: 96.71 +- 4.27e+00 (mean +- std over 1000 tasks).
MT k=37510 (1.758s F: 0.875s, B: 0.880s) Val Loss: 3.04e-02, Val Acc: 99.33.
MT k=37520 (1.767s F: 0.877s, B: 0.888s) Val Loss: 2.14e-02, Val Acc: 99.42.
MT k=37530 (1.791s F: 0.880s, B: 0.910s) Val Loss: 4.65e-02, Val Acc: 98.33.
MT k=37540 (1.814s F: 0.882s, B: 0.929s) Val Loss: 3.73e-02, Val Acc: 98.50.
MT k=37550 (1.715s F: 0.877s, B: 0.835s) Val Loss: 2.39e-02, Val Acc: 99.17.
MT k=37560 (1.709s F: 0.873s, B: 0.834s) Val Loss: 2.13e-02, Val Acc: 99.08.
MT k=37570 (1.724s F: 0.877s, B: 0.845s) Val Loss: 4.14e-02, Val Acc: 98.17.
MT k=37580 (1.742s F: 0.881s, B: 0.858s) Val Loss: 6.90e-02, Val Acc: 98.08.
MT k=37590 (1.697s F: 0.879s, B: 0.815s) Val Loss: 2.95e-02, Val Acc: 98.67.
MT k=37600 (1.742s F: 0.874s, B: 0.865s) Val Loss: 2.82e-02, Val Acc: 99.08.
MT k=37610 (1.784s F: 0.877s, B: 0.904s) Val Loss: 4.80e-02, Val Acc: 98.50.
MT k=37620 (1.695s F: 0.877s, B: 0.816s) Val Loss: 5.39e-02, Val Acc: 98.50.
MT k=37630 (1.760s F: 0.873s, B: 0.885s) Val Loss: 6.28e-02, Val Acc: 97.92.
MT k=37640 (1.737s F: 0.880s, B: 0.855s) Val Loss: 2.16e-02, Val Acc: 99.33.
MT k=37650 (1.759s F: 0.880s, B: 0.877s) Val Loss: 2.55e-02, Val Acc: 99.08.
MT k=37660 (1.707s F: 0.877s, B: 0.828s) Val Loss: 2.95e-02, Val Acc: 99.08.
MT k=37670 (1.808s F: 0.873s, B: 0.933s) Val Loss: 3.21e-02, Val Acc: 98.92.
MT k=37680 (1.821s F: 0.875s, B: 0.944s) Val Loss: 6.20e-02, Val Acc: 97.67.
MT k=37690 (1.737s F: 0.877s, B: 0.858s) Val Loss: 4.85e-02, Val Acc: 98.58.
As you can see the test accuracy is quite below that reported in the paper (96.7% vs 99.1%) and the std is very high (4.27 vs 0.35).
I was wondering if you had managed to reproduce the results of the original paper.
I also had a few questions regarding your implementation choices:
Thanks so much for taking the time to read me it's very much appreciated.
Cheers!
In some of the examples (e.g. in iMAML) I notice that the inner optimization corresponds to a different fixed point map compared to the one used in the approximate implicit hypergradient function call (e.g. both use GradientDescent
steps, but have different learning rates). Is it required that the same fixed point map is used at both places, or is there no such necessity? I see in the paper that the approximate implicit hypergradient does not use information of the trajectory to the approximate solution of the inner problem (it only depends on the approximate solution itself), however I'm confused with the semantics of the fp_map
in the implementation of the implicit hypergradient functions.
The variables not used in the graph are not supported and gives a runtime error - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Thanks for the excellent hyper-grad implementation! I read the iMAML example, but in MAML, the whole network is the meta-model, and in the feature-head setting, we have a common feature extractor as the meta-model, while a task-specific head as inner variables. What are the major modifications to the iMAML example for this scenario?
In particular, how to pass the meta_model argument to the Task class? Should we still send the whole (feature+head) as meta_model, while specify params to be the parameter of head, and hparams to be the parameter of feature? A simple example pointing out what should be changed would be perfect. Thanks!
Hi,
I was going through imaml.py example code that you have given.
I didn't understand the benefit of passing meta_model.parameters() and params as arguments to the inner_loop function here and here. Because they are similar datawise and in fact, this makes the regularization part in train_loss_f redundant since it is always zero.
0.5 * self.reg_param * self.bias_reg_f(hparams, params) -> 0 always
Can you please explain?
Hi, thanks for your great job. I wonder if your work support for custom datasets?
I would like to train on open-datasets and then test on my own datasets.
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.