Giter Site home page Giter Site logo

hypertorch's People

Contributors

bamos avatar lucfra avatar prolearner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hypertorch's Issues

Hyperparameter Optimization on MLP

Hello,
I am an undergraduate student who is trying to perform hyperparameter optimization on a simple MLP with the MNIST dataset. I started to build from the code in logistic_regression.ipynb, but I found that the conversion is not very trivial. Do you mind to provide me some hint for doing that? Thanks a lot!

Reproducing results of the paper with iMAML

Hi,

Thanks so much for your work, it's very nice and useful.
I am at the moment struggling with trying to reproduce the results of the iMAML paper anyway I can (I tried with the original implementation, see issue here and I am developing my own atm).
I tried with your implementation and got the following results after quite some outer steps:

MT k=37500 (1.726s F: 0.877s, B: 0.847s) Val Loss: 1.80e-02, Val Acc: 99.50.
Test loss 1.04e-01 +- 1.51e-01: Test acc: 96.71 +- 4.27e+00 (mean +- std over 1000 tasks).
MT k=37510 (1.758s F: 0.875s, B: 0.880s) Val Loss: 3.04e-02, Val Acc: 99.33.
MT k=37520 (1.767s F: 0.877s, B: 0.888s) Val Loss: 2.14e-02, Val Acc: 99.42.
MT k=37530 (1.791s F: 0.880s, B: 0.910s) Val Loss: 4.65e-02, Val Acc: 98.33.
MT k=37540 (1.814s F: 0.882s, B: 0.929s) Val Loss: 3.73e-02, Val Acc: 98.50.
MT k=37550 (1.715s F: 0.877s, B: 0.835s) Val Loss: 2.39e-02, Val Acc: 99.17.
MT k=37560 (1.709s F: 0.873s, B: 0.834s) Val Loss: 2.13e-02, Val Acc: 99.08.
MT k=37570 (1.724s F: 0.877s, B: 0.845s) Val Loss: 4.14e-02, Val Acc: 98.17.
MT k=37580 (1.742s F: 0.881s, B: 0.858s) Val Loss: 6.90e-02, Val Acc: 98.08.
MT k=37590 (1.697s F: 0.879s, B: 0.815s) Val Loss: 2.95e-02, Val Acc: 98.67.
MT k=37600 (1.742s F: 0.874s, B: 0.865s) Val Loss: 2.82e-02, Val Acc: 99.08.
MT k=37610 (1.784s F: 0.877s, B: 0.904s) Val Loss: 4.80e-02, Val Acc: 98.50.
MT k=37620 (1.695s F: 0.877s, B: 0.816s) Val Loss: 5.39e-02, Val Acc: 98.50.
MT k=37630 (1.760s F: 0.873s, B: 0.885s) Val Loss: 6.28e-02, Val Acc: 97.92.
MT k=37640 (1.737s F: 0.880s, B: 0.855s) Val Loss: 2.16e-02, Val Acc: 99.33.
MT k=37650 (1.759s F: 0.880s, B: 0.877s) Val Loss: 2.55e-02, Val Acc: 99.08.
MT k=37660 (1.707s F: 0.877s, B: 0.828s) Val Loss: 2.95e-02, Val Acc: 99.08.
MT k=37670 (1.808s F: 0.873s, B: 0.933s) Val Loss: 3.21e-02, Val Acc: 98.92.
MT k=37680 (1.821s F: 0.875s, B: 0.944s) Val Loss: 6.20e-02, Val Acc: 97.67.
MT k=37690 (1.737s F: 0.877s, B: 0.858s) Val Loss: 4.85e-02, Val Acc: 98.58.

As you can see the test accuracy is quite below that reported in the paper (96.7% vs 99.1%) and the std is very high (4.27 vs 0.35).

I was wondering if you had managed to reproduce the results of the original paper.

I also had a few questions regarding your implementation choices:

  • you use a batch size of 16. I saw that the original implementation uses 32. How did you chose that number?
  • Could you explain what you mean here? In particular I am not sure what approximation of the paper you are referring to?
  • You use the batch norm after the activation and the pooling, unlike the original implementation, why?
  • Again a question regarding the difference between your implementation and the original: you initialize linear weights at zero, why?

Thanks so much for taking the time to read me it's very much appreciated.

Cheers!

Optimizers of inner problem and fixed point maps in approximate implicit hypergradient computations do not necessarily match

In some of the examples (e.g. in iMAML) I notice that the inner optimization corresponds to a different fixed point map compared to the one used in the approximate implicit hypergradient function call (e.g. both use GradientDescent steps, but have different learning rates). Is it required that the same fixed point map is used at both places, or is there no such necessity? I see in the paper that the approximate implicit hypergradient does not use information of the trajectory to the approximate solution of the inner problem (it only depends on the approximate solution itself), however I'm confused with the semantics of the fp_map in the implementation of the implicit hypergradient functions.

Add suport for unused variables in the graph

The variables not used in the graph are not supported and gives a runtime error - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Modification for the feature-head setting of meta-learning?

Thanks for the excellent hyper-grad implementation! I read the iMAML example, but in MAML, the whole network is the meta-model, and in the feature-head setting, we have a common feature extractor as the meta-model, while a task-specific head as inner variables. What are the major modifications to the iMAML example for this scenario?

In particular, how to pass the meta_model argument to the Task class? Should we still send the whole (feature+head) as meta_model, while specify params to be the parameter of head, and hparams to be the parameter of feature? A simple example pointing out what should be changed would be perfect. Thanks!

imaml.py issue regarding inner_loop solver

Hi,

I was going through imaml.py example code that you have given.
I didn't understand the benefit of passing meta_model.parameters() and params as arguments to the inner_loop function here and here. Because they are similar datawise and in fact, this makes the regularization part in train_loss_f redundant since it is always zero.
0.5 * self.reg_param * self.bias_reg_f(hparams, params) -> 0 always
Can you please explain?

support for custom datasets

Hi, thanks for your great job. I wonder if your work support for custom datasets?
I would like to train on open-datasets and then test on my own datasets.
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.