Giter Site home page Giter Site logo

Comments (1)

Kait0 avatar Kait0 commented on September 27, 2024

1-1. You forgot to freeze the batch norm weights in the encoder. Batch norm weights work differently than the other since they are not updated with gradients.

It is a bit tricky to freeze batch norms.
Methods like detr use custom implementations of batch norm to freeze the weights
https://github.com/facebookresearch/detr/blob/main/models/backbone.py
Other people recommend calling module.eval() on the batch norm modules.
https://discuss.pytorch.org/t/how-to-freeze-bn-layers-while-training-the-rest-of-network-mean-and-var-wont-freeze/89736/10

It is important for these pre-trained weights because their training code had some bugs and the batch norms were not trained properly (so now if you do train them you change the network quite a bit).

You could use these weights instead https://drive.google.com/file/d/1CeWcADEOf4DoPywxaWmKIGSpvUDTuLRK/view?usp=sharing were the batch norms were trained (the models have similar performance).

right_dataset_23_11

Seems like a bad idea to me to finetune the model only on right turns. It might forget how to turn left for example.
If you want to fine-tune with less data overall you could use the whole dataset with less epochs.

Your figures are broken I can't see them.

Single GPU; I didn't modify hyper-parameters such as learning rate or optimizer (kept the code as is).

You did implicitly change the batch size. In pytorch batch size is set per GPU, so if you reduce the number of GPUs from 8->1 the batch size will be 8x smaller. We trained with 2080ti gpus which have ~11GB of memory. If your GPU has more you could simply increase batch size. If not a common trick is to reduce learning rate proportionally (8x in this case), as you will do more gradient steps overall with a smaller batch size and the smaller learning rate might counter the noisier gradient.

I used transfer learning to maintain the visual and spatial capabilities of the Transfuser model while updating the parameters involved in waypoint prediction to fit the new dataset. Is the research design of "only unfreezing some parameters of the Transfuser pre-trained model for training" fundamentally flawed?

No I think freezing just didn't work as intended. Finetuning on top of frozen layers should work (if the trained layers are at the end of the network).

If that is not the case, then following point 1, I set wp_only = 1 to consider only the waypoint ...

Same problem as 1.

When evaluating with the pre-trained model, the results are good, but when re-training and performing additional training, the results deteriorate. Could this be due to using fewer data samples or different training environments compared to you?

See above, you shouldn't just train on right turns.

from transfuser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.