First off, thank you for building this! 3 questions regarding the two heads of the pol

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

lm_head and v_head, why re-initialize and why dropout? about trl HOT 4 CLOSED

huggingface commented on July 30, 2024

lm_head and v_head, why re-initialize and why dropout?

from trl.

Comments (4)

clam004 commented on July 30, 2024

So I did some research on my own and basically my first 2 questions can be answered by looking at the huggingface transformers repository: https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py

from trl.

danjohnvelasco commented on July 30, 2024

So I did some research on my own and basically my first 2 questions can be answered by looking at the huggingface transformers repository: https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py

Hi @clam004, do you mind explaining your answer/understanding on why they do it? Thanks!

from trl.

clam004 commented on July 30, 2024

@danjohnvelasco as long as you use the same name self.lm_head, when you load the pretrained model from the dictionary of parameters, these linear parameters will be replaced with the trained ones. So thats why the model still works (question 2). Also regarding question 3, I suspect somehow it doesnt matter, although Im not sure why, cause when I run this repo without the dropout layer, as expected, it behaves the same.

from trl.

lvwerra commented on July 30, 2024

Regarding 3 I agree and we moved the dropout before the linear layer in #70.

from trl.

lm_head and v_head, why re-initialize and why dropout? about trl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent