Giter Site home page Giter Site logo

why is detach necessary about examples HOT 17 CLOSED

pytorch avatar pytorch commented on May 18, 2024 37
why is detach necessary

from examples.

Comments (17)

soumith avatar soumith commented on May 18, 2024 43

you are correct. it is done for speed, not correctness. The computation of gradients wrt the weights of netG can be fully avoided in the backward pass if the graph is detached where it is.

from examples.

dantp-ai avatar dantp-ai commented on May 18, 2024 34

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

from examples.

sunshineatnoon avatar sunshineatnoon commented on May 18, 2024 5

Missed detach when implementing dcgan in pytorch, and it gives me this error:

RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

from examples.

shiyuanyin avatar shiyuanyin commented on May 18, 2024 4

@Einstellung
hi
I have a question, the G model gradient update
首先有三个独立的网络,鉴别器网络D,生成器网络G和源网络S
1、鉴别器网络首先 输入:合并的值特征值,输出:LogSoftmax(),损失是用1 和0 的标签, 二分类损失, 梯度更新backward()
2 输入:首先用鉴别器, 输入:生成器G的输出特征,输出:LogSoftmax(),
然后我不明白,怎么和生成器G,关联起来昵,它和鉴别器是两个独立的网络,鉴别器的梯度更新怎么和 G 联系起来昵???,
G网络的输入是target 数据 标签 ,输出是fc
D网路的输入是 G的特征以及 G与S的拼接特征。输出是LogSoftmax()

from examples.

soumith avatar soumith commented on May 18, 2024 2

ah yes, what I said above is only true if we also retain_graph=True. My bad, I stand corrected.

from examples.

colinshenc avatar colinshenc commented on May 18, 2024 2

@plopd what you are saying doesn't make any sense to me

from examples.

MeirGavish avatar MeirGavish commented on May 18, 2024 1

Hello,
I am a little confused by this and will greatly appreciate help in understanding.
According to my understanding detach() prevents further computations from being tracked. (I suppose it also prevent previous computations from being taken into account in the backward pass?)

Either way, wouldn't you want to track the next computation, the operation of D over fake, for the backward pass of D?
If you wanted to prevent tracking of the Generator, wouldn't it make sense to detach before applying G and then restore tracking for D right at the point where detach is now called? (With requires_grad_(True)?)
Thank you

from examples.

rogertrullo avatar rogertrullo commented on May 18, 2024

Thanks for the quick reply @soumith!

from examples.

tusharkr avatar tusharkr commented on May 18, 2024

what @plopd has said is absolutely right. Detaching fake from the graph is necessary and will lead to an error if not done so.

from examples.

Einstellung avatar Einstellung commented on May 18, 2024

let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.

from examples.

shiyuanyin avatar shiyuanyin commented on May 18, 2024

S feature->
G feature->
图片

from examples.

Einstellung avatar Einstellung commented on May 18, 2024

@shiyuanyin
I don't really know what you said. You should use G result to update D

from examples.

shiyuanyin avatar shiyuanyin commented on May 18, 2024

@Einstellung

I don't really know what you said. You should use G result to update D

    ############################
    # (2) Update G network: maximize log(D(G(z)))
    ###########################
    netG.zero_grad()
    label.data.fill_(real_label)  # fake labels are real for generator cost
    output = netD(fake)
    errG = criterion(output, label)
    errG.backward()
    D_G_z2 = output.data.mean()
    optimizerG.step()

########
output = netD(fake), output is related to D model , not G model, when backward(D get the input G gradient) ,the gadient how to give the G, because is not connect
##I think the process, netG.backward(the model D grad,to input G feature)
this is right???

from examples.

disanda avatar disanda commented on May 18, 2024

detach just reduce the work that G() gradient upgrade in training step of D(), because G() will train in next step

from examples.

tjysdsg avatar tjysdsg commented on May 18, 2024

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

If I understand correctly, then if i created a new noise input for G, there's no need for the detach() call?

from examples.

Shanmukh0028 avatar Shanmukh0028 commented on May 18, 2024

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

If I understand correctly, then if i created a new noise input for G, there's no need for the detach() call?

Do you mean like creating fake1=netG(noise) which is same as fake that was before disconnection. Even i have the same doubt can someone please clarify this?

from examples.

batman47steam avatar batman47steam commented on May 18, 2024

I think that's because if you don't use fake.detach() in output = netD(fake.detach()).view(-1) then fake is just some middle variable in the whole computational Graph, it will track from netG() to netD(). and when you can optimizerD.step() all grad information except leaf nodes are released. which means no more gradient information about netG() in the computational Graph. then you use errG.backward() it will cause an error

from examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.