Giter Site home page Giter Site logo

Comments (6)

universome avatar universome commented on August 24, 2024 3

I agree that the code is not too transparent and might take a lot of time to get one's head around. I checked several places that I think are the main places to change and think about the following ones:

  • GridInput class here. Looks like you can remove input_column altogether because for 2D generation one would need to generate vertical patches independently too, this input_column would become an "input_pixel", which will make it quite useless
  • patchwise_op function here, which is hard-coded for vertical patches only
  • (the most difficult one): fast_bilinear_mult_row function here, which is hard-coded to do interpolated only for 1D
  • Right now, we feed the central latent code w and its context ws_context in a very inconvenient way as separate arguments everywhere (like here) (I did it to preserve backward compatibility to StyleGAN2). Maybe it's worth dropping this and having a single w_grid argument of shape [batch_size, grid_h, grid_w, num_ws, w_dim] (or smth like this). But not sure about that. The problem with the current variant is that it is quite annoying to arange w and ws_context into a grid every time you need to interpolate.

Also, it might be useful to think what tricks can be borrowed from the recent Alias-Free GAN which is also a coordinate-based GAN model.

Feel completely free to ask any further questions if you'll have some!

from alis.

universome avatar universome commented on August 24, 2024 1

Hi! In my head, it was supposed to look the following way. You sample 9 anchors and assemble them into a 3x3 grid (right now there are 3 anchors sampled that are assembled into a 1x3 grid). This grid is projected onto the coordinates plane in such a way that its left lower point is at location (-d, -d) and its upper right point is at (d, d). Here, d is a hyperparameter denoting the distance between anchors, in the paper we used d=2 (assuming that a frame has the size of 1x1 units).

Now, you randomly sample a square frame on this grid, render it and pass to the discriminator. For each coordinates location, you interpolate the styles codes from the above 3x3 anchors grid.

Note that it will be quite slow (i.e. ~2x) to train compared to the current 1D implementation if one does not implement a specialized CUDA kernel for the fused interpolate+multiply operation.

from alis.

ivanlen avatar ivanlen commented on August 24, 2024 1

It completely make sense what you say. Thanks!
I am still reading the code, comparing with StyleGan2 and with your manuscript, and trying to understand some lines before the implementation. It's a big piece of code and it is taking my a while, but I am advancing slowly.
As soon as I understand the important lines I will start to code and with some testings.

By the way, congrats for the manuscript, is super interesting and very well written, I enjoyed it a lot.

I will try not to spam here, but if I have some doubts with the implementation I'll come around.

Cheers!

from alis.

ivanlen avatar ivanlen commented on August 24, 2024 1

Hi @universome thank you very much for your hints.
I was quite busy with other stuff, but during these days I will check the code together with your suggestions and see if I can advance in the 2d implementation.

Thank you again for your answer, I think that it was what I needed to advance. Also I will check the Alias-Free GAN.
If I have further questions during the implementation I will definitely let you know.
Cheers!

from alis.

zengxianyu avatar zengxianyu commented on August 24, 2024

Have you made any progress on the 2D version? I'm also interested

from alis.

ivanlen avatar ivanlen commented on August 24, 2024

Have you made any progress on the 2D version? I'm also interested

Not really, I have some drafts notebooks in which I was testing stuff, but still very far from something that can be shared or a PR.

I don't have much free time lately... I hope to be able to continue this soon, but I don't know when I am going to be able.

from alis.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.