Giter Site home page Giter Site logo

Comments (5)

CR-Gjx avatar CR-Gjx commented on September 4, 2024

Thanks for your suggestion! We also notice these thoughts and try MCTS in our framework, but in text generation, the counts of action always more than 5000 while 361 in GO, so the algorithm is limited by the memory of the GPU. I think that it can be a great idea if you have enough resource.

from leakgan.

NickShahML avatar NickShahML commented on September 4, 2024

Yes, I should have addressed that issue. I tested your repo on a larger vocab size (80k) and I run out of memory quickly. However, I think there are several ways to address this memory issue:

  1. The biggest memory problem with your approach is that LSTM's take an excessive amount of memory. On the other hand, with the Transformer Network, you not only experience improved results over vanilla LSTM, but it takes significantly less memory. On a single 1080ti, I can train a batch size of 2048 while for a comparable LSTM, at most I can train a batch size of 64.

Another huge benefit with this network is that it can be trained in linear time, which means you can reduce the batch size even further (this would affect ranking part of your algo through)

In your paper, you use a small LSTM network of 128 units. If you used a comparable Transformer network (just the decoder portion), you would have little to no memory footprint.

  1. Yes, there is action counts over 5000+ actions in text generation, but one way to reduce this problem is to use subwords instead of word-based. You can get reasonable text generation with a 4k vocab.

  2. Finally, I have a four 1080ti system that I would be happy to run any experiments you guys have. Additionally AWS just released volta gpus for renting.

from leakgan.

AranKomat avatar AranKomat commented on September 4, 2024

I'm working on implementing this AlphaZero + GAN + Transformer thing. A good thinWas your comparison g about our case compared with board game is that the required forward FLOPs at each move is smaller by ~100 fold, since the input of each layer in our case is bs x hidden_dim, whereas in Go it is bs x hidden_dim x 19 x 19. Furthermore, we can have less number of layers (e.g. 6) and drastically decrease the number of simulations per move, the latter of which I have some justification for. Thus, I believe you can do a reasonable training with one or several GPUs without decreasing the hidden dimension from 256. For simplicity, I've omitted leakage of information and hierarchical components to compare with SeqGAN. I'm not confident in the discriminability of unfinished sentences, so I'll try two cases: (1) to assign the D score of (finished) sentence to each leaf (no non-leaf node); (2) to assign the D score of any sentence to any node and z of a node is the mean of the child nodes. Without using proper cache, Transformer's inference is much slower than LSTM, whereas with cache it can perform fast decoding like faster Wavenet, which makes it slightly faster than LSTM. In my case, both G and D are Transformer.

@NickShahML I don't see why Transformer is 32x more memory-efficient than LSTM, since the most memory is consumed at the embedding and the softmax layer, which are identical to the both architectures. How did you make the comparison? Batch size used in T2T's Transformer implementation corresponds to the total number of tokens used rather than the total number of sentences. Is your 2048 batch size of Transformer really the same thing as a batch of 2048 sentences?

from leakgan.

AranKomat avatar AranKomat commented on September 4, 2024

So, I've completed the aforementioned implementation and hyperparameter tuning, and I'm trying to achieve the full convergence now with ImageCOCO. I've detected a significant mode collapsing of LeakGAN on ImageCOCO. For example, according to generated_coco_examples.txt, the word "skateboard" appears 3261 times over nearly 10k sentences, but it didn't appear very often in the actual dataset. A similar thing can be said about other words such as "A" and "man". This can be attributed to the small generator and REINFORCE. AlphaZero allows a larger architecture for generator, so hopefully this issue will be mitigated.

from leakgan.

CR-Gjx avatar CR-Gjx commented on September 4, 2024

I'm doing some work for aforementioned problems, and I think that there are many works to do, we can share some progress to solve these problems~

from leakgan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.