lantaoyu / seqgan Goto Github PK

View Code? Open in Web Editor NEW

2.1K 73.0 712.0 4.12 MB

Implementation of Sequence Generative Adversarial Nets with Policy Gradient

Python 100.00%

seqgan's Introduction

SeqGAN

Requirements:

Tensorflow r1.0.1
Python 2.7
CUDA 7.5+ (For GPU)

Introduction

Apply Generative Adversarial Nets to generating sequences of discrete tokens.

The illustration of SeqGAN. Left: D is trained over the real data and the generated data by G. Right: G is trained by policy gradient where the final reward signal is provided by D and is passed back to the intermediate action value via Monte Carlo search.

The research paper SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient has been accepted at the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17).

We provide example codes to repeat the synthetic data experiments with oracle evaluation mechanisms. To run the experiment with default parameters:

$ python sequence_gan.py

You can change the all the parameters in sequence_gan.py.

The experiment has two stages. In the first stage, use the positive data provided by the oracle model and Maximum Likelihood Estimation to perform supervise learning. In the second stage, use adversarial training to improve the generator.

After running the experiments, you could get the negative log-likelihodd performance saved in save/experiment-log.txt like:

pre-training...
epoch:	0	nll:	10.1716
epoch:	5	nll:	9.42939
epoch:	10	nll:	9.2388
epoch:	15	nll:	9.11899
epoch:	20	nll:	9.13099
epoch:	25	nll:	9.14474
epoch:	30	nll:	9.12539
epoch:	35	nll:	9.13982
epoch:	40	nll:	9.135
epoch:	45	nll:	9.13081
epoch:	50	nll:	9.10678
epoch:	55	nll:	9.10694
epoch:	60	nll:	9.10349
epoch:	65	nll:	9.10403
epoch:	70	nll:	9.07613
epoch:	75	nll:	9.091
epoch:	80	nll:	9.08909
epoch:	85	nll:	9.0807
epoch:	90	nll:	9.08434
epoch:	95	nll:	9.08936
epoch:	100	nll:	9.07443
epoch:	105	nll:	9.08305
epoch:	110	nll:	9.06973
epoch:	115	nll:	9.07058
adversarial training...
epoch:	0	nll:	9.08457
epoch:	5	nll:	9.04511
epoch:	10	nll:	9.03079
epoch:	15	nll:	8.99239
epoch:	20	nll:	8.96401
epoch:	25	nll:	8.93864
epoch:	30	nll:	8.91642
epoch:	35	nll:	8.87761
epoch:	40	nll:	8.88582
epoch:	45	nll:	8.8592
epoch:	50	nll:	8.83388
epoch:	55	nll:	8.81342
epoch:	60	nll:	8.80247
epoch:	65	nll:	8.77778
epoch:	70	nll:	8.7567
epoch:	75	nll:	8.73002
epoch:	80	nll:	8.72488
epoch:	85	nll:	8.72233
epoch:	90	nll:	8.71473
epoch:	95	nll:	8.71163
epoch:	100	nll:	8.70113
epoch:	105	nll:	8.69879
epoch:	110	nll:	8.69208
epoch:	115	nll:	8.69291
epoch:	120	nll:	8.68371
epoch:	125	nll:	8.689
epoch:	130	nll:	8.68989
epoch:	135	nll:	8.68269
epoch:	140	nll:	8.68647
epoch:	145	nll:	8.68066
epoch:	150	nll:	8.6832

Note: this code is based on the previous work by ofirnachum. Many thanks to ofirnachum.

seqgan's People

Contributors

Stargazers

Watchers

Forkers

haozesun2016 udibr federicov hedgefair vseledkin g-wang codeaudit caomw lxj0276 amoliu techscientist benjamesbabala lfthwjx bloodd madaiqian ml-lab zhenyangiacas vyraun xuerenlv vikingmew chingyaoc anuroopsriram andrewliao11 sainid77 quickresolve nikogamulin shaoyizhang wjbianjason miradel51 richardkelley polaris79 stevenlol victormelo sohuren tranlm yunfanz mansteinliliang ajsutrave yala jalused deepmusic buptpriswang jinyu0310 johndpope shashankg7 codyhan zuiwufenghua emigmo jzhang45 tonyan nipengmath wsjeon nopey moheo tonydeep ajaytalati soledad89 sygi peterjliu wjssx sungjinlees yangliuy aquastar yanyankangkang karimpedia elviswf lkang chagge cosmozhang jozef-mokry iamsile solderzzc hit-computer liwei606 drjzhou speedcell4 yutong91 markwunlp lovingliferwj zshwuhan yaokaichun vanpersie32 andreasveit cjxh shatu amber819 ubergarm caoge4 m12sl snoopyboyang scml tonytongzhao zhouliang1979 tandychao fangzheng354 violet-zct napsternxg davidpeng11 yingdongucas qianwangthu

seqgan's Issues

global_variables_initializer error - tensor flow 12.1

Hi.

I would like to start generating models from my own data but I'm unable to get the code working. Running ubuntu 16 & 14 64 bit and current tensorflow 12.1 on nvidia gpu.
I've checked several forums and there are similar issues with tensorflow 12.

If this cannot be fixed is it possible to post your previous code that works on earlier versions of tensorflow?

Thanks

Traceback (most recent call last):
File "pretrain_experiment.py", line 123, in
main()
File "pretrain_experiment.py", line 93, in main
sess.run(tf.global_variables_initializer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 951, in _run
fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 407, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 238, in for_fetch
return _ElementFetchMapper(fetches, contraction_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 271, in init
% (fetch, type(fetch), str(e)))
TypeError: Fetch argument <function global_variables_initializer at 0x7fb92909bb18> has invalid type <type 'function'>, must be a string or Tensor. (Can not convert a function into a Tensor or Operation.)

Similar error when running sequence_gan.py

Once the training is done, how can we reuse the model?

After training the model, how can we load it again and reuse it for generating new molecules (e.g. SMILES)? I hope you can tell me how write a program for reusing it, thanks in advance

Training strategy?

Hi, thanks for your excellent work.
I found your code train discriminator for 15 epoches for each step of generator. It seems is totally an unfair strategy. How do you choose this strategy?

why should update rollout policy in this way?

According to the paper, rollout policy is the same with generator policy. So self.Wi = self.lstm.Wi, but in the code, here update parameters of rollout policy in a different way. Can you please explain why? Thank you very much @LantaoYu @wnzhang

Question about Rollout

In this loop:

SeqGAN/rollout.py

Line 79 in 5f2c0a5

for i in range(rollout_num):

This is N Time Monte Carlo sampling with n = 16 in the code. But how are the different samples generated? given_num represents how many tokens to use from the input, and irepresents the i'th sample. Why are the samples different for different values of i? Is the rollout network being updated somewhere within call to get_reward and I'm missing it? I also don't see where the randomness is coming in for the Monte Carlo estimation of the partial sequence reward.

From my examination of the code, the network doesn't get updated and the session parameters are the same so I'm not sure how different samples are being generated.

Can someone help me understand how a) different samples are being generated, b) where is the randomness coming from, c) if the rollout network has the same parameters as the Generator network, how is it generating different samples than the generator?

Any help is greatly appreciated! Thank you for providing this code it has been very helpful to me.

Issue regarding new review generation

Hi ,
I am unable to understand that how to use yelp data set and generate new review. In DP-GAN they have used text data .
In SeqGAN if I have text data file what are the steps to preprocess it and after generating new review file how to get back it in text form.Here every thing is comming in numerical format. How to convert into actual text please help. I actually wanted to see the quality of text generated by SeqGAN and DP-GAN.
Thanks

strange behavior of reward signal

Hi,

I'm getting a very strange behavior that I can't explain when running your code, and I'm interested to know if you can reproduce, or help me understand.
I wanted to see if I can calculate a better reward, and along the way I tested with fixed values. Meaning, I replaced the implementation of rollout.py:get_reward() with:

rewards = np.zeros((64,20))
rewards.fill(2)
return rewards

Surprisingly, it had the generator achieve faster convergence onto a lower value of the test error (see attached log). I got pretty much the same behavior when I used rewards uniformly sampled from [0,1]. I'm not sure what to make of it..

Also, a question: Why is the rollout network lagging behind the generator (default value is 0.8)? Don't we want in theory to sample from the latest generator?

fixed-seqgan-log.txt

How to run the code in the python3.5.2

I want to know how to modify the code to run in the python3.5.2
thank you!
@lantao Yu

what the 'start state s0' exactly is?

I saw 'generate a sequence from the start state s0 to maximize its expected end reward ...' in the paper. I'm wondering what the s0 exactly mean? In the code, I see the START_TOKEN=0 and h0=zeros,which one is start state?

How to train and generate from our own Data?

Can you provide any guidance on how to train and generate from my own data? I would like to try SeqGAN with various English poetry and prose, but I am not sure how to change this code to train on my own data and then generate new writing.

target_params.pkl open using pyhotn3 pickle failed.

hi, I am trying run this repo under pyhton3, original implements using python2, code is fine but the target_params.pkl can not open, do you have an alternate version of that file which can open using python3?

target_params.pkl encode

Thanks for you project, It is very useful to me. But the code of 'target_params.pkl' is not support in python3, how can I modify it ? Thanks very much!

Sequential data as vectors?

Do you know if it would be possible to adjust this to work for an input where each token in the sequence is a vector? Thanks!

About the sequences to process

I wonder whether I coule use this frame to processing plural sequences. If it is posseible, coule you please give me some suggestions?

Confusion about generator training in adversarial train

I still don't know exactly what (N Time Monte Carlo sampling) is..

Could someone please explain it ?
Thank you @LantaoYu @sygi @jozef-mokry @wnzhang @andreasveit

About baseline of reward function

hello everyone,
I have learned that in order to reduce the variance of gradient estimator,
usually we apply the "reward baseline" technique in the gradient optimization function like

However, I cannot find any reward baseline technique in SeqGAN code.
Am I missing something?

thanks in advance!

about the loss of generator

Hi, I have read your code in generator.py (line 106-113),
# Unsupervised Training
self.g_loss = -tf.reduce_sum(
tf.reduce_sum(
tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
), 1) * tf.reshape(self.rewards, [-1])
)
I find the the variable (self.g_predictions and self.x) is the same as the variable in self.pretrain_loss, but since this is for the Unsupervised Training, the variable self.g_predictions should be replaced with the variable ( line 50, tf.nn.softmax(o_t) ). After all, the line 47-56 do not use the supervised information (self.x). Is there any reason ?

Understanding of reward and loss function

Hello,

i don't understand the combination of reward and loss function. The label which are given to the discriminator are defined as followed:

positive_labels = [[0, 1] for _ in positive_examples] negative_labels = [[1, 0] for _ in negative_examples]

The reward then is always the second, positive label:

ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed) ypred = np.array([item[1] for item in ypred_for_auc])

So if the reward gets larger, the samples are identified as real class.
But then in the loss function of the generator, the reward is multiplied to the loss:

self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1]) )

I don't understand that, since the loss gets minimized, the rewards will be minimized too. So shouldn't be taken the label item[0] for ypred?

Question about the corpus

The same question about the input corpus，what if I want to generate some speech or lyrics？ Looking forword to your answer！

the input data?

Can you describe the format of the input data?
Thank you very much！

Training on custom dataset

I want to train AOL Search query logs on Seq-GAN to generate queries. Is there an implementation of Seq-GAN that uses Custom Dataset?

where does the randomness of the generator come from?

It seems that according to the generator, at the beginning of unrolling, the input is START_TOKEN. And seems that there's no source of randomness to feed into the network, so shouldn't the trained network be deterministic?

how can i run this code with python3.5? i dont know how install tensorflow-gpu in python2.7

Support python3.5.2?

Hi,
Would you have idea to support python3.5.2 environment?as Tensorflow for windows only support python3.5.2

the control_flow_ops.While has been changed to tf.while_loop

输入数据在哪?

你好,我想问一下,这源码的输入数据是哪一个数据集?又是何种形式输入的?可否分享一份与源码兼容的输入数据?

Int to Word Mapping needed

Can you upload the integer to word mappings or the pickle file for the mappings? I am unable to comprehend the data generated in real_data.txt

Please Cite

Hey - looks like you heavily based this code on my implementation at https://github.com/ofirnachum/sequence_gan

It's nice to see that you got it working on bigger problems, but please cite/reference my work.

View text data in result

What modifications do we need to make in the code to see the text generated. I would like to see results of Obama speech generated, as mentioned in the paper along with the loss. Also, can you share the data used for training.

Loss of the generator

I have found that the calculation of the loss of generator is as following

self.g_loss = -tf.reduce_sum(
        tf.reduce_sum(
            tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
                tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
            ), 1) * tf.reshape(self.rewards, [-1])
    )

I am confused of the g_predictions here, which seems to be the softmax output of the generator. And x is the samples generated based on the g_predictions. So why should they be summed together? Any help is appreciated! Thanks in advance!

关于采样的问题

你好我先问一下：采样的话必须用tf.gather这个函数吗。。直接让没有采到的那些样本乘于0行不行。。我试了一下好像直接乘与0会造成梯度为0的问题。。想问一下你们是不是也遇到过这个问题。

Why do you define a create_recurrent_unit function yourself？use lstm?

Monte-Carlo roll-out

Hi,
Thank you for your job.
I have question that how to process variable sequences in MC rollout?
Thanks

How the gradients pass down if there's a tf.multinomial() sampling process?

Since the sampling procedure is not differentiable, then how error from Discriminator pass down to train the generator?

question about update_params in roll out policy

hi, thank for sharing your code @LantaoYu , I have some question. I don't understand why should update parameters of roll out policy in the way here and here. According to paper, the roll out policy should be the same with generator, but actually in these code it is not. Besides, I don't understant the use of identity in here. Why should identity is necessary in it?

模型问题

从代码中看，target_lstm的参数是确定的，初始状态确定，初始输入确定，为什么每个batch的输出不是确定呢？

AttributeError: 'module' object has no attribute 'While'

rzai@rzai00:/prj/SeqGAN/MLE_SeqGAN$ CUDA_VISIBLE_DEVICES=0 python pretrain_experiment.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "pretrain_experiment.py", line 123, in
main()
File "pretrain_experiment.py", line 85, in main
generator = get_trainable_model(vocab_size)
File "pretrain_experiment.py", line 36, in get_trainable_model
return PoemGen(num_emb, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN)
File "/home/rzai/prj/SeqGAN/MLE_SeqGAN/model.py", line 62, in init
_, _, _, self.gen_o, self.gen_x = control_flow_ops.While(
AttributeError: 'module' object has no attribute 'While'
rzai@rzai00:/prj/SeqGAN/MLE_SeqGAN$

Did you track discriminator accuracy and loss?

I am interested in the discriminator accuracy and loss during the training. Do you have this data? I ask this because with my dataset, the discriminator is too strong.

Three questions about this model ~

Is this model fixed? Because when the training of this model is finished, users can only change the output by changing the "start token"?
What if I want to generate something else?
Compared to the raw-gan in CV, people can use the gan to generate the picture from some noise input. So can we use this SeqGAN model to do that? Generate the sentence from some noise input?

Keras implementation of SeqGAN

Almost all the popular GANs have Keras implementation except SeqGAN which makes it difficult to optimize and use with other GANs.

problems happened

Traceback (most recent call last):
File "/home/gan/eclipse-workspace/SeqGAN-master/sequence_gan.py", line 182, in
main()
File "/home/gan/eclipse-workspace/SeqGAN-master/sequence_gan.py", line 91, in main
generator = Generator(vocab_size, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN)
File "/home/gan/eclipse-workspace/SeqGAN-master/generator.py", line 43, in init
self.h0 = tf.stack(values, name)([self.h0, self.h0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 817, in stack
value_shape = ops.convert_to_tensor(values[0], name=name).get_shape()
TypeError: 'type' object has no attribute 'getitem'

Is the loss of the discriminator in GAN the same as that in pretraining?

In paper they are not the same. See the function (5) in the paper.

evaluation issues

Hi there, I got a question about the evaluation on text generation. In your AAAI2017 paper, you have mentioned that for the Chinese poem generation you "use the whole test set as the references" to caculate the BLEU score. What is the meaning of "references"? How to use the test samples as the "positive examples" as you mentioned? Are all the test samples loaded to the model as the input? And then the BLEU scores are calculated based on the corresponding output from the model? I would appreciate if you could explain more details about the evaluation procedure. Thanks in advance!

Vocab Dictionary

Hello, I want to translate the sequence of tokens to sentence. Would you please provide the complete vocabulary file please?

How to get the parameters of a different target_lstm?

I noticed that there is a class named TARGET_LSTM which uses the predefined parameters from target_params.pkl.
My question is that if I use my own data and different global parameters, such as EMB_DIM, HIDDEN_DIM, SEQ_LENGTH.., how to obtain the parameters for TARGET_LSTM? What is the usage of TARGET_LSTM?