taoxugit / attngan Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 414.0 36.76 MB

License: MIT License

Python 100.00%

attngan's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 lynnhongliu ilovecv opencvfun xuanhan863 hainow sethjuarez kang1916 btrungchi shuizhilinxin irfanicmll owalnuto soinlovelin pzzhang zhangshaoyong southatsouth gwli truncet tangyoubao zcrwind yxgeee udonda poorvarane jamesli1618 manyawadhwa sam-daniel ai3dvision yiwenshaostephen c1a1o1 yc-dreaming vanpersie32 nzigel shafiahmed liviust zzw1123 2kangho rchavezj dan9lee gjyin raiabhishek viveksck whyou5945 huzaifasabir inn0vat0r aipachakutiqwan jiangpinghuang aprilyapingzhang winwinjjiang zhengkaifu smallflyingpig rafid009 ywang370 sazan-mahbub shuvrajit9904 zhaoyuanjdf ljm198134 afcarl kheimpel fendaq saxenarohit auserj cvalenzuela stevenlol jimmy-walker adaydl alabarga revskill10 daitomanabe kitchensinkcollection fingerleakers morristech amoliu hulalazz iynere ruijiang81 jobqiu dailyactie jthigh joycezhao2 sadden fwtan fran-mora duke24k dtean qiaott musingxu pranavbudhwant d0npiano paulchou0309 mikexuq shaivaldalal roopy7890 qibinc doneladams christinaliang duanzhibin andylhxu ihollywhy maplespirit liuchang0520

attngan's Issues

Discriminator loss function differs from paper?

Can someone explain what the function of cond_wrong_errD is in the discriminator loss function? It seems not a part of the discriminator loss mentioned in the paper (Eq. 5). Also, it does not make sense to me. Why ignore the last entry in the batch?

cond_wrong_logits = netD.COND_DNET(real_features[:(batch_size - 1)], conditions[1:batch_size])
cond_wrong_errD = nn.BCELoss()(cond_wrong_logits, fake_labels[1:batch_size])

Out of memory - 1060 6GB while Sampling

RuntimeError: CUDA error: out of memory
I am running sampling for birds model (python main.py --cfg cfg/eval_bird.yml --gpu 0)

External dataset

Hey. First of all need to say that job and result is absolutely amazing. Thank you for your work and sharing the code with the community.

Going through the steps to run your code with your data is quite easy and I get the same result as yours. But when i try to figure out how to test your architecture on external data i faced the issue of preprocessed metadata for each dataset you work with.

So can you please list some steps on how to feed some external data (aka bunch of images with captions) to your model (including pretraining DAMSM and embedding vectors). Suppose this is will be very useful information to expand your research on broad spheres.

Transition from Python 2.7 to 3.0+

After invoking "python main.py --cfg cfg/bird_attn2.yml --gpu 2" for bird dataset, I get the following error:

Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg
Load filenames from: ../data/birds/train/filenames.pickle (8855)
Load filenames from: ../data/birds/test/filenames.pickle (2933)
Load from: ../data/birds/captions.pickle
Traceback (most recent call last):
File "main.py", line 129, in
transform=image_transform)
File "/Users/user/Desktop/AttnGAN-master/code/datasets.py", line 118, in init
self.class_id = self.load_class_id(split_dir, len(self.filenames))
File "/Users/user/Desktop/AttnGAN-master/code/datasets.py", line 254, in load_class_id
class_id = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

I couldn't find an answer through Google so I posted back to the community.

Training on CPU

I have no GPU, could I use my CPU for the training?
Thank you very much

How to run evaluation code using pre-trained model locally?

Hi,

I want to run evaluation code locally using pre-trained model on COCO-dataset. I don't want to create a docker container or any API based on call. I tried to run eval/eval.py by getting error related to azure setup. How to use existing code for "Generating images from the captions"?

AttributeError: 'module' object has no attribute 'Upsample'

training is very slow

The training is very slow and GPU has not fully used. Is this the case of the code? Or I implemented wrongly?

I run python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0 for pretrain of DAMSM.
``
It sometimes shows:

+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 0000:8A:00.0     Off |                  N/A |
| 34%   58C    P2    84W / 250W |   3952MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Sometimes shows

+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 0000:8A:00.0     Off |                  N/A |
| 34%   59C    P2   171W / 250W |   3952MiB / 12189MiB |     74%      Default |
+-------------------------------+----------------------+----------------------+

The GPU has not been fully used and training is quite slow. 15 epochs need 1 hours. So 600 epochs may need 40 hours. Is this the case? Or I has something wrong?

Why does the hash value of saved models change everytime I save it?

I am using torch.save() to save a model file. However, everytime I save it, it changes. Why so?

netG_1 = torch.load('netG.pth')
netG_2 = torch.load('netG.pth')

torch.save(netG_1, 'netG_1.pth')
torch.save(netG_2, 'netG_2.pth')

Using md5sum *.pth:

779f0fefca47d17a0644033f9b65e594  netG_1.pth
476f502ec2d1186c349cdeba14983d09  netG_2.pth
b0ceec8ac886a11b79f73fc04f51c6f9  netG.pth

The inception score on coco dataset

Hi,in my experiment,the inception score of the pretrained model on coco dataset is 16.16.Do you know why the IS is not stable?How can make the IS to 25.89?

Error when training DAMSM

Traceback (most recent call last):
File "pretrain_DAMSM.py", line 275, in
dataset.ixtoword, image_dir)
File "pretrain_DAMSM.py", line 61, in train
for step, data in enumerate(dataloader, 0):
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 310, in iter
return DataLoaderIter(self)
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 180, in init
self._put_indices()
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
indices = next(self.sample_iter, None)
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/sampler.py", line 119, in iter
for idx in self.sampler:
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/sampler.py", line 50, in iter
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184

@taoxugit
Could you please help me solve this problem?
Thank you very much!
I'm looking forward to your reply.

p(image|caption)

Hello,

What's the easiest way to calculate p(ground truth image | caption)?
I would highly appreciate any suggestions.

_

division by zero

i am seeing:

C:\AttnGAN\code\miscc\utils.py:235: RuntimeWarning: invalid value encountered in true_divide
one_map = (one_map - minV) / (maxV - minV)

is this ok?

size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([279, 300]).

Why does netG generate different data for same input?

In many places, the fake images are generated via:

fake_imgs, _, _, _ = netG(noise, sent_emb, words_embs, mask)

The netG is of class G_NET defined in https://github.com/taoxugit/AttnGAN/blob/master/code/model.py#L397.

When I keep noise, sent_emb, words_embs and mask constant and rerun the generation, I get different fake images. Shouldn't the model be outputting a constant output for a constant input? Is there any stochastic behaviour of the G_NET?

question about the paper

Hi,
professor, I am so exciting about the result of you paper, and the idear inspire my inspiration a lot. I think it is an awesome work. But I still have a problem about this paper.
At first, we will pretrain the DAMSM to get the text encoder. I think this step will make the word feature from text enconder to get close to the sub-region feature of the image from the image encoder. But I get confuse. In the begining, the word feature get from the text encoder will be random without training, how can we make the word feature to match the right sub-region? For example, if the word feature of the word 'bird' is close to the feature of sub-region 'tree' at first step without training, then the word 'bird' will match the sub-region 'tree' step by step while pretraining the DAMSM. It seems not correct. But the result is so amazing.
I don't konw if i understand it in the right way. I am grateful if you could answer the question. Tanks

Validation generates 2928 instead of 2933 reported in paper and dataset.len

Hi, I run the following code to make validation on trained Attn Generator:
python main.py --cfg cfg/eval_bird.yml --gpu 1
but resulted in getting 2928 images instead of 2933 reported in paper "Statistics of datasets" and using dataset.len (2933 as well)

I think Pytorch Dataloader initialization param "drop_last" is the key. However, when I set it to False, the code raised an error saying:
RuntimeError: Expected hidden[0] size (2, 5, 128), got (2, 16, 128)
indicating that the last batch failed and batch_size instead of input.size[0] is used to struct the model, which results in that the number of generated images varies with different batch_size setup.

How can I solve this issue?

The inception score of the pretrained model on birds dataset is 4.17.

Hi,
I want to reproduce the experiment result in the paper. However, the inception score of the pretrained model on birds dataset is 4.17. I compute the inception score using https://github.com/hanzhanggit/StackGAN-inception-model. I have tried both pytorch 0.3 and 0.4, the inception score is still lower than 4.36 (reported in the AttnGAN paper).

"GLU" function and "class_ids"

Hi,
professor,you work is so exciting. I have read the paper and code,but have some puzzled.first, i do't know the effect of "GLU" function. And second,in DASMA model, what is the meaning of class_ids.( for example: imgs, captions, captions_lens, class_ids, keys = data )

I hope your help,and thank you at same time.

RuntimeError: Couldn't open shared file mapping: <torch_18044_1286185042>, error code: <1455>

I tried to run the pre-train part, but it shows like this:

  File "pretrain_DAMSM.py", line 274, in <module>
    dataset.ixtoword, image_dir)
  File "pretrain_DAMSM.py", line 59, in train
    for step, data in enumerate(dataloader, 0):
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 110, in default_collate
    storage = batch[0].storage()._new_shared(numel)
  File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\storage.py", line 114, in _new_shared
    return cls._new_using_filename(size)
RuntimeError: Couldn't open shared file mapping: <torch_18044_1286185042>, error code: <1455> at C:\Anaconda2\conda-bld\pytorch_1519496000060\work\torch\lib\TH\THAllocator.c:157

could anyone help me?

COCO caption.pickle

Hi
I can't find cpation.pickle for COCO in your repo, I only found caption.pickle for birds. Can you please also update the repo with COCO caption.pickle or if it's too big can you please send it to me directly.
Thanks a lot

Higher resolution?

Hi, it now generate 256x256 pixels images, could it be possible to produce higher resolution such as 1024x1024? If yes, how could I do it? thanks.

One small spelling mistake

In "Dependencies" part, I think it should be "scikit-image" instead of "skikit-image". :)

Issue with loading caption method

I have tried to display the caption and image after loading for the training process, then I found out that loading caption method seems truncated some keywords from the original caption.

For example:
Image name: 148.Green_tailed_Towhee/Green_Tailed_Towhee_0030_797417
The loading caption: this bird has a red and blue crown brown and secondaries and coverts a grey and white chest
The original caption: this bird has a red and blue crown, brown and yellow secondaries, grey and green coverts and a grey and white chest.

As you can see that it missing important keywords like yellow and green. This could destroy the performance of the model. I think that the get_caption method should load the original caption in the dataset.

trainer.py not working with CPU

Hi,

trainer.py has lots of cuda() calls without asking about the CUDA flag set in config file, so evaluation/training on a CPU is not possible.

I fixed the issue by adding "if cfg.CUDA" loops before every cuda() call and it works just fine: https://github.com/KCool/AttnGAN/blob/master/code/trainer.py

Best
KCool

-

How to continue to train from the snapshot?

I saved the models for netG and netD, I want to continue to train them. What should I do?
I found in the code
torch.save(netG.state_dict()
how can I transfer dict to netG?
Anyone could help me?
Thank you so much.

AssertionError

Error message:

Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg
Load from: ../data/birds/captions.pickle
1 10
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 247, in
assert dataset
AssertionError

Could you please help me solve this problem? @taoxugit

DAMSM model choice

Hello,
Thanks for making this code publicly available, it's of great use.
I want to know how to choose the best pretrained DAMSM when training the AttnGAN models. Can you please guide me a little on this?
Thank you.

How many epochs is performed when training DAMSM model?

There is some difference between your provided trained DAMSM model and the "config".yml. It seems the provided trained DAMSM model has been trained for 200 epochs. However, in the .yml, you suggest us to train this model with 600 epochs. I am confused.

FileNotFoundError: File b'../data/coco\\CUB_200_2011/bounding_boxes.txt' does not exist

(base) H:\AttnGAN-master\code>python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'DAMSM',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '../data/coco',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 64,
'GF_DIM': 128,
'R_NUM': 2,
'Z_DIM': 100},
'GPU_ID': 0,
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 15},
'TRAIN': {'BATCH_SIZE': 48,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 600,
'NET_E': '',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 1.0},
'SNAPSHOT_INTERVAL': 5},
'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1},
'WORKERS': 1}
C:\ProgramData\Anaconda3\lib\site-packages\torchvision\transforms\transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 243, in
transform=image_transform)
File "H:\AttnGAN-master\code\datasets.py", line 110, in init
self.bbox = self.load_bbox()
File "H:\AttnGAN-master\code\datasets.py", line 126, in load_bbox
header=None).astype(int)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'../data/coco\CUB_200_2011/bounding_boxes.txt' does not exist

No such file or directory: '../data/birds/text/180.Wilson_Warbler/Wilson_Warbler_0007_175618.txt'

I tried to run the codes with the pre-trained model provided, but it showed
No such file or directory: '../data/birds/text/180.Wilson_Warbler/Wilson_Warbler_0007_175618.txt'

where is this txt? do I need to generate it or download it?

AssertionError：dataset = TextDataset()

global ignore in /data excludes example files

The global git ignore file in the /data directory currently excludes example files such as /data/birds/example_filenames.txt and /data/birds/example_captions.txt.

Preserving these files by using git ignore more selectively should help new users test this network more easily.

How to generate image from CPU?

Does anyone know how to generate images from CPU? Thanks

Alan

The inception score on coco dataset

Hi, I want to reproduce the experiment result in the paper. However, the inception score of the pretrained model on coco dataset is 12.32. I compute the inception score using https://github.com/openai/improved-gan/tree/master/inception_score. I use the pretrain DAMSM model you provided. And directly run 'python main.py --cfg cfg/coco_attn2.yml --gpu 0'. Is there any tricks for training? The training process is very slow.

Not being able to train on CPU even though I altered trainer.py

In another issue about training on CPU, I read this fix:

"_trainer.py has lots of cuda() calls without asking about the CUDA flag set in config file, so evaluation/training on a CPU is not possible.

I fixed the issue by adding "if cfg.CUDA" loops before every cuda() call and it works just fine_"

This fix still doesn't enable me to evaluate and train on CPU... I keep on getting this error: AttributeError: module 'torch._C' has no attribute '_cuda_getDevice
Does anybody know how I can fix this please?

Permission to deploy

I notice in the readme you want references if people use your work for research. What about production? Are we given permission to use AttnGAN to deploy for production?

Images for new text

Where should I place my text into, if I want to generate new images.
I have successfully reproduced your images, but when i put my own images into the example_captions.txt they dont come up.
So where should I put new text ???

Telemetry key error

Hi,
I'm getting the error on running python main.py --cfg cfg/eval_coco.yml --gpu 1

Traceback (most recent call last):
  File "main.py", line 12, in <module>
    enable(os.environ["TELEMETRY"])
  File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__
    raise KeyError(key)
KeyError: 'TELEMETRY'

How do you do the metadata preprocessing?

I want to know about the specific operation of metadata preprocessing. Can you upload the corresponding code?

FileNotFoundError: [Errno 2] No such file or directory: '../DAMSMencoders/coco/image_encoder100.pth'

When I run this code, I meet this error.

RuntimeError: sizes must be non-negative

Hello, I was running python3 main.py --cfg cfg/eval_bird.yml --gpu 0 to start sampling and got the above error. Can anyone explain why this error occured?

NOTE:
I used the pretrained models for both the DAMSMand AttnGAN. I am also currently using the birds dataset.

Thank you.

Parallel multi-GPU training

Hello,
Thanks for making this code publicly available, it's of great use.
It's running smoothly except when I try to run it on multiple GPUs, then it raises errors like this:
RuntimeError: Expected hidden size (2, 24, 128), got (1L, 48L, 128L)
Can you please guide me a little on this?
Thanks.

RNN parameters initilization

hello, in the code 'model.py', you said "# Do not need to initialize RNN parameters, which have been initialized" like below, why do like this? any help

def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
# Do not need to initialize RNN parameters, which have been initialized
# http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM
# self.decoder.weight.data.uniform_(-initrange, initrange)
# self.decoder.bias.data.fill_(0)

Validation error,run python main.py --cfg cfg/eval_coco.yml --gpu 1

While copying the parameter named "encoder.weight", whose dimensions in the model are torch.Size([27552, 300]) and whose dimensions in the checkpoint are torch.Size([27297, 300]).

eval/eval.py and code/main.py produce different image sizes?

What is the difference between eval/eval.py and code/main.py code for the generation part?

When I run eval/eval.py the images are of size 64x64. Whereas when I sample from test folder using code/main.py by setting the cfg.B_VALIDATION as True, it generates 256x256 images. Both the scripts use the same cfg/eval_****.yml.

Why does the eval.py script generate 64x64 images? I don't see any other hyper paramater in eval.py code that decides the size of the images.

I add print(im.size) after line 110 in eval/eval.py https://github.com/taoxugit/AttnGAN/blob/master/eval/eval.py#L110

I add print(im.size) after line 423 in code/trainer.py
https://github.com/taoxugit/AttnGAN/blob/master/code/trainer.py#L423

Question about the Code

In the /code/model.py, the c_code is passed to the class NEXT_STAGE_G as a parameter. However, the c_code is covered by the function c_code, att = self.att(h_code, word_embs) so that the parameter here is redundant. I'm not sure whether the c_code in the definition is necessary or just misused in c_code, att = self.att(h_code, word_embs).
Thanks~

def forward(self, h_code, c_code, word_embs, mask):
    """
        h_code1(query):  batch x idf x ih x iw (queryL=ihxiw)
        word_embs(context): batch x cdf x sourceL (sourceL=seq_len)
        c_code1: batch x idf x queryL
        att1: batch x sourceL x queryL
    """
    self.att.applyMask(mask)
    c_code, att = self.att(h_code, word_embs)
    h_c_code = torch.cat((h_code, c_code), 1)
    out_code = self.residual(h_c_code)

    # state size ngf/2 x 2in_size x 2in_size
    out_code = self.upsample(out_code)

    return out_code, att

How to control max epochs ? and best number of workers for 64 GB ram.

I changed the max epochs in coco.yml but max iteractions goes upto 600.
Next is i changed the batch_size to fully use my GPU, next is what is optimal number of workers for 64GB RAM to make the training faster ?

Thanks!!