taoxugit / attngan Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Can someone explain what the function of cond_wrong_errD
is in the discriminator loss function? It seems not a part of the discriminator loss mentioned in the paper (Eq. 5). Also, it does not make sense to me. Why ignore the last entry in the batch?
cond_wrong_logits = netD.COND_DNET(real_features[:(batch_size - 1)], conditions[1:batch_size])
cond_wrong_errD = nn.BCELoss()(cond_wrong_logits, fake_labels[1:batch_size])
RuntimeError: CUDA error: out of memory
I am running sampling for birds model (python main.py --cfg cfg/eval_bird.yml --gpu 0)
Hey. First of all need to say that job and result is absolutely amazing. Thank you for your work and sharing the code with the community.
Going through the steps to run your code with your data is quite easy and I get the same result as yours. But when i try to figure out how to test your architecture on external data i faced the issue of preprocessed metadata for each dataset you work with.
So can you please list some steps on how to feed some external data (aka bunch of images with captions) to your model (including pretraining DAMSM and embedding vectors). Suppose this is will be very useful information to expand your research on broad spheres.
After invoking "python main.py --cfg cfg/bird_attn2.yml --gpu 2" for bird dataset, I get the following error:
Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg
Load filenames from: ../data/birds/train/filenames.pickle (8855)
Load filenames from: ../data/birds/test/filenames.pickle (2933)
Load from: ../data/birds/captions.pickle
Traceback (most recent call last):
File "main.py", line 129, in
transform=image_transform)
File "/Users/user/Desktop/AttnGAN-master/code/datasets.py", line 118, in init
self.class_id = self.load_class_id(split_dir, len(self.filenames))
File "/Users/user/Desktop/AttnGAN-master/code/datasets.py", line 254, in load_class_id
class_id = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
I couldn't find an answer through Google so I posted back to the community.
I have no GPU, could I use my CPU for the training?
Thank you very much
Hi,
I want to run evaluation code locally using pre-trained model on COCO-dataset. I don't want to create a docker container or any API based on call. I tried to run eval/eval.py by getting error related to azure setup. How to use existing code for "Generating images from the captions"?
The training is very slow and GPU has not fully used. Is this the case of the code? Or I implemented wrongly?
I run python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
for pretrain of DAMSM.
``
It sometimes shows:
+-------------------------------+----------------------+----------------------+
| 7 TITAN Xp Off | 0000:8A:00.0 Off | N/A |
| 34% 58C P2 84W / 250W | 3952MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Sometimes shows
+-------------------------------+----------------------+----------------------+
| 7 TITAN Xp Off | 0000:8A:00.0 Off | N/A |
| 34% 59C P2 171W / 250W | 3952MiB / 12189MiB | 74% Default |
+-------------------------------+----------------------+----------------------+
The GPU has not been fully used and training is quite slow. 15 epochs need 1 hours. So 600 epochs may need 40 hours. Is this the case? Or I has something wrong?
I am using torch.save()
to save a model file. However, everytime I save it, it changes. Why so?
netG_1 = torch.load('netG.pth')
netG_2 = torch.load('netG.pth')
torch.save(netG_1, 'netG_1.pth')
torch.save(netG_2, 'netG_2.pth')
Using md5sum *.pth
:
779f0fefca47d17a0644033f9b65e594 netG_1.pth
476f502ec2d1186c349cdeba14983d09 netG_2.pth
b0ceec8ac886a11b79f73fc04f51c6f9 netG.pth
Hi,in my experiment,the inception score of the pretrained model on coco dataset is 16.16.Do you know why the IS is not stable?How can make the IS to 25.89?
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 275, in
dataset.ixtoword, image_dir)
File "pretrain_DAMSM.py", line 61, in train
for step, data in enumerate(dataloader, 0):
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 310, in iter
return DataLoaderIter(self)
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 180, in init
self._put_indices()
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
indices = next(self.sample_iter, None)
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/sampler.py", line 119, in iter
for idx in self.sampler:
File "/media/server009/seagate/liuhan/anaconda2/envs/attngan/lib/python2.7/site-packages/torch/utils/data/sampler.py", line 50, in iter
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184
@taoxugit
Could you please help me solve this problem?
Thank you very much!
I'm looking forward to your reply.
Hello,
What's the easiest way to calculate p(ground truth image | caption)?
I would highly appreciate any suggestions.
i am seeing:
C:\AttnGAN\code\miscc\utils.py:235: RuntimeWarning: invalid value encountered in true_divide
one_map = (one_map - minV) / (maxV - minV)
is this ok?
In many places, the fake images are generated via:
fake_imgs, _, _, _ = netG(noise, sent_emb, words_embs, mask)
The netG is of class G_NET
defined in https://github.com/taoxugit/AttnGAN/blob/master/code/model.py#L397.
When I keep noise
, sent_emb
, words_embs
and mask
constant and rerun the generation, I get different fake images. Shouldn't the model be outputting a constant output for a constant input? Is there any stochastic behaviour of the G_NET
?
Hi,
professor, I am so exciting about the result of you paper, and the idear inspire my inspiration a lot. I think it is an awesome work. But I still have a problem about this paper.
At first, we will pretrain the DAMSM to get the text encoder. I think this step will make the word feature from text enconder to get close to the sub-region feature of the image from the image encoder. But I get confuse. In the begining, the word feature get from the text encoder will be random without training, how can we make the word feature to match the right sub-region? For example, if the word feature of the word 'bird' is close to the feature of sub-region 'tree' at first step without training, then the word 'bird' will match the sub-region 'tree' step by step while pretraining the DAMSM. It seems not correct. But the result is so amazing.
I don't konw if i understand it in the right way. I am grateful if you could answer the question. Tanks
Hi, I run the following code to make validation on trained Attn Generator:
python main.py --cfg cfg/eval_bird.yml --gpu 1
but resulted in getting 2928 images instead of 2933 reported in paper "Statistics of datasets" and using dataset.len (2933 as well)
I think Pytorch Dataloader initialization param "drop_last" is the key. However, when I set it to False, the code raised an error saying:
RuntimeError: Expected hidden[0] size (2, 5, 128), got (2, 16, 128)
indicating that the last batch failed and batch_size instead of input.size[0] is used to struct the model, which results in that the number of generated images varies with different batch_size setup.
How can I solve this issue?
Hi,
I want to reproduce the experiment result in the paper. However, the inception score of the pretrained model on birds dataset is 4.17. I compute the inception score using https://github.com/hanzhanggit/StackGAN-inception-model. I have tried both pytorch 0.3 and 0.4, the inception score is still lower than 4.36 (reported in the AttnGAN paper).
Hi,
professor,you work is so exciting. I have read the paper and code,but have some puzzled.first, i do't know the effect of "GLU" function. And second,in DASMA model, what is the meaning of class_ids.( for example: imgs, captions, captions_lens, class_ids, keys = data )
I hope your help,and thank you at same time.
I tried to run the pre-train part, but it shows like this:
File "pretrain_DAMSM.py", line 274, in <module>
dataset.ixtoword, image_dir)
File "pretrain_DAMSM.py", line 59, in train
for step, data in enumerate(dataloader, 0):
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
return self._process_next_batch(batch)
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in default_collate
return [default_collate(samples) for samples in transposed]
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in default_collate
return [default_collate(samples) for samples in transposed]
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 135, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\utils\data\dataloader.py", line 110, in default_collate
storage = batch[0].storage()._new_shared(numel)
File "D:\Work\Aconda\AnacodaPython3.7\envs\py37\lib\site-packages\torch\storage.py", line 114, in _new_shared
return cls._new_using_filename(size)
RuntimeError: Couldn't open shared file mapping: <torch_18044_1286185042>, error code: <1455> at C:\Anaconda2\conda-bld\pytorch_1519496000060\work\torch\lib\TH\THAllocator.c:157
could anyone help me?
Hi
I can't find cpation.pickle for COCO in your repo, I only found caption.pickle for birds. Can you please also update the repo with COCO caption.pickle or if it's too big can you please send it to me directly.
Thanks a lot
Hi, it now generate 256x256 pixels images, could it be possible to produce higher resolution such as 1024x1024? If yes, how could I do it? thanks.
In "Dependencies" part, I think it should be "scikit-image" instead of "skikit-image". :)
I have tried to display the caption and image after loading for the training process, then I found out that loading caption method seems truncated some keywords from the original caption.
For example:
Image name: 148.Green_tailed_Towhee/Green_Tailed_Towhee_0030_797417
The loading caption: this bird has a red and blue crown brown and secondaries and coverts a grey and white chest
The original caption: this bird has a red and blue crown, brown and yellow secondaries, grey and green coverts and a grey and white chest.
As you can see that it missing important keywords like yellow and green. This could destroy the performance of the model. I think that the get_caption method should load the original caption in the dataset.
Hi,
trainer.py has lots of cuda() calls without asking about the CUDA flag set in config file, so evaluation/training on a CPU is not possible.
I fixed the issue by adding "if cfg.CUDA" loops before every cuda() call and it works just fine: https://github.com/KCool/AttnGAN/blob/master/code/trainer.py
Best
KCool
I saved the models for netG and netD, I want to continue to train them. What should I do?
I found in the code
torch.save(netG.state_dict()
how can I transfer dict to netG?
Anyone could help me?
Thank you so much.
Error message:
Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg
Load from: ../data/birds/captions.pickle
1 10
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 247, in
assert dataset
AssertionError
Could you please help me solve this problem? @taoxugit
Hello,
Thanks for making this code publicly available, it's of great use.
I want to know how to choose the best pretrained DAMSM when training the AttnGAN models. Can you please guide me a little on this?
Thank you.
There is some difference between your provided trained DAMSM model and the "config".yml. It seems the provided trained DAMSM model has been trained for 200 epochs. However, in the .yml, you suggest us to train this model with 600 epochs. I am confused.
(base) H:\AttnGAN-master\code>python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'DAMSM',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '../data/coco',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 64,
'GF_DIM': 128,
'R_NUM': 2,
'Z_DIM': 100},
'GPU_ID': 0,
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 15},
'TRAIN': {'BATCH_SIZE': 48,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 600,
'NET_E': '',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 1.0},
'SNAPSHOT_INTERVAL': 5},
'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1},
'WORKERS': 1}
C:\ProgramData\Anaconda3\lib\site-packages\torchvision\transforms\transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
Traceback (most recent call last):
File "pretrain_DAMSM.py", line 243, in
transform=image_transform)
File "H:\AttnGAN-master\code\datasets.py", line 110, in init
self.bbox = self.load_bbox()
File "H:\AttnGAN-master\code\datasets.py", line 126, in load_bbox
header=None).astype(int)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'../data/coco\CUB_200_2011/bounding_boxes.txt' does not exist
I tried to run the codes with the pre-trained model provided, but it showed
No such file or directory: '../data/birds/text/180.Wilson_Warbler/Wilson_Warbler_0007_175618.txt'
where is this txt? do I need to generate it or download it?
The global git ignore file in the /data
directory currently excludes example files such as /data/birds/example_filenames.txt
and /data/birds/example_captions.txt
.
Preserving these files by using git ignore more selectively should help new users test this network more easily.
Hi
Does anyone know how to generate images from CPU? Thanks
Alan
Hi, I want to reproduce the experiment result in the paper. However, the inception score of the pretrained model on coco dataset is 12.32. I compute the inception score using https://github.com/openai/improved-gan/tree/master/inception_score. I use the pretrain DAMSM model you provided. And directly run 'python main.py --cfg cfg/coco_attn2.yml --gpu 0'. Is there any tricks for training? The training process is very slow.
In another issue about training on CPU, I read this fix:
"_trainer.py has lots of cuda() calls without asking about the CUDA flag set in config file, so evaluation/training on a CPU is not possible.
I fixed the issue by adding "if cfg.CUDA" loops before every cuda() call and it works just fine_"
This fix still doesn't enable me to evaluate and train on CPU... I keep on getting this error: AttributeError: module 'torch._C' has no attribute '_cuda_getDevice
Does anybody know how I can fix this please?
I notice in the readme you want references if people use your work for research. What about production? Are we given permission to use AttnGAN to deploy for production?
Where should I place my text into, if I want to generate new images.
I have successfully reproduced your images, but when i put my own images into the example_captions.txt they dont come up.
So where should I put new text ???
Hi,
I'm getting the error on running python main.py --cfg cfg/eval_coco.yml --gpu 1
Traceback (most recent call last):
File "main.py", line 12, in <module>
enable(os.environ["TELEMETRY"])
File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__
raise KeyError(key)
KeyError: 'TELEMETRY'
I want to know about the specific operation of metadata preprocessing. Can you upload the corresponding code?
When I run this code, I meet this error.
Hello,
Thanks for making this code publicly available, it's of great use.
It's running smoothly except when I try to run it on multiple GPUs, then it raises errors like this:
RuntimeError: Expected hidden size (2, 24, 128), got (1L, 48L, 128L)
Can you please guide me a little on this?
Thanks.
hello, in the code 'model.py', you said "# Do not need to initialize RNN parameters, which have been initialized" like below, why do like this? any help
def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
# Do not need to initialize RNN parameters, which have been initialized
# http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM
# self.decoder.weight.data.uniform_(-initrange, initrange)
# self.decoder.bias.data.fill_(0)
While copying the parameter named "encoder.weight", whose dimensions in the model are torch.Size([27552, 300]) and whose dimensions in the checkpoint are torch.Size([27297, 300]).
What is the difference between eval/eval.py
and code/main.py
code for the generation part?
When I run eval/eval.py
the images are of size 64x64. Whereas when I sample from test folder using code/main.py
by setting the cfg.B_VALIDATION
as True
, it generates 256x256 images. Both the scripts use the same cfg/eval_****.yml
.
Why does the eval.py
script generate 64x64 images? I don't see any other hyper paramater in eval.py
code that decides the size of the images.
I add print(im.size)
after line 110 in eval/eval.py
https://github.com/taoxugit/AttnGAN/blob/master/eval/eval.py#L110
I add print(im.size)
after line 423 in code/trainer.py
https://github.com/taoxugit/AttnGAN/blob/master/code/trainer.py#L423
In the /code/model.py, the c_code
is passed to the class NEXT_STAGE_G
as a parameter. However, the c_code
is covered by the function c_code, att = self.att(h_code, word_embs)
so that the parameter here is redundant. I'm not sure whether the c_code
in the definition is necessary or just misused in c_code, att = self.att(h_code, word_embs)
.
Thanks~
def forward(self, h_code, c_code, word_embs, mask): """ h_code1(query): batch x idf x ih x iw (queryL=ihxiw) word_embs(context): batch x cdf x sourceL (sourceL=seq_len) c_code1: batch x idf x queryL att1: batch x sourceL x queryL """ self.att.applyMask(mask) c_code, att = self.att(h_code, word_embs) h_c_code = torch.cat((h_code, c_code), 1) out_code = self.residual(h_c_code) # state size ngf/2 x 2in_size x 2in_size out_code = self.upsample(out_code) return out_code, att
I changed the max epochs in coco.yml
but max iteractions goes upto 600.
Next is i changed the batch_size to fully use my GPU, next is what is optimal number of workers for 64GB RAM to make the training faster ?
Thanks!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.