hanzhanggit / stackgan-pytorch Goto Github PK

License: MIT License

Python 100.00%

stackgan-pytorch's Introduction

StackGAN-pytorch

Pytorch implementation for reproducing COCO results in the paper StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. The network structure is slightly different from the tensorflow implementation.

Dependencies

python 2.7

Pytorch

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

tensorboard
python-dateutil
easydict
pandas
torchfile

Data

Download our preprocessed char-CNN-RNN text embeddings for training coco and evaluating coco, save them to data/coco.

[Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.

Download the coco image data. Extract them to data/coco/.

Training

The steps to train a StackGAN model on the COCO dataset using our preprocessed embeddings.
- Step 1: train Stage-I GAN (e.g., for 120 epochs) python main.py --cfg cfg/coco_s1.yml --gpu 0
- Step 2: train Stage-II GAN (e.g., for another 120 epochs) python main.py --cfg cfg/coco_s2.yml --gpu 1
*.yml files are example configuration files for training/evaluating our models.
If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.

Pretrained Model

StackGAN for coco. Download and save it to models/coco.
Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper

Evaluating

Run python main.py --cfg cfg/coco_eval.yml --gpu 2 to generate samples from captions in COCO validation set.

Examples for COCO:

Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription 😃

Citing StackGAN

If you find StackGAN useful in your research, please consider citing:

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Our follow-up work

References

Generative Adversarial Text-to-Image Synthesis Paper Code
Learning Deep Representations of Fine-grained Visual Descriptions Paper Code

stackgan-pytorch's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala johnson-yue andrewlook hedgefair soumenms2015 tony32769 ml-ai-nlp-ir linhanxiao amoliu kristoffzoghbi donghaoye mowangphy zcrwind negar-rostamzadeh darwinsenior shubhampachori12110095 tax313 hyzcn jwyang itsss chequochuu singularity42 tensoralex soinlovelin zpolina grevutiu-gabriel pandinosaurus fuxiang-chen noone31173 cadenhsin amwons meenuh xianyong techkang afcarl smallflyingpig lxmwust ljm198134 pranaykr b2220333 schreven dtean rollicks fwtan christinaliang mikexuq khushnaseeb xhufdd andylhxu arunkodnani rocksat kaiser1997 piegu proxonly sivaramireddy18 hhhhhhao root20 gitszu adityashtekar monkeydeking chengaopro mathhango liuqinyi gufeicang kaiqiao1992 manvigoel r3a2t10 yangsuhui makai281 dutinghou imyzx2017 taniaincio seongkyun gitpromisehub lifeixianshen qizhongjian raghavvedire amritds habibmrad anthonyftwang yoooo233 zhaoisstrong matrixblake lemon-l7 sivaaaa appa-ayephyu dylan199602 alchematt cuikaiwen18 jennyli-xin kingallexs jervvs hummingbird2012 freedreamer-crypto yazankhatib onlyonewater akshayrb22 stevenli1141 rishabh0203iitr zidaoziyan123

stackgan-pytorch's Issues

Truncated caption feature file?

Hi,

Thank you for sharing the code. We are trying to run evaluation of pre-trained StackGAN model but encountered a problem with loading caption features for validation images.

Specifically, when we try to load the feature from (val_captions.t7) using torchfile, the following error occurs:

*** error: unpack requires a string argument of length 4

We are suspecting that may be the caption feature file is truncated, considering that the size of file is ~13.5 MB. Could you please check whether the file is valid?

How to create new image using my own sentence as input?

I have got the model after train,but i do not know how to create image using own sentence as input,please help me.

access denied

I wanted to download the preprocessed char-CNN-RNN text embeddings for coco and the pretrained StackGAN model, but my access got denied in the google drive. I tried to leave a message to request access, but I didn't get any response. Is there any way to access the embeddings and pretrained model or can someone share these files? I will be very grateful if anyone can help me.

Inconsistent with the paper (Stage-II inputs)

In models.py, it looks like the Stage 2 Generator takes a text-embedding and noise as input. In the paper, it also takes stage 1 Generator output.
In trainer, one can see that there is no difference in the stage1 and stage2 training, as they both take only text embedding input.
From paper- "The Stage-II GAN takes Stage-I results and text descriptions as inputs".
Let me know if I have misinterpreted anything.

Generator loss

I am a newer of GAN and I just wonder the generator loss used in this project, why not the pixel-level bce loss between real images and fake images?

Error: Parameter to MergeFrom() must be instance of same class:

File "", line 1, in
runfile('D:/Projects/GAN-Text2Image/code/main.py', args='--cfg cfg/coco_s1.yml --gpu 0', wdir='D:/Projects/GAN-Text2Image/code')

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/Projects/GAN-Text2Image/code/main.py", line 73, in
algo.train(dataloader, cfg.STAGE)

File "D:\Projects\GAN-Text2Image\code\trainer.py", line 205, in train
self.summary_writer.add_summary(summary_D, count).eval()

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\tensorboard\writer.py", line 94, in add_summary
event = event_pb2.Event(summary=summary)

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\google\protobuf\internal\python_message.py", line 520, in init
_ReraiseTypeErrorWithFieldName(message_descriptor.name, field_name)

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\google\protobuf\internal\python_message.py", line 448, in _ReraiseTypeErrorWithFieldName
six.reraise(type(exc), exc, sys.exc_info()[2])

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\six.py", line 692, in reraise
raise value.with_traceback(tb)

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\google\protobuf\internal\python_message.py", line 518, in init
copy.MergeFrom(new_val,)

File "C:\Users\Anaconda3\envs\tf_gpu\lib\site-packages\google\protobuf\internal\python_message.py", line 1230, in MergeFrom
'expected %s got %s.' % (cls.name, msg.class.name))

TypeError: Parameter to MergeFrom() must be instance of same class: expected Summary got Tensor. for field Event.summary

I don't have the oxford_flower model of stageI_G for pytorch

same as above

Ues 'mu' as "conditional augmentation" and pass it to Discriminator?

In tensorflow version, you use different network to compute "conditional augmentation", but in this pytorch version, you use the mean value computed from Generator as the "conditional augmentation", and pass it to Discriminator: https://github.com/hanzhanggit/StackGAN-Pytorch/blob/master/code/trainer.py#L189. In Discriminator, you concatenate the 'mu' to encoded images directly without computing another "conditional augmentation".

However, in Generator, you compute the c_code, and concatenate it to noise?

Did you do this on purpose? Does this improve the quality of images generated or something else?

How to preprocess char-CNN-RNN?

@hanzhanggit Hello!
Thank you for your contributions on this code.
I'm trying to train this on my own dataset.
I followed reedscot/icml2016, and trained a char-CNN-RNN text encoder.
But it's a .t7 file, not .pickle as your preprocessed char-CNN-RNN text embeddings.
So I'm wondering how to preprocess the char-CNN-RNN text embeddings to a .pickle file?
Thanks again for your contributions.
I'm looking forward to your reply!

How to match the generated images with the caption?

I have finished the training of 2 stages on CUB and I can generate the samples during the test.
During the test, I also use the .pickle file extracted from the char-CNN-RNN text embeddings file of CUB as the embedding. But I failed to match the caption with the generated images. I just use the corresponding description during test in [self.caption]( # self.captions = self.load_all_captions()), but they are not matched.
How to get the correct caption of the generated images?

Much appreciation!

inception score on coco pretrained model is 8.50526228697

@hanzhanggit The readme says

Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper

With the provided pre-trained model, I get 8.50526228697. Is the 10+ score for StackGAN v2?

Images are chaotic

I ran the pre-trained model and the images are pretty incoherent. Did I load something wrong or is this the current state-of-the-art?

OSError: [Errno 22] Invalid argument

@hanzhanggit
@taoxugit
please help me, what is the main problem behind this?

(base) H:\StackGAN\StackGAN-Pytorch-master\code>python main.py --cfg cfg/coco_eval.yml --gpu 0
Using config:
{'CONFIG_NAME': 'stageI',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '../data/coco',
'EMBEDDING_TYPE': 'cnn-rnn',
'GAN': {'CONDITION_DIM': 128, 'DF_DIM': 96, 'GF_DIM': 192, 'R_NUM': 4},
'GPU_ID': '0',
'IMSIZE': 64,
'NET_D': '',
'NET_G': '',
'STAGE': 1,
'STAGE1_G': '',
'TEXT': {'DIMENSION': 1024},
'TRAIN': {'BATCH_SIZE': 128,
'COEFF': {'KL': 2.0},
'DISCRIMINATOR_LR': 0.0002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'LR_DECAY_EPOCH': 20,
'MAX_EPOCH': 120,
'PRETRAINED_EPOCH': 600,
'PRETRAINED_MODEL': '',
'SNAPSHOT_INTERVAL': 10},
'VIS_COUNT': 64,
'WORKERS': 4,
'Z_DIM': 100}
Load filenames from: ../data/coco\train\filenames.pickle (82783)
embeddings: (82783, 5, 1024)
This section is run successfully...
STAGE1_G(
(ca_net): CA_NET(
(fc): Linear(in_features=1024, out_features=256, bias=True)
(relu): ReLU()
)
(fc): Sequential(
(0): Linear(in_features=228, out_features=24576, bias=False)
(1): BatchNorm1d(24576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(upsample1): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(1536, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample2): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(768, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample3): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(upsample4): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(img): Sequential(
(0): Conv2d(96, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): Tanh()
)
)
STAGE1_D(
(encode_img): Sequential(
(0): Conv2d(3, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(1): LeakyReLU(negative_slope=0.2, inplace)
(2): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(3): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): LeakyReLU(negative_slope=0.2, inplace)
(5): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(6): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.2, inplace)
(8): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(9): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(10): LeakyReLU(negative_slope=0.2, inplace)
)
(get_cond_logits): D_GET_LOGITS(
(outlogits): Sequential(
(0): Conv2d(896, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2, inplace)
(3): Conv2d(768, 1, kernel_size=(4, 4), stride=(4, 4))
(4): Sigmoid()
)
)
)
Preparing training data...
Traceback (most recent call last):
File "main.py", line 77, in
algo.train(dataloader, cfg.STAGE)
File "H:\StackGAN\StackGAN-Pytorch-master\code\trainer.py", line 158, in train
for i, data in enumerate(data_loader, 0):
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OSError: [Errno 22] Invalid argument

Is training the model on Bird and Flower dataset straight forward?

Hi Han,

I wanted to train the model on the the Bird and Flower dataset (like in the Tensorflow version). Would it be as straight forward as downloading the datasets and calling main? I'm guessing you haven't tried this yet, any potential pitfalls you see?

ImportError: cannot import name 'FileWriter' from 'tensorboard'

I am getting the following error trace while trying to run the Stage I of training using coco dataset

Looks like tensorboard V 1.10.0 (which is what I have installed in my virtual env does not have a class called FileWriter???). My Pycharm IDE is also complaining about the same thing

Traceback (most recent call last):
File "C:/PyCharmProjects/StackGAN-Pytorch/code/main.py", line 19, in
from trainer import GANTrainer
File "C:\PyCharmProjects\StackGAN-Pytorch\code\trainer.py", line 24, in
from tensorboard import FileWriter
ImportError: cannot import name 'FileWriter' from 'tensorboard' (C:\PythonVEnvs\StackGANPyTorch\lib\site-packages\tensorboard_init_.py)

Process finished with exit code 1

add the project folder to PYTHONPATH

How do I perform this action?

Currently I have typed the following terminal commands:
python
import sys
sys.path.append("path/to/Modules")
print (sys.path)

cuda runtime error

THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error Traceback (most recent call last): File "main.py", line 77, in <module> algo.sample(datapath, cfg.STAGE) File "C:\Users\hunte\OneDrive\Documents\Projects\EAD Project\StackGAN-Pytorch-master\code\trainer.py", line 238, in sample netG, _ = self.load_network_stageII() File "C:\Users\hunte\OneDrive\Documents\Projects\EAD Project\StackGAN-Pytorch-master\code\trainer.py", line 110, in load_network_stageII netG.cuda() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 260, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply module._apply(fn) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply module._apply(fn) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 187, in _apply module._apply(fn) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 193, in _apply param.data = fn(param.data) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 260, in <lambda> return self._apply(lambda t: t.cuda(device)) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 162, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

Why the text condition input of netD is `mu`？

Hi,
in trainer.py line:180, the text condition that input to netD is 'mu',
i double why it's not 'c_code' which is the text condition for the netG.

And 'mu' only has half information of the text embedding, so i'm a little confused about it.

Waitting for your reply.

Best regards!

How to create images for new text?

Hi, I am using the pretrained COCO model, which is shared in this repo. How can I use it over some other sentences as input? Since the mode will not be train, it will use val_captions.t7. However, I am not clear on how to convert a text file to t7 file. Could you please elaborate on this?
Thanks

GPU out of memory during evaluation.

Hi Han,

I'm getting a "cuda runtime error (2) : out of memory" error when I try to evaluate the model using the pretrained weights. What is the hardware requirement to run this code? I have an Nvidia gtx 1080.

Console:

$ python main.py --cfg cfg/coco_eval.yml --gpu 0

Using config:
{'CONFIG_NAME': 'stageII',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '../data/coco',
 'EMBEDDING_TYPE': 'cnn-rnn',
 'GAN': {'CONDITION_DIM': 128, 'DF_DIM': 96, 'GF_DIM': 192, 'R_NUM': 2},
 'GPU_ID': '0',
 'IMSIZE': 256,
 'NET_D': '',
 'NET_G': '../models/coco/netG_epoch_90.pth',
 'STAGE': 2,
 'STAGE1_G': '',
 'TEXT': {'DIMENSION': 1024},
 'TRAIN': {'BATCH_SIZE': 40,
           'COEFF': {'KL': 2.0},
           'DISCRIMINATOR_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'LR_DECAY_EPOCH': 600,
           'MAX_EPOCH': 600,
           'PRETRAINED_EPOCH': 600,
           'PRETRAINED_MODEL': '',
           'SNAPSHOT_INTERVAL': 50},
 'VIS_COUNT': 64,
 'WORKERS': 4,
 'Z_DIM': 100}
STAGE2_G (
  (STAGE1_G): STAGE1_G (
    (ca_net): CA_NET (
      (fc): Linear (1024 -> 256)
      (relu): ReLU ()
    )
    (fc): Sequential (
      (0): Linear (228 -> 24576)
      (1): BatchNorm1d(24576, eps=1e-05, momentum=0.1, affine=True)
      (2): ReLU (inplace)
    )
    (upsample1): Sequential (
      (0): Upsample(scale_factor=2, mode=nearest)
      (1): Conv2d(1536, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (2): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
      (3): ReLU (inplace)
    )
    (upsample2): Sequential (
      (0): Upsample(scale_factor=2, mode=nearest)
      (1): Conv2d(768, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True)
      (3): ReLU (inplace)
    )
    (upsample3): Sequential (
      (0): Upsample(scale_factor=2, mode=nearest)
      (1): Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True)
      (3): ReLU (inplace)
    )
    (upsample4): Sequential (
      (0): Upsample(scale_factor=2, mode=nearest)
      (1): Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True)
      (3): ReLU (inplace)
    )
    (img): Sequential (
      (0): Conv2d(96, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): Tanh ()
    )
  )
  (ca_net): CA_NET (
    (fc): Linear (1024 -> 256)
    (relu): ReLU ()
  )
  (encoder): Sequential (
    (0): Conv2d(3, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU (inplace)
    (2): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True)
    (4): ReLU (inplace)
    (5): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
    (7): ReLU (inplace)
  )
  (hr_joint): Sequential (
    (0): Conv2d(896, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU (inplace)
  )
  (residual): Sequential (
    (0): ResBlock (
      (block): Sequential (
        (0): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
      )
      (relu): ReLU (inplace)
    )
    (1): ResBlock (
      (block): Sequential (
        (0): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
      )
      (relu): ReLU (inplace)
    )
  )
  (upsample1): Sequential (
    (0): Upsample(scale_factor=2, mode=nearest)
    (1): Conv2d(768, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True)
    (3): ReLU (inplace)
  )
  (upsample2): Sequential (
    (0): Upsample(scale_factor=2, mode=nearest)
    (1): Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True)
    (3): ReLU (inplace)
  )
  (upsample3): Sequential (
    (0): Upsample(scale_factor=2, mode=nearest)
    (1): Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True)
    (3): ReLU (inplace)
  )
  (upsample4): Sequential (
    (0): Upsample(scale_factor=2, mode=nearest)
    (1): Conv2d(96, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (2): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True)
    (3): ReLU (inplace)
  )
  (img): Sequential (
    (0): Conv2d(48, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): Tanh ()
  )
)
Load from:  ../models/coco/netG_epoch_90.pth
STAGE2_D (
  (encode_img): Sequential (
    (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU (0.2, inplace)
    (2): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True)
    (4): LeakyReLU (0.2, inplace)
    (5): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True)
    (7): LeakyReLU (0.2, inplace)
    (8): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.2, inplace)
    (11): Conv2d(768, 1536, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (12): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True)
    (13): LeakyReLU (0.2, inplace)
    (14): Conv2d(1536, 3072, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (15): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True)
    (16): LeakyReLU (0.2, inplace)
    (17): Conv2d(3072, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (18): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True)
    (19): LeakyReLU (0.2, inplace)
    (20): Conv2d(1536, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (21): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
    (22): LeakyReLU (0.2, inplace)
  )
  (get_cond_logits): D_GET_LOGITS (
    (outlogits): Sequential (
      (0): Conv2d(896, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True)
      (2): LeakyReLU (0.2, inplace)
      (3): Conv2d(768, 1, kernel_size=(4, 4), stride=(4, 4))
      (4): Sigmoid ()
    )
  )
  (get_uncond_logits): D_GET_LOGITS (
    (outlogits): Sequential (
      (0): Conv2d(768, 1, kernel_size=(4, 4), stride=(4, 4))
      (1): Sigmoid ()
    )
  )
)
Successfully load sentences from:  ../data/coco/test/val_captions.t7
Total number of sentences: 40470
num_embeddings: 40470 (40470, 1024)
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "main.py", line 77, in <module>
    algo.sample(datapath, cfg.STAGE)
  File "/home/shenkev/Downloads/StackGAN-Pytorch/code/trainer.py", line 278, in sample
    nn.parallel.data_parallel(netG, inputs, self.gpus)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 102, in data_parallel
    return module(*inputs[0], **module_kwargs[0])
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/shenkev/Downloads/StackGAN-Pytorch/code/model.py", line 257, in forward
    h_code = self.upsample4(h_code)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/upsampling.py", line 80, in forward
    return F.upsample(input, self.size, self.scale_factor, self.mode)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 911, in upsample
    return _functions.thnn.UpsamplingNearest2d(_pair(size), scale_factor)(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/thnn/upsampling.py", line 52, in forward
    self.scale_factor
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66

Steps to train on a new dataset

Thanks for the great work! Could you please provide the steps for using this model on a new dataset? The dataset has multiple captions per image. It would be really helpful if these steps could be elaborated upon.
Thanks

Work on Windows 10?

Hi, I'm working on Windows 10, I get this issue:

DATAPATH:  ../data/coco/test/val_captions.t7
Traceback (most recent call last):
  File "main.py", line 77, in <module>
    algo.sample(datapath, cfg.STAGE)
  File "D:\documenti\Monica\StackGAN-Pytorch\code\trainer.py", line 243, in sample
    t_file = torchfile.load(datapath)
  File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 424, in load
    return reader.read_obj()
  File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 386, in read_obj
    v = self.read_obj()
  File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 386, in read_obj
    v = self.read_obj()
  File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 414, in read_obj
    "unknown object type / typeidx: {}".format(typeidx))
torchfile.T7ReaderException: unknown object type / typeidx: -1112529805

Can anyone help me? Does StackGAN-Pytorch work on Windows?

some loss keep increase

What's the property parameters for training CUB-birds(200-2011) datasets?

I had tried several param_sets, but the G loss always increases and oscillates in 2.0 after 50 epochs, and the KL loss is always increase..

How to improve the trend of loss to get better images? Any suggestions? Thanks!

Link to preprocessed char-CNN-RNN text embeddings for birds is down

ImportError: cannot import name FileWriter

Traceback (most recent call last):
File "main.py", line 22, in
from trainer import GANTrainer
File "/media/server009/seagate/liuhan/text2img/StackGAN-Pytorch-master/code/trainer.py", line 24, in
from tensorboard import FileWriter
ImportError: cannot import name FileWriter

@hanzhanggit
Did I do something wrong?
Why cannot import FileWriter?
Thank you very much!

Output images aligned?

Hi,

I notice that the output saves fake and real images. Are they supposed to be aligned? Or only the last image is aligned?

OBS: I only ran the "stage 1".