raywzy / ict Goto Github PK
View Code? Open in Web Editor NEWHigh-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)
High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)
Hi, @raywzy
I am trying to train the model on ImageNet with the following setting:
--data_path /ILSVRC2012/train --validation_path /ILSVRC2012/val --mask_path /PConv-Keras/irregular_mask/train_mask --batch_size 32 --train_epoch 100 --nodes 1 --gpus 8 --node_rank 0 --n_layer 35 --n_embd 1024 --n_head 8 --GELU_2 --image_size 32 --use_ImageFolder
But I am getting Nan for train and test loss (screenshot attached).
This is happening for smaller datasets like Paris streetview (#train_images: 14900, #test_images 100) as well (screenshot attached).
Any suggestions on how to fix this issue?
Has anyone encountered this error,torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL. I am wondering a solution.
I really appreciate your work, but I only have one GPU, can I train and test?
Hello,
Congrats for your amazing works. Could you explain the dataset specification for training on own dataset ?
Could you provide a little example of the dataset with few examples ?
And finally, what is the structure folder for the dataset ?
Have a good day,
Hi, thanks for your nice works.
There are some details that bothered me. I would appreciate it if you could give me some advice.
--BERT
option was be used in all Transformer training and inference processes. Which situation we do not need to select this option?CausalSelfAttention
, I found model will capture information at all position, and attention filling mask does not be used except input occlusion. How can it guarantee to just pay attention on unmasked information?Thanks again.
Hello, wonderful work! I have a question. How does your method ensure that the results of completion are different?
How much time does inference need for one image ?
Thanks
Hi, thanks for your work on this.
I'm trying to download pretrained models, but it looks like I can't download from Baidu Driver unless I have a Baidu account (which requires a Chinese phone number.)
Would it be possible to upload the model to non-Baidu source as well?
Hello,
First of all, super cool model, and thanks for being so helpful with past questions. I was just wondering specifically how you generated pixel-wise completion probability maps as in Fig. 9 of your paper (I get how it's done in theory, I just wanted to see code if possible).
Thanks!
Hello, I finished two stages of training, and ready to test, but the test path is correct, but can not load the test set inside the images.Can you help me? Thank you!
Could you release the pretrained models? The given link is empty.
can you tell me the training data format?
as i understand
thanks in advance
Hi, @raywzy
I tried to use both FFHQ and Places2 pretrained weights of upsampler.
However, the Upsampler's pretrained weight of Places does not have enough quality.
We know that the weight of the generator is trained during 322000 iterations.
Do you think that the attached results are correct?
From left to right,
1st stage output | blended result with masked input image | raw output of upsampler | GT image | masked input image | raw output of upsampler within given mask
Hello, thank you for the great work. I tried to download the pretrained model but I get this error:
--2021-11-23 18:50:20-- https://www.dropbox.com/s/cqjgcj0serkbdxd/ckpts_ICT.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/dl/cqjgcj0serkbdxd/ckpts_ICT.zip [following]
--2021-11-23 18:50:20-- https://www.dropbox.com/s/dl/cqjgcj0serkbdxd/ckpts_ICT.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 404 Not Found
2021-11-23 18:50:20 ERROR 404: Not Found.
The link doesn't seem to work.
There is a parameter--"validation_path" described as "validation_image_path" in training Transformers,but I couldn't find a parameter like "validation_mask_path" in "./Transformer/main.py".So does it mean that the validation set doesn't need its own mask set or it use the same mask set of training set or something else?Sorry for my poor English.
Hi, thank you for sharing code.
I want to run training code on grayscale images, but i got following error.
# Mask is 12022, # Image is 12022
# Mask is 12022, # Image is 0
Warnning: There is no trained model found. An initialized model will be used.
Warnning: There is no previous optimizer found. An initialized optimizer will be used.
Resume from Epoch 0
Traceback (most recent call last):
File "main.py", line 139, in <module>
mp.spawn(main_worker, nprocs=opts.gpus, args=(opts,))
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/naoki/ICT/Transformer/main.py", line 73, in main_worker
trainer.train(loaded_ckpt)
File "/home/naoki/ICT/Transformer/DDP_trainer.py", line 203, in train
run_epoch('train')
File "/home/naoki/ICT/Transformer/DDP_trainer.py", line 139, in run_epoch
logits, loss = model(x, y)
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/naoki/ICT/Transformer/models/model.py", line 254, in forward
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward
I checked the size and data type before cross entropy loss.
'''
logits: torch.Size([12, 1024, 512]), torch.float32
targets: torch.Size([12, 1024]), torch.float32
'''
Could you give me how to solve this problem.
Thank you in advance.
Dear reasercher, please also consider checking our newly introduced face inpainting method to address the symmetry problems of general inpainting mehthods by using swin transformer and semantic aware discriminators.
Our proposed method showed better results in terms of fid score and newly proposed metric which focus on the face symmetry compared to some of the state-of-the-art methods including lama.
Our paper is availabe at:
https://www.researchgate.net/publication/366984165_SFI-Swin_Symmetric_Face_Inpainting_with_Swin_Transformer_by_Distinctly_Learning_Face_Components_Distributions
The code also will be published in:
https://github.com/mohammadrezanaderi4/SFI-Swin
Hi,
I am confused by the sample_mask function in transformer/utils/util.py, it seems that it does not use the argument num_sample but keeps num_sample=1, is it normal ?
Moreover, you use top_k=40 but the paper uses top_k=50. What is the best choice ?
Thanks,
hi, I wonder to know the mask ratio used by the pretrained model provided for download. It is not mentioned in the downloaded files. ths!
I see in the article that you use bi-directional attention in the Transformer phase, but I don't see you using bi-directional attention in /Transformer/model.py. Instead of using CausalSelfAttention, do you use bi-directional attention elsewhere?
Congratulations on the great paper!! I was wondering if you could give a date as to when the model and the code will be released
where is the mask path
Thank you for proposing this good idea of using Transformer as a priori information!
But why not use an end-to-end network for training, is it because the effect is not good?
I'm been playing around trying to create a minimal Google Colab demo for this repository, but am running into some error messages that you can see here:
As you can see, the ImageNet inference command (same as in README) throws an error like this:
RuntimeError: Error(s) in loading state_dict for InpaintGenerator_5:
Missing key(s) in state_dict: "encoder.1.weight", "encoder.1.bias", "encoder.3.weight", ...............
Unexpected key(s) in state_dict: "module.encoder.1.weight", "module.encoder.1.bias", ...............
I haven't dug into the other (FFHQ) error yet - it says something about transparency, which may be a mistake on my part with preparing the input image and mask (image, mask).
Thanks for your work on this repo and publicly releasing the code and pre-trained models! Can't wait to try it out.
hi, thanks for sharing your code.
could you tell me how can i get mask images of dataset?
i'd like to use FFHQ.
thank you.
Thanks for the nice code! I tried to enable "--random_stroke" so the masks will be generated on the fly. But it seems that the "--mask_path" still needs input, otherwise it will point to the default path which is not on my PC.
I tried several time to download the ckpts but could not make it due to server problem. Anyone encoutered this?
Trying to evaluate your codes:
WSL2 under Windows 10
Nvidia RTX3090
Upon start provided script for inference (regardless of using ckpts) , from the very beginning got a message:
### Something Wrong ###
0%|
After that calculation continue.
If "--FFHQ" or "--Places2_Nature" specified - inference finished with no error.
However if " ImageNet" specified - inference finished with an error:
raise AssertionError("Invalid device id")
AssertionError: Invalid device id
NVCC report:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
Tried to change line 44 in "run.py" from
CUDA_VISIBLE_DEVICES=0,1
to
CUDA_VISIBLE_DEVICES=0
-- no luck.
Found the reason for message:
### Something Wrong ###
0%|
That happened due to the different file image/mask names (they should be identical).
However the rest of the issue still exists.
Beside of that after finishing inference (regardless specified ckpts) "output" sub folder in the folder "Guided_Upsample" got created but it is empty. "output" sub folder in the folder "Transformer" got created and consists generated image with 32 or 48 pixel.
Hi, your supplemental material is empty link. Can you provide it?
I have 8 V100 gpus and run the training code without error with image size 32 and batch size 64 but cuda out of memory occurs when I set image size to 48. Anyone met this?
Hello, I am training on a 3090 graphics card, obviously you can train with a single card, I have adjusted the num_work and batch_size down, or will report such an error
Hello, what do you think is the minimum recommended GPU specs (memory etc) for good performance, both for training on a new dataset and for testing the pluralistic completion? Thanks!
Hi,
Thanks for sharing great work! I have a question about kmeans_centers.npy
.
According to your paper, you use clustering data generated from ImageNet to reduce the computational cost.
To further reduce the dimension and faithfully re-represent the low-resolution image,
an extra visual vocabulary with spatial size 512 × 3 is generated using KMeans cluster centers of the whole ImageNet [8] RGB pixel spaces.
Will your clustering data work well for other domains (like faces, paintings or maps)?
Hello,
Congrats for your nice works.
I use a 16G GPU, but a single card can only run batch_size 3. I turned on mixed precision training, and the other settings are --n_layer 35 --n_embd 512 --n_head 8, which is the same as your model trained on Places2.
So I want to know how do you use 8 GPUs and set the batch_size to 64 to train the transformer model?
您好,代码是否支持在Windows系统上调试?如果要使用自己的数据集训练该如何修改配置文件?
(ICT) E:\ZSL\ICT-main\Transformer>python main.py --name ICT --ckpt_path ./ckpt --data_path E:\ZSL\ICT-main\data\train_256_png --validation_path E:\ZSL\ICT-main\data\test_256
_png --mask_path E:\ZSL\ICT-main\data\mask --BERT --batch_size 1 --train_epoch 100 --nodes 1 --gpus 1 --node_rank 0 --n_layer 12 --n_embd 512 --n_head 8 --GELU_2 --image_s
ize 256 --AMP
Traceback (most recent call last):
File "main.py", line 139, in
mp.spawn(main_worker, nprocs=opts.gpus, args=(opts,))
File "D:\anaconda\envs\ICT\lib\site-packages\torch\multiprocessing\spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "D:\anaconda\envs\ICT\lib\site-packages\torch\multiprocessing\spawn.py", line 188, in start_processes
while not context.join():
File "D:\anaconda\envs\ICT\lib\site-packages\torch\multiprocessing\spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "D:\anaconda\envs\ICT\lib\site-packages\torch\multiprocessing\spawn.py", line 59, in _wrap
fn(i, args)
File "E:\ZSL\ICT-main\Transformer\main.py", line 57, in main_worker
IGPT_model=GPT(model_config)
File "E:\ZSL\ICT-main\Transformer\models\model.py", line 142, in init
self.blocks = nn.Sequential([Block_2(config) for _ in range(config.n_layer)])
File "E:\ZSL\ICT-main\Transformer\models\model.py", line 142, in
self.blocks = nn.Sequential(*[Block_2(config) for _ in range(config.n_layer)])
File "E:\ZSL\ICT-main\Transformer\models\model.py", line 88, in init
self.attn = CausalSelfAttention(config)
File "E:\ZSL\ICT-main\Transformer\models\model.py", line 44, in init
self.register_buffer("mask", torch.tril(torch.ones(config.block_size, config.block_size))
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:75] data. DefaultCPUAllocator: not enough memory: you tried to allocate 17179869184 bytes. Buy new RAM!
这个问题是什么导致的
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.