fenglinglwb / mat Goto Github PK

View Code? Open in Web Editor NEW

741.0 741.0 84.0 18.89 MB

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

License: Other

Python 91.51% C++ 2.61% Cuda 5.88%

mat's People

Contributors

Stargazers

Watchers

Forkers

peterzhousz redrock303 cv-ip time888 xijunjun abhinavisgood lyl-7 momalave vccheng2001 hykai209 xinxinxing zhumingxu ayankumarbhunia sam-motamed kushagr2402 xwalways no360201 chendl02 fathshalaby daeun9 adam-duan farukcankaya zborger voroninvisuals cin-inpaiting kishansidh 786kishan saurabh1409 dqgdqg fxjy15550 dl-vit buiduchanh viengiaan chencheng-976 pathbreak chrischen1023 mikeyu925 yyang181 anushka17agarwal shreyazomato wurenzhe163 geniusfoever cxz yaojingchi raikuma eternal-f1ame pegx erichiker2008 tipcolin dancharters onebituniverse akashad98 xijieupenn bencoster daodaoliang 5l1v3r1 xiankgx edvardhua delldu aurimasbalciunas zaiqiangwu thanhvotran yenle73 yimikai hyeongyeon-lee guevarabaozhuan jiminjung2334 matthias-fauconneau aharonazulay pablodawson kingstarcraft michaelcshn sunpihai-up heriun raunakmanekar dqj5182 jafitz26 qunilcs vismiroglou amirdkb studylonglong danielanojan namirahrasul ahuirecome

mat's Issues

Questions about error and mask path

Thank you for sharing great works!!

I am interested in your works, and I have two questions.

I try to train the networks on custom dataset, but I got the following index error. Do you have the solutions?

Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /home/naoki/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100%|##########| 548M/548M [00:26<00:00, 22.0MB/s]
Exporting sample images...
Initializing logs...
Skipping tfevents export: No module named 'tensorboard'
Training for 50000 kimg...

tick 0     kimg 0.0      time 1m 47s       sec/tick 14.1    sec/kimg 1767.80 maintenance 92.5   cpumem 5.40   gpumem 37.61  augment 0.000
Evaluating metrics...
Traceback (most recent call last):
  File "train.py", line 648, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "train.py", line 641, in main
    subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
  File "train.py", line 471, in subprocess_fn
    training_loop.training_loop(rank=rank, **args)
  File "/home/naoki/MAT/training/training_loop.py", line 418, in training_loop
    dataset_kwargs=val_set_kwargs, num_gpus=num_gpus, rank=rank, device=device)
  File "/home/naoki/MAT/metrics/metric_main.py", line 47, in calc_metric
    results = _metric_dict[metric](opts)
  File "/home/naoki/MAT/metrics/metric_main.py", line 93, in fid36k5_full
    fid = frechet_inception_distance.compute_fid(opts, max_real=36500, num_gen=36500)
  File "/home/naoki/MAT/metrics/frechet_inception_distance.py", line 31, in compute_fid
    rel_lo=0, rel_hi=1, capture_mean_cov=True, max_items=num_gen).get_mean_cov()
  File "/home/naoki/MAT/metrics/metric_utils.py", line 273, in compute_feature_stats_for_generator
    **data_loader_kwargs):
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/naoki/.pyenv/versions/3.7.6/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/naoki/MAT/datasets/dataset_512.py", line 265, in __getitem__
    image = self._load_raw_image(self._raw_idx[idx])
IndexError: index 2032 is out of bounds for axis 0 with size 2032

How can I specify the mask path for training?

Issues with the network

Thanks for sharing your great work!

I'm researching image inpainting solutions and find MAT the best i've seen so far, though there are some issues I'd like to point:

Network only supports square 512px input image - while other solutions like Lama support any image aspect ratio
Face model doesn't give any output only Places model works.

error: assert (name in src_tensors) or (not require_all)

Traceback (most recent call last):
File "generate_image.py", line 155, in
generate_images() # pylint: disable=no-value-for-parameter
File "/root/anaconda3/envs/yyy/lib/python3.7/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/root/anaconda3/envs/yyy/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/root/anaconda3/envs/yyy/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/anaconda3/envs/yyy/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/root/anaconda3/envs/yyy/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "generate_image.py", line 102, in generate_images
copy_params_and_buffers(G_saved, G, require_all=True)
File "generate_image.py", line 46, in copy_params_and_buffers
assert (name in src_tensors) or (not require_all)
AssertionError

i had fine-tuned the model in my own dataset and i want to test it, but it had the error ,Do you have any solutions?

ImportError: No module named 'upfirdn2d_plugin'

Hello,
When I run the code, I get the following warnings. I have followed all the steps according to the ReadMe file. Installing VS C++, windows SDK, and other techniques provided on GitHub were not helpful.
I still get a result but the result is not good at all. I attach the input, mask, and the output.

Loading data from: C:\Users\User 2\Downloads\mat input
Loading mask from: C:\Users\User 2\Downloads\mat mask
Loading networks from: pretrained/Places_512_FullData.pkl
Prcessing: indoor1.png
Setting up PyTorch plugin "bias_act_plugin"... Failed!
C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.py:50: UserWarning: Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:

Traceback (most recent call last):
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Python\Python38\lib\subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.py", line 48, in _init
    _plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math'])
  File "C:\Users\User 2\PycharmProjects\MAT\torch_utils\custom_ops.py", line 110, in get_plugin
    torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs)
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 986, in load
    return _jit_compile(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1193, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1297, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'bias_act_plugin': [1/3] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --use_fast_math -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cu" -o bias_act.cuda.o 
FAILED: bias_act.cuda.o 
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --use_fast_math -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cu" -o bias_act.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[2/3] cl /showIncludes -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cpp" /Fobias_act.o 
Microsoft (R) C/C++ Optimizing Compiler Version 19.33.31630 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

C:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\pybind11\detail/common.h(106): warning C4005: 'HAVE_SNPRINTF': macro redefinition
C:\Users\Python\Python38\Include\pyerrors.h(315): note: see previous definition of 'HAVE_SNPRINTF'
ninja: build stopped: subcommand failed.


  warnings.warn('Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc())

I'm usaing :

Python 3.7
Pytorch 1.7.1
Cuda 11.0
Windows 10

I wonder what is the role of upfirdn2d_plugin and how it affects the predictions.

finetuning cfg_specs

Hi @fenglinglwb
Great work!
I have a dataset with about 500000 images and 1 GPU.
I plan to train a 128x128 resolution first if good scale it up. Can I copy celeba512 directly and change ref_gpus to 1 or could you suggest the parameters for cfg_specs in train.py?
'celeba512': dict(ref_gpus=1, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8),
Thanks

Pretrained Model on Places2

Hi! Very impressed on your Great work. Currently, I am doing research on Inpainting models on Places2 dataset. I find there is only pretrained model on Places dataset. May I know if there is a pretrained model on Places2?

Why do use the same checkpoint to eval on the PID and UID twice or more, and the results are different?

Congratulations on CVPR2022 Best Paper Finalists!!

Why do use the same checkpoint to eval on the PID and UID twice or more, and the results are different?

Train on custom dataset

Thanks for the nice code! I was wondering whether I can train the model with my own dataset (It is a small one so I hope to train from the pretrained model on Places2).

About the mirror

Thank you for your great work, but I still have a question, when training, what should I do if I don't want to use mirror images? I've set it to false.

FFHQ weights

Thank you for the great work.

Do you happen to have the weights for FFHQ that you could share?

How long for the training time?

Hi @fenglinglwb.
Thank you so much for your awesome work.
Could you tell me about the training time for CelebA-HQ 512 and Place2 512 datasets?
Another question is why finally you set 1.8 M for training the place2 dataset, instead of others, e.g. 2.M 1.M 3.5M, etc?

Thanks!

What is the environment on the A100？

You mentioned that you train MAT on A100. #23

I want to know what is the environment on the A100？Are they the same as the environment in the readme such as Python 3.7,PyTorch 1.7.1,Cuda 11.0?

Thank you

import mask

Is it possible to import mask from files?
I have 64 png files (256x256 pix)

input with other resolution

could model gei input with other resolution?
请问这个网络的输入可以使用自定义的分辨率吗？

Loss functions comprehension used in work

Hi!
Thank you for the great work!

Currently, I am going to finetune your Places model on my own dataset and try to understand the loss functions that use in your solution.

Looking at the tensorboard dashboard, I saw a lot of charts that can be confusing for me.
Could you explain loss functions which we can see in tensorboard dashboard (specifically Loss/D/loss, Loss/D/loss_s1, Loss/D/reg, Loss/D/reg_s1, Loss/G/l1_loss, Loss/G/loss, Loss/G/loss_s1, Loss/G/pcp_loss, Loss/r1_penalty, Loss/r1_penaly_s1, Loss/scores/fake, Loss/scores/fake_s1, Loss/scores/real, Loss/scores/real_s1, Loss/signs/fake, Loss/signs/fake_s1, Loss/signs/real, Loss/signs/real_s1) in simple words.

As I figure out, after some training progress, the process should be convergence. However, so far, I cannot understand what are we minimizing and what are we maximizing.

What simple interpretation of these loss functions?
What are the key points to pay attention to?
If say simply, which loss should fall and which should rise?
Particularly, how should loss/scores/fake and loss/real/fake behave themselves?

I will be grateful for your answer. Thanks in advance!

How to devide the celeba dataset into trainset and valset?

I want to know how you devide dataset

Train and val splits of FFHQ

Hi, @fenglinglwb
Thanks for your wonderful work! I am using the provided FFHQ-512 model to get some metrics. But I do not know the train and val splits you used. Could you please provide the train or val file list you used? Thanks.

what is mask_ratio used for in the mat.py file

Hello, I'm not quite understand that what is mask_ratio used for in the mat.py file?

CUDA issue

Hi, I tried to run my dataset based on the pretrained model. However, I got this error :

Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

when running : python generate_image.py --network pretrained/CelebA-HQ.pkl --dpath test_sets/CelebA-HQ/images --mpath test_sets/CelebA-HQ/masks --outdir samples

What should I do?

You can try MAT on your own machine by Lama Cleaner

Lama Cleaner is a free, open-source, fully self-hosted inpainting tool powered by SOTA AI model, currently, it support follow models:

upfirdn2d

Excuse me

I want to ask that what is upfirdn2d and the upfirdn2d seems to be relative to C++. Can I use python to build a upfirdn2d?

About the mirror

Hello, your work is great ! How do I remove the mirror during training, even though I set it to FALSE.

GFLOPs of the network

I am trying to calculate the GFLOPs of your network and unable to do so....

Test set for comparison

Hello, you reported results on CelebA-HQ at 256 × 256 size in Table F.3. what is your test set and how we can access it for comparison?
How did you use Places (512 × 512) to train and test the model?
@fenglinglwb

Where is Style Manipulation Module?

Excuse me, I read your code for some hours, but I can't find where Style Manipulation Module is in the code.

Can you tell me where it is? Thank you.

Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!

I run the test code on linux and windows, there are the same problem?
Traceback (most recent call last):
File "/data1/mingqi/MAT-main/torch_utils/ops/upfirdn2d.py", line 32, in _init
_plugin = custom_ops.get_plugin('upfirdn2d_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math'])
File "/data1/mingqi/MAT-main/torch_utils/custom_ops.py", line 110, in get_plugin
torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs)
File "/home/mingqi/.conda/envs/Mat/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 997, in load
keep_intermediates=keep_intermediates)
File "/home/mingqi/.conda/envs/Mat/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1213, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/mingqi/.conda/envs/Mat/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1560, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/home/mingqi/.conda/envs/Mat/lib/python3.7/imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'upfirdn2d_plugin'

warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc())

能否写个colab运行demo?

本地环境一堆报错，放弃了。希望可以简单出个最小demo

ImportError: DLL load failed while importing upfirdn2d_plugin: 找不到指定的模块。
warnings.warn('Failed to build CUDApython generate_image.py --network pretrained/CelebA-HQ_512.pkl --dpath test_sets/CelebA-HQ/images --mpath test_sets/CelebA-HQ/masks --outdir

关于metrics中协方差矩阵疑惑

您好，感谢您的工作，它对我帮助很大。
我在阅读FID的计算方式时候，发现您的计算过程是sum(x64.T@x64) / nums - np.outer(mean, mean)，这和在网上查到的协方差计算过程不一致，您的实现方式是否是另一种方法？
期待您的回答。

regarding mask updation strategy

hi,

can you tell me role of these three lines in mask updation?

Training from scratch

Hi, @fenglinglwb!
Congratulations on CVPR2022 Best Paper Finalists!!

I want to train a MAT from scratch with CelebAHQ 128x128. But, training cannot correctly be performed with lr=0.001.
Inputs images are normalized to [-1, 1] and 0 value in mask indicates a missing pixel.
I confirmed the input worked for your pretrained weights.

Did you see this problem?

Input images

Generated images in my training process.

mask_ratio

I want to know the mask_ratio of the masks you provided for evaluation.
Do they have a percentage range?

the small_masks have a lot of large_ratio masks, the same as the larges_masks

A partial display of large_masks you proxided

A partial display of lsmall_masks you provided

and i tried to generate masks by the mask_generator_512.py and mask_generator_512_small .py , I had the similar results

It's confusing to me. So i wonder do the large masks and small masks have a certain range?
Sorry for my poor english ,looking for your reply

About evaluation

Hi, @fenglinglwb!

Thank you for your sharing your nice code. Congratulations on CVPR22!

I wonder how to quantitively evaluate generated images.
Which images, which are generated images or blended images, did you use?

I succeed to generate samples, but, generated images include some artifacts caused by input mask pixels.

Left: a generated image. Right: blended images using input and generated images.

assert L == H * W, "input feature has wrong size"

I trace pre-trained model but get this error:

 assert L == H * W, "input feature has wrong size"
C:\Users\seoer\Downloads\MAT-main\networks\mat.py:64: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 File "C:\Users\seoer\Downloads\MAT-main\torch_utils\ops\conv2d_resample.py", line 89, in conv2d_resample
    assert isinstance(groups, int) and (groups >= 1)
AssertionError

. here is my code:

   ```
   batch = 1
    res = 512
    image1 = torch.randn(batch, 3, res, res).to(device)
    mask1 = torch.randn(batch, 1, res, res).to(device)
    G.eval()
    batch_gen = 4
    z1 = torch.zeros([1, 512], device=device)
    c1 = torch.zeros([1, 0], device=device)
    traced_model = torch.jit.trace(G, (image1, mask1, z1, c1), check_trace=False, strict=False).to(device)

Please help!

train.py tries to allocate too much RAM when running with multiple GPUs

I am now working on training the MAT network from scratch with a full Places dataset (512*512 resolution) and I'm now facing this error.

Above error was produced by followinig command.
python train.py --outdir=/home/work/outputs/MAT/baseline_places_full --gpus=2 --batch=8 --metrics=fid36k5_full --data=/home/work/Dataset/places/full/data_large --data_val=/home/work/Dataset/places/full/val_large --dataloader=datasets.dataset_512.ImageFolderMaskDataset --mirror=True --cond=False --cfg=places512 --aug=noaug --generator=networks.mat.Generator --discriminator=networks.mat.Discriminator --loss=losses.loss.TwoStageLoss --pr=0.1 --pl=False --truncation=0.5 --style_mix=0.5 --ema=10 --lr=0.001
However, this error does not occur when I only use one GPU with the same number of batch for each GPU.

Above result was produced by the following command.
python train.py --outdir=/home/work/outputs/MAT/baseline_places_full --gpus=1 --batch=4 --metrics=fid36k5_full --data=/home/work/Dataset/places/full/data_large --data_val=/home/work/Dataset/places/full/val_large --dataloader=datasets.dataset_512.ImageFolderMaskDataset --mirror=True --cond=False --cfg=places512 --aug=noaug --generator=networks.mat.Generator --discriminator=networks.mat.Discriminator --loss=losses.loss.TwoStageLoss --pr=0.1 --pl=False --truncation=0.5 --style_mix=0.5 --ema=10 --lr=0.001
I am now using 8 GPUs with 40GB of RAM for each.

check_ddp_consistency error

Traceback (most recent call last): File "/home/dmsheng/anaconda3/envs/lama/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/dmsheng/demo/image_inpainting/MAT/my_train.py", line 405, in subprocess_fn my_training_loop.training_loop(rank=rank, **args) File "/home/dmsheng/demo/image_inpainting/MAT/training/my_training_loop.py", line 404, in training_loop misc.check_ddp_consistency(module, ignore_regex=[r'.*\.w_avg', r'.*\.relative_position_index', r'.*\.avg_weight', r'.*\.attn_mask', r'.*\.resample_filter']) File "/home/dmsheng/demo/image_inpainting/MAT/torch_utils/misc.py", line 195, in check_ddp_consistency assert (nan_to_num(tensor) == nan_to_num(other)).all(), fullname AssertionError: Generator.synthesis.first_stage.conv_first.conv.weight
Thanks for your great work! I have no idea what 'check_ddp_consistency' function for. Any ideas to solve the problem?

Train on CelebA-HQ

Hi, Thank you for your work!

If I want to train a model on CelebA-HQ, can you give me a example for bash script?

The confusion about the code.

Hello, thank you for your job, I wonder why the masks need to substract the 0.5 in the code "x = torch.cat([masks_in - 0.5, images_in * masks_in], dim=1)" of the firststage network.

An inquiry

Hello, I'm not quite understand this ws is used for

training time

you have mentioned that you have used 8 V100s GPUs in your paper. Could you tell me how long it took for you to train it?

How to create Mask image by myself

Create mask image that contain some pixels with alpha < 1 did not work. I don't know how to create mask image with all pixels have alpha = 1. Please help

questions with places512 trainning

HI @fenglinglwb ,thanks for your work, i use source code to train places2 datasets, but meet the self._load_raw_image out of bounds in datasets/dataset_512.py, but executing this file alone is no problem. thanks for your reply!

The test results were terrible.

Thank you so much for a great job. I had a few problems when I was testing. I made a few minor changes to the code, which worked fine in training, but in testing, it printed out a lot of intermediate parameters, probably executed the code(fig3), and the test result were terrible. I want to know do you have encountered this problem and how to solve it?

fig1is the test result and the fig2 is the snapshot of training. Thanks a lot!

inference is so slow

partial conv is too slow. Is it normal?

module 'torch' has no attribute 'Assert'

您好，我在运行过程中出现了如下错误，您知道是怎么回事吗？

MAT-main\torch_utils\misc.py", line 64, in
symbolic_assert = torch.Assert # 1.7.0
AttributeError: module 'torch' has no attribute 'Assert'

Inference time drastically different between different weights

Openings an issue perhaps someone figures out why this is happening:

Inference time of MAT on pretrained FFHQ is drastically higher than CelebA!

About reconstruction metrics (PSNR, SSIM)

Thanks for the great work!

I understand PSNR and SSIM are not good metrics for evaluating inpainting tasks.
Nevertheless, could I get the PSNR and SSIM of your method MAT compared to other methods (CoModGAN, LaMa, ICT, MADF, AOTGAN)?

About ICT comparison in the paer

Hi! Thanks for the great work again.

Is there a reason why there is no result of ICT on Places2 in the paper?

Thank you in advance

How can I use 512x512 pretrained model to inpaint 1024x1024 images?

Hi! Thanks for the great work.
Tough I have some little questions, as you said in readme :

"Our implementation only supports generating an image whose size is a multiple of 512. You need to pad or resize the image to make its size a multiple of 512. Please pad the mask with 0 values."

So I use generate_image.py and set --resolution into 1024 , with 512x512 pretrained model you offered, but it seems not working out. The error is below:

File "MAT-main/networks/mat.py", line 20, in nf
return NF[2 ** stage]
KeyError: 1024

It seems that it lacks parameters for 1024x1024 resolution, how can I solve this? Or to say, how can I use 512x512 pretrained model to inpaint 1024x1024 images as you said?

Quick Test have some questions

`Setting up PyTorch plugin "bias_act_plugin"... Failed!
..\torch_utils\ops\bias_act.py:50: UserWarning: Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:

Traceback (most recent call last):
File "..\torch_utils\ops\bias_act.py", line 48, in _init
_plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math'])
File "..\torch_utils\custom_ops.py", line 64, in get_plugin
raise RuntimeError(f'Could not find MSVC/GCC/CLANG installation on this computer. Check _find_compiler_bindir() in "{file}".')
RuntimeError: Could not find MSVC/GCC/CLANG installation on this computer. Check _find_compiler_bindir() in "..\torch_utils\custom_ops.py".`
hello，I didn't download vs. what is the specific solution to this problem

During inference, does your Generator model takes the ground truth image as the input?

Image inpainting is a task of completing holes of the image. Thus, during inference, when there is no ground truth image, the generator should only pass through the masked image as an input. But your code in 'generate_image.py' seems as if it passes through the ground truth image as well.

image = read_image(ipath)
output = G(image, mask, z, label, truncation_psi=truncation_psi, noise_mode=noise_mode)

So, I took an experiment with your pretrained 'CelebA-HQ' (256) model and passed through two different images; masked image and the ground truth image for the image as the G 's component.

Then, it seems as if the model's output is largely affected by the image of the input.
For the below image, the left one is the input image, the middle one is the masked image, and the right one is the output.

Can you please tell me how the ground truth image affects the results of the inference?