advimman / lama Goto Github PK

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Home Page: https://advimman.github.io/lama-project/

License: Apache License 2.0

Python 9.75% Shell 0.33% Jupyter Notebook 89.89% Dockerfile 0.03%

inpainting inpainting-methods inpainting-algorithm computer-vision cnn deep-learning deep-neural-networks image-inpainting fourier fourier-transform

lama's People

Contributors

Stargazers

Watchers

Forkers

styler00dollar justinjohn0306 ofirkris saulocatharino elimohl q8iqbal palsol ai-machine-vision-lab loretoparisi kapitsa2811 zhanghongyong123456 ak391 radiotherapyai rhinojosa mishav78 bonevays xyz9911 zhaoyi-yan peterzhousz flashfoxter jeffreyhaobondilabs uaicfs githubssj lincong666 gov-ai awesome-archive kalvin807 2021-paper-fun helang818 kerwinzxc soulcoder91 fushinex kingdary freekid hanrw yifree leedaga jianantian duanexiao wzl-bxg hackdou gentledell bobo-yan waoywssy ml-lab bigdatasciencegroup zhuomingliang xmyyzy123 shopee-jin unlu huangweiboy2 haikusw chunhualiu toread-jxj napohou h1d3r dumpmemory asadullah-dal17 cthorey yfq512 geogubd anttutu lerayan dl-ha orlgln xuyichengyy adw0rd globalq cv-ip apertus-dev geekhuyang jinwook-shim truhangle hhhhnwl andreas128 yuchiwen cmf588124 wy0727 leggerla soaringgecko micklexqg digantamisra98 hongshibao snoopybingo 2vin jaredyam hadryan alexlaforge mustafaalahmid gamsarai321 hijinks777 mrsandmanrus happytiger888 mykhailychmykola stjordanis bigboyabhisthi bobyakbr zhoushiwei valeman mrcodechef

lama's Issues

Spectral Positional Encoding

I see in your FourierUnit you added an optional spectral_pos_encoding argument. Have you experimented at all with this? Has it improved/reduced performance?

Influence: Amount of training data and data augmentation via Detectron2

Hi,

Thank you for sharing LaMa! The inpainting quality is really impressive!

I was wondering:

Comparing "LaMa-Fourier" with "Big LaMa-Fourier": How much did the larger training data (4.5M images from the Places-Challenge dataset) contribute to the improved quality of Big LaMa-Fourier? Do you think that similar results could have also been achieved for Big LaMa-Fourier with less data?
You have proposed a sophisticated approach for data augmentation. How much did the training and the inference benefit from data augmentation using segmentation masks from Detectron2?

Best wishes,
Alex

lama-cleaner

A self-host version of cleanup.pictures

https://github.com/Sanster/lama-cleaner

where is the fourier-convolutions function?

can you please guide me to the part of the code that fourier-convolutions function is defined? I am not able to find it...

Integrate to iOS

I am trying to integrate this in a ios project. But I couldn't find any way to integrate this. Can anyone help me with this.

there are some strange white areas in my result.

Thanks for your exciting work firstly.
when i use -cn lama-fourier to train my own dataset . i find there are some white areas in some train and test images (not all images, and according to my observation it is irrelevant to mask size), like below( these two images are selected from epoch33/40):

and

Do you know how to avoid this situation? Thanks in advance.

PS:
my dataset is a food image set and there are 150,000 images.
And i use this command to train my model
CUDA_VISIBLE_DEVICES=0,1,2,3 python bin/train.py -cn lama-fourier location=food data.batch_size=10 data.num_workers=8 trainer.kwargs.gpus=[0,1,2,3] trainer.kwargs.limit_train_batches=12360 optimizers.generator.lr=0.001 optimizers.discriminator.lr=0.0001

Training batch size and other parameters?

Hi, thanks for the great work!
I am trying to reproduce the training results.
I used the default batch size and run the lama-fourier model on 4 V100 GPUs for 40 epochs. The training takes about 12 hrs, and the results on the training dataset look very nice, but it went wrong in the testing and other validation images. There will be some texture artifacts like this.

I wonder what will be the reason: batch size, training epoch or others?
If I set the batch size to 10, the training time on lama-fourier will be too long.

Usually how long should the training be finished on 4 V100 GPUs? (1 day or 1 week) and what batch size should be set to make the model generalized well to other images?

Thanks so much!

Image2Image model

Congrats on your great work and thanks for sharing your code. I'm using this model for an image2image translation. but after training for some time I'm losing the details from the input image in the predicted results. do you have any suggestions on how to modify the loss functions for better detail perservation? Thanks.

Streamlit implementation

Hey there, thanks for making the code & model public.
I've made a small streamlit tool out of it, together with seam-carving for image retargeting.
It's currently quite rudimentary but I'll add new features soon.

https://github.com/petergro-hub/ComicInpainting

How to convert bestresult to onnx

At present, bestressult is in CKPT format, and I hope to convert it to onnx mode. I tried the code,

        model = load_checkpoint(train_config, checkpoint_path, strict=False, map_location='cpu')
        model.freeze()
        model.to(device)
        torch.save(model,'bestresult.pth')

        img = torch.rand(1,3,320,320,requires_grad=False)
        img = img.to(torch.device('cpu'))
        torch.onnx.export(model,img,'bestresult.onnx',opset_version=11)
        print('=====================best onnx result is saved!================')

but it reported an error. How can I solve it? thank you.

  [2021-11-26 06:25:35,036][__main__][CRITICAL] - Prediction failed due to too many indices for tensor of dimension 4:
Traceback (most recent call last):
  File "bin/predict.py", line 68, in main

Releasing the Pretrained Full Checkpoints

Could the authors release the full checkpoints including the discriminator?
I would like to finetune the model.

Thanks so much.

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI ⚡
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

have some tests, including PL integration
add config to ecosystem CI - https://github.com/PyTorchLightning/ecosystem-ci

What will you get?

scheduled nightly testing configured for development/stable versions
slack notification if something went wrong to investigate
testing also on multi-GPU machine as our gift to you 🐰

Environment variable 'USER' not found

Thanks for your share! When I run python bin/train.py -cn lama-fourier location=my_dataset data.batch_size=10，the errors occured:
omegaconf.errors.InterpolationResolutionError: ValidationError raised while resolving interpolation: Environment variable 'USER' not found
full_key: hydra.run.dir
object_type=dict

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bin/train.py", line 74, in
main()
File "/root/lama-main/saicinpainting/utils.py", line 163, in new_main
main_func(*args, **kwargs)
File "/root/.local/lib/python3.7/site-packages/hydra/main.py", line 53, in decorated_main
config_name=config_name,
File "/root/.local/lib/python3.7/site-packages/hydra/_internal/utils.py", line 368, in _run_hydra
lambda: hydra.run(
File "/root/.local/lib/python3.7/site-packages/hydra/_internal/utils.py", line 270, in run_and_report
cur.tb_lasti = iter_tb.tb_lasti
AttributeError: 'NoneType' object has no attribute 'tb_lasti'

Could you tell me how to solve it, thanks!

Export to Onnx

Got this failed attempt to convert the model to ONNX

Code example:
save_onnx_path = "/content/lama.onnx"

image = torch.rand(1, 3, 120, 120)
mask = torch.rand(1, 1, 120, 120)
inputs = {
    "image": img,
    "mask": mask
}

torch.onnx.export(model,   
                  inputs,   
                  save_onnx_path,
                  opset_version=12,
                  do_constant_folding=True,
                  input_names = ['img', 'mask'],
                  output_names = ['output'],
                  dynamic_axes={
                      'img' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
                      'mask' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
                      'output' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
                  })

Stack trace:
RuntimeError: Exporting the operator fft_rfftn to ONNX opset version 12 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

As far I understood the problem is the fact that ONNX does not support "torch.fft.rfftn" operation used in the module FourierUnit

sync_batchnorm

I tried to set "sync_batchnorm: True " in configs.training.trainer, and the training just got stuck, why?

What is LearnableSpatialTransformWrapper layer?

Hello guys,

Thanks for the code and project ! I am trying to understand your model and I found that you are using this layer after the FFC Resblock ? Can you explain what are you trying to do with this layer ? I can not find any details in the paper.

https://github.com/saic-mdal/lama/blob/0449ee2ced46f13539b16c5fe8103eb46f56c16d/saicinpainting/training/modules/spatial_transform.py#L7

Question on masked_l1_loss

For the function masked_l1_loss (https://github.com/saic-mdal/lama/blob/main/saicinpainting/training/losses/feature_matching.py#L13),
I found weight_known=10 and weight_missing=0 in all config yaml files, so the masked_l1_loss won't calculate the difference on masked region. This seems counter-intuitive. Can you explain the reason?

Where to download the model for perceptual loss

Hi! The website http://sceneparsing.csail.mit.edu/model/pytorch/ade20k-resnet50dilated-ppm_deepsup/encoder_epoch_20.pth seems unavailable right now. Do you know, maybe, whether it's possible to download it from any other place?

Questions about training big-lama and the full-checkpoint

Hi, thanks again for your excellent works.
Is the big-lama model trained on places-challenge dataset? Whether it performs greatly better than a big-lama trained with places2-standard?
Is it possible to release the full checkpoints of the big-lama model, so we can finetune it on other data? Thanks.

Random Mask Generation

Can you please guide over how can random masks be created on custom images? The link to the script given in the description does not seem to be working.

Is there a way to train in the fp16 mode?

It looks like fp16 is not supported in pytorch.

ffted = torch.fft.rfftn(x, dim=fft_dim, norm=self.fft_norm)
RuntimeError: Unsupported dtype Half

Is there a way to make parts of the network that do not support fp16 run in fp32, and those that support in fp16?

No inpainted results generated on JPG

I have followed all the instructions to setup lama on my system. I went with the conda installation, and the big-lama.zip model.

I'm running on a custom .jpg image, and have rightly modified configs/prediction/default.yaml and changed .png to .jpg

I created a folder named ts_images containing img.jpg and img_mask.jpg (also tried with img_mask001.jpg), and ran the command:

python3 bin/predict.py model.path=$(pwd)/big-lama indir=$(pwd)/ts_images outdir=$(pwd)/output

I get a Detectron v2 not installed message, then after a series of processing I get:

[2022-02-14 16:18:45,353][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-02-14 16:18:49,151][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home2/varungupta/lama/ts_images/
0it [00:00, ?it/s]

An outputs folder gets created in lama/ (and NO output folder) containing file predict.log, the top 5 lines of which are:

[2022-02-14 16:18:44,873][saicinpainting.utils][WARNING] - Setting signal 10 handler <function print_traceback_handler at 0x14fe587bcf28>
[2022-02-14 16:18:44,905][root][INFO] - Make training model default
[2022-02-14 16:18:44,905][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init called
[2022-02-14 16:18:44,905][root][INFO] - Make generator ffc_resnet
[2022-02-14 16:18:45,352][saicinpainting.training.trainers.base][INFO] - Generator

and ending with:

[2022-02-14 16:18:45,353][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-02-14 16:18:49,151][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home2/varungupta/lama/ts_images/

I created my masked image using openCV, and it looks the following:

No inpainting results are being generated. What am I missing?
Kindly help me out,
Thanks :)

@windj007

Edit: I converted my images to .png from .jpg, and now the code works! As mentioned, I updated the .yaml file :

indir: no  # to be overriden in CLI
outdir: no  # to be overriden in CLI

model:
  path: no  # to be overriden in CLI
  checkpoint: best.ckpt

dataset:
  kind: default
  img_suffix: .jpg
  pad_out_to_modulo: 8

device: cuda
out_key: inpainted

So, has anyone else tested the model with .jpg images and got it working?

please add our cleaner app on README

https://www.hama.app/en - A fast web application that lets you erase objects from your photo with a single brush by https://github.com/coxwave

How to use ddp during training

Hi, I use the Places365-Standard dataset to train the model, which has more than 1.8 Million images. And when i set batch_size=15, accelerator=ddp, gpus=8, from my experience the batchs in one epoch is 1.8M/15/8, each gpu just gets visibility into a subset of the overall dataset. But i found that the batchs on each gpu is 1.8M/15 in lama.
I run python3 bin/train.py -cn mylama-fourier to start training.
Is this the right way to use ddp to accelerate training?

about running predict.py

Thanks for your brilliant work, and when I run predict.py, the errors occured.

Detectron v2 is not installed
mismatched input '(' expecting
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
how could I solve it, thanks!

ade20k-resnet50dilated-ppm_deepsup/encoder_epoch_20.pth

It seems that the website storing the model is down. May I ask you to upload your backup model? Thank you very much.

http://sceneparsing.csail.mit.edu/model/pytorch/ade20k-resnet50dilated-ppm_deepsup/encoder_epoch_20.pth

Thank you

This amazing model is now integrated into a telegram bot.
Here is the source code https://github.com/Moldoteck/MagicEraser
And here is the bot: https://t.me/MagicEraser
Here is the bot username: @MagicEraserBot

ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006)

Thank you for sharing your great work!!

When I ran train.py with my own dataset, I got the following error.

[2021-12-07 11:28:26,947][__main__][CRITICAL] - Training failed due to `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006). If you want to disable validation set `limit_val_batches` to 0.0 instead.:
Traceback (most recent call last):
  File "/home/naoki/lama/bin/train.py", line 64, in main
    trainer.fit(training_model)
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in run_train
    self.train_loop.reset_train_val_dataloaders(model)
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 218, in reset_train_val_dataloaders
    self.trainer.reset_train_dataloader(model)
  File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py", line 243, in reset_train_dataloader
    raise ValueError(
ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006). If you want to disable validation set `limit_val_batches` to 0.0 instead.

My dataset consists of the following number of images.

Train: 12022
val_source: 2068
visual_test_source: 188
eval_source: 2032

Is this due to having too few training images?

Evaluate predictions on multiple GPUs

Hi, thanks for the amazing codebase. Is it possible to run the bin/evaluate_predicts.py process on multiple GPUs? As per my understanding and experiments, it only uses a single GPU, making an evaluation on Places2 relatively slow.

Batch/Video Processing

I was looking to install this. I was wondering if it supports some way to do a batch of images from a sequence or even do a video? Or do I have to manual highlight the object i want removed in each image? Thanks everyone.

[Proposal] Proposal for a better code hygiene

Hi, I love the project, and the code quality is relatively high.

But there could be a few more minor steps to make it even more readable.

Could you please consider adding

formatters: black, isort
linters: flake8, pylint, mypy

to the project?

pre-commit hook
GitHub actions to check for it on every PR?

It should be relatively straightforward and will not change any code logic but benefit maintainers and other users.

At some point, I wrote a blog post on the topic with examples of all the steps: I trained a model. What is next?

Or you can see how we do it in Albumentations: https://github.com/albumentations-team/albumentations

The repository will also become a role model for all other Research Projects at GitHub :)

P.S. If you consider doing this and encounter problems, I will be happy to answer any questions.

Perceptual loss weight

In your paper, you write that:
Naive supervised losses require the generator to reconstruct
the ground truth precisely. However, the visible parts of the
image often do not contain enough information for the exact
reconstruction of the masked part. Therefore, using naive
supervision leads to blurry results due to the averaging of
multiple plausible modes of the inpainted content.
In contrast, perceptual loss evaluates a distance between features extracted from the predicted and the target
images by a base pre-trained network

But inside some main configs, you set the perceptual weight to 0.
Is it a config problem, or do you train the models without perceptual loss?

perceptual:
    weight: 0

In lama-fourier config

FileNotFoundError: [Errno 2] No such file or directory: '/data/checkpoint/config.yaml'

Execute the command：bash docker/2_predict.sh /usr/local/docker/big-lama /usr/local/docker/LaMa_test_images /usr/local/docker/output device=cpu
There is an error：FileNotFoundError: [Errno 2] No such file or directory: '/data/checkpoint/config.yaml'
I don't know where this file comes from, could you please answer it? Thank you！

no yaml file

hydra.main(config_path='../configs/training', config_name='tiny_test.yaml')

where is the config file? configs/training/tiny_test.yaml does not exist.

Are the training parameters provided by you the same as those on the website (https://cleanup.pictures/) ?

Are the training parameters provided by you the same as those on the website (https://cleanup.pictures/) ? Why do the same pictures and masks have different effects?

Start training from a pretrained model

I can't load the trained checkpoint, how can I modify the code to load it?

Invalid signal value

Hello and thank you for your great work. I encountered this error trying to start the predict.py...what does it tell me? Im not quite sure how to react and what to change :D

python bin\predict.py model.path=B:\ProgrammierDateiBaum\Bachelor\lama\big-lama indir=B:\ProgrammierDateiBaum\Bachelor\bachelorarbeit\Image_data\lama\In outdir=B:\ProgrammierDateiBaum\Bachelor\bachelorarbeit\Image_data\lama\Out
Detectron v2 is not installed
[2022-02-26 16:09:34,785][saicinpainting.utils][WARNING] - Setting signal 1 handler <function print_traceback_handler at 0x0000028C6828E670>
[2022-02-26 16:09:34,787][__main__][CRITICAL] - Prediction failed due to invalid signal value:
Traceback (most recent call last):
  File "bin\predict.py", line 41, in main
  File "B:\ProgrammierDateiBaum\Bachelor\lama\bin\saicinpainting\utils.py", line 109, in register_debug_signal_handlers
    signal.signal(sig, handler)
  File "B:\Python3\lib\signal.py", line 47, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: invalid signal value

Edit:
I made everything work by outcommenting register_debug_signal_handlers() and changing device to cpu in the default.yml file. It seems like the torch 1.8.0 version does struggle with the cuda version I have?! Im not an expert here...

Im working on a windows machine btw...had some trouble creating the env first, seems like with conda on windows some packages are not available - I switched to a "normal" python version and used pip, then it worked!

I would still like to know what the signal error is and whether torch version 1.8.1 would be sufficient as well, because this version does provide a better cuda support it seems.

[Bug] In `predict.py`. images are not unpadded

For inference, images are padded to be divisible by 8, but after the prediction is done they are not unpadded.

=> when one uses bin/predict.py input images and output images have a different shape

Mask "shadows" in some images?

In some predicted images, there is a noticeable marking of the applied masks. I attached a small example here. This is part of a bigger image. https://imgur.com/a/19pIbQo
I circled in red the "shadows" that I'm referring to. Those are exactly where the random masks were applied.

Worth mentioning: This is a model that was trained on my own data, using the proposed architecture and configuration proposed here: https://github.com/saic-mdal/lama/blob/main/configs/training/lama-regular.yaml

I am not certain what other information is relevant here but I will provide more if necessary.

Any suggestions on what the issue is here and what could fix it are welcomed. Thanks and thank you for the great project.

Discriminator config

Hello, your paper mentions that the discriminator uses Fourier or dilated convolution, but I see that the discriminator is set to'pix2pixhd_nlayer' in the config files. What is the reason?

Question about L1 loss weight_know vs weight_missing

Hi! First of all thanks for sharing the code. I have a doubt about the loss L1.First of all this loss does not appear at the paper right? Second of all about the loss weight_know vs weight_missing: Why in most of the configs you set weight_missing to 0, As I understand this weights the part of the masked image in order to make the network match the gt with the predicted in the zone to inpainted. That is the zone where mask == 1. Why You set that to 0? Have you studied this param on the effect of convergence?

An automated implementation of lama: auto-lama

Hi, thank you so much for making Lama happen. I'd like to share an app/implementation of using DE:TR, an object detection model, and Lama together.

https://github.com/andy971022/auto-lama

Here are some demos that are pretty much self-explanatory.

Proposal for the handling of B&W images

Hello!
Thank you for this fantastic model.

I am using it with Black and White images (i.e. gray) which only have 2 channels.
Your code reads images using matplot lib, which will only read the 2 channels of an BnW image.

It will be better to read images using openCV, which adds the extra channel.

So replacing everything that reads the image like this:

img = plt.imread(fname)[:,:,:3] #this won't work in BnW

by using:

img = cv2.imread(fname) #this adds the extra channel and converts to np array on the fly

I can make the changes in your code and crete a pull request if you want.

Cheers, Lucia.

the input_size and out_size of big-lama

Hi,
Firstly, Thank you for making such a great project open source.
I found the out_size in released big-lama config.yaml is 256, was the big-lama model trained with images' size 256?

Doubt about optimizer_idx

Looking at the code I found the _do_step function that as I can tell via torch ligthining what it does is train at each iteration one time the generator and the other the discrimantor? Is this always the receipt in image inpatining. Have you studied oder strategies in order to train the generator and the discriminator.Maybe both at the same iteration ....

GPU load is severely unbalanced

I have 8 2080ti GPUs (11GB). When I train, I can only use 4 cards, and the batchsize can only be set to 5. GPU 0 occupies 10481 MB, and the other three cards occupies 6906 MB each. I don’t know how to solve it. I am not very familiar with pytorch lightning. If DDP is used for parallel training in pytorch, the load is balanced.