advimman / lama Goto Github PK
View Code? Open in Web Editor NEW🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Home Page: https://advimman.github.io/lama-project/
License: Apache License 2.0
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Home Page: https://advimman.github.io/lama-project/
License: Apache License 2.0
At present, bestressult is in CKPT format, and I hope to convert it to onnx mode. I tried the code,
model = load_checkpoint(train_config, checkpoint_path, strict=False, map_location='cpu')
model.freeze()
model.to(device)
torch.save(model,'bestresult.pth')
img = torch.rand(1,3,320,320,requires_grad=False)
img = img.to(torch.device('cpu'))
torch.onnx.export(model,img,'bestresult.onnx',opset_version=11)
print('=====================best onnx result is saved!================')
but it reported an error. How can I solve it? thank you.
[2021-11-26 06:25:35,036][__main__][CRITICAL] - Prediction failed due to too many indices for tensor of dimension 4:
Traceback (most recent call last):
File "bin/predict.py", line 68, in main
I have followed all the instructions to setup lama on my system. I went with the conda installation, and the big-lama.zip
model.
I'm running on a custom .jpg
image, and have rightly modified configs/prediction/default.yaml
and changed .png to .jpg
I created a folder named ts_images
containing img.jpg
and img_mask.jpg
(also tried with img_mask001.jpg), and ran the command:
python3 bin/predict.py model.path=$(pwd)/big-lama indir=$(pwd)/ts_images outdir=$(pwd)/output
I get a Detectron v2 not installed
message, then after a series of processing I get:
[2022-02-14 16:18:45,353][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-02-14 16:18:49,151][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home2/varungupta/lama/ts_images/
0it [00:00, ?it/s]
An outputs
folder gets created in lama/
(and NO output
folder) containing file predict.log
, the top 5 lines of which are:
[2022-02-14 16:18:44,873][saicinpainting.utils][WARNING] - Setting signal 10 handler <function print_traceback_handler at 0x14fe587bcf28>
[2022-02-14 16:18:44,905][root][INFO] - Make training model default
[2022-02-14 16:18:44,905][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init called
[2022-02-14 16:18:44,905][root][INFO] - Make generator ffc_resnet
[2022-02-14 16:18:45,352][saicinpainting.training.trainers.base][INFO] - Generator
and ending with:
[2022-02-14 16:18:45,353][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-02-14 16:18:49,151][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home2/varungupta/lama/ts_images/
I created my masked image using openCV, and it looks the following:
No inpainting results are being generated. What am I missing?
Kindly help me out,
Thanks :)
Edit: I converted my images to .png
from .jpg
, and now the code works! As mentioned, I updated the .yaml
file :
indir: no # to be overriden in CLI
outdir: no # to be overriden in CLI
model:
path: no # to be overriden in CLI
checkpoint: best.ckpt
dataset:
kind: default
img_suffix: .jpg
pad_out_to_modulo: 8
device: cuda
out_key: inpainted
So, has anyone else tested the model with .jpg
images and got it working?
Hi, thanks again for your excellent works.
Is the big-lama model trained on places-challenge dataset? Whether it performs greatly better than a big-lama trained with places2-standard?
Is it possible to release the full checkpoints of the big-lama model, so we can finetune it on other data? Thanks.
I see in your FourierUnit
you added an optional spectral_pos_encoding
argument. Have you experimented at all with this? Has it improved/reduced performance?
In some predicted images, there is a noticeable marking of the applied masks. I attached a small example here. This is part of a bigger image. https://imgur.com/a/19pIbQo
I circled in red the "shadows" that I'm referring to. Those are exactly where the random masks were applied.
Worth mentioning: This is a model that was trained on my own data, using the proposed architecture and configuration proposed here: https://github.com/saic-mdal/lama/blob/main/configs/training/lama-regular.yaml
I am not certain what other information is relevant here but I will provide more if necessary.
Any suggestions on what the issue is here and what could fix it are welcomed. Thanks and thank you for the great project.
Thank you for sharing your great work!!
When I ran train.py with my own dataset, I got the following error.
[2021-12-07 11:28:26,947][__main__][CRITICAL] - Training failed due to `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006). If you want to disable validation set `limit_val_batches` to 0.0 instead.:
Traceback (most recent call last):
File "/home/naoki/lama/bin/train.py", line 64, in main
trainer.fit(training_model)
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in run_train
self.train_loop.reset_train_val_dataloaders(model)
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 218, in reset_train_val_dataloaders
self.trainer.reset_train_dataloader(model)
File "/home/naoki/miniconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py", line 243, in reset_train_dataloader
raise ValueError(
ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006). If you want to disable validation set `limit_val_batches` to 0.0 instead.
My dataset consists of the following number of images.
Is this due to having too few training images?
Thanks for your share! When I run python bin/train.py -cn lama-fourier location=my_dataset data.batch_size=10,the errors occured:
omegaconf.errors.InterpolationResolutionError: ValidationError raised while resolving interpolation: Environment variable 'USER' not found
full_key: hydra.run.dir
object_type=dict
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bin/train.py", line 74, in
main()
File "/root/lama-main/saicinpainting/utils.py", line 163, in new_main
main_func(*args, **kwargs)
File "/root/.local/lib/python3.7/site-packages/hydra/main.py", line 53, in decorated_main
config_name=config_name,
File "/root/.local/lib/python3.7/site-packages/hydra/_internal/utils.py", line 368, in _run_hydra
lambda: hydra.run(
File "/root/.local/lib/python3.7/site-packages/hydra/_internal/utils.py", line 270, in run_and_report
cur.tb_lasti = iter_tb.tb_lasti
AttributeError: 'NoneType' object has no attribute 'tb_lasti'
Could you tell me how to solve it, thanks!
Please specify python version
Hello and thank you for your great work. I encountered this error trying to start the predict.py...what does it tell me? Im not quite sure how to react and what to change :D
python bin\predict.py model.path=B:\ProgrammierDateiBaum\Bachelor\lama\big-lama indir=B:\ProgrammierDateiBaum\Bachelor\bachelorarbeit\Image_data\lama\In outdir=B:\ProgrammierDateiBaum\Bachelor\bachelorarbeit\Image_data\lama\Out
Detectron v2 is not installed
[2022-02-26 16:09:34,785][saicinpainting.utils][WARNING] - Setting signal 1 handler <function print_traceback_handler at 0x0000028C6828E670>
[2022-02-26 16:09:34,787][__main__][CRITICAL] - Prediction failed due to invalid signal value:
Traceback (most recent call last):
File "bin\predict.py", line 41, in main
File "B:\ProgrammierDateiBaum\Bachelor\lama\bin\saicinpainting\utils.py", line 109, in register_debug_signal_handlers
signal.signal(sig, handler)
File "B:\Python3\lib\signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: invalid signal value
Edit:
I made everything work by outcommenting register_debug_signal_handlers()
and changing device
to cpu
in the default.yml
file. It seems like the torch 1.8.0 version does struggle with the cuda version I have?! Im not an expert here...
Im working on a windows machine btw...had some trouble creating the env first, seems like with conda on windows some packages are not available - I switched to a "normal" python version and used pip, then it worked!
I would still like to know what the signal error is and whether torch version 1.8.1 would be sufficient as well, because this version does provide a better cuda support it seems.
Looking at the code I found the _do_step function that as I can tell via torch ligthining what it does is train at each iteration one time the generator and the other the discrimantor? Is this always the receipt in image inpatining. Have you studied oder strategies in order to train the generator and the discriminator.Maybe both at the same iteration ....
I am not sure it is a github problem or because the holder of the repo.... where is my issue, I recently refer to it...
Execute the command:bash docker/2_predict.sh /usr/local/docker/big-lama /usr/local/docker/LaMa_test_images /usr/local/docker/output device=cpu
There is an error:FileNotFoundError: [Errno 2] No such file or directory: '/data/checkpoint/config.yaml'
I don't know where this file comes from, could you please answer it? Thank you!
Hello, your paper mentions that the discriminator uses Fourier or dilated convolution, but I see that the discriminator is set to'pix2pixhd_nlayer' in the config files. What is the reason?
For inference, images are padded to be divisible by 8, but after the prediction is done they are not unpadded.
=> when one uses bin/predict.py
input images and output images have a different shape
Got this failed attempt to convert the model to ONNX
Code example:
save_onnx_path = "/content/lama.onnx"
image = torch.rand(1, 3, 120, 120)
mask = torch.rand(1, 1, 120, 120)
inputs = {
"image": img,
"mask": mask
}
torch.onnx.export(model,
inputs,
save_onnx_path,
opset_version=12,
do_constant_folding=True,
input_names = ['img', 'mask'],
output_names = ['output'],
dynamic_axes={
'img' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
'mask' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
'output' : {0 : 'batch_size', 2 : 'width', 3 : 'height'},
})
Stack trace:
RuntimeError: Exporting the operator fft_rfftn to ONNX opset version 12 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
As far I understood the problem is the fact that ONNX does not support "torch.fft.rfftn" operation used in the module FourierUnit
Hi, thank you so much for making Lama happen. I'd like to share an app/implementation of using DE:TR, an object detection model, and Lama together.
Thanks for your brilliant work, and when I run predict.py, the errors occured.
Detectron v2 is not installed
mismatched input '(' expecting
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
how could I solve it, thanks!
hydra.main(config_path='../configs/training', config_name='tiny_test.yaml')
where is the config file? configs/training/tiny_test.yaml does not exist.
Thanks for your exciting work firstly.
when i use -cn lama-fourier
to train my own dataset . i find there are some white areas in some train and test images (not all images, and according to my observation it is irrelevant to mask size), like below( these two images are selected from epoch33/40):
and
Do you know how to avoid this situation? Thanks in advance.
PS:
my dataset is a food image set and there are 150,000 images.
And i use this command to train my model
CUDA_VISIBLE_DEVICES=0,1,2,3 python bin/train.py -cn lama-fourier location=food data.batch_size=10 data.num_workers=8 trainer.kwargs.gpus=[0,1,2,3] trainer.kwargs.limit_train_batches=12360 optimizers.generator.lr=0.001 optimizers.discriminator.lr=0.0001
This amazing model is now integrated into a telegram bot.
Here is the source code https://github.com/Moldoteck/MagicEraser
And here is the bot: https://t.me/MagicEraser
Here is the bot username: @MagicEraserBot
I have 8 2080ti GPUs (11GB). When I train, I can only use 4 cards, and the batchsize can only be set to 5. GPU 0 occupies 10481 MB, and the other three cards occupies 6906 MB each. I don’t know how to solve it. I am not very familiar with pytorch lightning. If DDP is used for parallel training in pytorch, the load is balanced.
I am trying to integrate this in a ios project. But I couldn't find any way to integrate this. Can anyone help me with this.
Hello!
Thank you for this fantastic model.
I am using it with Black and White images (i.e. gray) which only have 2 channels.
Your code reads images using matplot lib, which will only read the 2 channels of an BnW image.
It will be better to read images using openCV, which adds the extra channel.
So replacing everything that reads the image like this:
img = plt.imread(fname)[:,:,:3]
#this won't work in BnW
by using:
img = cv2.imread(fname) #this adds the extra channel and converts to np array on the fly
I can make the changes in your code and crete a pull request if you want.
Cheers, Lucia.
Hi, thanks for your excellent works.
I have a question about what is "weights_path: ${env:TORCH_HOME}" mean in big-lama.yaml ? When I retrain the works, I don't kown what I had writen in here.
Thank you for time!
can you please guide me to the part of the code that fourier-convolutions function is defined? I am not able to find it...
Hello guys,
Thanks for the code and project ! I am trying to understand your model and I found that you are using this layer after the FFC Resblock ? Can you explain what are you trying to do with this layer ? I can not find any details in the paper.
Hi, I use the Places365-Standard dataset to train the model, which has more than 1.8 Million images. And when i set batch_size=15, accelerator=ddp, gpus=8, from my experience the batchs in one epoch is 1.8M/15/8
, each gpu just gets visibility into a subset of the overall dataset. But i found that the batchs on each gpu is 1.8M/15
in lama.
I run python3 bin/train.py -cn mylama-fourier
to start training.
Is this the right way to use ddp to accelerate training?
Hi! First of all thanks for sharing the code. I have a doubt about the loss L1.First of all this loss does not appear at the paper right? Second of all about the loss weight_know vs weight_missing: Why in most of the configs you set weight_missing to 0, As I understand this weights the part of the masked image in order to make the network match the gt with the predicted in the zone to inpainted. That is the zone where mask == 1. Why You set that to 0? Have you studied this param on the effect of convergence?
I was looking to install this. I was wondering if it supports some way to do a batch of images from a sequence or even do a video? Or do I have to manual highlight the object i want removed in each image? Thanks everyone.
Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI ⚡
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍
What is needed to do?
What will you get?
I can't load the trained checkpoint, how can I modify the code to load it?
https://www.hama.app/en - A fast web application that lets you erase objects from your photo with a single brush by https://github.com/coxwave
Are the training parameters provided by you the same as those on the website (https://cleanup.pictures/) ? Why do the same pictures and masks have different effects?
Can you please guide over how can random masks be created on custom images? The link to the script given in the description does not seem to be working.
It seems that the website storing the model is down. May I ask you to upload your backup model? Thank you very much.
Could the authors release the full checkpoints including the discriminator?
I would like to finetune the model.
Thanks so much.
Hi,
Thank you for sharing LaMa! The inpainting quality is really impressive!
I was wondering:
Comparing "LaMa-Fourier" with "Big LaMa-Fourier": How much did the larger training data (4.5M images from the Places-Challenge dataset) contribute to the improved quality of Big LaMa-Fourier? Do you think that similar results could have also been achieved for Big LaMa-Fourier with less data?
You have proposed a sophisticated approach for data augmentation. How much did the training and the inference benefit from data augmentation using segmentation masks from Detectron2?
Best wishes,
Alex
Hi, thanks for the great work!
I am trying to reproduce the training results.
I used the default batch size and run the lama-fourier model on 4 V100 GPUs for 40 epochs. The training takes about 12 hrs, and the results on the training dataset look very nice, but it went wrong in the testing and other validation images. There will be some texture artifacts like this.
I wonder what will be the reason: batch size, training epoch or others?
If I set the batch size to 10, the training time on lama-fourier will be too long.
Usually how long should the training be finished on 4 V100 GPUs? (1 day or 1 week) and what batch size should be set to make the model generalized well to other images?
Thanks so much!
For the function masked_l1_loss
(https://github.com/saic-mdal/lama/blob/main/saicinpainting/training/losses/feature_matching.py#L13),
I found weight_known=10
and weight_missing=0
in all config yaml files, so the masked_l1_loss won't calculate the difference on masked region. This seems counter-intuitive. Can you explain the reason?
Hi! The website http://sceneparsing.csail.mit.edu/model/pytorch/ade20k-resnet50dilated-ppm_deepsup/encoder_epoch_20.pth seems unavailable right now. Do you know, maybe, whether it's possible to download it from any other place?
I tried to set "sync_batchnorm: True " in configs.training.trainer, and the training just got stuck, why?
Hi, thanks for the amazing codebase. Is it possible to run the bin/evaluate_predicts.py
process on multiple GPUs? As per my understanding and experiments, it only uses a single GPU, making an evaluation on Places2 relatively slow.
Hi,
Firstly, Thank you for making such a great project open source.
I found the out_size in released big-lama config.yaml is 256, was the big-lama model trained with images' size 256?
Congrats on your great work and thanks for sharing your code. I'm using this model for an image2image translation. but after training for some time I'm losing the details from the input image in the predicted results. do you have any suggestions on how to modify the loss functions for better detail perservation? Thanks.
It looks like fp16 is not supported in pytorch.
ffted = torch.fft.rfftn(x, dim=fft_dim, norm=self.fft_norm)
RuntimeError: Unsupported dtype Half
Is there a way to make parts of the network that do not support fp16 run in fp32, and those that support in fp16?
Hi, I love the project, and the code quality is relatively high.
But there could be a few more minor steps to make it even more readable.
Could you please consider adding
to the project?
It should be relatively straightforward and will not change any code logic but benefit maintainers and other users.
At some point, I wrote a blog post on the topic with examples of all the steps: I trained a model. What is next?
Or you can see how we do it in Albumentations: https://github.com/albumentations-team/albumentations
The repository will also become a role model for all other Research Projects at GitHub :)
P.S. If you consider doing this and encounter problems, I will be happy to answer any questions.
In your paper, you write that:
Naive supervised losses require the generator to reconstruct
the ground truth precisely. However, the visible parts of the
image often do not contain enough information for the exact
reconstruction of the masked part. Therefore, using naive
supervision leads to blurry results due to the averaging of
multiple plausible modes of the inpainted content.
In contrast, perceptual loss evaluates a distance between features extracted from the predicted and the target
images by a base pre-trained network
But inside some main configs, you set the perceptual weight to 0.
Is it a config problem, or do you train the models without perceptual loss?
perceptual:
weight: 0
In lama-fourier
config
Hey there, thanks for making the code & model public.
I've made a small streamlit tool out of it, together with seam-carving for image retargeting.
It's currently quite rudimentary but I'll add new features soon.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.