Giter Site home page Giter Site logo

hitachinsk / fgt Goto Github PK

View Code? Open in Web Editor NEW
288.0 13.0 30.0 54.55 MB

[ECCV 2022] Flow-Guided Transformer for Video Inpainting

Home Page: https://hitachinsk.github.io/publication/2022-10-01-Flow-Guided-Transformer-for-Video-Inpainting

License: MIT License

Python 36.40% Shell 0.01% Jupyter Notebook 63.59%
eccv2022 video-inpainting

fgt's Introduction

Hi there 👋

github stats

  • 🔭 I'm currently working on low-level computer vision, and 3D related modeling / rendering.
  • 🌱 I'm currently learning medical image processing.
  • 👯 I'm looking to collaborate on the ground-breaking projects on computer vision.
  • 💬 If you have any problem about my research projects or me, feel free to contact me.
  • 📫 How to reach me: [email protected]

Visitor Count

fgt's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fgt's Issues

Add python to env creation so correct pip will be used

In the installation process
"conda create -n FGT
conda activate FGT
pip install -r requirements.txt
pip install imageio-ffmpeg"

Add step to install pip and python to conda environment to make sure correct pip is used and packages are installed to FGT environment. This can be caveat to some young developers if not done correctly and if ROOT envs pip is accidentally used
-->
"conda create -n FGT python=3.x"
conda activate FGT
pip install -r requirements.txt
pip install imageio-ffmpeg"

Outpainting Training on Custom Data

Good day,

I am looking to train the outpainting model on a custom dataset, but I am not so sure what will be most effective. I was thinking about fine-tuning the FGT by retraining the pretrained model - though I am not sure, what would you do?

Cuda OOM (1k+ frame video) on A100

Hi @hitachinsk. Thanks for the great work.

Is there any recommendations you or anyone else can give on how to solve a Cuda OOM issue using FGT? I tried experimenting with allocation size as well as cleared my cache but nothing works. I see the batch size is already low.
I'm on an A100 which should be plenty.

This line of code:

filled_frames = FGT_model(masked_frames, selected_flows, selected_masks)

test result resolution

Is the resolution of the final output video image of the algorithm different from that of the input image? Same or bigger or smaller?how to adjust parameter for adapting my poor GPU memory(6G)?

Questions about the correspondence between the published pre-trained models and the algorithms in the paper.

I am sorry to bother you.
There are some questions about the correspondences between the published pre-trained models and the algorithms in the paper. I noticed that your published pre-trained model list have 'lafc' and 'lafc_single' two flow completion weights. At the same time, as shown in paper named 'Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting', Tab. 4 provided 'S' 'LA' 'LA+Le' three algorithms to compare with each other. I am sorry to bother you to consult about which two of the published two weights correspond to two of the three comparison algorithms respectively.
Thanks very much. Looking forward to your reply soon!

About image resolution and optical flow resolution

I noticed that in the code './tool/video_inpainting. Py 'the resolution of the optical flow estimation for low-resolution images is extended, which causes the resolution to not align when the function rf.regionfill() is called.
if imgH < 350: flowH, flowW = imgH * 2, imgW * 2 else: flowH, flowW = imgH, imgW
After this code is deleted, the sample is generated accurately. Can the author explain the reason for this?v

supplementary material

May I ask where the “supplementary material” mentioned multiple times in the paper is ?

Add a way to extract an `ONNX` file

First of all, thanks a lot for this amazing project and your hard work!

In https://github.com/burn-rs/burn we are trying to load different ONNX models, convert them into burn models, and then use them on different backends. It would be fantastic if we could try your model too. Can you add a way to extract the ONNX model? Using PyTorch is quite feasible.

Thanks a lot in advance!

Run on image

Hi @hitachinsk, is it possible to run your awesome project on only image, not video? If yes, could you help me point out which part I need to modify. Thank you.

T dimension problems in encoders

inputs = inputs.view(b * t, self.in_channels, h, w)
enc_feats = self.frame_endoder(inputs)
Why is the t dimension multiplied on the batch dimension and not on the channel dimension?Can you explain it?Thank you

FGT++

I see FGT++ mentioned in the Readme. Is there any related code in the repo?

How can I test a custom video? I have only any video.

Your study is brilliant!

I would like to try a custom video. Do we need a mask for any video? If yes, what do you suggest for this?

Will you release any demo like this scenario? (I have only any video)

Thank you.

About square mask settings

Hope everything is well with you!

Recently I've been trying to reproduce FGT's performance on square mask settings but failed. Under our own setting (by using "object_removal.yaml" config), we face that the FGT's performance on Youtube-VOS is lower than STTN about 1dB, while outperforming STTN on the DAVIS dataset. Can you share the config for square settings if exist?

Thank you for open-sourcing your great work :)

Best,

Jinsu

output video size

Dear Author,
It is a great project. And I meet a problem, can I set the output size of the video? i mean the height and the width.

RuntimeError:cuda out of memory

now,I have 4 Tesla T4 GPU(every one is 16G),when I run video_inpaintint.py to object removal in my video(1440x720),but report cuda memory error,only one GPU is used.

RuntimeError: CUDA out of memory. Tried to allocate 6.95 GiB (GPU 0; 14.76 GiB total capacity; 9.42 GiB already allocated; 2.78 GiB free; 10.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

so how to use my all 4 GPUs?,(I hope not change the image size)

Originally posted by @wangdi9 in #23 (comment)

FGT model config problem

When I tried to train your FGT model ,it occured to me that I`m missing the right .yaml config file in flowCheckPoint dir
which caused this to happen

RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict:

..
Could pls release the right and corresponding config file?

The flow propagation

Hi , thanks for great work! The flow propagation mentioned in paper seems to be missing in the released code, it doesn't matter?

add models to Hugging Face Hub

Hi!

Would you be interested in sharing your models in the Hugging Face Hub? The Hub offers free hosting and it would make your work more accessible and visible to the rest of the ML community.

Some of the benefits of sharing your models through the Hub would be:

  • versioning, commit history and diffs
  • repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
  • multiple features from TensorBoard visualizations, PapersWithCode integration, and more
  • wider reach of your work to the ecosystem

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

and you can also setup a gradio demo for your model by following this guide: https://gradio.app/getting_started/

here is a example of a gradio demo: https://huggingface.co/spaces/adirik/OWL-ViT
and the code: https://huggingface.co/spaces/adirik/OWL-ViT/blob/main/app.py

Happy to hear your thoughts,
Ahsen and the Hugging Face team

Organization of the YouTube-VOS dataset in the ‘myData’ folder for LAFC Net training?

Hi, I have organized the YouTube-VOS dataset in the 'myData' folder, as shown below, for LAFC Network training
$ cd LAFC
$ python train.py

FGT/myData
|- youtubevos_frames
|- bmx-bumps
|- <00000>.jpg
|- <00001>.jpg
|- youtubevos_flows
|- backward_flo
|- bmx-bumps
|- <00000>.flo
|- <00001>.flo
|- forward_flo
|- bmx-bumps
|- <00000>.flo
|- <00001>.flo

But the problem I'm having is that there is no such file or directory: '/myData/youtubevos_flows'. The logs are shown below.

using GPU 4-4 for training
using GPU 3-3 for training
using GPU 1-1 for training
self.opt[datasetName_train] train_dataset_edge
self.opt[datasetName_train] train_dataset_edge
self.opt[datasetName_train] train_dataset_edge
self.opt[datasetName_train] train_dataset_edge
self.opt[datasetName_train] train_dataset_edge
Traceback (most recent call last):
File "train.py", line 70, in
main(args_obj)
File "train.py", line 59, in main
mp.spawn(main_worker, nprocs=opt['world_size'], args=(opt,))
File "/raid2/hwpeng/miniconda3/envs/FGT_ENV/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/raid2/hwpeng/miniconda3/envs/FGT_ENV/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/raid2/hwpeng/miniconda3/envs/FGT_ENV/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/raid2/hwpeng/miniconda3/envs/FGT_ENV/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/raid2/hwpeng/Project_Coding/FGT/LAFC/train.py", line 30, in main_worker
trainer = pkg.Network(opt, rank)
File "/raid2/hwpeng/Project_Coding/FGT/LAFC/trainer.py", line 24, in init
self.dataInfo, self.valInfo, self.trainSet, self.trainSize, self.totalIterations, self.totalEpochs, self.trainLoader, self.trainSampler = self.prepareDataset()
File "/raid2/hwpeng/Project_Coding/FGT/LAFC/trainer.py", line 129, in prepareDataset
train_set = create_dataset(dataset, dataInfo, phase, self.opt['datasetName_train'])
File "/raid2/hwpeng/Project_Coding/FGT/LAFC/data/init.py", line 38, in create_dataset
dataset = dataset_package.VideoBasedDataset(dataset_opt, dataInfo)
File "/raid2/hwpeng/Project_Coding/FGT/LAFC/data/train_dataset_edge.py", line 29, in init
self.train_list = os.listdir(self.data_path)
FileNotFoundError: [Errno 2] No such file or directory: '/myData/youtubevos_flows'

About setting parameter batch_size=1, num_frames=5

In the second stage of the FGT network training, I found that batch_size was only set to 1 and only 5 frames per video were selected for training. Thus, the size of the input tensor is (b, t, c, h, w)=>(1, 5, c, h, w). I would like to know why batch_size is set so small.

test own video , need mask?

If I want to test whether my video needs to correspond to the mask of each frame, in fact, I think the difficulty lies in obtaining the mask of each frame

About DDP aggregation gradients on different GPUs

In your FGT/FGT/networks/network.py module (see the following 'code mark 1' and 'code mark 2' ), I didn't find the .all_reduce() function to aggregate gradients on different GPUs.

#####==============code mark 1===============#####
dis_loss = (dis_real_loss + dis_fake_loss) / 2
self.dist_optim.zero_grad()
dis_loss.backward()
self.dist_optim.step()

#####==============code mark 2===============#####
loss = m_loss_valid + m_loss_masked + gen_loss
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

Should the code be rewritten like the following form (see the following 'rewritten 1' and 'rewritten 2' ) to aggregate gradients on different GPUs? If not, will each GPU be isolated and calculate the gradient update alone?

#####==============rewritten 1===============#####
dis_loss = (dis_real_loss + dis_fake_loss) / 2
self.dist_optim.zero_grad()
dis_loss.backward()
dis_loss=reduce_value(dis_loss, average=True)
self.dist_optim.step()

#####==============rewritten 2===============#####
loss = m_loss_valid + m_loss_masked + gen_loss
self.optimizer.zero_grad()
loss.backward()
loss=reduce_value(loss, average=True)
self.optimizer.step()

#####==============introduction function===============#####
from torch.distributed as dist
image

About FGT++

Hi, there~
Thank you for your great work. Intesting in your latest work about FGT++
Do you have plan to release video result or test script or source code of FGT++?

Default watermark removal mode config not work very well

Hi, thanks for uploading this great code. I really enjoy your work.

I ran the demo video in object-removal and watermark removal modes respectively, and found that the watermark removal results have residual images and are not stable enough.

I also tried to use the parameters of object-removal

consistencyThres: 5
flow_mask_dilates: 12
frame_dilates: 4

this time the generated result has some improvement.

I am confused about the default parameters used in watermark_removal, is there any way to improve the quality by adjusting the parameters? Thanks.

Not inpainting beginning and end of videos

H @hitachinsk . I was testing on a few (longer videos) and I found that the inpainting works well for the middle of the video but the object (despite having the necessary masks) is still present in the beginning and end of the video.

To process the longer videos, I have the step size to be 1/10th of the total number of frames and the neighbor_stride to be 10. Can you please explain how I can modify parameters to inpaint fully? I am not quite clear on how neighbor frames and step together play a role.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.