Giter Site home page Giter Site logo

omnimatterf's Introduction

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Project Page | arXiv

Video matting has broad applications, from adding interesting effects to casually captured movies to assisting video production professionals. Matting with associated effects like shadows and reflections has also attracted increasing research activity, and methods like Omnimatte have been proposed to separate foreground objects of interest into their own layers. However, prior works represent video backgrounds as 2D image layers, limiting their capacity to express more complicated scenes, thus hindering application to real-world videos. In this paper, we propose a novel video matting method, F2B3, that combines 2D foreground layers and a 3D background model. The 2D layers preserve the details of the subjects, while the 3D background robustly reconstructs scenes in real-world videos. Extensive experiments demonstrate that our method reconstructs with better quality on various videos.

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Geng Lin, Chen Gao, Jia-Bin Huang, Changil Kim, Yipeng Wang, Matthias Zwicker, Ayush Saraf

in ICCV 2023

Setup

Docker

If you have a containerized environment, you can run our code with this image: logchan/matting:20221229.01 on docker hub. It is recommended that you mount three paths inside the container:

  • /code for this repository
  • /data for video datasets (see data format)
  • /output for experiment output
  • /home/user for storing shell config and PyTorch cache; copy .bashrc to this folder to use fish by default

Check here for an example docker-compose.yaml.

Virtual Environment / Conda

You can setup a Python environment with these packages installed:

torch
torch-efficient-distloss
tinycudann
dataclasses-json
detectron2
hydra-core
kornia
lpips
scikit-image
tensorboard
tqdm

# for running RoDynRF
easydict
ConfigArgParse

Required software in PATH:

  • ffmpeg
  • colmap (for pose estimation only)

Data

Download our synthetic and captured datasets from Google Drive.

The following data are needed to run our method:

  • rgb_1x, input video sequence as image files
  • poses_bounds.npy or transforms.json, camera poses in the LLFF or NeRF Blender format
  • flow/flow and flow/flow_backward are forward and backward optical flows written with RAFT writeFlow; flow/confidence contains confidence maps generated by omnimatte
  • masks/mask, containing one or more subfolders, each providing a coarse mask sequence.
    • Note: our mask layer order is reverse of omnimatte's
  • depth, monocular depth estimation (required only if using depth loss)

While all paths are configurable with command line arguments, the code by default recognizes the following structure:

/data/matting/wild/bouldering
├── colmap
│   └── poses_bounds.npy
├── depth
│   └── depth
│       └── 00000.npy
├── flow
│   ├── confidence
│   │   └── 0001.png
│   ├── flow
│   │   └── 00000.flo
│   └── flow_backward
│       └── 00000.flo
├── homography
│   └── homographies.npy
├── masks
│   └── mask
│       └── 00
│           └── 00000.png
└── rgb_1x
    └── 00000.png

We also provide scripts for preparing all data required to run our pipeline, and for converting our data format to Omnimatte or Nerfies formats. See using your video for details.

Running our code

We use hydra for configuring the pipeline, training parameters, and evaluation setups. The entrypoint files and predefined configurations are located in the workflows folder.

You can find the documented config structure in code files under core/config.

All-in-One CLI

To make it easy to prepare data and run experiments, we have created a simple command line interface, ui/cli.py. It requires some setup as it enforces the data organization shown above. See how to use it in Using the CLI.

If you can't use the CLI, it basically wraps the commands described below.

Train

Basic configuration (without depth supervision)

# Using CLI

python ./ui/cli.py train_ours wild/walk

python ./ui/cli.py train_ours wild/bouldering -- \
    data_sources.llff_camera.scene_scale=0.2

# Invoke workflow directly

python workflows/train.py \
    --config-name train_both \
    output=/output/train/wild/walk/matting/basic-exp \
    dataset.path=/data/matting/wild/walk \
    dataset.scale=0.25 \
    contraction=ndc

python workflows/train.py \
    --config-name train_both \
    output=/output/train/wild/bouldering/matting/basic-exp \
    dataset.path=/data/matting/wild/bouldering \
    dataset.scale=0.25 \
    data_sources=[flow,mask,colmap] \
    contraction=ndc \
    data_sources.llff_camera.scene_scale=0.2

In the above command,

  • dataset.scale sets the resolution scale of the images. The bouldering video is 1080p and training at 0.5x scale would require ~40GB of VRAM.
  • data_sources specifies which data folders (apart from images) should be loaded for training.
    • The minimal requirement of our pipeline is [flow,mask,{pose}], where pose should be one of colmap, blender (for synthetic data), or rodynrf (if pose is from RoDynRF). The default is [flow,mask,colmap].
    • The rodynrf config uses the same npy file format as colmap, but assumes that the file is stored under rodynrf/poses_bounds.npy. It also disables some pose preprocessing steps.
  • contraction sets how rays should be contracted into a fixed volume for TensoRF. We use ndc for synthetic and COLMAP-reconstructed poses, and mipnerf for RoDynRF-predicted poses.
  • data_sources.llff_camera.scene_scale scales all camera origins to fit the scene in a smaller volume. In practice this prevents TensoRF from getting OOM errors for some videos.

With depth supervision

# Using CLI

python ./ui/cli.py \
    train_ours \
    wild/bouldering \
    --use_depths \
    -- \
    fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
    fg_losses.robust_depth_matching.config.alpha=0.1 \
    fg_losses.bg_distortion.config.alpha=0.01 \
    data_sources.llff_camera.scene_scale=0.2

# Invoke workflow directly

python workflows/train.py \
    --config-name train_both \
    output=/output/train/wild/bouldering/matting/exp-with-depths \
    dataset.path=/data/matting/wild/bouldering \
    dataset.scale=0.25 \
    data_sources=[flow,mask,colmap,depths] \
    contraction=ndc \
    fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
    fg_losses.robust_depth_matching.config.alpha=0.1 \
    fg_losses.bg_distortion.config.alpha=0.01 \
    data_sources.llff_camera.scene_scale=0.2

The configs robust_depth_matching and bg_distortion enables monocular depth supervision and distortion loss respectively.

Evaluate

By default, the evaluation script loads pipeline and dataset configurations from training:

# Using CLI

python ./ui/cli.py eval_ours wild/bouldering/exp-with-depths --step 15000

# Invoke workflow directly

python workflows/eval.py \
    output=/output/train/wild/bouldering/matting/exp-with-depths/eval/15000 \
    checkpoint=/output/train/wild/bouldering/matting/exp-with-depths/checkpoints/checkpoint_15000.pth

Clean background retraining

If you find some shadows captured in both foreground and background layers, it may be possible to obtain a clean background by training the TensoRF model from scratch, using the mask from the jointly-trained foreground.

The eval script generates fg_alpha which is the combined alpha of foreground layers. You can train the background RF using:

# Using CLI

python ui/cli.py \
    train_ours \
    --config train_bg \
    --name retrain_bg \
    --mask /output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
    wild/walk

# Invoke workflow directly

python workflows/train.py \
    --config-name train_bg \
    output=/output/train/wild/walk/retrain-bg \
    dataset.path=/data/matting/wild/walk \
    dataset.scale=0.25 \
    data_sources=[mask,colmap] \
    data_sources.mask.subpath=/output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
    contraction=ndc

Contact

For any issues related to code and data, file an issue or email [email protected].

Citation

@InProceedings{Lin_2023_ICCV,
  author    = {Geng Lin and Chen Gao and Jia-Bin Huang and Changil Kim and Yipeng Wang and Matthias Zwicker and Ayush Saraf},
  title     = {OmnimatteRF: Robust Omnimatte with 3D Background Modeling},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023}
}

Acknowledgements

The code is available under the MIT license.

Our codebase contains code from MiDaS, omnimatte, RAFT, RoDynRF, and TensoRF. Their licenses can be found under the licenses folder.

omnimatterf's People

Contributors

logchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omnimatterf's Issues

Preprocess Data

Great project, thank you to the author for their excellent work.
I encountered two issues during data preprocessing. One is the obtain COLMAP stage, and the other is the Depth Estimation stage. In the obtain COLMAP stage, only 10 of my images were pointed out; In the Depth Estimation stage, there is a problem of network structure mismatch in the weight dpt_beit_large_512.pt.

The first stage:

[2023-10-03 19:16:45,343][main][INFO] - Run: colmap feature_extractor --database_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/run_colmap/database.db --image_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/rgb_1x --SiftExtraction.use_gpu 0 --SiftExtraction.upright 0 --ImageReader.camera_model OPENCV --ImageReader.single_camera 1
[2023-10-03 19:16:45,418][main][INFO] - Run: colmap exhaustive_matcher --database_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/run_colmap/database.db --SiftMatching.use_gpu 0
[2023-10-03 19:16:45,543][main][INFO] - Run: colmap mapper --database_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/run_colmap/database.db --image_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/rgb_1x --output_path /data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/run_colmap/sparse --Mapper.ba_refine_principal_point 1 --Mapper.filter_max_reproj_error 2 --Mapper.tri_complete_max_reproj_error 2 --Mapper.min_num_matches 32
Post-colmap
Cameras 5
hwf = 450 450 1505.2784385924213
Images # 10
Points (313, 3) Visibility (313, 10)
Depth stats 0.013273805525082845 11.832586834420141 10.408927046544216
Done with imgs2poses
[2023-10-03 19:28:29,752][main][ERROR] - Colmap only recovered 10 for 200 images

The last stage:

[2023-10-03 18:44:15,308][main][INFO] - Process 200 files
/data_1/anaconda3/envs/omnimatte/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Error executing job with overrides: ['input=/data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/rgb_1x', 'output=/data_1/ldw_models/OmnimatteRF/data/matting/wild/obama/depth', 'model=/data_1/ldw_models/OmnimatteRF/data/pretrained/midas/dpt_beit_large_512.pt']
Traceback (most recent call last):
File "/data_1/ldw_models/OmnimatteRF/preprocess/run_depth.py", line 51, in main
model, transform, net_w, net_h = load_model(device, cfg.model, cfg.type, optimize=False)
File "/data_1/ldw_models/OmnimatteRF/third_party/MiDaS/midas/model_loader.py", line 52, in load_model
model = DPTDepthModel(
File "/data_1/ldw_models/OmnimatteRF/third_party/MiDaS/midas/dpt_depth.py", line 165, in init
self.load(path)
File "/data_1/ldw_models/OmnimatteRF/third_party/MiDaS/midas/base_model.py", line 18, in load
self.load_state_dict(parameters)
File "/data_1/anaconda3/envs/omnimatte/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DPTDepthModel:
Unexpected key(s) in state_dict: "pretrained.model.blocks.0.attn.relative_position_index", "pretrained.model.blocks.1.attn.relative_position_index", "pretrained.model.blocks.2.attn.relative_position_index", "pretrained.model.blocks.3.attn.relative_position_index", "pretrained.model.blocks.4.attn.relative_position_index", "pretrained.model.blocks.5.attn.relative_position_index", "pretrained.model.blocks.6.attn.relative_position_index", "pretrained.model.blocks.7.attn.relative_position_index", "pretrained.model.blocks.8.attn.relative_position_index", "pretrained.model.blocks.9.attn.relative_position_index", "pretrained.model.blocks.10.attn.relative_position_index", "pretrained.model.blocks.11.attn.relative_position_index", "pretrained.model.blocks.12.attn.relative_position_index", "pretrained.model.blocks.13.attn.relative_position_index", "pretrained.model.blocks.14.attn.relative_position_index", "pretrained.model.blocks.15.attn.relative_position_index", "pretrained.model.blocks.16.attn.relative_position_index", "pretrained.model.blocks.17.attn.relative_position_index", "pretrained.model.blocks.18.attn.relative_position_index", "pretrained.model.blocks.19.attn.relative_position_index", "pretrained.model.blocks.20.attn.relative_position_index", "pretrained.model.blocks.21.attn.relative_position_index", "pretrained.model.blocks.22.attn.relative_position_index", "pretrained.model.blocks.23.attn.relative_position_index".
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Could you help me? Thanks a lot.

High resolution output with affordable memory

Great work!
I have two questions:

  1. Resolution: if I want to render scenes with higher resolution (e.g. the solo video with 1080p), what should i do?
  2. Memory: if i render high resolution, the cost of memory is too larger, are there some approach to save memory? (e.g. reduce cache feature?)

Very thanks!

For depth supervision command

Use the '+' Sign to Add New Configurations: If scene_scale is a new field that you are adding to the configuration, you should prefix it with a '+' sign in your command. This tells Hydra that you are adding a new field rather than modifying an existing one. Modify your command as follows:

python ./ui/cli.py train_ours wild/visd_3_rp_2_depth --use_depths -- fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] fg_losses.robust_depth_matching.config.alpha=0.1 fg_losses.bg_distortion.config.alpha=0.01 +data_sources.llff_camera.scene_scale=0.2

errors in run_rodynrf

Thanks for your great work, I wonder where can I get
~/data/pretrained/midas/midas_v21-f6b98070.pt and ~/data/pretrained/raft/raft-things.pth are correct.
Thanks!

errors in run_depth.py

I tried downloaded dpt_beit_large_512.pt in https://github.com/isl-org/MiDaS, and it works, but it failed in OmnimatterRF's commands with following error messages:

    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DPTDepthModel:
        Unexpected key(s) in state_dict: "pretrained.model.blocks.0.attn.relative_position_index", "pretrained.model.blocks.1.attn.relative_position_index", "pretrained.model.blocks.2.attn.relative_position_index", "pretrained.model.blocks.3.attn.relative_position_index", "pretrained.model.blocks.4.attn.relative_position_index", "pretrained.model.blocks.5.attn.relative_position_index", "pretrained.model.blocks.6.attn.relative_position_index", "pretrained.model.blocks.7.attn.relative_position_index", "pretrained.model.blocks.8.attn.relative_position_index", "pretrained.model.blocks.9.attn.relative_position_index", "pretrained.model.blocks.10.attn.relative_position_index", "pretrained.model.blocks.11.attn.relative_position_index", "pretrained.model.blocks.12.attn.relative_position_index", "pretrained.model.blocks.13.attn.relative_position_index", "pretrained.model.blocks.14.attn.relative_position_index", "pretrained.model.blocks.15.attn.relative_position_index", "pretrained.model.blocks.16.attn.relative_position_index", "pretrained.model.blocks.17.attn.relative_position_index", "pretrained.model.blocks.18.attn.relative_position_index", "pretrained.model.blocks.19.attn.relative_position_index", "pretrained.model.blocks.20.attn.relative_position_index", "pretrained.model.blocks.21.attn.relative_position_index", "pretrained.model.blocks.22.attn.relative_position_index", "pretrained.model.blocks.23.attn.relative_position_index". 

Thank you very much for your support in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.