Giter Site home page Giter Site logo

microsoft / maskflownet Goto Github PK

View Code? Open in Web Editor NEW
366.0 9.0 73.0 239.94 MB

[CVPR 2020, Oral] MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Home Page: https://arxiv.org/abs/2003.10955

License: MIT License

Python 100.00%
optical-flow occlusion cvpr2020 sintel kitti feature-matching feature-warping

maskflownet's Introduction

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask, CVPR 2020 (Oral)

By Shengyu Zhao, Yilun Sheng, Yue Dong, Eric I-Chao Chang, Yan Xu.

[arXiv] [ResearchGate]

@inproceedings{zhao2020maskflownet,
  author = {Zhao, Shengyu and Sheng, Yilun and Dong, Yue and Chang, Eric I-Chao and Xu, Yan},
  title = {MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
}

Introduction

mask_visualization

Feature warping is a core technique in optical flow estimation; however, the ambiguity caused by occluded areas during warping is a major problem that remains unsolved. We propose an asymmetric occlusion-aware feature matching module, which can learn a rough occlusion mask that filters useless (occluded) areas immediately after feature warping without any explicit supervision. The proposed module can be easily integrated into end-to-end network architectures and enjoys performance gains while introducing negligible computational cost. The learned occlusion mask can be further fed into a subsequent network cascade with dual feature pyramids with which we achieve state-of-the-art performance. For more details, please refer to our paper.

This repository includes:

  • Training and inferring scripts using Python and MXNet; and
  • Pretrained models of MaskFlownet-S and MaskFlownet.

Code has been tested with Python 3.6 and MXNet 1.5.

Datasets

We follow the common training schedule for optical flow using the following datasets:

Please modify the paths specified in main.py (for FlyingChairs), reader/things3d.py (for FlyingThings3D), reader/sintel.py (for Sintel), reader/kitti.py (for KITTI 2012 & KITTI 2015), and reader/hd1k.py (for HD1K) according to where you store the corresponding datasets. Please be aware that the FlyingThings3D dataset (subset) is still very large, so you might want to load only a relatively small proportion of it (see main.py).

Training

The following script is for training:

python main.py CONFIG [-dataset_cfg DATASET_CONFIG] [-g GPU_DEVICES] [-c CHECKPOINT, --clear_steps] [--debug]

where CONFIG specifies the network and training configuration; DATASET_CONFIG specifies the dataset configuration (default to chairs.yaml); GPU_DEVICES specifies the GPU IDs to use (default to cpu only), split by commas with multi-GPU support. Please make sure that the number of GPUs evenly divides the BATCH_SIZE, which depends on DATASET_CONFIG (BATCH_SIZE are 8 or 4 in the given configurations, so 4, 2, or 1 GPU(s) will be fine); CHECKPOINT specifies the previous checkpoint to start with; use --clear_steps to clear the step history and start from step 0; use --debug to enter the DEBUG mode, where only a small fragment of the data is read. To test whether your environment has been set up properly, run: python main.py MaskFlownet.yaml -g 0 --debug.

Here, we present the procedure to train a complete MaskFlownet model for validation on the Sintel dataset. About 20% sequences (ambush_2, ambush_6, bamboo_2, cave_4, market_6, temple_2) are split as Sintel val, while the remaining are left as Sintel train (see Sintel_train_val_maskflownet.txt). CHECKPOINT in each command line should correspond to the name of the checkpoint generated in the previous step.

# Network Training Validation Command Line
1 MaskFlownet-S Flying Chairs Sintel train + val python main.py MaskFlownet_S.yaml -g 0,1,2,3
2 MaskFlownet-S Flying Things3D Sintel train + val python main.py MaskFlownet_S_ft.yaml --dataset_cfg things3d.yaml -g 0,1,2,3 -c [CHECKPOINT] --clear_steps
3 MaskFlownet-S Sintel train + KITTI 2015 + HD1K Sintel val python main.py MaskFlownet_S_sintel.yaml --dataset_cfg sintel_kitti2015_hd1k.yaml -g 0,1,2,3 -c [CHECKPOINT] --clear_steps
4 MaskFlownet Flying Chairs Sintel val python main.py MaskFlownet.yaml -g 0,1,2,3 -c [CHECKPOINT] --clear_steps
5 MaskFlownet Flying Things3D Sintel val python main.py MaskFlownet_ft.yaml --dataset_cfg things3d.yaml -g 0,1,2,3 -c [CHECKPOINT] --clear_steps
6 MaskFlownet Sintel train + KITTI 2015 + HD1K Sintel val python main.py MaskFlownet_sintel.yaml --dataset_cfg sintel_kitti2015_hd1k.yaml -g 0,1,2,3 -c [CHECKPOINT] --clear_steps

Pretrained Models

Pretrained models for step 2, 3, and 6 in the above procedure are given (see ./weights/).

Inferring

The following script is for inferring:

python main.py CONFIG [-g GPU_DEVICES] [-c CHECKPOINT] [--valid or --predict] [--resize INFERENCE_RESIZE]

where CONFIG specifies the network configuration (MaskFlownet_S.yaml or MaskFlownet.yaml); GPU_DEVICES specifies the GPU IDs to use, split by commas with multi-GPU support; CHECKPOINT specifies the checkpoint to do inference on; use --valid to do validation; use --predict to do prediction; INFERENCE_RESIZE specifies the resize used to do inference.

For example,

  • to do validation for MaskFlownet-S on checkpoint fffMar16, run python main.py MaskFlownet_S.yaml -g 0 -c fffMar16 --valid (the output will be under ./logs/val/).

  • to do prediction for MaskFlownet on checkpoint 000Mar17, run python main.py MaskFlownet.yaml -g 0 -c 000Mar17 --predict (the output will be under ./flows/).

Inferrence on New Data

For those who do not wish to train the model and would purely like to obtain flow images from a pretrained model on their own data, please use predict_new_data.py. You do not need to download any of the optical flow datasets to use predict_new_data.py, although you will have to additionally pip install flow_vis and moviepy. The functions provide a means to load a model and perform inference on a given pair of images or to obtain a series of flow images corresponding to the movement between component images of a given video without the need to download optical flow datasets. These can be called from another script or you can call the program from a terminal/Anaconda prompt like so:

  • to obtain a video composed of the flow images corresponding to input_video.mp4, run python predict_new_data.py C:/Users/my_username/flow_video_filepath.mp4 MaskFlownet.yaml --video_filepath C:/Users/my_username/input_video.mp4 -g 0 -c 8caNov12

  • to obtain a flow image from 2 input images image_1.png and image_2.png, run python predict_new_data.py C:/Users/my_username/flow_image_filepath.png MaskFlownet.yaml --image_1 C:/Users/my_username/image_1.png --image_2 C:/Users/my_username/image_2.png -g 0 -c 8caNov12

Acknowledgement

We thank Tingfung Lau for the initial implementation of the FlyingChairs pipeline.

maskflownet's People

Contributors

microsoftopensource avatar simon1727 avatar zsyzzsoft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maskflownet's Issues

Error in operator maskflownet0_maskflownet_s0_upsample0_reshape_like0

Dear all,

I am recently trying to use the code to do some inference of my own, and I basically use the code in master branch, combine the code in main.py and predict.py. But I have received strange errors during the forward pass of the network. Please tell me if I have got anything wrong. Thank you very much!

My code is as follows:

import os
import sys
import argparse
import json
import yaml
import numpy as np
import mxnet as mx
import cv2

import network.config
from network import get_pipeline

# NOTE: Use Default Value when running
parser = argparse.ArgumentParser()
parser.add_argument('--config', default='MaskFlownet.yaml', type=str)
parser.add_argument('--gpu_device', default='0', type=str)
parser.add_argument('--network', default='MaskFlownet', type=str)
parser.add_argument('--data_folder', type=str, default='./data/')
parser.add_argument('--img_folder', type=str, default='images/')
parser.add_argument('--checkpoint', type=str, default='5adNov03-0005_1000000.params')
args = parser.parse_args()
args.img_folder = os.path.join(args.data_folder, args.img_folder)


def main():
    ctx = [mx.gpu(gpu_id) for gpu_id in map(int, args.gpu_device.split(','))]
    prefix = os.path.dirname(__file__)
    config_file = 'MaskFlownet.yaml'
    config_path = os.path.join(prefix, 'network/config', config_file)
    with open(config_path) as f:
        config = network.config.Reader(yaml.load(f))
    
    pipe = get_pipeline('MaskFlownet', ctx=ctx, config=config)
    checkpoint_path = os.path.join(prefix, 'weights', args.checkpoint)
    pipe.load(checkpoint_path)
    pipe.fix_head()

    pre_img_name = '1.jpg'
    cur_img_name = '2.jpg'
    pre_img_path = os.path.join(args.img_folder, pre_img_name)
    cur_img_path = os.path.join(args.img_folder, cur_img_name)
    # read the image and resize
    h, w = 576, 1024
    pre_img = cv2.imread(pre_img_path)
    pre_img = cv2.resize(pre_img, (w, h))
    cur_img = cv2.imread(cur_img_path)
    cur_img = cv2.resize(cur_img, (w, h))
    
    # Output the shape of images, making sure the size is correct
    # The shapes are both
    print('Image Shapes {:} {:}'.format(pre_img.shape, cur_img.shape))
    flow = list(pipe.predict([pre_img], [cur_img], batch_size=1))[0][0]
    print(flow)
    return


if __name__ == '__main__':
    main()

And my error log is

Default FLAGS..network.flow_multiplier to 1.0
Default FLAGS..network.deform_bias to True
Default FLAGS..network.upfeat_ch to [16, 16, 16, 16]
Default FLAGS..network.flow_multiplier to 1.0
Default FLAGS..network.deform_bias to True
Default FLAGS..network.upfeat_ch to [16, 16, 16, 16]
Default FLAGS..network.mw to [0.005, 0.01, 0.02, 0.08, 0.32]
Default FLAGS..optimizer.q to None
Image Shapes (576, 1024, 3) (576, 1024, 3)
Traceback (most recent call last):
  File "/mnt/truenas/scratch/ziqi.pang/MaskFlowNet/infer.py", line 67, in <module>
    main()
  File "/mnt/truenas/scratch/ziqi.pang/MaskFlowNet/infer.py", line 61, in main
    flow = list(pipe.predict([pre_img], [cur_img], batch_size=1))[0][0]
  File "/mnt/truenas/scratch/ziqi.pang/MaskFlowNet/network/pipeline.py", line 209, in predict
    flow, occ_mask, warped, _ = self.do_batch(img1s, img2s, resize = resize)
  File "/mnt/truenas/scratch/ziqi.pang/MaskFlowNet/network/pipeline.py", line 136, in do_batch
    flows, occ_masks, _ = self.do_batch_mx(img1, img2, resize = resize)
  File "/mnt/truenas/scratch/ziqi.pang/MaskFlowNet/network/pipeline.py", line 131, in do_batch_mx
    pred, flows, warpeds = self.network(img1, img2)
  File "/root/.tspkg/lib/python3/mxnet/gluon/block.py", line 471, in __call__
    return self.forward(*args)
  File "/root/.tspkg/lib/python3/mxnet/gluon/block.py", line 705, in forward
    return self._call_cached_op(x, *args)
  File "/root/.tspkg/lib/python3/mxnet/gluon/block.py", line 612, in _call_cached_op
    out = self._cached_op(*cargs)
  File "/root/.tspkg/lib/python3/mxnet/_ctypes/ndarray.py", line 149, in __call__
    ctypes.byref(out_stypes)))
  File "/root/.tspkg/lib/python3/mxnet/base.py", line 149, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator maskflownet0_maskflownet_s0_upsample0_reshape_like0: [15:34:33] src/operator/tensor/elemwise_unary_op_basic.cc:348: Check failed: (*in_attrs)[0].Size() == (*in_attrs)[1].Size() (1152 vs. 288) Cannot reshape lhs with shape [2,1,18,32]to rhs with shape [1,2,9,16] because they have different size.

Stack trace returned 10 entries:
[bt] (0) /root/.tspkg/lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f2f8ed8507b]
[bt] (1) /root/.tspkg/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f2f8ed85be8]
[bt] (2) /root/.tspkg/lib/libmxnet.so(+0x15e128a) [0x7f2f8f7fe28a]
[bt] (3) /root/.tspkg/lib/libmxnet.so(+0x2f8b571) [0x7f2f911a8571]
[bt] (4) /root/.tspkg/lib/libmxnet.so(mxnet::exec::InferShape(nnvm::Graph&&, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1ada) [0x7f2f911aa72a]
[bt] (5) /root/.tspkg/lib/libmxnet.so(mxnet::imperative::CheckAndInferShape(nnvm::Graph*, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >&&, bool, std::pair<unsigned int, unsigned int>, std::pair<unsigned int, unsigned int>)+0x13c) [0x7f2f912abdfc]
[bt] (6) /root/.tspkg/lib/libmxnet.so(mxnet::Imperative::CachedOp::GetForwardGraph(bool, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x548) [0x7f2f9129a5a8]
[bt] (7) /root/.tspkg/lib/libmxnet.so(mxnet::Imperative::CachedOp::Forward(std::shared_ptr<mxnet::Imperative::CachedOp> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0xb5) [0x7f2f912a23f5]
[bt] (8) /root/.tspkg/lib/libmxnet.so(MXInvokeCachedOp+0xc39) [0x7f2f917c8569]
[bt] (9) /root/.tspkg/lib/libmxnet.so(MXInvokeCachedOpEx+0x3ee) [0x7f2f917c975e]

Can I use the code to recurrent the performance reported in paper?

Hi!
I am really interested in your paper and this repo.
I have trained the MaskflowNet_s using FlyingChairs dataset with the code provided in this
repo. However, the epe on mpi-sintel clean and final is 2.999 and 4.399. There is quite a gap between the reported epe 2.88 and 4.25.
So, here is my question, Is the code provided in this repo is that you used for experiment during writing the paper? Can I use the code to recurrent the performance reported in paper without further modify? The performance reported in this paper is the best checkpoint or the final checkpoint?
Thanks!

confusing about FlyingThings3d dataset preload

Hi!
Thanks for sharing!
When I read the code, I noticed that, seems all the training data is preloaded into RAM? and
when training on FlyingThings3d dataset, you have separate it into several parts, and only load one part of it during training. I didn't find the code about reload the rest parts, Could you please point it out for me ? Thanks again!

RAM leak problem when training in command batch_queue.get(). Where do you release resources once a batch is trained.

I have noticed that your training loop leaks small amounts of RAM memory.Any idea on what may have caused this?

time taken= 9.865329265594482 | steps= 1 | cpu= 51.8 | ram= 34.50078675328186 | gpu= [3101]
[5613]
time taken= 0.934636116027832 | steps= 2 | cpu= 27.0 | ram= 29.34866251942084 | gpu= [5613]
[3045]
time taken= 0.8695635795593262 | steps= 3 | cpu= 29.4 | ram= 29.217970957706278 | gpu= [3045]
[3021]
time taken= 0.8483304977416992 | steps= 4 | cpu= 29.8 | ram= 29.033316428574086 | gpu= [3021]
[2997]
time taken= 0.8630681037902832 | steps= 5 | cpu= 30.2 | ram= 28.87988403913803 | gpu= [2997]
[2997]
time taken= 0.8645083904266357 | steps= 6 | cpu= 29.4 | ram= 28.714746447210654 | gpu= [2997]
[2997]
time taken= 0.864253044128418 | steps= 7 | cpu= 29.3 | ram= 28.573093657739385 | gpu= [2997]
[2997]
time taken= 0.8693573474884033 | steps= 8 | cpu= 29.3 | ram= 28.389703885656044 | gpu= [2997]
[2997]
time taken= 0.8704898357391357 | steps= 9 | cpu= 29.4 | ram= 28.298690976454438 | gpu= [2997]
[2997]
time taken= 0.8670341968536377 | steps= 10 | cpu= 29.5 | ram= 28.13385097442091 | gpu= [2997]
[2997]
time taken= 0.8750414848327637 | steps= 11 | cpu= 29.5 | ram= 27.959884882309396 | gpu= [2997]
[2997]
time taken= 0.8624210357666016 | steps= 12 | cpu= 29.9 | ram= 27.784356443255188 | gpu= [2997]
[2997]
time taken= 0.8561670780181885 | steps= 13 | cpu= 29.8 | ram= 27.644241201568796 | gpu= [2997]
[2997]
time taken= 0.8609695434570312 | steps= 14 | cpu= 29.7 | ram= 27.51883186047002 | gpu= [2997]
[2997]
time taken= 0.8462607860565186 | steps= 15 | cpu= 29.7 | ram= 27.36641623650461 | gpu= [2997]
[2997]
time taken= 0.8624782562255859 | steps= 16 | cpu= 29.2 | ram= 27.23760941078441 | gpu= [2997]
[2997]
time taken= 0.8649694919586182 | steps= 17 | cpu= 29.4 | ram= 27.113514425050127 | gpu= [2997]
[2997]
time taken= 0.8661544322967529 | steps= 18 | cpu= 29.3 | ram= 27.004993310427178 | gpu= [2997]
[2997]
time taken= 0.8687705993652344 | steps= 19 | cpu= 29.8 | ram= 26.82090916192486 | gpu= [2997]
[2997]
time taken= 0.8823645114898682 | steps= 20 | cpu= 29.6 | ram= 26.688630454109777 | gpu= [2997]
[2997]
time taken= 0.8795809745788574 | steps= 21 | cpu= 29.4 | ram= 26.517987449146226 | gpu= [2997]
[2997]
time taken= 0.8857841491699219 | steps= 22 | cpu= 29.1 | ram= 26.40289455770082 | gpu= [2997]
[2997]
time taken= 0.8605339527130127 | steps= 23 | cpu= 29.5 | ram= 26.274509317663572 | gpu= [2997]
[2997]
time taken= 0.8524265289306641 | steps= 24 | cpu= 29.8 | ram= 26.16445065525575 | gpu= [2997]

Mask visualization problem

I tried to visualize the mask image with the code below:

output = 1-(occ_mask - occ_mask.min()) / (occ_mask.max() - occ_mask.min())
io.imsave(os.path.join(seq_output_folder, fname), output)

The result is not the same with your paper claimed.
Are there any problems here?

How can I infer my own data by net MaskFlownet_S?

AssertionError: Parameter 'maskflownet_s0_hybridsequential0_conv1aweight' is missing in file './MaskFlownet/weights/dbbSep30-1206_1000000.params', which contains parameters: 'maskflownet0_hybridsequential0_conv1aweight', 'maskflownet0_hybridsequential0_conv1abias', 'maskflownet0_hybridsequential1_conv1bweight', ..., 'maskflownet0_hybridsequential55_conv3fweight', 'maskflownet0_hybridsequential55_conv3fbias', 'maskflownet0_hybridsequential56_conv2fweight', 'maskflownet0_hybridsequential56_conv2fbias'. Please make sure source and target networks have the same prefix.

I get error as above, and how can I get occlusion masks as paper's work?
this is the command I used:
python predict_new_data.py ./test.png MaskFlownet_S.yaml --image_1 ./image_1.png --image_2 ./image_2.png -g 0 -c 5adNov03

How to do post-processing when submit flow results to KITTI eval server?

Hi, @simon1727 I noticed you mentioned that force to make ground truth dense in training stage #5.
Then how to do post-process when we submit to Kitti eval server in prediction stage?
In other words, how to get sparse result according to dense flow results, which makes sure getting promising eval results? Because the ground truth is sparse in test dataset also.

Some Question About the Ablation Exps

Thanks for your great work!
I understand u stress the unsupervised learning of the mask and I just read your code to make sure u successfully learn the mask in an unsupervised manner. But I wonder that we all get the occlusion maps as the supervision to train the MaskFlownet-S. Simply adding an EPE or a cross-entropy loss may guide the MaskFlownet-S to learn a better attention mask. I understand it will take a long time to generate all of the mask maps in these datasets. It is indeed a problem.
Here are some questions about the demonstration of middle results:

  1. Did u do the comparison of the supervised and unsupervised learning of mask and their final influence on the result?
  2. The mask seems to be right on the foreground-background case, but actually, we don't really care about the background flow which is far from the foreground. Do u get more visualization of the mask on the objects which are moving close to each other like the third and seventh row in Figure 12?

Thanks for your attention.

Some problems about the prediction of optical flow and occlusion

Thanks for your great job!

When I run python main.py MaskFlownet.yaml -g 0 -c 000Mar17 --predict. I got outputs in /flows.

But it doesn't seem to be the right size.
000000_10

The only thing I changed in the code was to replace imread in skimage.io with imread in cv2.

import skimage.io

My environment:
python 3.6.8
mxnetcu90-1.5.1
CUDA 9.0.176
cudnn 7.6.5

In addition, I would like to ask how to correctly visualize occlusion in binary form?

您好,我也是做这方面工作的。能否交流下?

您好,我也是做这方面工作的。Occlusion+optical flow,去年我就完成了预测occlusion mask放入PWC-Net网络作为decoder分支网络的辅助信息类似的工作,很遗憾后面没有就这个工作深入下去。
还没深入看您的paper,但感觉思路和我当时的相差无几,如果方便的话,能否留个联系方式探讨下?

Question regarding Occlusion-Aware Pyramid

Hi,

I have a question regarding the Occlusion-Aware Pyramid.

In the paper, it writes

image

in the code, it is

mask0 = Upsample(4)(mask2)  
mask0 = F.sigmoid(mask0) - 0.5  
c30 = c10  
c40 = self.warp(c20, Upsample(4)(flow2)*self.scale)  
# concat image 1 with zero mask
c30 = F.concat(c30, F.zeros_like(mask0), dim=1)  
# concat warped image 2 with occlusion mask
c40 = F.concat(c40, mask0, dim=1)  

From my understanding, the occlusion mask is a probability map (where 1 stands for occlusion and 0 stands for non-occlusion), and after subtraction by 0.5, the range would be [-0.5, 0.5], and value 0, in this case, would mean "don't know whether there is occlusion or not".

Then the question is why image 1 I1 is concatenated with a zero mask instead of a -0.5 mask, or the same occlusion map as image 2 I2? Since the follow-up conv layers are shared for variables c30 and c40, shouldn't the concatenated occlusion mask have the same meaning for both I1 and I2 ?

Thanks a lot!

Does image size matters?

Hi, really appreciate for opening the code! Can i have few questions?

Does the image size influence the performance? I mean in the training stage, image patches (896x320 for Kitti) are used. But in testing, do you still use the image patches? Or use the entire image (around 1240x370 for Kitti)?

I know cropping is for augmentation, but If you use small pates for training, and larger images for testing, does this strategy will influence the performance?

Thank you very much!

Can you provide the .state file?

I want to fine tune pretrained maskflownet on ChairsSDHom.
I have written the dataset scripts for it but am getting this error while training.

Traceback (most recent call last):
  File "main.py", line 143, in <module>
    pipe.trainer.load_states(checkpoint.replace('params', 'states'))
  File "/home/mask/miniconda3/envs/mask/lib/python3.6/site-packages/mxnet/gluon/trainer.py", line 515, in load_states
    with open(fname, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/mask/maskflownet/MaskFlownet/weights/5adNov03-0005_1000000.states'

Can you please help me on how to proceed?

Question about provided models

Hi,

Many thanks for your nice work.

I have a question in weights fold what the difference between '771Sep25-0735_500000' and 'abbSep15-1037_500000' . I saw in the readme 771Sep25-0735_500000 in Pre-trained Models, but it is not in the evaluation table.

Could you explain the difference between them?

Best regards,
Meow

No activation function in FeaturePyramidExtractor?

Hi!
Thanks for your greater work! I am ready to relocate your work from mxnet to pytorch. But I encountered a problem that there is no activation function in FeaturePyramidExtractor different from PWC-net.
Is that a bug?

Finetuning on chairsSDHom epe doesn't go down.

Issue is on training the validation loss goes up too much very quickly. check logs below.

I have added chairsSDHom data loading script as follows.
Changes:

  1. Loading data at iterate_data instead of reading all images into a list in main.py
  2. added chairsSDHom.py, chairsSDHom.yaml
    I have attached all code which i have updated below.

1 . main.py

...
...
elif dataset_cfg.dataset.value == "chairsSDHom":
        batch_size=3
        orig_shape= [384,512]
        # training
        chairsSDHom_dataset = chairsSDHom.list_data()
        print(chairsSDHom_dataset['flow'][0])
        from pympler.asizeof import asizeof
        trainImg1 = [file for file in chairsSDHom_dataset['image_0']]
        trainImg2 = [file for file in chairsSDHom_dataset['image_1']]
        trainFlow = [file for file in chairsSDHom_dataset['flow']]
        trainMask = [file for file in chairsSDHom_dataset['mask']]
        trainSize = len(trainFlow)
        training_datasets = [(trainImg1, trainImg2, trainFlow,trainMask)] * batch_size

        # validaion- sintel
        sintel_dataset = sintel.list_data()
        divs = ('training',) if not getattr(config.network, 'class').get() == 'MaskFlownet' else ('training2',)
        for div in divs:
                for k, dataset in sintel_dataset[div].items():
                        dataset = dataset[:samples]
                        img1, img2, flow, mask = [[sintel.load(p) for p in data] for data in zip(*dataset)]
                        validationSize = len(flow)
                        validation_datasets['sintel.' + k] = (img1, img2, flow, mask)
...
...
def iterate_data(iq, dataset):
    if dataset_cfg.dataset.value == 'chairsSDHom' or dataset_cfg.dataset.value == "things3d":
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            if dataset_cfg.dataset.value == "chairsSDHom":
                data = [skimage.io.imread(data[0]),skimage.io.imread(data[1]),chairsSDHom.load(data[2]),skimage.io.imread(data[3])]
            elif dataset_cfg.dataset.value == "things3d":
                data = [cv2.imread(data[0]).astype('uint8'),skimage.io.imread(data[1]).astype('uint8'),things3d.load(data[2]).astype('float16')]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
    else:
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
...

rest everthing is same

yet training

updated code.zip


Logs:

[2020/12/22 21:36:48] start=0, train=21670, val=224, host=ludwig, batch=3
[2020/12/22 21:36:48] batch=8, config='MaskFlownet_ft.yaml', dataset_cfg='chairsSDHom.yaml', shard=1, gpu_device='1', checkpoint='5adNov03', clear_steps=True, network='MaskFlownet', debug=False, valid=Fa
lse, predict=False, resize=''
[2020/12/22 21:36:54] steps=1, epe=81.23613661839343, total_time=0.00
[2020/12/22 21:37:20] steps=1, sintel.clean=1.4036083221435547, sintel.final=**1.7385120391845703**
[2020/12/22 21:37:20] steps=2, epe=82.52426050579368, total_time=31.65
[2020/12/22 21:37:21] steps=3, epe=70.33922181313649, total_time=15.62
[2020/12/22 21:37:21] steps=4, epe=64.53729546698513, total_time=10.30
[2020/12/22 21:37:21] steps=5, epe=73.13790790314701, total_time=7.64
[2020/12/22 21:37:22] steps=6, epe=69.97008332644914, total_time=6.04
[2020/12/22 21:37:22] steps=7, epe=63.190831684866595, total_time=4.98
[2020/12/22 21:37:23] steps=8, epe=69.54386270096657, total_time=4.23
[2020/12/22 21:37:23] steps=9, epe=71.65906570549198, total_time=3.66
[2020/12/22 21:37:24] steps=10, epe=70.68287622669239, total_time=3.22
[2020/12/22 21:37:24] steps=11, epe=68.10887379487774, total_time=2.88
[2020/12/22 21:37:24] steps=12, epe=65.31357897717663, total_time=2.59
[2020/12/22 21:37:25] steps=13, epe=67.39865911195284, total_time=2.36
[2020/12/22 21:37:25] steps=14, epe=66.05316386284305, total_time=2.16
[2020/12/22 21:37:26] steps=15, epe=62.74090359794587, total_time=1.99
[2020/12/22 21:37:26] steps=16, epe=65.24516708995266, total_time=1.85
[2020/12/22 21:37:27] steps=17, epe=61.783343363284466, total_time=1.72
[2020/12/22 21:37:27] steps=18, epe=66.12157773880946, total_time=1.61
[2020/12/22 21:37:27] steps=19, epe=65.41601491031372, total_time=1.51
[2020/12/22 21:37:28] steps=20, epe=67.27401184191667, total_time=1.42
[2020/12/22 21:37:41] steps=50, epe=64.05605013410363, total_time=0.57
[2020/12/22 21:38:03] steps=100, epe=60.72789733634401, total_time=0.45
[2020/12/22 21:38:30] steps=100, sintel.clean=3.107024669647217, sintel.final=**3.6572041511535645**
[2020/12/22 21:38:51] steps=150, epe=58.168171286698964, total_time=0.55
[2020/12/22 21:39:14] steps=200, epe=55.366796654848244, total_time=0.45
[2020/12/22 21:39:41] steps=200, sintel.clean=4.636238098144531, sintel.final=**5.08129358291626**
[2020/12/22 21:40:03] steps=250, epe=52.92103477169547, total_time=0.56
[2020/12/22 21:40:25] steps=300, epe=50.651504112365515, total_time=0.45
[2020/12/22 21:40:52] steps=300, sintel.clean=5.46751070022583, sintel.final=**5.855245113372803**
[2020/12/22 21:41:13] steps=350, epe=48.90560261388807, total_time=0.55
[2020/12/22 21:41:36] steps=400, epe=47.090479957163055, total_time=0.45
[2020/12/22 21:42:02] steps=400, sintel.clean=6.850785255432129, sintel.final=**7.147568702697754**
[2020/12/22 21:42:24] steps=450, epe=45.47630244939083, total_time=0.55
[2020/12/22 21:42:47] steps=500, epe=43.721847967473224, total_time=0.45
[2020/12/22 21:43:14] steps=500, sintel.clean=7.392406940460205, sintel.final=**7.563663005828857**
[2020/12/22 21:43:36] steps=550, epe=41.861068025751216, total_time=0.56
[2020/12/22 21:43:59] steps=600, epe=40.728338542736246, total_time=0.45
[2020/12/22 21:44:25] steps=600, sintel.clean=8.37342643737793, sintel.final=**8.398472785949707**
[2020/12/22 21:44:47] steps=650, epe=39.22414651439415, total_time=0.55
[2020/12/22 21:45:09] steps=700, epe=38.01273616706755, total_time=0.45
[2020/12/22 21:45:36] steps=700, sintel.clean=8.904271125793457, sintel.final=**8.86906623840332**
[2020/12/22 21:45:57] steps=750, epe=36.68394209224638, total_time=0.55
[2020/12/22 21:46:20] steps=800, epe=35.51223404091925, total_time=0.45
[2020/12/22 21:46:46] steps=800, sintel.clean=9.723841667175293, sintel.final=**9.715934753417969**
[2020/12/22 21:47:08] steps=850, epe=34.441762749200876, total_time=0.55
[2020/12/22 21:47:30] steps=900, epe=33.21928807435762, total_time=0.45
[2020/12/22 21:47:56] steps=900, sintel.clean=10.129880905151367, sintel.final=**10.09166431427002**

Question 1) Any idea on why is the network output is such? And how may i fix this?
Question 2) Is there anything you think that is very wrong in the edits i have made?

Thank you so much. Highly appriciate your work.<3 :D

bugfix for kitti

Hi Simon,
some bugfix as follow.

  1. reader/kitti.py: line44 and line 99
    samples = None ->samples =-1;
    because samples = 32 if args.debug else -1 in main.py line194;
  2. reader/kitti.py: line98
    num_files = (len(os.listdir(path_testing)) - 1) // 2 -> num_files = len(os.listdir(path_testing)) // 2;
  3. reader/kitti.py: line105 & line106
    img0 = cv2.resize(img0, resize)
    img1 = cv2.resize(img1, resize)
    ->
    adj_resize = (resize[1], resize[0])
    img0 = cv2.resize(img0, adj_resize)
    img1 = cv2.resize(img1, adj_resize)
    because cv2.resize() should be dim(width, height) -> dim(cols, rows)

Memory curroption

Hi,

Your result is very impressive and. Unfortunately, I'm getting the following error:

*** Error in `python3': double free or corruption (!prev): 0x000055c2f1f46400 ***
Aborted

Did you ever encounter this type of error? or have any idea how to fix it?

Thanks

How to adjust the parameter?

I train the Maskflownet on Sintel train + KITTI 2015 + HD1K without any change of your code, however the performance of my own trained model is not as well as your pretrained model "8caNov12", if there is any difference in your uploaded MaskFlownet_sintel.yaml and sintel_kitti2015_hd1k.yaml? Or should I need adjust any other parameters? Thank you very much for your help in advance.

Details about fine-tune in the Flyingthings3D

Can you tell me the number of training iterations for finetune on flyings3d after pre training on flyingchairs? Would you like to use s-fine to train 0.5m iterations after flyingchairs 1.2m iterations, or restart using long + sfine for a total of 1.7m iterations?

Finetuning on KITTI dataset with sparse ground truth

Hi, authors,

Thanks for sharing the code, it is a great work!

When fine-tuning on KITTI dataset, it only has sparse ground truth. In this case, If we employ some geometric transformation such as scale and roatation with biliear sampling, it will cause problem, because there are many zeros for those non-labeled pixels. Besides, the binary mask will become non-binary anymore.

In your paper, you state that For sparse ground-truth flow in KITTI, the augmented flow is weighted averaged based on the interpolated valid mask. But in the code, I cannot find how you handle this in detail. Could you please tell me how employ geometric transformations on sparse ground truth datasets (e.g, interpolated valid mask)?

Thanks for your attention!

What is the variable F in your code please . . .

could you please explain what should be the F in this line of code :
x2_warp = self.deform5(x2, F.repeat(F.expand_dims(flow*self.scale/self.strides[1], axis=1), 9, axis=1).reshape((0, -3, -2)))

thank you

How to use inference with my own dataset?

Hi, thanks for sharing.
I am trying to test your model on a pair of image but could not make it working.
I installed Python 3.6.10 and mxnet1.5 using anaconda and all necessary modules.
By now it crashes when reading the model, something is missing. Here is my command:
python main.py MaskFlownet_S.yaml -c 8caNov12 --predict --clear_steps --debug
and result is:
[('C:\Users\cvestri\Work\Dev\RDVision\Code\MaskFlownet\logs\8caNov12-1532.log', '8caNov12-1532', '-1532')]
Default FLAGS..network.flow_multiplier to 1.0
Default FLAGS..network.deform_bias to True
Default FLAGS..network.upfeat_ch to [16, 16, 16, 16]
Default FLAGS..network.mw to [0.005, 0.01, 0.02, 0.08, 0.32]
Default FLAGS..optimizer.q to None
Default FLAGS..optimizer.learning_rate to None
Load Checkpoint C:\Users\cvestri\Work\Dev\RDVision\Code\MaskFlownet\weights\8caNov12-1532_300000.params
load the weight for the network
Traceback (most recent call last):
File "main.py", line 136, in
pipe.load(checkpoint)
File "C:\Users\cvestri\Work\Dev\RDVision\Code\MaskFlownet\network\pipeline.py", line 57, in load
self.network.load_parameters(checkpoint, ctx=self.ctx)
File "C:\Users\cvestri\AppData\Local\conda\conda\envs\py36_mxnet\lib\site-packages\mxnet\gluon\block.py", line 394, in load_parameters
cast_dtype=cast_dtype, dtype_source=dtype_source)
File "C:\Users\cvestri\AppData\Local\conda\conda\envs\py36_mxnet\lib\site-packages\mxnet\gluon\parameter.py", line 968, in load
name[lprefix:], filename, _brief_print_list(arg_dict.keys()))
AssertionError: Parameter 'hybridsequential0_conv1aweight' is missing in file 'C:\Users\cvestri\Work\Dev\RDVision\Code\MaskFlownet\weights\8caNov12-1532_300000.params', which contains parameters: 'maskflownet_s0_maskflownet_s0_hybridsequential0_conv1aweight', 'maskflownet_s0_maskflownet_s0_hybridsequential0_conv1abias', 'maskflownet_s0_maskflownet_s0_hybridsequential1_conv1bweight', ..., 'maskflownet_s0_deform3weight', 'maskflownet_s0_deform3bias', 'maskflownet_s0_deform2weight', 'maskflownet_s0_deform2bias'. Please make sure source and target networks have the same prefix.

it is the same with mxNet 1.6
Thanks

Some problems about MXNetError

Hi,
When I run the main.py with "MaskFlownet_S_sintel.yaml --dataset_cfg sintel_kitti2015_hd1k.yaml -g 0 -c dbbSep30-1206 --clear_steps", i got the error like below:
Traceback (most recent call last):
File "/MaskFlownet-master/main.py", line 537, in
train_log = pipe.train_batch(img1, img2, flow, geo_aug, color_aug, mask = mask)
File "/MaskFlownet-master/network/pipeline.py", line 101, in train_batch
img1s, img2s, labels, masks = geo_aug(img1s, img2s, labels, masks)
File "/anaconda3/envs/maskflownet2/lib/python3.6/site-packages/mxnet/gluon/block.py", line 548, in call
out = self.forward(*args)
File "/anaconda3/envs/maskflownet2/lib/python3.6/site-packages/mxnet/gluon/block.py", line 915, in forward
return self._call_cached_op(x, *args)
File "/anaconda3/envs/maskflownet2/lib/python3.6/site-packages/mxnet/gluon/block.py", line 821, in _call_cached_op
out = self._cached_op(*cargs)
File "/anaconda3/envs/maskflownet2/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 150, in call
ctypes.byref(out_stypes)))
File "/anaconda3/envs/maskflownet2/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator geometryaugmentation0_bilinearsampler0: [14:00:56] src/operator/./bilinear_sampler-inl.h:158: Check failed: dshape[0] == lshape[0] (1 vs. 4) :
Can I ask for a help? Thank you very much in advance.

could not find the yaml file

Hello, thanks for your work. I could not find the .yaml config file in your project, which I think should be used in inference. Could you help upload it? I want to use it to inference on my own data based on your pretrained model. Thank you.

Questions about the provided checkpoints

Hi,

Thank you very much for providing the source code, it's really awesome.

I have several questions about the provided checkpoints,

For the provided dbbSep30-1206_1000000 checkpoint, it seems that the real validation result is different from the score which mentioned it the README section (i.e., 2.07 / 4.07 for Sintel). I ran it on the validation set and got a score of 1.47 / 1.90
for Sintel Val.

There also exists inconsistent between the log file and the provided checkpoint, as the last line of the dbbSep30-1206.log is correct.

I am guessing that this checkpoint is trained on the whole Sintel dataset, am I correct?

From the guess from (1), I try to upload the test results to see if the dbbSep30-1206_1000000 checkpoint can reproduce the results reported on the paper and the website. But I find that there is a gap in them: I got 4.877 / 3.182 on FINAL and CLEAN, respectively, and the reported ones on the paper and website are 4.38 / 2.77.

I would appreciate it if you could help to clarify these questions and provide the checkpoints which can reproduce the results.

Thank you for your time and consideration again!

BGR vs RGB input

Hi,

I have noticed that your work mainly uses 'cv2.imread' as image IO, which reads an image as BGR format. But in Sintel.py, I found a mixed use of 'skimage.io.imread' that reads RGB format. Is this expected?
Though I found both RGB/BGR works fine, could you clarify what is the expected input format for the network? What is the input format you used for benchmarking?

Thanks
Min

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.