Giter Site home page Giter Site logo

megvii-research / crestereo Goto Github PK

View Code? Open in Web Editor NEW
448.0 13.0 56.0 3.72 MB

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

License: Apache License 2.0

Shell 0.82% Python 97.39% Dockerfile 1.79%
stereo-matching cvpr dataset megengine computer-vision deep-learning stereo stereo-vision

crestereo's Introduction

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

This repository contains MegEngine implementation of our paper:

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu
CVPR 2022 (Oral)

Paper | ArXiv | BibTeX

Datasets

The Proposed Dataset

Download

There are two ways to download the dataset(~400GB) proposed in our paper:

  • Download using shell scripts dataset_download.sh
sh dataset_download.sh

the dataset will be downloaded and extracted in ./stereo_trainset/crestereo

  • Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually.

Disparity Format

The disparity is saved as .png uint16 format which can be loaded using opencv imread function:

def get_disp(disp_path):
    disp = cv2.imread(disp_path, cv2.IMREAD_UNCHANGED)
    return disp.astype(np.float32) / 32

Other Public Datasets

Other public datasets we use including

Dependencies

CUDA Version: 10.1, Python Version: 3.6.9

  • MegEngine v1.8.2
  • opencv-python v3.4.0
  • numpy v1.18.1
  • Pillow v8.4.0
  • tensorboardX v2.1
python3 -m pip install -r requirements.txt

We also provide docker to run the code quickly:

docker run --gpus all -it -v /tmp:/tmp ylmegvii/crestereo
shotwell /tmp/disparity.png

Inference

Download the pretrained MegEngine model from here and run:

python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

Training

Modify the configurations in cfgs/train.yaml and run the following command:

python3 train.py

You can launch a TensorBoard to monitor the training process:

tensorboard --logdir ./train_log

and navigate to the page at http://localhost:6006 in your browser.

Acknowledgements

Part of the code is adapted from previous works:

We thank all the authors for their awesome repos.

Citation

If you find the code or datasets helpful in your research, please cite:

@inproceedings{li2022practical,
  title={Practical stereo matching via cascaded recurrent network with adaptive correlation},
  author={Li, Jiankun and Wang, Peisen and Xiong, Pengfei and Cai, Tao and Yan, Ziwei and Yang, Lei and Liu, Jiangyu and Fan, Haoqiang and Liu, Shuaicheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16263--16272},
  year={2022}
}

crestereo's People

Contributors

diyer22 avatar jacklee396 avatar martinperis avatar xxr3376 avatar yanziwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crestereo's Issues

The issue of disparity to depth conversion

Hi, thanks for your work. it is a greate work..
When I wanted to convert the disparity map into a depth map, using the formula Z=b * f/d, I found that the depth was not accurate. How can the disparity map be converted to depth? I tested the parallax of an object at a given distance and deduced the value of b * f. I found that this value would change, which does not comply with the formula. Is the value of the disparity map correct?
I really hope for your reply.

Datasets in training and schedule

dataset = CREStereoDataset(args.training_data_path)

Thank you for supplying this code and training procedure!
In the paper (and the git readme), you say you train using other datasets as well ([SceneFlow], [Sintel], [Middlebury], [ETH3D], [KITTI 2012/2015], [Falling Things], [InStereo2K], [HR-VS]).
Yet, in the train.py, you only refer to your CRES dataset.
Can you elaborate? Are you training on other datasets before? After?

Thank you!

Results on Holopix50k dataset

Hello! Thank you for sharing the codes and the model.
I tested the pre-trained model on Holopix50k test dataset, but didn't get similar results that you showed on the paper.
If I would like to run crestereo_eth3d.mge model on this dataset, does it require different parameter setting or pre-preprocessing? How I can get the similar results on Holopix50k dataset?
Any advice would be very helpful. Thank you in advance!
0001
0002
0007
0008

SceneFlow pre-trained model

This is a really promising project, Can you please open the SceneFlow pre-trained model? Thank you very much

Get poor results on holopix 50k

Hi, I can get good result on other dataset, but on holopix 50k, the disparity generated does not make much sense.

on DIML:
diml
on holopix
holopix

What datasets are used for pretraining?

The pretrained model works amazingly well on the real-life photos! What datasets are used for pretraining?
Can you please provide the training details of the pretrained model? Thanks!

Predicting disparity for the right image

Hi, I am trying out your model and the results are awesome in predicting the disparity for the images from the left camera. However, I have tried swapping the left and right images in input to get a prediction also for the images from the right camera and the performance declines drastically. Am I doing something wrong? What would recommend?

I have also tried projecting the estimation for the left image on the right image, but of course this leaves holes in the prediction :)

nan

2022/06/01 14:17:17 Model params saved: train_logs/models/epoch-1.mge
2022/06/01 14:17:25 0.66 b/s,passed:00:13:16,eta:21:41:36,data_time:0.16,lr:0.0004,[2/100:5/500] ==> loss:26.19
2022/06/01 14:17:32 0.65 b/s,passed:00:13:24,eta:21:40:40,data_time:0.17,lr:0.0004,[2/100:10/500] ==> loss:6.847
2022/06/01 14:17:40 0.68 b/s,passed:00:13:31,eta:21:39:57,data_time:0.14,lr:0.0004,[2/100:15/500] ==> loss:6.83
2022/06/01 14:17:47 0.67 b/s,passed:00:13:39,eta:21:39:12,data_time:0.16,lr:0.0004,[2/100:20/500] ==> loss:16.89
2022/06/01 14:17:55 0.66 b/s,passed:00:13:46,eta:21:38:28,data_time:0.17,lr:0.0004,[2/100:25/500] ==> loss:43.18
2022/06/01 14:18:02 0.66 b/s,passed:00:13:54,eta:21:37:36,data_time:0.17,lr:0.0004,[2/100:30/500] ==> loss:20.37
2022/06/01 14:18:10 0.65 b/s,passed:00:14:01,eta:21:36:52,data_time:0.18,lr:0.0004,[2/100:35/500] ==> loss:15.24
2022/06/01 14:18:17 0.65 b/s,passed:00:14:09,eta:21:36:18,data_time:0.19,lr:0.0004,[2/100:40/500] ==> loss:9.399
2022/06/01 14:18:25 0.67 b/s,passed:00:14:16,eta:21:35:41,data_time:0.16,lr:0.0004,[2/100:45/500] ==> loss:40.27
2022/06/01 14:18:32 0.68 b/s,passed:00:14:24,eta:21:34:58,data_time:0.14,lr:0.0004,[2/100:50/500] ==> loss:15.02
2022/06/01 14:18:40 0.69 b/s,passed:00:14:31,eta:21:34:14,data_time:0.14,lr:0.0004,[2/100:55/500] ==> loss:32.48
2022/06/01 14:18:47 0.65 b/s,passed:00:14:39,eta:21:33:42,data_time:0.18,lr:0.0004,[2/100:60/500] ==> loss:9.96
2022/06/01 14:18:55 0.65 b/s,passed:00:14:46,eta:21:33:16,data_time:0.18,lr:0.0004,[2/100:65/500] ==> loss:14.69
2022/06/01 14:19:02 0.68 b/s,passed:00:14:54,eta:21:32:35,data_time:0.13,lr:0.0004,[2/100:70/500] ==> loss:nan
2022/06/01 14:19:10 0.65 b/s,passed:00:15:01,eta:21:31:55,data_time:0.19,lr:0.0004,[2/100:75/500] ==> loss:nan
2022/06/01 14:19:17 0.68 b/s,passed:00:15:09,eta:21:31:14,data_time:0.15,lr:0.0004,[2/100:80/500] ==> loss:nan
2022/06/01 14:19:25 0.67 b/s,passed:00:15:16,eta:21:30:34,data_time:0.15,lr:0.0004,[2/100:85/500] ==> loss:nan
2022/06/01 14:19:32 0.67 b/s,passed:00:15:24,eta:21:30:08,data_time:0.17,lr:0.0004,[2/100:90/500] ==> loss:nan
2022/06/01 14:19:40 0.69 b/s,passed:00:15:31,eta:21:29:28,data_time:0.14,lr:0.0004,[2/100:95/500] ==> loss:nan
2022/06/01 14:19:47 0.65 b/s,passed:00:15:39,eta:21:28:54,data_time:0.17,lr:0.0004,[2/100:100/500] ==> loss:nan
2022/06/01 14:19:55 0.68 b/s,passed:00:15:46,eta:21:28:11,data_time:0.14,lr:0.0004,[2/100:105/500] ==> loss:nan
2022/06/01 14:20:02 0.65 b/s,passed:00:15:54,eta:21:27:38,data_time:0.17,lr:0.0004,[2/100:110/500] ==> loss:nan
2022/06/01 14:20:10 0.64 b/s,passed:00:16:01,eta:21:27:04,data_time:0.2,lr:0.0004,[2/100:115/500] ==> loss:nan
2022/06/01 14:20:17 0.67 b/s,passed:00:16:09,eta:21:26:28,data_time:0.16,lr:0.0004,[2/100:120/500] ==> loss:nan
2022/06/01 14:20:25 0.66 b/s,passed:00:16:16,eta:21:26:04,data_time:0.17,lr:0.0004,[2/100:125/500] ==> loss:nan
2022/06/01 14:20:32 0.68 b/s,passed:00:16:24,eta:21:25:20,data_time:0.15,lr:0.0004,[2/100:130/500] ==> loss:nan

hello!
this is my train logs,why?

About Model Conversion!

感谢您的开源!

您的项目非常有趣,我想问能使用mgeconvert 能将该模型转换成onnx嘛,或着有其他方法嘛!

期待您的回复!

finetune in the secend batch, loss is nan.

hi, it's a real nice work! but when I fine-tune the model using your pre-trained model ,the loss in the secend batch be nan. I checked the data input to the model, the left and right image are the original data without any preprocessing, and the disparity is the absolute value. I don't know where is the problem? Can you offer some advice? thanks.
the log is follow:

left.max(), left.min():
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
right.max(), right.min():
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
gt_disp.max(), gt_disp.min():
Tensor(65.625, device=xpux:0) Tensor(0.0, device=xpux:0)
valid_mask.max(), valid_mask.min():
Tensor(1.0, device=xpux:0) Tensor(0.0, device=xpux:0)
The i-th iteration prediction loss :
0 Tensor(68.409615, device=xpux:0) Tensor(-0.72061765, device=xpux:0)
1 Tensor(69.27495, device=xpux:0) Tensor(-7.1237144, device=xpux:0)
2 Tensor(68.630264, device=xpux:0) Tensor(-2.3412788, device=xpux:0)
3 Tensor(67.001595, device=xpux:0) Tensor(-0.64989996, device=xpux:0)
4 Tensor(67.27512, device=xpux:0) Tensor(-0.53194094, device=xpux:0)
5 Tensor(66.031105, device=xpux:0) Tensor(-1.1353028, device=xpux:0)
6 Tensor(66.7748, device=xpux:0) Tensor(-2.5566366, device=xpux:0)
7 Tensor(66.69823, device=xpux:0) Tensor(-0.30609164, device=xpux:0)
8 Tensor(66.8682, device=xpux:0) Tensor(-0.37459654, device=xpux:0)
9 Tensor(66.893974, device=xpux:0) Tensor(-0.80092835, device=xpux:0)
10 Tensor(66.295364, device=xpux:0) Tensor(-1.110324, device=xpux:0)
11 Tensor(67.22122, device=xpux:0) Tensor(-3.059827, device=xpux:0)
12 Tensor(66.74182, device=xpux:0) Tensor(-0.807206, device=xpux:0)
13 Tensor(66.88104, device=xpux:0) Tensor(-0.45083997, device=xpux:0)
14 Tensor(67.27106, device=xpux:0) Tensor(-0.62685704, device=xpux:0)
15 Tensor(67.43465, device=xpux:0) Tensor(-0.7094991, device=xpux:0)
16 Tensor(67.55379, device=xpux:0) Tensor(-0.38040105, device=xpux:0)
17 Tensor(67.453476, device=xpux:0) Tensor(-1.5267422, device=xpux:0)
18 Tensor(67.46704, device=xpux:0) Tensor(-0.3359019, device=xpux:0)
19 Tensor(67.47497, device=xpux:0) Tensor(-0.32194442, device=xpux:0)
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(69.34766, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(1.0, device=xpux:0) Tensor(0.0, device=xpux:0)
0 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
1 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
2 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
3 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
4 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
5 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
6 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
7 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
8 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
9 Tensor(nan, device=xpux:0) Tensor(nan, device=xpu

MegEngine 1.9.0 causes test.py error

I have been playing around a bit with the code (thank you so much, by the way. Having heaps of fun with it) and found out that MegEngine 1.9.0 causes test.py to die with the following output:

Images resized: 1024x1536
Model Forwarding...
Traceback (most recent call last):
  File "test.py", line 94, in <module>
    pred = inference(left_img, right_img, model_func, n_iter=20)
  File "test.py", line 45, in inference
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
  File "/usr/local/lib/python3.6/dist-packages/megengine/module/module.py", line 149, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/dgxmartin/workspace/CREStereo/nets/crestereo.py", line 210, in forward
    align_corners=True,
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/vision.py", line 663, in interpolate
    [wscale, Tensor([0, 0], dtype="float32", device=inp.device)], axis=0
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/tensor.py", line 405, in concat
    (result,) = apply(builtin.Concat(axis=axis, comp_node=device.to_c()), *inps)
TypeError: py_apply expects tensor as inputs

For the time being the MegEngine version should be set to exactly 1.8.2

The effect of distortion on results?

Excuse me, in the case of a distorted image or binocular stereo correction still with distortion, will the effect of restoring depth by parallax be significantly affected? For example, multi-frame point cloud splicing will cause the phenomenon of multiple layers of point clouds (rotation and translation pose no problem)

数据集问题

为什么您做出来的合成数据集里面的视差图是全黑的呢

Assistance Requested: Issues Encountered with train.py Script in CREStereo Repository

Hello,

I hope this message finds you well. I am currently working on a project that involves the train.py script from the CREStereo repository. However, I have encountered some issues while running the script and would like to seek assistance from the community.

The challenges I am facing are as follows:

Issue with Batch Size: The train.py script only seems to work with a batch size of 1. When attempting to use a batch size other than 1, the script fails to execute properly. I would appreciate guidance on how to make the script compatible with different batch sizes.

2RuntimeError: cuda error 700: an illegal memory access was encountered (cudaMemcpyAsync( device_ptr, host_ptr, size, cudaMemcpyHostToDevice, m_env.cuda_env().stream) at ../../../../../../src/core/impl/comp_node/cuda/comp_node.cpp:copy_to_device:230)

If anyone in the community has experience working with the train.py script and has successfully addressed these issues, I kindly request your guidance and assistance. Any insights regarding the correct configurations, dependencies, or steps needed to overcome these challenges would be greatly appreciated.

Thank you for your attention and support. I am eagerly looking forward to hearing from you or anyone who can provide valuable assistance in resolving these issues.

Colab or Huggingface demo?

Thanks for sharing this great work!
Would you consider making a Google Colab notebook or Huggingface demo of this code so that the less technically inclined like myself can try it out?
Thanks!

Model initialization takes a long time

I'm running python test.py.
in load_model():
model = Model(max_disp=256, mixed_precision=False, test_mode=True)
spend a lot of time, about 30 mins.
My computer: 10900k, rtx3090, 32G RAM
top info:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8825 xwl 20 0 9.929g 1.122g 250360 S 100.3 3.6 8:01.38 interpreter

Is CUDA 11.6 supported?

This is a really promising project, congratulations and thanks for releasing it!

I'm trying to run the test script with your Eth3d model and this command:
python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

But the code hangs up and doesn't return from this line in extractor.py:82:
self.conv2 = M.Conv2d(128, output_dim, kernel_size=1)

which is called form load_model in test.py:15
model = Model(max_disp=256, mixed_precision=False, test_mode=True)

My GPU is NVIDIA RTX A6000 and the CUDA version on the system is v11.6

CREStereo not able to run inside thread with Python

I do not seem to be able to run inference with CREStereo inside of a thread using python's threading module. Below is a minimal example using the test.py script from this repo. It loads the pretrained model and runs inference in a child thread(lines 96-98). Also attached is the error that appears when this is run:
CREStereo_thread_error

import os

import megengine as mge
import megengine.functional as F
import argparse
import numpy as np
import cv2

from nets import Model

#NOTE: added threading import statement
import threading

def load_model(model_path):
    print("Loading model:", os.path.abspath(model_path))
    pretrained_dict = mge.load(model_path)
    model = Model(max_disp=256, mixed_precision=False, test_mode=True)

    model.load_state_dict(pretrained_dict["state_dict"], strict=True)

    model.eval()
    return model


def inference(left, right, model, n_iter=20):
    imgL = left.transpose(2, 0, 1)
    imgR = right.transpose(2, 0, 1)
    imgL = np.ascontiguousarray(imgL[None, :, :, :])
    imgR = np.ascontiguousarray(imgR[None, :, :, :])

    imgL = mge.tensor(imgL).astype("float32")
    imgR = mge.tensor(imgR).astype("float32")

    imgL_dw2 = F.nn.interpolate(
        imgL,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    imgR_dw2 = F.nn.interpolate(
        imgR,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)

    pred_flow = model(imgL, imgR, iters=n_iter, flow_init=pred_flow_dw2)
    pred_disp = F.squeeze(pred_flow[:, 0, :, :]).numpy()

    return pred_disp


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="A demo to run CREStereo.")
    parser.add_argument(
        "--model_path",
        default="crestereo_eth3d.mge",
        help="The path of pre-trained MegEngine model.",
    )
    parser.add_argument(
        "--left", default="img/test/left.png", help="The path of left image."
    )
    parser.add_argument(
        "--right", default="img/test/right.png", help="The path of right image."
    )
    parser.add_argument(
        "--size",
        default="1024x1536",
        help="The image size for inference. Te default setting is 1024x1536. \
                        To evaluate on ETH3D Benchmark, use 768x1024 instead.",
    )
    parser.add_argument(
        "--output", default="disparity.png", help="The path of output disparity."
    )
    args = parser.parse_args()

    assert os.path.exists(args.model_path), "The model path do not exist."
    assert os.path.exists(args.left), "The left image path do not exist."
    assert os.path.exists(args.right), "The right image path do not exist."

    model_func = load_model(args.model_path)
    left = cv2.imread(args.left)
    right = cv2.imread(args.right)

    assert left.shape == right.shape, "The input images have inconsistent shapes."

    in_h, in_w = left.shape[:2]

    print("Images resized:", args.size)
    eval_h, eval_w = [int(e) for e in args.size.split("x")]
    left_img = cv2.resize(left, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)
    right_img = cv2.resize(right, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)

    #NOTE: put inference in a thread here
    inference_thread = threading.Thread(target=inference, args=(left_img, right_img, model_func,))
    inference_thread.start()
    inference_thread.join()

Disparity with uint16 format

Hi, thanks for your work. it is a greate work.
I want to generate a 3D point cloud from the output of your script which is disparity but for this I need to get the disparity with 16bit. as you mentioned it the readme file, the disparity output will be saved in 16 bit but when I checked the test.py line 105 to 114
I see that you save the disparity with 8bit. Anyway I comment those lines to save the raw predicted disparity.
However, when I tried to produce a 3D point cloud using disparity, I ended up with a very terrible discrete point cloud (see image), which is mostly due by using 8 bit instead of 16 bit. it seems that even the raw predicted disparity is also in 8bit.
I verified my script and also the camera calibration to make sure that the problem comes from the 8bit disparity.
Anyway would you please let me know if is it possible to save the disparity in 16 bit correctly

Line 107 to 117
disp_vis = inference(left_img, right_img, model_func, n_iter=20)
# disp_vis = (disp - disp.min()) / (disp.max() - disp.min()) * 255.0
# disp_vis = disp_vis.astype("uint8")
# disp_vis = disp_vis.astype(np.uint16)
# disp_vis = cv2.applyColorMap(disp_vis, cv2.COLORMAP_INFERNO)
parent_path = os.path.abspath(os.path.join(args.output, os.pardir))
if not os.path.exists(parent_path):
os.makedirs(parent_path)
cv2.imwrite(args.output, disp_vis)

Picture1

Running CREStereo with the latest version of MegEngine

Hello,

I hope you are doing well. I am interested in using CREStereo with the latest version of MegEngine for my project. However, I have some questions regarding the compatibility and requirements of the environment. I would appreciate it if you could provide some guidance.

1.Is the latest version of CREStereo fully compatible with the current version of MegEngine?
2.Are there any specific dependencies or library versions required to ensure smooth integration between CREStereo and MegEngine?
3.Are there any known issues or considerations when setting up the environment for running CREStereo with MegEngine?

Any information or insights you can provide would be very helpful. If there are any documentation or resources available that address these concerns, please let me know.

Thank you for your time and assistance. I look forward to your response.

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}

Thank you for the excellent work!
I got some problem
I finetune the model using own data. Howerer it got stuck in step 2
flow_predictions = model(left, right)
after one optimizer.step().clear_grad(), the network can not inference any image.
I use gdb to debug and find it would be stuck in random layers in the network forward....

I check that my data is correct. Even using same data the model got stuck after one optimizer.step().clear_grad()
Do you have any suggestions?

I upgrade mgengine 1.9.1 -> 1.11.1
the model can train without stuck.
However, it print when doing optimizer.step().clear_grad() at first time:

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}, (49342:49342) Handle{ptr=0x5616b860dd58, name="update_block.encoder.conv.bias"}

the para update abnormal, the result are worse.
Does anyone meet the same problem or has any suggestion?

Dataset for reproducing the results

Thank you for the great job!
Is it possible or future plan to release the datasets used for training the model so that someone else can reproduce the results reported in the paper ?

能否用于三维重建

该项目能用于三维重建吗?网络输出的结果与传统的SGM方法相比,缺少了costmap,无法进行深度图融合

About the ground truth of Holopix50K

您好,感谢您的CREStereo,我想问一下关于Holopix50K上的预测结果,模型有没有在Holopix50K上面进行预训练,如果预训练的话,这个数据集的GT该如何获得呢

CREStereo Dataset

Dear authors,
thanks a lot for your paper, code and trained models.

We have seen that your model generalizes well to common stereo cameras. However, we are currently working with stereo cameras that have a rather large baseline when compared to other commercial models and non-parallel image planes.

Are you planning to release the code and environments for generating the CREStereo Dataset at some point? This would help us generate data with specs closer to our sensor setup to retrain your model.

Thanks a lot in advance!

A Problem About Code

Thanks for your great work!
I met a problem in the code, which is that crestereo always uses negative flow predictions from lower resolution RUM, e.g. the "flow_dw8 = -scale * F.nn.interpolate" in the code below.
This is strange and not consistent with the paper. Could you explain the reason about it ?

# Recurrent Update Module
            # RUM: 1/16
            for itr in range(iters // 2):
                if itr % 2 == 0:
                    small_patch = False
                else:
                    small_patch = True

                flow_dw16 = flow_dw16.detach()
                out_corrs = corr_fn_att_dw16(
                    flow_dw16, offset_dw16, small_patch=small_patch
                )

                with amp.autocast(enabled=self.mixed_precision):
                    net_dw16, up_mask, delta_flow = self.update_block(
                        net_dw16, inp_dw16, out_corrs, flow_dw16
                    )

                flow_dw16 = flow_dw16 + delta_flow
                flow = self.convex_upsample(flow_dw16, up_mask, rate=4)
                flow_up = -4 * F.nn.interpolate(
                    flow,
                    size=(4 * flow.shape[2], 4 * flow.shape[3]),
                    mode="bilinear",
                    align_corners=True,
                )
                predictions.append(flow_up)

            scale = fmap1_dw8.shape[2] / flow.shape[2]
            flow_dw8 = -scale * F.nn.interpolate(
                flow,
                size=(fmap1_dw8.shape[2], fmap1_dw8.shape[3]),
                mode="bilinear",
                align_corners=True,
            )

TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

Good job! May I ask a question?

I tried to run the test.py on a V100 with the Cuda version being 10.2. The data is from ./img, and I set the size being 1280*720, the same as the original size. But I meet the following error:

File "CREStereo/nets/corr.py", line 42, in get_correlation (0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate") TypeError: pad() got an unexpected keyword argument 'pad_witdth'

It means that I may use the wrong type, but I checked the code and did not find the problems:
`

def pad(
src: Tensor,
pad_width: Tuple[Tuple[int, int], ...],
mode: str = "constant",
constant_value: float = 0.0,
) -> Tensor:
r"""Pads the input tensor.

Args:
    pad_width: A tuple. Each element in the tuple is the tuple of 2-elements,
        the 2 elements represent the padding size on both sides of the current dimension, ``(front_offset, back_offset)``
    mode: One of the following string values. Default: ``'constant'``

        * ``'constant'``: Pads with a constant value.
        * ``'reflect'``: Pads with the reflection of the tensor mirrored on the first and last values of the tensor along each axis.
        * ``'replicate'``: Pads with the edge values of tensor.
    constant_val: Fill value for ``'constant'`` padding. Default: 0

Examples:
    >>> import numpy as np
    >>> inp = Tensor([[1., 2., 3.],[4., 5., 6.]])
    >>> inp
    Tensor([[1. 2. 3.]
     [4. 5. 6.]], device=xpux:0)
    >>> F.nn.pad(inp, pad_width=((1, 1),), mode="constant")

`

I used the right Tuple type, but something wrong happened.

Pretrained middleburry weights

Amazing work!
I understand from the name of the weights file that the published weights are those for the eth3d dataset.
Is it possible to publish the weights for Middlebury as well?
Thanks!

testing result is better in RGB format than default BGR format?

I am testing the provided model. By default, the input is in BGR format since it uses cv2.imread. I found that if the images are converted to RGB format cv2.COLOR_BGR2RGB, the depth map is even better. I checked the training code, it reads images using cv2.imread. So I am wondering why it is the case. Does the author or anyone else see similar phenomena?

Model size and number of params?

Hey, so good job you have done!

Have you ever compared the model size and number of parameters with other SOTA works, such as LEAStereo, RAFT-Stereo etc? Seems your model very smart.

corr.py syntax error

Lines 39 and 40 has a syntax error, making the code to not run:

" right_pad = F.pad(right_feature, pad_witdth=(
(0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate")"

Basically, is change this "pad_witdth" for this "pad_width". Just wrong grammar on the width word.

RuntimeError: bad input shape for polyadic operator

请问有人遇到这个问题吗?这是我在尝试训练时出现的,我想这应该是数据集设置不对,是否有人管知道这如何修改呢
环境:windows 10
旷视版本:mge1.8.2

err: failed to load cuda func: cuDeviceGetNvSciSyncAttributes
2022/08/04 11:50:55 Use 1 GPU(s)
2022/08/04 11:50:55 Params: 5432948
2022/08/04 11:50:55 Dataset size: 5000
Traceback (most recent call last):
File "c:/CREStereo-master/train.py", line 309, in
run(args)
File "c:/CREStereo-master/train.py", line 207, in main
flow_predictions = model(left, right)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\module\module.py", line 149, in call
outputs = self.forward(*inputs, **kwargs)
File "c:\CREStereo-master\nets\crestereo.py", line 263, in forward
out_corrs = corr_fn(flow, None, small_patch=small_patch, iter_mode=True)
File "c:\CREStereo-master\nets\corr.py", line 25, in call
corr = self.corr_iter(self.fmap1, self.fmap2, flow, small_patch)
File "c:\CREStereo-master\nets\corr.py", line 72, in corr_iter
corr = self.get_correlation(
File "c:\CREStereo-master\nets\corr.py", line 48, in get_correlation
corr_mean = F.mean(left_feature * right_slid, axis=1, keepdims=True)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 176, in f
return _elwise(self, value, mode=mode)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 73, in _elwise
return _elwise_apply(args, mode)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 36, in _elwise_apply (result,) = apply(op, *args)
RuntimeError: bad input shape for polyadic operator: {2,64,128,96}, {18,64,128,96}

backtrace:
2 null

3 null

4 null

5 null

6 null

7 null

8 null

9 null

10 null

11 null

the GPU memory is too large

@zsc Thank you for your sharing! As your paper said, you can train with batch size 16 on 8 2080TI GPUs when you use the pytorch framework. But when I want to train your network, the GPU memory is large as 8.5G with batch size 1. So what is the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.