megvii-research / crestereo Goto Github PK

View Code? Open in Web Editor NEW

448.0 13.0 56.0 3.72 MB

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

License: Apache License 2.0

Shell 0.82% Python 97.39% Dockerfile 1.79%

stereo-matching cvpr dataset megengine computer-vision deep-learning stereo stereo-vision

crestereo's Introduction

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

This repository contains MegEngine implementation of our paper:

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu
CVPR 2022 (Oral)

Paper | ArXiv | BibTeX

Datasets

The Proposed Dataset

Download

There are two ways to download the dataset(~400GB) proposed in our paper:

Download using shell scripts dataset_download.sh

sh dataset_download.sh

the dataset will be downloaded and extracted in ./stereo_trainset/crestereo

Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually.

Disparity Format

The disparity is saved as .png uint16 format which can be loaded using opencv imread function:

def get_disp(disp_path):
    disp = cv2.imread(disp_path, cv2.IMREAD_UNCHANGED)
    return disp.astype(np.float32) / 32

Other Public Datasets

Other public datasets we use including

Dependencies

CUDA Version: 10.1, Python Version: 3.6.9

MegEngine v1.8.2
opencv-python v3.4.0
numpy v1.18.1
Pillow v8.4.0
tensorboardX v2.1

python3 -m pip install -r requirements.txt

We also provide docker to run the code quickly:

docker run --gpus all -it -v /tmp:/tmp ylmegvii/crestereo
shotwell /tmp/disparity.png

Inference

Download the pretrained MegEngine model from here and run:

python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

Training

Modify the configurations in cfgs/train.yaml and run the following command:

python3 train.py

You can launch a TensorBoard to monitor the training process:

tensorboard --logdir ./train_log

and navigate to the page at http://localhost:6006 in your browser.

Acknowledgements

Part of the code is adapted from previous works:

RAFT(code base)
LoFTR(attention module)
HSMNet(data augmentaion)

We thank all the authors for their awesome repos.

Citation

If you find the code or datasets helpful in your research, please cite:

@inproceedings{li2022practical,
  title={Practical stereo matching via cascaded recurrent network with adaptive correlation},
  author={Li, Jiankun and Wang, Peisen and Xiong, Pengfei and Cai, Tao and Yan, Ziwei and Yang, Lei and Liu, Jiangyu and Fan, Haoqiang and Liu, Shuaicheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16263--16272},
  year={2022}
}

crestereo's People

Contributors

Stargazers

Watchers

Forkers

lvzhengyi0204 nataliegithub widemeadows johnbhlm amhood zyl1336110861 dy-atl samjcheng ibaigorordo diyer22 ohkwon718 jamescallow martinperis daodao2022 whuzhang753 andyliming pvphan yanziwei kobayashiaki wioponsen runner42195 xxr3376 face-align ye-hanyu yupeichen jackzhousz hongshenggeng yongjingli eitarox bartn8 lihuaizhou syo093c ishitsuka-hikaru arielreplicate chenhuayou peterzs suren455 heibaizhu andyshen555 bluelatte7 chengwei920412 jjhw dnfsv kaixin-bai robotseye kyoron2 xiongzhu666 kafkaqin pengqianlong bennylinntu hjxwhy i7-ryzen yooooo00 nesquik011 jorge-c

crestereo's Issues

The issue of disparity to depth conversion

Hi, thanks for your work. it is a greate work..
When I wanted to convert the disparity map into a depth map, using the formula Z=b * f/d, I found that the depth was not accurate. How can the disparity map be converted to depth? I tested the parallax of an object at a given distance and deduced the value of b * f. I found that this value would change, which does not comply with the formula. Is the value of the disparity map correct?
I really hope for your reply.

Datasets in training and schedule

CREStereo/train.py

Line 147 in ad3a161

dataset = CREStereoDataset(args.training_data_path)

Thank you for supplying this code and training procedure!
In the paper (and the git readme), you say you train using other datasets as well ([SceneFlow], [Sintel], [Middlebury], [ETH3D], [KITTI 2012/2015], [Falling Things], [InStereo2K], [HR-VS]).
Yet, in the train.py, you only refer to your CRES dataset.
Can you elaborate? Are you training on other datasets before? After?

Thank you!

Results on Holopix50k dataset

Hello! Thank you for sharing the codes and the model.
I tested the pre-trained model on Holopix50k test dataset, but didn't get similar results that you showed on the paper.
If I would like to run crestereo_eth3d.mge model on this dataset, does it require different parameter setting or pre-preprocessing? How I can get the similar results on Holopix50k dataset?
Any advice would be very helpful. Thank you in advance!

test.py hangs up when model is initializing

please see the picture above, code hangs up on line 16

想问一下你的数据集的视差图是从blender里面导出的，有推荐的学习教程吗

想问一下你的数据集的视差图是从blender里面导出的，有推荐的学习教程吗，我们现在建模了一个场景，但是不知道如何导出对应视差图，能麻烦分享下经验么

SceneFlow pre-trained model

This is a really promising project, Can you please open the SceneFlow pre-trained model? Thank you very much

Get poor results on holopix 50k

Hi, I can get good result on other dataset, but on holopix 50k, the disparity generated does not make much sense.

on DIML:

on holopix

What datasets are used for pretraining?

The pretrained model works amazingly well on the real-life photos! What datasets are used for pretraining?
Can you please provide the training details of the pretrained model? Thanks!

Can you provide Kitti's pretraining model?

It helps me a lot, thx

How to Get Depth Information

Predicting disparity for the right image

Hi, I am trying out your model and the results are awesome in predicting the disparity for the images from the left camera. However, I have tried swapping the left and right images in input to get a prediction also for the images from the right camera and the performance declines drastically. Am I doing something wrong? What would recommend?

I have also tried projecting the estimation for the left image on the right image, but of course this leaves holes in the prediction :)

Can you release the kitti pretrain models?

Why can't reproduce the result as paper show?

I used the same dataset as in the paper, but could not reproduce the result as paper said ? What about other tricks or dataset do you use?

nan

2022/06/01 14:17:17 Model params saved: train_logs/models/epoch-1.mge
2022/06/01 14:17:25 0.66 b/s,passed:00:13:16,eta:21:41:36,data_time:0.16,lr:0.0004,[2/100:5/500] ==> loss:26.19
2022/06/01 14:17:32 0.65 b/s,passed:00:13:24,eta:21:40:40,data_time:0.17,lr:0.0004,[2/100:10/500] ==> loss:6.847
2022/06/01 14:17:40 0.68 b/s,passed:00:13:31,eta:21:39:57,data_time:0.14,lr:0.0004,[2/100:15/500] ==> loss:6.83
2022/06/01 14:17:47 0.67 b/s,passed:00:13:39,eta:21:39:12,data_time:0.16,lr:0.0004,[2/100:20/500] ==> loss:16.89
2022/06/01 14:17:55 0.66 b/s,passed:00:13:46,eta:21:38:28,data_time:0.17,lr:0.0004,[2/100:25/500] ==> loss:43.18
2022/06/01 14:18:02 0.66 b/s,passed:00:13:54,eta:21:37:36,data_time:0.17,lr:0.0004,[2/100:30/500] ==> loss:20.37
2022/06/01 14:18:10 0.65 b/s,passed:00:14:01,eta:21:36:52,data_time:0.18,lr:0.0004,[2/100:35/500] ==> loss:15.24
2022/06/01 14:18:17 0.65 b/s,passed:00:14:09,eta:21:36:18,data_time:0.19,lr:0.0004,[2/100:40/500] ==> loss:9.399
2022/06/01 14:18:25 0.67 b/s,passed:00:14:16,eta:21:35:41,data_time:0.16,lr:0.0004,[2/100:45/500] ==> loss:40.27
2022/06/01 14:18:32 0.68 b/s,passed:00:14:24,eta:21:34:58,data_time:0.14,lr:0.0004,[2/100:50/500] ==> loss:15.02
2022/06/01 14:18:40 0.69 b/s,passed:00:14:31,eta:21:34:14,data_time:0.14,lr:0.0004,[2/100:55/500] ==> loss:32.48
2022/06/01 14:18:47 0.65 b/s,passed:00:14:39,eta:21:33:42,data_time:0.18,lr:0.0004,[2/100:60/500] ==> loss:9.96
2022/06/01 14:18:55 0.65 b/s,passed:00:14:46,eta:21:33:16,data_time:0.18,lr:0.0004,[2/100:65/500] ==> loss:14.69
2022/06/01 14:19:02 0.68 b/s,passed:00:14:54,eta:21:32:35,data_time:0.13,lr:0.0004,[2/100:70/500] ==> loss:nan
2022/06/01 14:19:10 0.65 b/s,passed:00:15:01,eta:21:31:55,data_time:0.19,lr:0.0004,[2/100:75/500] ==> loss:nan
2022/06/01 14:19:17 0.68 b/s,passed:00:15:09,eta:21:31:14,data_time:0.15,lr:0.0004,[2/100:80/500] ==> loss:nan
2022/06/01 14:19:25 0.67 b/s,passed:00:15:16,eta:21:30:34,data_time:0.15,lr:0.0004,[2/100:85/500] ==> loss:nan
2022/06/01 14:19:32 0.67 b/s,passed:00:15:24,eta:21:30:08,data_time:0.17,lr:0.0004,[2/100:90/500] ==> loss:nan
2022/06/01 14:19:40 0.69 b/s,passed:00:15:31,eta:21:29:28,data_time:0.14,lr:0.0004,[2/100:95/500] ==> loss:nan
2022/06/01 14:19:47 0.65 b/s,passed:00:15:39,eta:21:28:54,data_time:0.17,lr:0.0004,[2/100:100/500] ==> loss:nan
2022/06/01 14:19:55 0.68 b/s,passed:00:15:46,eta:21:28:11,data_time:0.14,lr:0.0004,[2/100:105/500] ==> loss:nan
2022/06/01 14:20:02 0.65 b/s,passed:00:15:54,eta:21:27:38,data_time:0.17,lr:0.0004,[2/100:110/500] ==> loss:nan
2022/06/01 14:20:10 0.64 b/s,passed:00:16:01,eta:21:27:04,data_time:0.2,lr:0.0004,[2/100:115/500] ==> loss:nan
2022/06/01 14:20:17 0.67 b/s,passed:00:16:09,eta:21:26:28,data_time:0.16,lr:0.0004,[2/100:120/500] ==> loss:nan
2022/06/01 14:20:25 0.66 b/s,passed:00:16:16,eta:21:26:04,data_time:0.17,lr:0.0004,[2/100:125/500] ==> loss:nan
2022/06/01 14:20:32 0.68 b/s,passed:00:16:24,eta:21:25:20,data_time:0.15,lr:0.0004,[2/100:130/500] ==> loss:nan

hello!
this is my train logs,why?

About Model Conversion！

感谢您的开源！

您的项目非常有趣，我想问能使用mgeconvert 能将该模型转换成onnx嘛，或着有其他方法嘛！

期待您的回复！

finetune in the secend batch, loss is nan.

hi, it's a real nice work! but when I fine-tune the model using your pre-trained model ,the loss in the secend batch be nan. I checked the data input to the model, the left and right image are the original data without any preprocessing, and the disparity is the absolute value. I don't know where is the problem? Can you offer some advice? thanks.
the log is follow：

left.max(), left.min()：
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
right.max(), right.min()：
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
gt_disp.max(), gt_disp.min()：
Tensor(65.625, device=xpux:0) Tensor(0.0, device=xpux:0)
valid_mask.max(), valid_mask.min()：
Tensor(1.0, device=xpux:0) Tensor(0.0, device=xpux:0)
The i-th iteration prediction loss ：
0 Tensor(68.409615, device=xpux:0) Tensor(-0.72061765, device=xpux:0)
1 Tensor(69.27495, device=xpux:0) Tensor(-7.1237144, device=xpux:0)
2 Tensor(68.630264, device=xpux:0) Tensor(-2.3412788, device=xpux:0)
3 Tensor(67.001595, device=xpux:0) Tensor(-0.64989996, device=xpux:0)
4 Tensor(67.27512, device=xpux:0) Tensor(-0.53194094, device=xpux:0)
5 Tensor(66.031105, device=xpux:0) Tensor(-1.1353028, device=xpux:0)
6 Tensor(66.7748, device=xpux:0) Tensor(-2.5566366, device=xpux:0)
7 Tensor(66.69823, device=xpux:0) Tensor(-0.30609164, device=xpux:0)
8 Tensor(66.8682, device=xpux:0) Tensor(-0.37459654, device=xpux:0)
9 Tensor(66.893974, device=xpux:0) Tensor(-0.80092835, device=xpux:0)
10 Tensor(66.295364, device=xpux:0) Tensor(-1.110324, device=xpux:0)
11 Tensor(67.22122, device=xpux:0) Tensor(-3.059827, device=xpux:0)
12 Tensor(66.74182, device=xpux:0) Tensor(-0.807206, device=xpux:0)
13 Tensor(66.88104, device=xpux:0) Tensor(-0.45083997, device=xpux:0)
14 Tensor(67.27106, device=xpux:0) Tensor(-0.62685704, device=xpux:0)
15 Tensor(67.43465, device=xpux:0) Tensor(-0.7094991, device=xpux:0)
16 Tensor(67.55379, device=xpux:0) Tensor(-0.38040105, device=xpux:0)
17 Tensor(67.453476, device=xpux:0) Tensor(-1.5267422, device=xpux:0)
18 Tensor(67.46704, device=xpux:0) Tensor(-0.3359019, device=xpux:0)
19 Tensor(67.47497, device=xpux:0) Tensor(-0.32194442, device=xpux:0)
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(255.0, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(69.34766, device=xpux:0) Tensor(0.0, device=xpux:0)
Tensor(1.0, device=xpux:0) Tensor(0.0, device=xpux:0)
0 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
1 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
2 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
3 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
4 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
5 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
6 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
7 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
8 Tensor(nan, device=xpux:0) Tensor(nan, device=xpux:0)
9 Tensor(nan, device=xpux:0) Tensor(nan, device=xpu

MegEngine 1.9.0 causes test.py error

I have been playing around a bit with the code (thank you so much, by the way. Having heaps of fun with it) and found out that MegEngine 1.9.0 causes test.py to die with the following output:

Images resized: 1024x1536
Model Forwarding...
Traceback (most recent call last):
  File "test.py", line 94, in <module>
    pred = inference(left_img, right_img, model_func, n_iter=20)
  File "test.py", line 45, in inference
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
  File "/usr/local/lib/python3.6/dist-packages/megengine/module/module.py", line 149, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/dgxmartin/workspace/CREStereo/nets/crestereo.py", line 210, in forward
    align_corners=True,
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/vision.py", line 663, in interpolate
    [wscale, Tensor([0, 0], dtype="float32", device=inp.device)], axis=0
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/tensor.py", line 405, in concat
    (result,) = apply(builtin.Concat(axis=axis, comp_node=device.to_c()), *inps)
TypeError: py_apply expects tensor as inputs

For the time being the MegEngine version should be set to exactly 1.8.2

The effect of distortion on results?

Excuse me, in the case of a distorted image or binocular stereo correction still with distortion, will the effect of restoring depth by parallax be significantly affected? For example, multi-frame point cloud splicing will cause the phenomenon of multiple layers of point clouds (rotation and translation pose no problem)

数据集问题

为什么您做出来的合成数据集里面的视差图是全黑的呢

Is there is a way to speed up model training, like using the Torch DDP method?

Assistance Requested: Issues Encountered with train.py Script in CREStereo Repository

Hello,

I hope this message finds you well. I am currently working on a project that involves the train.py script from the CREStereo repository. However, I have encountered some issues while running the script and would like to seek assistance from the community.

The challenges I am facing are as follows:

Issue with Batch Size: The train.py script only seems to work with a batch size of 1. When attempting to use a batch size other than 1, the script fails to execute properly. I would appreciate guidance on how to make the script compatible with different batch sizes.

2RuntimeError: cuda error 700: an illegal memory access was encountered (cudaMemcpyAsync( device_ptr, host_ptr, size, cudaMemcpyHostToDevice, m_env.cuda_env().stream) at ../../../../../../src/core/impl/comp_node/cuda/comp_node.cpp:copy_to_device:230)

If anyone in the community has experience working with the train.py script and has successfully addressed these issues, I kindly request your guidance and assistance. Any insights regarding the correct configurations, dependencies, or steps needed to overcome these challenges would be greatly appreciated.

Thank you for your attention and support. I am eagerly looking forward to hearing from you or anyone who can provide valuable assistance in resolving these issues.

Colab or Huggingface demo?

Thanks for sharing this great work!
Would you consider making a Google Colab notebook or Huggingface demo of this code so that the less technically inclined like myself can try it out?
Thanks!

Model initialization takes a long time

I'm running python test.py.
in load_model():
model = Model(max_disp=256, mixed_precision=False, test_mode=True)
spend a lot of time, about 30 mins.
My computer: 10900k, rtx3090, 32G RAM
top info:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8825 xwl 20 0 9.929g 1.122g 250360 S 100.3 3.6 8:01.38 interpreter

Is CUDA 11.6 supported?

This is a really promising project, congratulations and thanks for releasing it!

I'm trying to run the test script with your Eth3d model and this command:
python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

But the code hangs up and doesn't return from this line in extractor.py:82:
self.conv2 = M.Conv2d(128, output_dim, kernel_size=1)

which is called form load_model in test.py:15
model = Model(max_disp=256, mixed_precision=False, test_mode=True)

My GPU is NVIDIA RTX A6000 and the CUDA version on the system is v11.6

CREStereo not able to run inside thread with Python

I do not seem to be able to run inference with CREStereo inside of a thread using python's threading module. Below is a minimal example using the test.py script from this repo. It loads the pretrained model and runs inference in a child thread(lines 96-98). Also attached is the error that appears when this is run:

import os

import megengine as mge
import megengine.functional as F
import argparse
import numpy as np
import cv2

from nets import Model

#NOTE: added threading import statement
import threading

def load_model(model_path):
    print("Loading model:", os.path.abspath(model_path))
    pretrained_dict = mge.load(model_path)
    model = Model(max_disp=256, mixed_precision=False, test_mode=True)

    model.load_state_dict(pretrained_dict["state_dict"], strict=True)

    model.eval()
    return model


def inference(left, right, model, n_iter=20):
    imgL = left.transpose(2, 0, 1)
    imgR = right.transpose(2, 0, 1)
    imgL = np.ascontiguousarray(imgL[None, :, :, :])
    imgR = np.ascontiguousarray(imgR[None, :, :, :])

    imgL = mge.tensor(imgL).astype("float32")
    imgR = mge.tensor(imgR).astype("float32")

    imgL_dw2 = F.nn.interpolate(
        imgL,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    imgR_dw2 = F.nn.interpolate(
        imgR,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)

    pred_flow = model(imgL, imgR, iters=n_iter, flow_init=pred_flow_dw2)
    pred_disp = F.squeeze(pred_flow[:, 0, :, :]).numpy()

    return pred_disp


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="A demo to run CREStereo.")
    parser.add_argument(
        "--model_path",
        default="crestereo_eth3d.mge",
        help="The path of pre-trained MegEngine model.",
    )
    parser.add_argument(
        "--left", default="img/test/left.png", help="The path of left image."
    )
    parser.add_argument(
        "--right", default="img/test/right.png", help="The path of right image."
    )
    parser.add_argument(
        "--size",
        default="1024x1536",
        help="The image size for inference. Te default setting is 1024x1536. \
                        To evaluate on ETH3D Benchmark, use 768x1024 instead.",
    )
    parser.add_argument(
        "--output", default="disparity.png", help="The path of output disparity."
    )
    args = parser.parse_args()

    assert os.path.exists(args.model_path), "The model path do not exist."
    assert os.path.exists(args.left), "The left image path do not exist."
    assert os.path.exists(args.right), "The right image path do not exist."

    model_func = load_model(args.model_path)
    left = cv2.imread(args.left)
    right = cv2.imread(args.right)

    assert left.shape == right.shape, "The input images have inconsistent shapes."

    in_h, in_w = left.shape[:2]

    print("Images resized:", args.size)
    eval_h, eval_w = [int(e) for e in args.size.split("x")]
    left_img = cv2.resize(left, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)
    right_img = cv2.resize(right, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)

    #NOTE: put inference in a thread here
    inference_thread = threading.Thread(target=inference, args=(left_img, right_img, model_func,))
    inference_thread.start()
    inference_thread.join()

Disparity with uint16 format

Hi, thanks for your work. it is a greate work.
I want to generate a 3D point cloud from the output of your script which is disparity but for this I need to get the disparity with 16bit. as you mentioned it the readme file, the disparity output will be saved in 16 bit but when I checked the test.py line 105 to 114
I see that you save the disparity with 8bit. Anyway I comment those lines to save the raw predicted disparity.
However, when I tried to produce a 3D point cloud using disparity, I ended up with a very terrible discrete point cloud (see image), which is mostly due by using 8 bit instead of 16 bit. it seems that even the raw predicted disparity is also in 8bit.
I verified my script and also the camera calibration to make sure that the problem comes from the 8bit disparity.
Anyway would you please let me know if is it possible to save the disparity in 16 bit correctly

Line 107 to 117
disp_vis = inference(left_img, right_img, model_func, n_iter=20)
# disp_vis = (disp - disp.min()) / (disp.max() - disp.min()) * 255.0
# disp_vis = disp_vis.astype("uint8")
# disp_vis = disp_vis.astype(np.uint16)
# disp_vis = cv2.applyColorMap(disp_vis, cv2.COLORMAP_INFERNO)
parent_path = os.path.abspath(os.path.join(args.output, os.pardir))
if not os.path.exists(parent_path):
os.makedirs(parent_path)
cv2.imwrite(args.output, disp_vis)

是否兼容windows系统

报错如下：

Running CREStereo with the latest version of MegEngine

Hello,

I hope you are doing well. I am interested in using CREStereo with the latest version of MegEngine for my project. However, I have some questions regarding the compatibility and requirements of the environment. I would appreciate it if you could provide some guidance.

1.Is the latest version of CREStereo fully compatible with the current version of MegEngine?
2.Are there any specific dependencies or library versions required to ensure smooth integration between CREStereo and MegEngine?
3.Are there any known issues or considerations when setting up the environment for running CREStereo with MegEngine?

Any information or insights you can provide would be very helpful. If there are any documentation or resources available that address these concerns, please let me know.

Thank you for your time and assistance. I look forward to your response.

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}

Thank you for the excellent work!
I got some problem
I finetune the model using own data. Howerer it got stuck in step 2
flow_predictions = model(left, right)
after one optimizer.step().clear_grad(), the network can not inference any image.
I use gdb to debug and find it would be stuck in random layers in the network forward....

I check that my data is correct. Even using same data the model got stuck after one optimizer.step().clear_grad()
Do you have any suggestions?

I upgrade mgengine 1.9.1 -> 1.11.1
the model can train without stuck.
However, it print when doing optimizer.step().clear_grad() at first time:

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}, (49342:49342) Handle{ptr=0x5616b860dd58, name="update_block.encoder.conv.bias"}

the para update abnormal, the result are worse.
Does anyone meet the same problem or has any suggestion?

百度网盘失效

百度网盘链接失效，是否可以再分享一个？
期待！

想请问一下该合成数据集的最大视差和最小视差分别是多少

Dataset for reproducing the results

Thank you for the great job!
Is it possible or future plan to release the datasets used for training the model so that someone else can reproduce the results reported in the paper ?

能否用于三维重建

该项目能用于三维重建吗？网络输出的结果与传统的SGM方法相比，缺少了costmap，无法进行深度图融合

The stereo algorithm is implemented using TensorRT

https://github.com/pcb9382/StereoAlgorithms

Nice job, Is there a plan to release the code?

About the ground truth of Holopix50K

您好，感谢您的CREStereo，我想问一下关于Holopix50K上的预测结果，模型有没有在Holopix50K上面进行预训练，如果预训练的话，这个数据集的GT该如何获得呢

CREStereo Dataset

Dear authors,
thanks a lot for your paper, code and trained models.

We have seen that your model generalizes well to common stereo cameras. However, we are currently working with stereo cameras that have a rather large baseline when compared to other commercial models and non-parallel image planes.

Are you planning to release the code and environments for generating the CREStereo Dataset at some point? This would help us generate data with specs closer to our sensor setup to retrain your model.

Thanks a lot in advance!

A Problem About Code

Thanks for your great work!
I met a problem in the code, which is that crestereo always uses negative flow predictions from lower resolution RUM, e.g. the "flow_dw8 = -scale * F.nn.interpolate" in the code below.
This is strange and not consistent with the paper. Could you explain the reason about it ?

# Recurrent Update Module
            # RUM: 1/16
            for itr in range(iters // 2):
                if itr % 2 == 0:
                    small_patch = False
                else:
                    small_patch = True

                flow_dw16 = flow_dw16.detach()
                out_corrs = corr_fn_att_dw16(
                    flow_dw16, offset_dw16, small_patch=small_patch
                )

                with amp.autocast(enabled=self.mixed_precision):
                    net_dw16, up_mask, delta_flow = self.update_block(
                        net_dw16, inp_dw16, out_corrs, flow_dw16
                    )

                flow_dw16 = flow_dw16 + delta_flow
                flow = self.convex_upsample(flow_dw16, up_mask, rate=4)
                flow_up = -4 * F.nn.interpolate(
                    flow,
                    size=(4 * flow.shape[2], 4 * flow.shape[3]),
                    mode="bilinear",
                    align_corners=True,
                )
                predictions.append(flow_up)

            scale = fmap1_dw8.shape[2] / flow.shape[2]
            flow_dw8 = -scale * F.nn.interpolate(
                flow,
                size=(fmap1_dw8.shape[2], fmap1_dw8.shape[3]),
                mode="bilinear",
                align_corners=True,
            )

TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

Good job! May I ask a question?

I tried to run the test.py on a V100 with the Cuda version being 10.2. The data is from ./img, and I set the size being 1280*720, the same as the original size. But I meet the following error:

File "CREStereo/nets/corr.py", line 42, in get_correlation (0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate") TypeError: pad() got an unexpected keyword argument 'pad_witdth'

It means that I may use the wrong type, but I checked the code and did not find the problems:
`

def pad(
src: Tensor,
pad_width: Tuple[Tuple[int, int], ...],
mode: str = "constant",
constant_value: float = 0.0,
) -> Tensor:
r"""Pads the input tensor.

Args:
    pad_width: A tuple. Each element in the tuple is the tuple of 2-elements,
        the 2 elements represent the padding size on both sides of the current dimension, ``(front_offset, back_offset)``
    mode: One of the following string values. Default: ``'constant'``

        * ``'constant'``: Pads with a constant value.
        * ``'reflect'``: Pads with the reflection of the tensor mirrored on the first and last values of the tensor along each axis.
        * ``'replicate'``: Pads with the edge values of tensor.
    constant_val: Fill value for ``'constant'`` padding. Default: 0

Examples:
    >>> import numpy as np
    >>> inp = Tensor([[1., 2., 3.],[4., 5., 6.]])
    >>> inp
    Tensor([[1. 2. 3.]
     [4. 5. 6.]], device=xpux:0)
    >>> F.nn.pad(inp, pad_width=((1, 1),), mode="constant")

I used the right Tuple type, but something wrong happened.

TypeError: load() got an unexpected keyword argument 'map_location'

When running train.py, there is a bug TypeError: load() got an unexpected keyword argument 'map_location'. How to debug?

Did you obtain results on Holopix50k with published model?

I've tried to run published model with few images from Holopix50k and got awful results.
Can you please tell how to obtain results similar to paper? Another model / another preprocessing?

Pretrained middleburry weights

Amazing work!
I understand from the name of the weights file that the published weights are those for the eth3d dataset.
Is it possible to publish the weights for Middlebury as well?
Thanks!

testing result is better in RGB format than default BGR format?

I am testing the provided model. By default, the input is in BGR format since it uses cv2.imread. I found that if the images are converted to RGB format cv2.COLOR_BGR2RGB, the depth map is even better. I checked the training code, it reads images using cv2.imread. So I am wondering why it is the case. Does the author or anyone else see similar phenomena?

Model size and number of params?

Hey, so good job you have done!

Have you ever compared the model size and number of parameters with other SOTA works, such as LEAStereo, RAFT-Stereo etc? Seems your model very smart.

baseline and focal length of the proposed dataset

Is there any email in this paper?

I can not find anything in this paper. Could i sent one email to you? Want to ask a question. Thanks you a lot.

corr.py syntax error

Lines 39 and 40 has a syntax error, making the code to not run:

" right_pad = F.pad(right_feature, pad_witdth=(
(0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate")"

Basically, is change this "pad_witdth" for this "pad_width". Just wrong grammar on the width word.

RuntimeError: bad input shape for polyadic operator

请问有人遇到这个问题吗？这是我在尝试训练时出现的，我想这应该是数据集设置不对，是否有人管知道这如何修改呢
环境：windows 10
旷视版本：mge1.8.2

err: failed to load cuda func: cuDeviceGetNvSciSyncAttributes
2022/08/04 11:50:55 Use 1 GPU(s)
2022/08/04 11:50:55 Params: 5432948
2022/08/04 11:50:55 Dataset size: 5000
Traceback (most recent call last):
File "c:/CREStereo-master/train.py", line 309, in
run(args)
File "c:/CREStereo-master/train.py", line 207, in main
flow_predictions = model(left, right)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\module\module.py", line 149, in call
outputs = self.forward(*inputs, **kwargs)
File "c:\CREStereo-master\nets\crestereo.py", line 263, in forward
out_corrs = corr_fn(flow, None, small_patch=small_patch, iter_mode=True)
File "c:\CREStereo-master\nets\corr.py", line 25, in call
corr = self.corr_iter(self.fmap1, self.fmap2, flow, small_patch)
File "c:\CREStereo-master\nets\corr.py", line 72, in corr_iter
corr = self.get_correlation(
File "c:\CREStereo-master\nets\corr.py", line 48, in get_correlation
corr_mean = F.mean(left_feature * right_slid, axis=1, keepdims=True)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 176, in f
return _elwise(self, value, mode=mode)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 73, in _elwise
return _elwise_apply(args, mode)
File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 36, in _elwise_apply (result,) = apply(op, *args)
RuntimeError: bad input shape for polyadic operator: {2,64,128,96}, {18,64,128,96}

backtrace:
2 null

3 null

4 null

5 null

6 null

7 null

8 null

9 null

10 null

11 null

the GPU memory is too large

@zsc Thank you for your sharing! As your paper said, you can train with batch size 16 on 8 2080TI GPUs when you use the pytorch framework. But when I want to train your network, the GPU memory is large as 8.5G with batch size 1. So what is the problem?

Do you have any plans to release pytorch version code?

I noticed in your paper that it is implemented using pytorch, do you have any plans to release the pytorch version of the code? thanks.