SfMLearner shows that the best result on KITTI with delta<1.25 is around 0.73, but

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Why the baseline is so high? about monodepth2 HOT 1 CLOSED

nianticlabs commented on June 2, 2024 1

Why the baseline is so high?

from monodepth2.

Comments (1)

mdfirman commented on June 2, 2024 1

Hi @placeforyiming ,

Thanks for your interest in this project, and thanks for a great question.

The high baseline we get is mostly just to do with sensible design decisions, using up-to-date machine learning methods and architectures. None of these are particularly 'innovative', so we don't claim them as contributions. However, they do help to lead to good scores for our baseline model.

In particular, I think that the following helps to make our baseline strong:

1. Better depth architecture

Zhou et al. use a Dispnet architecture, while we use a Resnet18.

2. Better pose architecture

Zhou et al. use a small-ish pose+mask CNN to predict frame-to-frame poses. Instead, we use a Resnet18, modified to accept a pair of frames as input.

3. Pairwise pose prediction

Zhou et al predict the poses for all source frames simultaneously with a single pass through their pose network. Instead, we predict poses between pairs of frames. For a single training image, we run the pose network twice: once for the frame forward in time, and once for the frame backwards in time.

4. Border padding

As outlined in #20: we use border padding for pixels which don't reproject into the target image, instead of masking with an explainability mask.

5. Imagenet pretraining

Our main baseline result (top row in Table 2) uses Imagenet pretraining. This helps the network to learn more about depth in a shorter amount of time (just 20 epochs!). Using a pretrained network follows standard practice in areas such as semantic segmentation, object detection etc. To enable fair comparisons with papers which do not pretrain on Imagenet, we repeat almost all our experiments without pretraining.

6. Data augmentations

We augment our input data with a standard set of colour and flip transforms, as outlined in our paper. The Zhou et al. paper doesn't mention augmentations, but they have since added some augmentations to their codebase helping to boost their scores.

from monodepth2.

Recommend Projects

Why the baseline is so high? about monodepth2 HOT 1 CLOSED

Comments (1)

1. Better depth architecture

2. Better pose architecture

3. Pairwise pose prediction

4. Border padding

5. Imagenet pretraining

6. Data augmentations

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent