Giter Site home page Giter Site logo

Comments (1)

mdfirman avatar mdfirman commented on June 2, 2024 1

Hi @placeforyiming ,

Thanks for your interest in this project, and thanks for a great question.

The high baseline we get is mostly just to do with sensible design decisions, using up-to-date machine learning methods and architectures. None of these are particularly 'innovative', so we don't claim them as contributions. However, they do help to lead to good scores for our baseline model.

In particular, I think that the following helps to make our baseline strong:

1. Better depth architecture

Zhou et al. use a Dispnet architecture, while we use a Resnet18.

2. Better pose architecture

Zhou et al. use a small-ish pose+mask CNN to predict frame-to-frame poses. Instead, we use a Resnet18, modified to accept a pair of frames as input.

3. Pairwise pose prediction

Zhou et al predict the poses for all source frames simultaneously with a single pass through their pose network. Instead, we predict poses between pairs of frames. For a single training image, we run the pose network twice: once for the frame forward in time, and once for the frame backwards in time.

4. Border padding

As outlined in #20: we use border padding for pixels which don't reproject into the target image, instead of masking with an explainability mask.

5. Imagenet pretraining

Our main baseline result (top row in Table 2) uses Imagenet pretraining. This helps the network to learn more about depth in a shorter amount of time (just 20 epochs!). Using a pretrained network follows standard practice in areas such as semantic segmentation, object detection etc. To enable fair comparisons with papers which do not pretrain on Imagenet, we repeat almost all our experiments without pretraining.

6. Data augmentations

We augment our input data with a standard set of colour and flip transforms, as outlined in our paper. The Zhou et al. paper doesn't mention augmentations, but they have since added some augmentations to their codebase helping to boost their scores.

from monodepth2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.