Comments (1)
Hi @placeforyiming ,
Thanks for your interest in this project, and thanks for a great question.
The high baseline we get is mostly just to do with sensible design decisions, using up-to-date machine learning methods and architectures. None of these are particularly 'innovative', so we don't claim them as contributions. However, they do help to lead to good scores for our baseline model.
In particular, I think that the following helps to make our baseline strong:
1. Better depth architecture
Zhou et al. use a Dispnet architecture, while we use a Resnet18.
2. Better pose architecture
Zhou et al. use a small-ish pose+mask CNN to predict frame-to-frame poses. Instead, we use a Resnet18, modified to accept a pair of frames as input.
3. Pairwise pose prediction
Zhou et al predict the poses for all source frames simultaneously with a single pass through their pose network. Instead, we predict poses between pairs of frames. For a single training image, we run the pose network twice: once for the frame forward in time, and once for the frame backwards in time.
4. Border padding
As outlined in #20: we use border padding for pixels which don't reproject into the target image, instead of masking with an explainability mask.
5. Imagenet pretraining
Our main baseline result (top row in Table 2) uses Imagenet pretraining. This helps the network to learn more about depth in a shorter amount of time (just 20 epochs!). Using a pretrained network follows standard practice in areas such as semantic segmentation, object detection etc. To enable fair comparisons with papers which do not pretrain on Imagenet, we repeat almost all our experiments without pretraining.
6. Data augmentations
We augment our input data with a standard set of colour and flip transforms, as outlined in our paper. The Zhou et al. paper doesn't mention augmentations, but they have since added some augmentations to their codebase helping to boost their scores.
from monodepth2.
Related Issues (20)
- onnx
- What are the units in which the results are predicted HOT 2
- A problem when I train my repo code HOT 1
- Can't run the initial training
- Network inference time problem
- The requested array has an inhomogeneous shape after 1 dimensions. HOT 5
- Write split file HOT 1
- obtained some very strange depth maps HOT 4
- The difference in the intrinsic matrix affects the results
- Question about image resolution
- question for the Data Preparation
- Why is smooth_loss divided by 2**scale?
- Questions about the meaning of grid in the F.grid_sample function
- the eval file about 'gt_depths.npz'
- Run on Google Colab,
- Run on Google Colab, but out of System RAM
- The new constraint about pose is not useful?
- RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS HOT 2
- How to obtain reconstructed image and loss for a single demo image
- How to setup already trained computer vision model Ultralytics YOLOv8 with monodepth2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monodepth2.