<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

defective prediction for large focal length about metric3d HOT 7 CLOSED

mwdotzom commented on June 3, 2024

defective prediction for large focal length

from metric3d.

Comments (7)

JUGGHM commented on June 3, 2024

          在训练中，大部分数据也并不1000。比如taskonomy，大部分在500-700左右，测试的NYU等数据集也并不在500左右。所以并不是平均值在1000。关于这部分的ablation，还可以继续深入探索一下。
Originally posted by @YvanYin in #3 (comment) Hi, thanks for your great work! I think I met a duplicate of this issue when testing on different dataset. When inferring on outdoor images with 'intrinsic': [1966.9, 1969.5, 948.7, 498.4], the predicted depth is unexpectedly poor. I tried different crop_size, but it didn't help much, RMSE is about 10m on sparse lidar measurements.

for SHIFT dataset where 'intrinsic': [640, 640, 640, 400], the result is much more reasonable, RMSE is about 7m.

This performance difference seems to be related to the training data, could you share your ideas on this? Thank you!

Thanks for your cases, we will check the first sample later, the depth map looks terrible.

from metric3d.

mwdotzom commented on June 3, 2024

Thanks for your cases, we will check the first sample later, the depth map looks terrible.

Appreciated! Please use the original image here.
Size: 1920*1080
Intrinsic: [1966.9, 1969.5, 948.7, 498.4]

from metric3d.

JUGGHM commented on June 3, 2024

We tested this case with several models. Our ConvNeXt models cannot predict reasonable depth maps, while an ongoing vit models can output the basic elements (road, vehicles, sound insulation wall). Since there are still problems in this model, it will not be released very soon.

from metric3d.

mwdotzom commented on June 3, 2024

Thanks for looking into this! Metric3D has a convincing theory and shows good generalizing ability in practice, as a novice I can only make blind guesses towards this extreme scenario(focal length = 1967) :

Is it due to the large difference compared with training data distribution (focal length mainly below 1000), such that the ConvNeXt model doesn't perform well in the CNN regression fashion?
Or it has sth to do with the drastic hypothetical move in canonical camera space? I'm trying to tell which of the 2 patterns this website shows applies to Metric3D:

https://exposuretherapy.ca/photography-guide/perspective-and-camera-position/

In the first set of images, 4 cameras shoot at the same spot, and then crop/enlarge to get identical images.

However, in the second set of images, 4 cameras positions differently to directly shoot the target object to be of same size. The background gets wider as a result of pin-hole imaging of shorter focal length.

Are we "cropping / dragging" the objects or "moving closer/away" ourselves? If the latter, does this cone of vision contribute to the distorted prediction?

Forgive me for any silly mistakes, and please do correct me, clearly I could use some help on optics... Cheers!

from metric3d.

JUGGHM commented on June 3, 2024

Thanks for looking into this! Metric3D has a convincing theory and shows good generalizing ability in practice, as a novice I can only make blind guesses towards this extreme scenario(focal length = 1967) :

Is it due to the large difference compared with training data distribution (focal length mainly below 1000), such that the ConvNeXt model doesn't perform well in the CNN regression fashion?

Or it has sth to do with the drastic hypothetical move in canonical camera space? I'm trying to tell which of the 2 patterns this website shows applies to Metric3D:

https://exposuretherapy.ca/photography-guide/perspective-and-camera-position/

In the first set of images, 4 cameras shoot at the same spot, and then crop/enlarge to get identical images.

However, in the second set of images, 4 cameras positions differently to directly shoot the target object to be of same size. The background gets wider as a result of pin-hole imaging of shorter focal length.

Are we "cropping / dragging" the objects or "moving closer/away" ourselves? If the latter, does this cone of vision contribute to the distorted prediction?

Forgive me for any silly mistakes, and please do correct me, clearly I could use some help on optics... Cheers!

Personally I do not think focal length will affect the shapes of objects in prediction. However it might affect the scale learning.
For your case, I think following could explain well:

CROP will not affect the scales
RESIZE towards LARGER / SMALLER sizes means the focal length becomes larger/smaller
For one specific object, it can be regarded as moving closer/away from ourselves. But in the real world, while the cameras are posed differently, different objects (with different depth) will be resized differently, as we have derived in the figure above.

from metric3d.

kwea123 commented on June 3, 2024

I'm also very suspicious of training using image crops, like if your crop looks like this, how do you know how far it is? It looks the same at multiple different distances, as you don't know the surroundings

from metric3d.

YvanYin commented on June 3, 2024

I'm also very suspicious of training using image crops, like if your crop looks like this, how do you know how far it is? It looks the same at multiple different distances, as you don't know the surroundings

The showed case has a very small field of view. If you enlarge the training crops size, this problem can be allievated.

from metric3d.

defective prediction for large focal length about metric3d HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent