Hi! The colab demo doesn't seem to be working w

Problem with aspect rations in colab demo about detr HOT 4 CLOSED

facebookresearch commented on June 19, 2024

Problem with aspect rations in colab demo

from detr.

Comments (4)

alcinos commented on June 19, 2024 2

Just to give slightly more context here for reference: The demo model has been trained with a preprocessing that ensures the longest edge of the image is at most 1333.

In the demo model, the positional encodings are learnt embeddings, hence if you feed images with an edge longer than 1333 it will be using un-trained embeddings (or out-right crash if you go really big, since we have only so many embeddings), hence the weird results. As noted by @tomek-l, the problem goes away if you resize to a smaller size (hence going back "in-domain"). We might update the demo collab to safe-guard against these out-of-distribution sizes.

Our torch-hub models are also trained with the same max longest edge of 1333, but as noted by @fmassa , the encodings we use in them are based on sines/cosines (as in the original transformer model), appropriately scaled to the image-size. As such, they are more robust to varying image-sizes.

from detr.

tomasz-lewicki commented on June 19, 2024 1

Thank you for your detailed explanation!

By the way - I think it's worth keeping the notebook with with this toy example even if it has limitations. It's really refreshing to see something like that as starter code, as opposed to a wall of argparse 👍

from detr.

fmassa commented on June 19, 2024

Hi,

Thanks for opening this issue!

The underlying problem is that the DETRdemo model that we use in the colab (which is a demo model for illustration purposes) is not very robust to out-of-distribution images, like images which have a max size larger than 1600. This is due to the positional embedding row_embed and col_embed not being invariant to the image size.

In this case, your image size is too large, and it goes out of the boundaries for the positional embedding that we are using for the demo.

But if you replace the DETRdemo model with the models we used for the paper results

detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
detr.eval();

then all the issues that you were facing disappear, as the positional encoding that we use in the paper is more robust to varying image sizes.

Here are the results I obtained with your images and the torchhub model

and

But this makes me think that we should upload a new colab notebook illustrating how to use the torchhub models, which work better and are more robust.

I believe I have answered your question, and as such I'm closing the issue, but let us know if you have other problems or questions.

from detr.

priya-dwivedi commented on June 19, 2024

Awesome work. Thank you for clarifying this issue.

from detr.

Recommend Projects

Problem with aspect rations in colab demo about detr HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent