Giter Site home page Giter Site logo

Comments (4)

alcinos avatar alcinos commented on June 19, 2024 2

Just to give slightly more context here for reference: The demo model has been trained with a preprocessing that ensures the longest edge of the image is at most 1333.

In the demo model, the positional encodings are learnt embeddings, hence if you feed images with an edge longer than 1333 it will be using un-trained embeddings (or out-right crash if you go really big, since we have only so many embeddings), hence the weird results. As noted by @tomek-l, the problem goes away if you resize to a smaller size (hence going back "in-domain"). We might update the demo collab to safe-guard against these out-of-distribution sizes.

Our torch-hub models are also trained with the same max longest edge of 1333, but as noted by @fmassa , the encodings we use in them are based on sines/cosines (as in the original transformer model), appropriately scaled to the image-size. As such, they are more robust to varying image-sizes.

from detr.

tomasz-lewicki avatar tomasz-lewicki commented on June 19, 2024 1

Thank you for your detailed explanation!

By the way - I think it's worth keeping the notebook with with this toy example even if it has limitations. It's really refreshing to see something like that as starter code, as opposed to a wall of argparse 👍

from detr.

fmassa avatar fmassa commented on June 19, 2024

Hi,

Thanks for opening this issue!

The underlying problem is that the DETRdemo model that we use in the colab (which is a demo model for illustration purposes) is not very robust to out-of-distribution images, like images which have a max size larger than 1600. This is due to the positional embedding row_embed and col_embed not being invariant to the image size.

In this case, your image size is too large, and it goes out of the boundaries for the positional embedding that we are using for the demo.

But if you replace the DETRdemo model with the models we used for the paper results

detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
detr.eval();

then all the issues that you were facing disappear, as the positional encoding that we use in the paper is more robust to varying image sizes.

Here are the results I obtained with your images and the torchhub model
image
and
image

But this makes me think that we should upload a new colab notebook illustrating how to use the torchhub models, which work better and are more robust.

I believe I have answered your question, and as such I'm closing the issue, but let us know if you have other problems or questions.

from detr.

priya-dwivedi avatar priya-dwivedi commented on June 19, 2024

Awesome work. Thank you for clarifying this issue.

from detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.