Comments (4)
Just to give slightly more context here for reference: The demo model has been trained with a preprocessing that ensures the longest edge of the image is at most 1333.
In the demo model, the positional encodings are learnt embeddings, hence if you feed images with an edge longer than 1333 it will be using un-trained embeddings (or out-right crash if you go really big, since we have only so many embeddings), hence the weird results. As noted by @tomek-l, the problem goes away if you resize to a smaller size (hence going back "in-domain"). We might update the demo collab to safe-guard against these out-of-distribution sizes.
Our torch-hub models are also trained with the same max longest edge of 1333, but as noted by @fmassa , the encodings we use in them are based on sines/cosines (as in the original transformer model), appropriately scaled to the image-size. As such, they are more robust to varying image-sizes.
from detr.
Thank you for your detailed explanation!
By the way - I think it's worth keeping the notebook with with this toy example even if it has limitations. It's really refreshing to see something like that as starter code, as opposed to a wall of argparse
👍
from detr.
Hi,
Thanks for opening this issue!
The underlying problem is that the DETRdemo
model that we use in the colab (which is a demo model for illustration purposes) is not very robust to out-of-distribution images, like images which have a max size larger than 1600. This is due to the positional embedding row_embed
and col_embed
not being invariant to the image size.
In this case, your image size is too large, and it goes out of the boundaries for the positional embedding that we are using for the demo.
But if you replace the DETRdemo
model with the models we used for the paper results
detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
detr.eval();
then all the issues that you were facing disappear, as the positional encoding that we use in the paper is more robust to varying image sizes.
Here are the results I obtained with your images and the torchhub model
and
But this makes me think that we should upload a new colab notebook illustrating how to use the torchhub models, which work better and are more robust.
I believe I have answered your question, and as such I'm closing the issue, but let us know if you have other problems or questions.
from detr.
Awesome work. Thank you for clarifying this issue.
from detr.
Related Issues (20)
- Question about object queries. HOT 4
- I want to train the DETR model on a CPU. How can I make it possible on a small computer, 8gb RAM HOT 3
- Why positional encoding is added to different role in encoder and decoder. HOT 1
- 🐛 Bug: Architecture diagram in README.md renders incorrectly when using dark mode
- continue training with chekckpoint
- How to finetune DETR for semantic segmentation task?
- I do not understand what the mask meaning in "samlpes"
- Process finished with exit code 137 (interrupted by signal 9: SIGKILL)Please read & provide the following
- Very low performance for segmentation task.
- box_cxcywh_to_xyxy
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 6 (pid: 257736) of binary: /home/public/anaconda3/envs/DL/bin/python
- Average Precision of each class for best epoch and then it's mean HOT 1
- the mAP is chage
- I think there are some errors in the posted code HOT 6
- Queries for images with low number of objects HOT 2
- RuntimeError: Error(s) in loading state_dict for DETRsegm: HOT 2
- Map metrics anomalies after backbone replacement
- when the trained model is used for inference this import error comes: RuntimeError: Failed to import transformers.models.detr.modeling_detr because of the following error (look up to see its traceback): cannot import name 'experimental_functions_run_eagerly' from 'tensorflow.python.eager.def_function' (C:\Anaconda\lib\site-packages\tensorflow\python\eager\def_function.py)
- Get Image masks coordinates.
- GFLOPs instead of GFLOPS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detr.