Comments (11)
Hi,
The current implementation for class_error
takes into account the error for predicting "no-object", so it can be misleadingly high.
So if you replace
Line 117 in b7b62c0
with
losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]
you'll get the class error without taking into account the "no-object" class.
That being said, training takes a long time for DETR, and we provide training logs for a reduced schedule in https://github.com/facebookresearch/detr#training
By looking at the training logs, you can see that after the first epoch of COCO training the class error is at 89, going down progressively.
As an additional note, if you change the number of object queries, you might need to change other hyperparameters as well (like the eos_coef
). Note that having more queries means that you'll have more candidate boxes to move around, which could lead to slower training.
from detr.
@ThomasDougherty as a rule of thumb, DETR takes much longer than Faster R-CNN to train as of now, as the Transformer needs to learn region priors that are included by default in Faster R-CNN.
So depending on the size of your dataset, I would say that you should maybe focus at at least 300k training iterations to get competitive results.
from detr.
@ThomasDougherty I don't think we have a recipe that always work as of now.
I would maybe encourage you to first use the 100 queries.
Also note that the eos_coef
is a function of the average number of objects in the image.
from detr.
@ThomasDougherty What GPU are you using? what batch size? For how many epochs did you train so far? Does it happen for both COCO and your dataset?
I'm asking because in my case, when trying to finetune on my dataset using a RTX2080ti which can hold batches of 2 (my images are 1280x720) I experienced the same behavior.
However when training on an RTX Titan (24GB) with batches of 6, it started better but is now deteriorating :(
I'm using a learning rate of 0.0001
from detr.
I'm using a single Tesla P100. Batch size has been fixed at 2. Learning rate 0.0001. My personal dataset I've trained for 20 epochs and the class error is majority 100. I've only trained COCO2017 for 2 epochs since it takes so long haha, but I've seen logs that class error starts dropping after the first epoch.
My personal dataset works well with Detectron packages like Faster RCNN when training, so I know there can be better results than what I'm seeing.
Very cool, I got some wiggle room to increase my batch size so I will try that and pray that'll fix it.
from detr.
Hi fmassa,
Thank you for your reply! I'm dropping my num_queries to 30 since my max objects is ~20. How much should I lower the eos_coef
? Is there some scaling ratio you recommend between num_queries and eos_coef?
from detr.
@fmassa Awesome thank you! Changed the loss and kept the default arguments. I'm already seeing my class_error
not stuck at 100 haha.
from detr.
@ThomasDougherty Did you make any progress with training on your own dataset?
from detr.
@raviv I'm so far 80 epochs in. My errors are still relatively high, but not stuck at 100 anymore. How's yours going?
from detr.
@ThomasDougherty
Mine is acting differently. Started very promising with nice detections as early as few thousand iterations. But as you can see it spikes into new maximum.
I saw this behavior with other Resnet based networks, such as SSD+Resnet50, where it spikes and then eventually converges. I'll have to wait and see if this one does as well...
from detr.
@fmassa hello, whether this high class error is harmful for the final performance
from detr.
Related Issues (20)
- Question about object queries. HOT 4
- I want to train the DETR model on a CPU. How can I make it possible on a small computer, 8gb RAM HOT 3
- Why positional encoding is added to different role in encoder and decoder. HOT 1
- 🐛 Bug: Architecture diagram in README.md renders incorrectly when using dark mode
- continue training with chekckpoint
- How to finetune DETR for semantic segmentation task?
- I do not understand what the mask meaning in "samlpes"
- Process finished with exit code 137 (interrupted by signal 9: SIGKILL)Please read & provide the following
- Very low performance for segmentation task.
- box_cxcywh_to_xyxy
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 6 (pid: 257736) of binary: /home/public/anaconda3/envs/DL/bin/python
- Average Precision of each class for best epoch and then it's mean HOT 1
- the mAP is chage
- I think there are some errors in the posted code HOT 6
- Queries for images with low number of objects HOT 2
- RuntimeError: Error(s) in loading state_dict for DETRsegm: HOT 2
- Map metrics anomalies after backbone replacement
- when the trained model is used for inference this import error comes: RuntimeError: Failed to import transformers.models.detr.modeling_detr because of the following error (look up to see its traceback): cannot import name 'experimental_functions_run_eagerly' from 'tensorflow.python.eager.def_function' (C:\Anaconda\lib\site-packages\tensorflow\python\eager\def_function.py)
- Get Image masks coordinates.
- GFLOPs instead of GFLOPS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detr.