Hi, so I've tried training with a personal dataset and COCO2017 for a sanity check. My

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

High class error about detr HOT 11 CLOSED

facebookresearch commented on June 16, 2024 2

High class error

from detr.

Comments (11)

fmassa commented on June 16, 2024 29

Hi,

The current implementation for class_error takes into account the error for predicting "no-object", so it can be misleadingly high.

So if you replace

detr/models/detr.py

Line 117 in b7b62c0

losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]

with

losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]

you'll get the class error without taking into account the "no-object" class.

That being said, training takes a long time for DETR, and we provide training logs for a reduced schedule in https://github.com/facebookresearch/detr#training

By looking at the training logs, you can see that after the first epoch of COCO training the class error is at 89, going down progressively.

As an additional note, if you change the number of object queries, you might need to change other hyperparameters as well (like the eos_coef). Note that having more queries means that you'll have more candidate boxes to move around, which could lead to slower training.

from detr.

fmassa commented on June 16, 2024 1

@ThomasDougherty as a rule of thumb, DETR takes much longer than Faster R-CNN to train as of now, as the Transformer needs to learn region priors that are included by default in Faster R-CNN.

So depending on the size of your dataset, I would say that you should maybe focus at at least 300k training iterations to get competitive results.

from detr.

fmassa commented on June 16, 2024 1

@ThomasDougherty I don't think we have a recipe that always work as of now.
I would maybe encourage you to first use the 100 queries.
Also note that the eos_coef is a function of the average number of objects in the image.

from detr.

raviv commented on June 16, 2024

@ThomasDougherty What GPU are you using? what batch size? For how many epochs did you train so far? Does it happen for both COCO and your dataset?

I'm asking because in my case, when trying to finetune on my dataset using a RTX2080ti which can hold batches of 2 (my images are 1280x720) I experienced the same behavior.

However when training on an RTX Titan (24GB) with batches of 6, it started better but is now deteriorating :(

I'm using a learning rate of 0.0001

from detr.

ThomasDougherty commented on June 16, 2024

I'm using a single Tesla P100. Batch size has been fixed at 2. Learning rate 0.0001. My personal dataset I've trained for 20 epochs and the class error is majority 100. I've only trained COCO2017 for 2 epochs since it takes so long haha, but I've seen logs that class error starts dropping after the first epoch.

My personal dataset works well with Detectron packages like Faster RCNN when training, so I know there can be better results than what I'm seeing.

Very cool, I got some wiggle room to increase my batch size so I will try that and pray that'll fix it.

from detr.

ThomasDougherty commented on June 16, 2024

Hi fmassa,

Thank you for your reply! I'm dropping my num_queries to 30 since my max objects is ~20. How much should I lower the eos_coef ? Is there some scaling ratio you recommend between num_queries and eos_coef?

from detr.

ThomasDougherty commented on June 16, 2024

@fmassa Awesome thank you! Changed the loss and kept the default arguments. I'm already seeing my class_error not stuck at 100 haha.

from detr.

raviv commented on June 16, 2024

@ThomasDougherty Did you make any progress with training on your own dataset?

from detr.

ThomasDougherty commented on June 16, 2024

@raviv I'm so far 80 epochs in. My errors are still relatively high, but not stuck at 100 anymore. How's yours going?

from detr.

raviv commented on June 16, 2024

@ThomasDougherty
Mine is acting differently. Started very promising with nice detections as early as few thousand iterations. But as you can see it spikes into new maximum.
I saw this behavior with other Resnet based networks, such as SSD+Resnet50, where it spikes and then eventually converges. I'll have to wait and see if this one does as well...

from detr.

SISTMrL commented on June 16, 2024

@fmassa hello, whether this high class error is harmful for the final performance

from detr.

High class error about detr HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent