Giter Site home page Giter Site logo

High class error about detr HOT 11 CLOSED

facebookresearch avatar facebookresearch commented on June 16, 2024 2
High class error

from detr.

Comments (11)

fmassa avatar fmassa commented on June 16, 2024 29

Hi,

The current implementation for class_error takes into account the error for predicting "no-object", so it can be misleadingly high.

So if you replace

losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]

with

losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]

you'll get the class error without taking into account the "no-object" class.

That being said, training takes a long time for DETR, and we provide training logs for a reduced schedule in https://github.com/facebookresearch/detr#training

By looking at the training logs, you can see that after the first epoch of COCO training the class error is at 89, going down progressively.

As an additional note, if you change the number of object queries, you might need to change other hyperparameters as well (like the eos_coef). Note that having more queries means that you'll have more candidate boxes to move around, which could lead to slower training.

from detr.

fmassa avatar fmassa commented on June 16, 2024 1

@ThomasDougherty as a rule of thumb, DETR takes much longer than Faster R-CNN to train as of now, as the Transformer needs to learn region priors that are included by default in Faster R-CNN.

So depending on the size of your dataset, I would say that you should maybe focus at at least 300k training iterations to get competitive results.

from detr.

fmassa avatar fmassa commented on June 16, 2024 1

@ThomasDougherty I don't think we have a recipe that always work as of now.
I would maybe encourage you to first use the 100 queries.
Also note that the eos_coef is a function of the average number of objects in the image.

from detr.

raviv avatar raviv commented on June 16, 2024

@ThomasDougherty What GPU are you using? what batch size? For how many epochs did you train so far? Does it happen for both COCO and your dataset?

I'm asking because in my case, when trying to finetune on my dataset using a RTX2080ti which can hold batches of 2 (my images are 1280x720) I experienced the same behavior.

image

However when training on an RTX Titan (24GB) with batches of 6, it started better but is now deteriorating :(

image

I'm using a learning rate of 0.0001

from detr.

ThomasDougherty avatar ThomasDougherty commented on June 16, 2024

I'm using a single Tesla P100. Batch size has been fixed at 2. Learning rate 0.0001. My personal dataset I've trained for 20 epochs and the class error is majority 100. I've only trained COCO2017 for 2 epochs since it takes so long haha, but I've seen logs that class error starts dropping after the first epoch.

My personal dataset works well with Detectron packages like Faster RCNN when training, so I know there can be better results than what I'm seeing.

Very cool, I got some wiggle room to increase my batch size so I will try that and pray that'll fix it.

from detr.

ThomasDougherty avatar ThomasDougherty commented on June 16, 2024

Hi fmassa,

Thank you for your reply! I'm dropping my num_queries to 30 since my max objects is ~20. How much should I lower the eos_coef ? Is there some scaling ratio you recommend between num_queries and eos_coef?

from detr.

ThomasDougherty avatar ThomasDougherty commented on June 16, 2024

@fmassa Awesome thank you! Changed the loss and kept the default arguments. I'm already seeing my class_error not stuck at 100 haha.

from detr.

raviv avatar raviv commented on June 16, 2024

@ThomasDougherty Did you make any progress with training on your own dataset?

from detr.

ThomasDougherty avatar ThomasDougherty commented on June 16, 2024

@raviv I'm so far 80 epochs in. My errors are still relatively high, but not stuck at 100 anymore. How's yours going?

image

from detr.

raviv avatar raviv commented on June 16, 2024

@ThomasDougherty
Mine is acting differently. Started very promising with nice detections as early as few thousand iterations. But as you can see it spikes into new maximum.
I saw this behavior with other Resnet based networks, such as SSD+Resnet50, where it spikes and then eventually converges. I'll have to wait and see if this one does as well...

image

from detr.

SISTMrL avatar SISTMrL commented on June 16, 2024

@fmassa hello, whether this high class error is harmful for the final performance

from detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.