Very impressed with the all new innovative architecture in Detr! Can you clarify r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Recommendations for training Detr on custom dataset?,about facebookresearch/detr

Comments (205)

alcinos commented on June 16, 2024 35

If you just want to replace the classification head, you need to erase it before loading the state dict. One approach would be:

model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=False, num_classes=50)
checkpoint = torch.hub.load_state_dict_from_url(
            url='https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth',
            map_location='cpu',
            check_hash=True)
del checkpoint["model"]["class_embed.weight"]
del checkpoint["model"]["class_embed.bias"]
model.load_state_dict(checkpoint["model"], strict=False)

Best of luck.

from detr.

alcinos commented on June 16, 2024 26

Hello,
Thanks for your interest in DETR.
It depends on the size of your dataset. If you have enough data (say at least 10K), training from scratch should work just fine. You'll need to prepare the data in the coco format and then follow instructions from the Readme. Note that if your dataset has a substantially different average number of objects per image than coco, you might need to adjust the number of object queries (--num_queries) It should be strictly higher than the max number of objects you may have to detect, and it's good to have some slack (in coco we use 100, the max number of objects in a coco image is ~70)

Fine-tuning should work in theory, but at the moment it's not tested/supported. If you want to give it a go anyways, you just need to --resume from one of the checkpoint we provide. Feel free to report back any results you obtain :)

Best of luck

from detr.

m-klasen commented on June 16, 2024 19

@tanulsingh I wrote quick gist on how you can modify DETR to finetune on your own coco-formatted dataset Link. Hope this helps.

from detr.

lessw2020 commented on June 16, 2024 15

Just a quick update that I am getting really outstanding results on my object detection inference with DETR (via fine tuning the res101 model).
I still have oddities in no mAP score, and class loss test curve is stuck at 100% error, but in running the actual detections on test images today it just smoked EfficientDet by comparison (D0 and D1). I'm sure this is b/c DETR can understand relationships, which is a big leap for this diagnostic work where all the items are inter-related and was the key reason I was super fired up to switch to DETR as soon as I read about the transformer architecture.
Anyway just wanted to post a big thanks to @fmassa and @alcinos esp. both for the help in getting training going (and also inventing DETR), and @raviv and @mlk1337 for additional feedback here.
This is for malaria and covid work fyi, so it has real life impact.
Thanks again!
(note I'm not signing off here, still have lots more datasets to train and fix mAP etc. but did want to provide an update and thanks!)

from detr.

cbasavaraj commented on June 16, 2024 9

It would be easier (or at least more standard practice) to first load the pre-trained model, and then replace the classification head.

from detr.

m-klasen commented on June 16, 2024 9

Hi, currently working with my custom dataset. Relatively small with ~2k Train, 400 valid images (32 video sequence clips) and only 4 classes with a maximum of 6 instances per image.
For my first training attempt i set num_queries=20 and discared all transformer weights etc.
I trained 400 epochs with apex fp16 at lr 1e-4 with a lr_drop to 1e-5 at 200.

Evaluation at ep400 gives me a mAP of 0.45 which i can benchmark against a known good MaskRCNN from my colleague who achieves 0.63 mAP.
My questions now are, which are the primary reason for the weaker performance?

More training? Better LR adjustments with a decay for example (hard to do with the first attempt when you are going in blind)?
reduce num_queries further?
class/bg loss coef adjustment?
...?

from detr.

lessw2020 commented on June 16, 2024 9

I started on a colab notebook today to walk through fine tuning, though didn't get as far as I thought b/c have some design decisions to make re: easiest way to wrapper custom datasets and should we do the training with all the args right in the notebook or run it with the shell command as current.
(I wrote my own class to handle, but might be easier way and I have trained both by setting all args in a notebook and with shell command. Personally I like having the args listed and all available so I think I'll proceed with that...)
Here's a link to the start though it just ramps up to the real issues atm.
https://github.com/lessw2020/training-detr/blob/master/training_detr_colab.ipynb

from detr.

raviv commented on June 16, 2024 7

Hi,

When fine-tuning from model zoo, using my own dataset, how should I modify the number of classes?
Loading the model fails (as expected) on:

RuntimeError: Error(s) in loading state_dict for DETR:
	size mismatch for class_embed.weight: copying a param with shape torch.Size([92, 256]) from checkpoint, the shape in current model is torch.Size([51, 256]).
	size mismatch for class_embed.bias: copying a param with shape torch.Size([92]) from checkpoint, the shape in current model is torch.Size([51]).

As I have 50 labels, and the checkpointed model has 91.

Thanks!

from detr.

lgvaz commented on June 16, 2024 6

For those having problems training on custom datasets,

I'm writing a library that unifies a data API for object detection, I just finished a tutorial on how to use it with Detr here.

The project provides a very flexible API for custom datasets while still using Detr original source code for training, be sure to take a look =)

@MHI4 , @AlexAndrei98 I'm tagging you because you're interested in a tutorial (the source code of the link I shared is a notebook btw, so you can run that).

from detr.

m-klasen commented on June 16, 2024 5

After training with the following parameters I noticed that the model is not quite succesful at learning has anyone had success into fine tuning for their dataset?
args.num_classes = 11
args.epochs = 3000
args.batch_size = 2
args.lr = 0.05
args.train_only_head = True

Your learning rate is really high for this architecture. Try something lower like 1e-4.

from detr.

alcinos commented on June 16, 2024 4

If you're fine-tuning, I don't recommend changing the number of queries on the fly, it is extremely unlikely to work out of the box. In this case you're probably better off retraining from scratch (you can change the --num_queries arg from our training script).

As for the initialization of class_embed, the solution I posted above makes sure it is initialized as it should.

Best of luck

from detr.

lessw2020 commented on June 16, 2024 4

My dataset has images of various sizes.
Do I need to resize them to a specific size?

I can't answer definitively but if you look at the code in datasets/coco.py, you can see how they handled their image resizing for coco training. Basically they do random rescaling per the scales list, with the largest size dimension maxed at 1333:
`
scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]

if image_set == 'train':
    return T.Compose([
        T.RandomHorizontalFlip(),
        T.RandomSelect(
            T.RandomResize(scales, max_size=1333),
            T.Compose([
                T.RandomResize([400, 500, 600]),
                T.RandomSizeCrop(384, 600),
                T.RandomResize(scales, max_size=1333),

The colab example used a max size of 800, with half precision weights.

Thus if your images are all larger than 1333 in one dimension, then they'll all be resized below that with padding anyway.

Hopefully others can add more info here but hope this provides some starter info for you.

from detr.

fmassa commented on June 16, 2024 4

@raviv no, the architecture doesn't expect a maximum size, but note that the Transformer Encoder is quadratic wrt the number of pixels in the feature map, so if your image is very large (like larger than 2000 pixels), you might face memory issues.

from detr.

lessw2020 commented on June 16, 2024 4

Hi @MHI4 - I can make a colab this weekend if no one beats me to it.

1 - Do you have a custom dataset I can use to test on and private email for testing it so we don't clog up this thread during dev/testing? My work datasets are private so I can't use those, or we can also pick a general smaller detection dataset for example.

2 - Did you already see @mlk1337 gist link to gist as that gives you the key steps needed to fine tune, though maybe a bit more knowledge to setup vs colab with pluggable params for a dataset.

from detr.

lessw2020 commented on June 16, 2024 4

Hi @AlexAndrei98,
I have been very successful with fine tuning. My recommendation is stick with the provided defaults first and only modify after you get a feel for how those do on your dataset.
In your example above, as @mlk1337 correctly points out, your learning rate is too high. Thus you are just oscillating without learning.
1e-4 is the default lr for detr transformer and head and 1e-5 for the backbone. I would recommend starting with those same learning rates. Training for transformers tends to be more of a long steady process so I'd change your lr and run with that to start.
Also try with 300 epochs, not the 3000 you have and check your results at that point.
Hope that helps!

from detr.

alcinos commented on June 16, 2024 3

@mlk1337 with such a small dataset, I'd recommend trying to fine-tune the class head, while starting from a pre-trained encoder/decoder. You'll have to keep the 100 queries if you do that, but unless you're after very marginal speed improvement it shouldn't hurt.

from detr.

m-klasen commented on June 16, 2024 3

@kratosld

Hi,

If I want to train on Openimages v6 dataset with 600 classes in 30 GB sets, is it recommended to train all the layers or just the classification head?

And, does the classification head consist of class_embed and bbox_embed or just the class_embed

Finally, if I set num_queries = 700, and 500 epochs, would that be alright?

Just remove class_embed.weight & class_embed.bias and keep the rest.
Unless you literally have 700 items in each image, do not change num_queries and keep it at 100, this will give you 100 proposed boxes for each item. Changing num_queries will result in retraining the whole transformer, which is costly.

from detr.

lessw2020 commented on June 16, 2024 3

I've made more progress on the colab for custom training - it's at the point of building the model/post-processor/criterion but I have a bit of a sticking point b/c num_classes in detr.py::build(args) is determined from dataset name.
I simply modified detr.py for my own training but that's not a good solution b/c it will break over time with new updates. I will open an issue and probably do a PR tonight if that's of interest, to close the loop on this so that simply passing in an args.num_classes is supported directly (i.e. defaults to 20 per the current code, but if not coco or coco_panoptic, it will adjust num_classes)?
I think that's the cleanest solution w/o disrupting anything and avoid the need to manually edit detr.py.

from detr.

alcinos commented on June 16, 2024 3

Hi everyone, we released a Detectron2 wrapper for DETR. This can be used as an alternative option to train on custom datasets (though it requires more dependencies).
You can refer to Detectron2's tutorial and colab to get started.

Best of luck.

from detr.

alcinos commented on June 16, 2024 2

My dataset has images of various sizes.
Do I need to resize them to a specific size?

As was noted by @lessw2020, the images will be randomly resized in an appropriate range by our data-augmentation. The images will then be padded, so having different sizes is not an issue.

Thanks for wonderful work,
What is your recommendation to use DETR for single object detection(e.g., scene text detection) datasets?

I'm not sure about the specifics of your dataset, but in general I'd say all the general advice provided in this thread apply to the case where there is only one object class.

from detr.

raviv commented on June 16, 2024 2

This is how my losses look like so far.
Would love to get other's input on their attempt to train on DETR on custom datasets.

from detr.

lessw2020 commented on June 16, 2024 2

@raviv - happy to share my training results but can you post your plot code for the graphs and I'll use that? Right now I just have text output as the detr plot_utils wasn't working (wasn't sure if I should debug that or just move it to tensorboard, looking at that now).
@mlk1337 - same question, can you share your plot code for the logs?

from detr.

m-klasen commented on June 16, 2024 2

@lessw2020

detr/util/plot_utils.py

Line 20 in 5617b89

    
           coco_eval = pd.DataFrame(pd.np.stack(df.test_coco_eval.dropna().values)[:, 1]).ewm(com=ewm_col).mean()

changed to
pd.DataFrame(pd.np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]).ewm(com=ewm_col).mean()
worked for me (for bounding boxes)

from detr.

MHI4 commented on June 16, 2024 2

Hello All,
I am quite a beginner in python. My experience is only in MATLAB based training. I was wondering whether anyone enthusiastically prepare a Google Colab notebook for us to train on our Custom Dataset. It might help us to learn the sequential training and validation steps.
I appreciate your contribution, @lessw2020 @mlk1337 @raviv @fmassa @alcinos.
Thank you all in Advance.

from detr.

lgvaz commented on June 16, 2024 2

args.num_classes is supported directly

@lessw2020 In the example I shared I implemented exactly that, in a backwards compatible way:

def build(args):
    if args.num_classes is not None:
        num_classes = args.num_classes
    else:
        num_classes = 20 if args.dataset_file != 'coco' else 91
        if args.dataset_file == "coco_panoptic":
            num_classes = 250

If it's of interest I can do a PR, just let me know =)

from detr.

fmassa commented on June 16, 2024 2

@tazu786 in our implementation of COCO datasets, we convert the boxes to x1y1x2y2 format, see

detr/datasets/coco.py

Line 67 in 1fcfc65

boxes[:, 2:] += boxes[:, :2]

from detr.

alcinos commented on June 16, 2024 2

Hi @vickraj ,

500 queries is quite high, it's fairly possible that the default hyper-parameters are not optimal for this setting. You can at least try to play with the eos-coef to see if it helps (maybe try 0.2 and 0.05 instead of the default 0.1)

As for duplicate boxes, are the duplicates high confidence?
In general, it really depends what is your end-goal and how you use the predicted boxes down the road. If you only care about AP, and your duplicate boxes are somewhat low confidence, then NMS is unlikely to improve the AP score. As a matter of fact, when evaluating AP for DETR, we provide to the evaluator ALL the queries, even those for which the highest scoring class is "no-object" (in this case we use the second best scoring class and its associated confidence).

Now if you don't really care about AP but care about the quality of the boxes, then it is likely that you can threshold the boxes based on the detection confidence to solve your issue of lesser-quality boxes (eg duplicates), similarly to what we do in the example collab. Note that the said threshold might be class-dependent, especially if the classes are un-balanced. In coco for eg, "person" is the majority class by far, and using a threshold like 0.9 or even 0.95 is likely to retain all the salient ppl in the image without duplicates. For rare classes like "hair-drier", you'll want to use a much lower threshold.

Hope this helps, and best of luck in your trainings with DETR.

from detr.

tanulsingh commented on June 16, 2024 2

Hi @lessw2020 Please see this https://www.kaggle.com/tanulsingh077/end-to-end-object-detection-with-transformers-detr
I don't know whether it helps or not but I would love to have some views on it

from detr.

raviv commented on June 16, 2024 1

@lessw2020 I'm using https://github.com/allegroai/trains/ to track training

from detr.

vickraj commented on June 16, 2024 1

Hey all,

I've been training DETR on a custom datasets that has a few pretty dense scenes (350-450ish objects) - Trained from scratch with 500 num-queries, default everything else (had to change the learning rate a bit to make it converge, but it appears to have converged at this point.

Only problem is that there are SEVERAL duplicate boxes on the images with <200 objects, sometimes with the same class sometimes with different classes... NMS can clearly help here, but I was hoping for a solution that doesn't use NMS since to my understanding DETR was not supposed to require this. The converged class error is also pretty high, which i imagine is contributing here. Any suggestions on other parameters to change, or should I just do nms and be done with it?

from detr.

martinj3456 commented on June 16, 2024 1

Hi @lessw2020,
I am looking forward to your colab script to train DETR on my custom dataset. When might the script be ready for us?

Thank you in advance for this awesome endeavor.

from detr.

Dicko87 commented on June 16, 2024 1

Great stuff @lessw2020 😃 waiting in anticipation haha... patiently for course!

from detr.

fmassa commented on June 16, 2024 1

@tanulsingh I believe I've answered your question in #109

from detr.

Dicko87 commented on June 16, 2024 1

Hey @tanulsingh, great work.
I was wondering if you know how to create a test set and calculate the mAP of it, that would be a great thing to see.

from detr.

lessw2020 commented on June 16, 2024 1

Hi @tanulsingh - nice job! I like how you integrated Albumentations as I will be doing the same for my own work.

@Dicko87 - I had the same thing with the large class_error fluctuations, so it's pretty normal. Using the average meter that Tanul used in his kernel might smooth that out. Alternatively I hope to setup with tensorboard for the training visuals soon and can run an average there.

I did not get mAP working yet on my datasets so definitely seeing more info on that from anyone that has it working would be great.

from detr.

tanulsingh commented on June 16, 2024 1

@Dicko87 Thanks ,yeah that is the next step , will soon do that , I have a lot of things planned , I will report the map here as well as in kaggle dicussion forums .
I have also prepared a separate kernel for calculating MAP for any dataset , will be updating that too soon @lessw2020

from detr.

Dicko87 commented on June 16, 2024 1

@tanulsingh Exciting stuff, I can’t wait to see the results! :)

from detr.

lessw2020 commented on June 16, 2024 1

oh forgot to answer @Dicko87 - yes you can pickup again if training is interrupted. Look for the latest weights in the output directory and then use the --resume option and point it to those weights to pick up again.

@tanulsingh - kernel for calculating mAP for any dataset sounds outstanding! Definitely look forward to that.

I also got the heatmaps going on my work datasets from the checkin that was done yesterday (thanks again @fmassa and @alcinos!) and was able to then interpolate them onto the test images directly to give a nicer visual result and will show how to do that soon.
I'm hoping to mimic the way they did it in the paper today as they apparently used seaborn vs I was using magma and jet cmaps which look good but not as nice as the ones in the paper.

from detr.

Zumbalamambo commented on June 16, 2024

from detr.

PancakeAwesome commented on June 16, 2024

agree

from detr.

lessw2020 commented on June 16, 2024

related question but how should we downgrade the query number for smaller classes ( in terms of continuing from the approach above)?
For example I only have 5 classes to detect and each image will have exactly 5 classes per image, so I was planning to run with queries = 12 instead of the default 100 (or should it be 5 if we know that's the max our images will ever have...)

I'm looking at model.query_embed with (100,256) and assume that is the right place to adjust but unclear. If we adjust via model.query_embed.num_embeddings=my_new_query_count, is that enough?
(update - I'm working on this and the DETR model stores a self.num_queries as well, but this is only referenced later for segmentation.
But to be correct should update both model.num_queries and the model.query_embed.num_embeddings would need to be adjusted together...)

from detr.

lessw2020 commented on June 16, 2024

Also wouldn't we want to re-init the weights in class_embed to normal or uniform after wiping the checkpoint weights to kick off the new training?

from detr.

lessw2020 commented on June 16, 2024

Hi @alcinos - excellent, thanks tremendously for the advice here, esp on a Sat night.
I will try both fine tuning for now (with smaller dataset and will not touch num_queries) and from scratch as we'll have a larger dataset soon, and update here to share results.
Thanks again!

from detr.

raviv commented on June 16, 2024

My dataset has images of various sizes.
Do I need to resize them to a specific size?

from detr.

raviv commented on June 16, 2024

@alcinos, @lessw2020 It seems that these resizes are for data augmentation when training.
As I'm using my own dataloader and augmentations, my question is does the architecture (or implementation) expects images to have some maximum size?
Thanks.

from detr.

fmassa commented on June 16, 2024

@mlk1337 thanks for sharing the results!

I think you are at a good starting point. I would say that from the logs you might want to change the eos_coef a bit and try different values. I think the number of num_queries is ok, but the eos_coef probably needs to be adapted.

I don't know if using apex with fp16 affects something or not as I haven't tried, but maybe @szagoruyko can comment on this?

@raviv your training logs are very weird, it seems that the model stopped working at some point early in training. Are you using gradient clipping (it's on by default)

from detr.

raviv commented on June 16, 2024

@fmassa I'm running with the default args.
To keep things simple, I'm using 1 class and disabled all augmentations.
The behavior was similar when training multiple classes and with aug enabled.
To speed things up I'm using a subset of my dataset with 8K train and 2K test

from detr.

tanulsingh commented on June 16, 2024

Hey , I wanted to fine tune DETR myself on custom datasets , But I am new to all , I have been using torchvision models all the time to fine tune on my dataset . I would be glad if someone shares a demo code for fine-tuning @alcinos

from detr.

lessw2020 commented on June 16, 2024

Thanks very much @raviv and @mlk1337 - here's my first two training runs, I used num queries = 12 (6 classes) and trained from scratch.
I modified eos_coef from .1 to .01 to compare. As you can see, training loss looks great but validation not doing so well.
(one caveat though is I can't hflip b/c this is for medical and flipped = no no, so I will be adding in more augmentations which for EffDet made a big difference and may alone be the validation issue here.)
I'm trying with higher queries now just as a fast check and will go 10x higher on eos_coef, and then will also compare the fine tuning only option (with default 100 queries) and then kick in augmentations.
Anyway at least for train loss, it's learning rapidly and easily:

from detr.

raviv commented on June 16, 2024

@lessw2020 What does the dotted line represent?

from detr.

lessw2020 commented on June 16, 2024

Here's fine tuning vs training from scratch - everything looks much better relatively. (not sure why test class error never changes though...need to review loss criterion?)

@raviv - dotted line represents test (validation) scores, sold is training scores.

from detr.

lessw2020 commented on June 16, 2024

related question - has anyone written visualization code for viewing sample output images with bboxes during training (i.e. with and/or without gt boxes in same image)?
edit - actually can leverage this code here for part of the visuals:
https://github.com/plotly/dash-detr/blob/master/model.py
https://github.com/plotly/dash-detr/blob/master/app.py

from detr.

lessw2020 commented on June 16, 2024

lastly, here's fine tuning with detr101-dc5 - for <30 epochs curves look great.
Still unclear what is happening with test_class_error and test_loss_ce (dotted lines = test):

from detr.

raviv commented on June 16, 2024

@lessw2020 Re: visualizing sample output, most of the code is in the project's Colab notebook
You would just have to adapt it to show GT bboxes as well.

from detr.

commented on June 16, 2024

Hi,

If I want to train on Openimages v6 dataset with 600 classes in 30 GB sets, is it recommended to train all the layers or just the classification head?

And, does the classification head consist of class_embed and bbox_embed or just the class_embed

Finally, if I set num_queries = 700, and 500 epochs, would that be alright?

from detr.

commented on June 16, 2024

@kratosld

Hi,
If I want to train on Openimages v6 dataset with 600 classes in 30 GB sets, is it recommended to train all the layers or just the classification head?
And, does the classification head consist of class_embed and bbox_embed or just the class_embed
Finally, if I set num_queries = 700, and 500 epochs, would that be alright?

Just remove class_embed.weight & class_embed.bias and keep the rest.
Unless you literally have 700 items in each image, do not change num_queries and keep it at 100, this will give you 100 proposed boxes for each item. Changing num_queries will result in retraining the whole transformer, which is costly.

Alright, thank you!

from detr.

MHI4 commented on June 16, 2024

Hello @lessw2020,
Thank you in advance for your Colab notebook.

I have started with a small open dataset from here. I have re-annotated object (1 object) in MATLAB and converted it to XML files. Using voc2coco I've generated the json annotation files. All are uploaded here.

You can use this. It would be wonderful. May be you can add the command for voc2coco based XML to json conversion in the same Colab script for all users.

Thank You Again

from detr.

lgvaz commented on June 16, 2024

@mlk1337

@lessw2020

detr/util/plot_utils.py

Line 20 in 5617b89

coco_eval = pd.DataFrame(pd.np.stack(df.test_coco_eval.dropna().values)[:, 1]).ewm(com=ewm_col).mean()

changed to
pd.DataFrame(pd.np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]).ewm(com=ewm_col).mean()
worked for me (for bounding boxes)

Do we have to do this when training with only bboxes (without masks)?

from detr.

AlexAndrei98 commented on June 16, 2024

Hi @MHI4 - I can make a colab this weekend if no one beats me to it.

1 - Do you have a custom dataset I can use to test on and private email for testing it so we don't clog up this thread during dev/testing? My work datasets are private so I can't use those, or we can also pick a general smaller detection dataset for example.

2 - Did you already see @mlk1337 gist link to gist as that gives you the key steps needed to fine tune, though maybe a bit more knowledge to setup vs colab with pluggable params for a dataset.

I think a notebook to run on a custom dataset would be very helpful 🙌🏼🙌🏼

from detr.

AlexAndrei98 commented on June 16, 2024

How does it compare in terms of time to a FasterRCNN in terms of training?

I have 2k images with roughly 10 classes each. I currently have a model that took me two day to train and performs rather well roughly 10000 iters batch size 4 using Colab Pro. Any tips to better tune some parameters? Thanks you

from detr.

lessw2020 commented on June 16, 2024

1 - I made a new PR to hopefully cleanly and robustly handle supporting args.num_classes with full backwards compat: #89

2 - @lgvaz - thanks for the code snippet! I initially had similar (a None check for args.num_classes and default None), but that will throw an exception if num_classes not present at all in args.
I have several modified main.py and others likely as well where args.num_classes would not be present at all, so I wrappered the check in a try/except block and also defaulted it to 20 in both cases to be back compat with the previous code). I also went with a simple if/then blocks to check for coco and coco_panoptic respectively to keep it uber-readable.

3 - @lgvaz or others - do you happen to have a lightweight wrapper class for supporting datasets in coco format ala handling class_id mapping I could use for the colab? I have my own coco class I made, but it needs reworking imo though it handles the class id mapping issue which blew up my initial detr training.
For the colab I'm trying to keep it very light and minimal any new/external requirements, and integrated with detr as cleanly as possible... so I don't want to integrate a larger project like mantisshrimp for it. But from a quick look tonight at the mantisshrimp project I definitely like some of the abstraction work you've done there with the parser and datasets (reminds me of fastai), so if it could be split off as just a custom class wrapper that would be ideal.

from detr.

lessw2020 commented on June 16, 2024

@AlexAndrei98 - I was able to train my custom dataset via fine-tuning in half a day on a v100.

For a 2K dataset though I would try fine-tuning first and I don't envision you would need 2 days though obviously what gpu is going to impact that.
I had good results with the default params supplied (bs = 2, etc) so I would also start with those, but you could do a shorter cycle up front and review before committing to the default 300 epochs.
As an initial test, I trained for 60 epochs, and did the lr drop at 50 as a first test. That was plenty to review the model with test data and get an idea, so you could try that and check to determine how much total training time you might need.

from detr.

m-klasen commented on June 16, 2024

@lessw2020 Hi, did you have any success fine tuning on a different backbone in your experiments? Afaik the resnet used is quite standard except for a short stem. However, despite training quite extensively I never came close to the AP achieved with the provided resnet. Any thoughts on how to address this issue?

from detr.

tazu786 commented on June 16, 2024

Hi, while formatting my custom dataset into coco format, I came across this (datasets/transforms line 253 on):

      if "boxes" in target:
        boxes = target["boxes"]
        boxes = box_xyxy_to_cxcywh(boxes)
        boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32)
        target["boxes"] = boxes

Why should I apply xyxy_to_cxcywh in the normalization of the bbox target if the coco format for the bbox is already xywh (with xy top left corner)?

from detr.

AlexAndrei98-LI commented on June 16, 2024

After training with the following parameters I noticed that the model is not quite succesful at learning has anyone had success into fine tuning for their dataset?

args.num_classes = 11
args.epochs = 3000
args.batch_size = 2
args.lr = 0.05
args.train_only_head = True

from detr.

defqoon commented on June 16, 2024

yes, I was successful using this

If you just want to replace the classification head, you need to erase it before loading the state dict. One approach would be:

from detr.

vickraj commented on June 16, 2024

Hi @alcinos, thanks for the tips. These are all pretty high confidence scores for the duplicate boxes unfortunately, which is why I was thinking NMS would help a lot. I removed the "no-object" ones for visualziation purposes, and there were still quite a few of duplicate boxes.

The eos-coef isn't something I've played around with - will definitely try it out. Thanks for the suggestion.

The class-dependent thresholding is interesting, and certainly worth a try. It's possible that I could find some good thresholds to ensure one (or close to one) box per class for this case.

from detr.

Mihir-Mavalankar commented on June 16, 2024

Hi,
I am trying to use the DETR model to fine tune it for a dataset with only one class. As expected the class_error for the model goes to zero immediately but the loss_bbox and loss_giou error don't seem to be going down.
loss_bbox error remains in the range of 1500-3000 and loss_giou error remains at about 1.5.
I have trained multiple times on a GPU for about 5 epochs but these losses keep randomly fluctuating and don't go down.

I checked the bboxes I am supplying they are correct and in the x_center,y_center, h, w format.
The learning rate is 1-e4 and I am using torch SGD optimizer. 100 queries as default and all other hyper-parameters are as default.

What might I be doing wrong that leads to the loss not going down? Any suggestions would be greatly appreciated!
@alcinos

from detr.

alcinos commented on June 16, 2024

Hi @Mihir-Mavalankar

SGD is not a good idea here, for two reasons: 1) the network was pre-trained with a different optimizer and it's never recommended to switch 2) SGD is known to work poorly with transformer anyways. You should stick to the default (AdamW)

A loss of loss_box of 1500-3000 seems really high to me. Could you check that the targets you provide to the loss are normalized in [0, 1]? Normally the code does it for you here:

detr/datasets/transforms.py

Lines 255 to 257 in 1fcfc65

    
           boxes = box_xyxy_to_cxcywh(boxes) 
        
           boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32) 
        
           target["boxes"] = boxes

but you may be skipping this part for some reason?

from detr.

lessw2020 commented on June 16, 2024

Hi @martinj3456 - thanks for the feedback! I was planning to finish it this weekend but I was asked to do an org wide demo of DETR on Monday so I'm working on that first.
I should be able to have a working setup posted later this week.
It looks like num_args support won't be added to the codebase so I'll just show how to modify the current code as needed for that and use my current Coco class to show how to setup a custom class, and that should enable anyone to start training their custom dataset. (will then work on improving the process and add image visualizations etc).

from detr.

tanulsingh commented on June 16, 2024

Hello everyone , @alcinos , I want to use Detr as nn.module type things and build a custom model with , just like we do with hugging face transformers , we can add different things like now fc on top of BERT,roberta,etc . I want to do something like:

class CustomDETR(nn.Module): def __init__(self): super(CustomDETR,self).__init__() self.model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)

Now here I want to be able to change the queries and everything and when I overrride the forward function I want to be able to use DETR model

The forward function of DETR only gives the labels and not the losses how to get the losses and then backpropagate and fine tune the network ? I really tried , but don't know what to do

from detr.

Dicko87 commented on June 16, 2024

That’s great, is it possible to use this to fine tune the DETR using our custom dataset, rather than training from scratch? I only have about 2K images

from detr.

tanulsingh commented on June 16, 2024

@fmassa after clonning now I am trying to import detr.models it gives an error while it was working perfectly before the latest commit , it says that util not found

ModuleNotFoundError Traceback (most recent call last)
in
21
22 #DETR
---> 23 import detr.models
24 #from detr.models import HungarianMatcher
25 #from detr.models.detr import SetCriterion

/kaggle/working/detr/models/init.py in

  1 # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved

----> 2 from .detr import build
3
4
5 def build_model(args):

/kaggle/working/detr/models/detr.py in

  7 from torch import nn
  8

----> 9 from util import box_ops
10 from util.misc import (NestedTensor, nested_tensor_from_tensor_list,
11 accuracy, get_world_size, interpolate,

ModuleNotFoundError: No module named 'util'

from detr.

fmassa commented on June 16, 2024

@tanulsingh can you please open a new issue with the error you are having, so that we keep this issue only for the original topic of custom DETR training and tips?

from detr.

Dicko87 commented on June 16, 2024

Hi there @lessw2020 , I have been following your lovely notebook and wondering how to execute training now that I have set up my custom dataset. I can see from @mlk1337 that to do this on the command line I can do python main.py --dataset_file your_dataset --coco_path data --epochs 50 --lr=1e-4 --batch_size=2 --num_workers=4 --output_dir="outputs" --resume="detr-r50_no-class-head.pth"
However I am doing this through the notebook and not sure how to go about it.... or/and how do I go about running what was done in the notebook, on the command line so that the command just given will work. Thank you very much 😃
I ran the script but got some missing Keys ... here's my output.

from detr.

Dicko87 commented on June 16, 2024

I manages to fix the above by reading https://gist.github.com/mlk1337/651297e28199b4bb7907fc413c49f58f#gistcomment-3331643
The model now seems to be training away :) looking forward to seeing the results. :)
just started it and it looks like this...

from detr.

tanulsingh commented on June 16, 2024

@fmassa I managed to solve the issue and I was able to write complete pipeline , that can be used to fine-tune train etc , A DETR model on any dataset with some changes , I have tried to make as genric and reproducible I can
https://www.kaggle.com/tanulsingh077/end-to-end-object-detection-with-transformers-detr
I plan to upload it on github too , I would really appreciate your suggestions/views on this , it will be highly motivating for me to keep using DETR @alcinos

from detr.

lessw2020 commented on June 16, 2024

Hi @Dicko87 - great, glad to see you got your training going! I'll integrate the training aspect here shortly into the colab and of course include the strict=False requirement.
The one question I was debating was where to run the training -
a) in the notebook directly (i.e. a live training loop with code exposed) vs
b) running the modified main.py as a shell process, so that's why I hadn't added that yet.
I would like to look at using tensorboard for visuals basically.
I'll just setup with main.py first since that's faster though now that I think about it.

Note if anyone has a good public dataset for sample training that would be great if you can post a link (saw someone had a mask detector for example). The dataset that was posted earlier for stairs has some issues with how the train/test is setup (they had pre-augmented so you can get nearly identical images in train and test..) so hoping to use a different example to train on and demo with in the colab.

from detr.

Dicko87 commented on June 16, 2024

Hi @lessw2020 , thanks very much for the reply :) Brilliant, I'll take another look at your notebook once you've added to it, will keep checking.
Just something I've noticed while training is going on, on every epoch the class_error goes between 0.00, 25.00, 33.00, 50.00
these numbers seem a little ... emmm... high, just wondering if this is normal (I was expecting something like 0.6, 0.3 etc etc) Running for 50 epochs on a training set of 839 images and validation set of 160 images. Currently on epoch 14.

Oooo, one more thing ... was just thinking, what happens if for some reason the training got interrupted and it disconnected after epoch 14 say and I want to resume from where it left off, back from epoch 14 rather than starting from the beginning .... does anyone know how to do this ... if it's even possible !
Thanks :)

from detr.

Dicko87 commented on June 16, 2024

@lessw2020 hiya, where do I find the latest checkin haha, wondering if I’ve been looking in the wrong place. I used was looking at the training ipynb in your repo ... detr

from detr.

lessw2020 commented on June 16, 2024

Hi @Dicko87 - it's actually not in the codebase proper (might be good to put it in a notebooks folder?) but they checked in and added a link on the readme to a new colab that shows how to run predictions and do heatmaps yesterday is what I am referring to. You can find it in the notebooks section on the github homepage for detr or here's the direct link (jumps you into colab):
https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb

from detr.

Dicko87 commented on June 16, 2024

Ah right I see, well thank you very much for the link - much appreciated.
I was thinking .. is there a way to 'Save the Best Epoch' for example say we got an mAP of 0.5 at epoch 10 but at epoch 20 we got mAP 0.2 - would be best to use the model on epoch 10.
Very intrigued to see the heatmaps, will gave a gander :)

Also, has anybody tried it with Resnet101 ? just wondering if the results are much better, might try that sometime this week.

Just though of something else haha... augmentations, does DETR automatically apply data augmentations to my dataset to help it generalize ? All I did was keep my data in the same format as coco so it reads it as coco, not sure if the default network did some nice augmentations on the fly.

from detr.

eslambakr commented on June 16, 2024

Hello All,

I am using DETR on custom data, which contains 2k images for training. I have followed your suggestion to fine-tune to avoid getting zeros, and I succeeded in achieving comparable accuracy.

But when I tried to train from scratch using the default configuration in main.py, I got zeros for the first 100 epochs until now, so should I wait for more epochs? I think it is so weird
So what do u think should I do to be able to get a good accuracy from scratch?

from detr.

lessw2020 commented on June 16, 2024

@Dicko87 - I tested with both 50 and 101 on my dataset. Got a bit better with 101 but the difference wasn't huge. This is likely going to vary a lot based on dataset though so I think you ultimately want to test both.
Note that the dc5 variants will require about 2x more memory and I hit CUDA out of memory on both...so I just tested regular 50 and 101.

from detr.

Dicko87 commented on June 16, 2024

@lessw2020 Cool, thanks for the info :) I am just curious ...
In my training set I have 1678 images and in my validation set I have 320 images. Now during training, besides the number of epochs it says Epoch:[n] [n/830] during training and Epoch:[n] [n/160] during Test. Only half of my data is shown, but I have just realised I have the batch size set to 2 - are all my images being used, I assume so, just wondering why there aren't two batches of 839 and two batches of 160

from detr.

Dicko87 commented on June 16, 2024

Woooohooo @lessw2020 my training has now completed ... after 5 hours, just wondering how I go about running my model in evaluation mode to run some predictions on a test image ?

from detr.

lessw2020 commented on June 16, 2024

Hi @Dicko87 - congrats, that's great to hear!
For predictions, I can post some code but the basic steps are to load the model with your final weights, be 1,000% sure you put it in eval mode, and then you can just setup a detect and show_results function.
The detect needs to take the image, resize,normalize, and totensor and then run it through the model. You can then screen out the confidence levels of the results to just use the confident predictions.
Then it's just a matter of resizing/rescaling the bboxes and plotting them onto the original image to 'see' the results.
Hang on let me post the link to their colab that has some good starting code

from detr.

lessw2020 commented on June 16, 2024

Here's a jump to their colab with their prediction process:
https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb#scrollTo=-CVJRl28-_wS

Note that there's some gotchas if you want to use the gpu for predictions (much faster but you have to move things back and forth) and I'll add that to a prediction colab soon.

from detr.

Dicko87 commented on June 16, 2024

Awww @lessw2020 that will be great, thank you.
I was playing about and trying to do torch.load(‘latest.pth’) and then trying to load state dictionary but got an error saying attribute error: ‘dict ’ object has no attribute ‘load state dict’

from detr.

Dicko87 commented on June 16, 2024

Here's a jump to their colab with their prediction process:
https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb#scrollTo=-CVJRl28-_wS

Note that there's some gotchas if you want to use the gpu for predictions (much faster but you have to move things back and forth) and I'll add that to a prediction colab soon.

Amazing, I’ll take a look! 😃😃😃😃

from detr.

Dicko87 commented on June 16, 2024

Well I managed to load an image and tried to run predictions and the bounding boxes that were predicted arent around any object at all, the class names aren't visible either. Was wondering if its a problem with how my data is being read into the model in the first place, is there a way to check and visualize the input data to amke sure the bounding boxes in the ground truth data are being read in properly? Or I've messed something up with trying to make predictions :s

I even ran a prediction on one of the training images and ... yeah it did rubbish haha, so something isn't right.

from detr.

lessw2020 commented on June 16, 2024

Hi @Dicko87 - might be easiest to start with the training image that didn't work and look at the underlying annotation vs the predictions and start working backwards. Issues could be scaling is off (i.e. if image was resized with padding but not accounted for).
For training data review, you can do a call your dataset via .getitem and grab an image and annotation, and plot those to make sure your training data is looking good.
I'm currently testing out Detectron2 since that support was added yesterday and that has a nice visualization tool there as well, though a lot more overhead.
Also you asked about augmentation - the default detr.py coco.py and also the custom_dataset.py I posted already have standard augmentations of hflip and rescaling etc and that is likely enough to start with.
But if you ran with no augmentation then that could also produce poor results though it sounds like right now there's more of a fundamental issue related to either annotation interpretations/scaling or similar esp if your class names aren't showing.
Might be worth posting your work as a colab if it's not work confidential and then we can try to help resolve (we = me and/or others here).

from detr.

lessw2020 commented on June 16, 2024

hi @eslambakr - glad to hear the fine tuning went well!
For 'scratch' training, it was proposed that 10k images might be the size needed so not sure if 2k will work that well.
That said, it likely can be done.
Transformers take a long time to train so probably need to think about getting to 200 or 300 epochs to make a better determination.
Hope that helps!

from detr.

Dicko87 commented on June 16, 2024

Hi there, I have started again to try and resolve some of the issues I am having with regards to predicted bounding boxes being wayyyyy off. I know something isn't right, before I didn't create a custom data class, I kept everything the same and created the following directories: datasets/coco and in coco I have three folders, annotations, train2017 and val2017. In the annotations folder I have my two json files, for training I have named it instances_train2017.json and for validation I have renamed it instances_val2017. In the train2017 folder I have my training images and in the val2017 folder I have my validation images. I have been trying to find a way to view the data that was going into the model so I can see if the ground truth bounding boxes are in the right location. I know I need to use getitems and have been trying but can't seem to figure it out just yet. I have also tried to go about using the custom_dataset.py file by @lessw2020 this time round but I got the following error:

This was resolved by changing the following line in the init.py file in the datasets folder:

Any ideas on how to insepct my input data would be great, I will keep plugging away at it in the mean time, thank you :)

https://github.com/lessw2020/training-detr/blob/master/custom_dataset.py

from detr.

Dicko87 commented on June 16, 2024

Just an update, I managed to have a peep at the training data .... I think and the ground truth bounding boxes look to be in the correct place.... not sure whats going on :s I do have another side question haha, I keep hearing abobut setting the number of queries ... what is it? and how do you set it? because I've just left it as default and I have 2 classes.

from detr.

Dicko87 commented on June 16, 2024

Hi @lessw2020 is there a way to private message you on here?

from detr.

lessw2020 commented on June 16, 2024

Hi @Dicko87 - would be great if github had a PM system (sorely needed imo) but none that I am aware of.
Are you on the pytorch forums or fastai deep learning forums? Both have a PM setup, and my id is same there on both as here...so that would work - https://discuss.pytorch.org or https://forums.fast.ai

from detr.

Dicko87 commented on June 16, 2024

@lessw2020 cool, I’ll head over to pytorch and pm you there :D ... hehe I’ve found you on pytorch.. just trying to find the pm bit haha .... found it !

from detr.

Dicko87 commented on June 16, 2024

Oh @lessw2020 it says I can’t send you a message!

from detr.

lessw2020 commented on June 16, 2024

@Dicko87 - oh no, that must be some anti-spam issue (i.e. a new account can't send PM's till X days later). Try forums.fast.ai and hopefully it's less travelled so may not have that.

from detr.

Recommendations for training Detr on custom dataset? about detr HOT 205 OPEN

Comments (205)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	boxes = box_xyxy_to_cxcywh(boxes)
	boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32)
	target["boxes"] = boxes