woctezuma / finetune-detr Goto Github PK

View Code? Open in Web Editor NEW

135.0 4.0 23.0 81.38 MB

Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.

License: MIT License

Jupyter Notebook 100.00%

facebook detr finetune finetuning finetunes instance-segmentation instance instances segementation segment

finetune-detr's Introduction

Finetune DETR

The goal of this Google Colab notebook is to fine-tune Facebook's DETR (DEtection TRansformer).

From left to right: results obtained with pre-trained DETR, and after fine-tuning on the balloon dataset.

Usage

Acquire a dataset, e.g. the the balloon dataset,
Convert the dataset to the COCO format,
Run finetune_detr.ipynb to fine-tune DETR on this dataset.
Alternatively, run finetune_detectron2.ipynb to rely on the detectron2 wrapper.

NB: Fine-tuning is recommended if your dataset has less than 10k images. Otherwise, training from scratch would be an option.

Data

DETR will be fine-tuned on a tiny dataset: the balloon dataset. We refer to it as the custom dataset.

There are 61 images in the training set, and 13 images in the validation set.

We expect the directory structure to be the following:

path/to/coco/
├ annotations/  # JSON annotations
│  ├ annotations/custom_train.json
│  └ annotations/custom_val.json
├ train2017/    # training images
└ val2017/      # validation images

NB: if you are confused about the number of classes, check this Github issue.

Metrics

Typical metrics to monitor, partially shown in this notebook, include:

the Average Precision (AP), which is the primary challenge metric for the COCO dataset,
losses (total loss, classification loss, l1 bbox distance loss, GIoU loss),
errors (cardinality error, class error).

As mentioned in the paper, there are 3 components to the matching cost and to the total loss:

classification loss,

def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
    """Classification loss (NLL)
    targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
    """
    [...]
    loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
    losses = {'loss_ce': loss_ce}

l1 bounding box distance loss,

def loss_boxes(self, outputs, targets, indices, num_boxes):
    """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
       targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
       The target boxes are expected in format (center_x, center_y, w, h),normalized by the image
       size.
    """
    [...]
    loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
    losses['loss_bbox'] = loss_bbox.sum() / num_boxes

Generalized Intersection over Union (GIoU) loss, which is scale-invariant.

    loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
        box_ops.box_cxcywh_to_xyxy(src_boxes),
        box_ops.box_cxcywh_to_xyxy(target_boxes)))
    losses['loss_giou'] = loss_giou.sum() / num_boxes

Moreover, there are two errors:

cardinality error,

def loss_cardinality(self, outputs, targets, indices, num_boxes):
    """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty
    boxes. This is not really a loss, it is intended for logging purposes only. It doesn't
    propagate gradients
    """
    [...]
    # Count the number of predictions that are NOT "no-object" (which is the last class)
    card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
    card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
    losses = {'cardinality_error': card_err}

class error,

    # TODO this should probably be a separate loss, not hacked in this one here
    losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]

where accuracy is:

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""

Results

You should obtain acceptable results with 10 epochs, which require a few minutes of fine-tuning.

Out of curiosity, I have over-finetuned the model for 300 epochs (close to 3 hours). Here are:

the last checkpoint (~ 500 MB),
the log file.

All of the validation results are shown in view_balloon_validation.ipynb.

References

Official repositories:
- Facebook's DETR (and the paper)
- Facebook's detectron2 wrapper for DETR ; caveat: this wrapper only supports box detection
- DETR checkpoints: remove the classification head, then fine-tune
My forks:
- My fork of DETR to fine-tune on a dataset with a single class
- My fork of VIA2COCO to convert annotations from VIA format to COCO format
Official notebooks:
- An official notebook showcasing DETR
- An official notebook showcasing the COCO API
- An official notebook showcasing the detectron2 wrapper for DETR
Tutorials:
- A Github issue discussing the fine-tuning of DETR
- A Github Gist explaining how to fine-tune DETR
- A Github issue explaining how to load a fine-tuned DETR
Datasets:
- A blog post about another approach (Mask R-CNN) and the balloon dataset
- A notebook about the nucleus dataset

finetune-detr's People

Contributors

Stargazers

Watchers

finetune-detr's Issues

Quadrilateral Boxes?

How we can fine-tune a dataset that contains quadrilateral (8-points) bounding boxes?

First I tried with batch size 2. Somewhere at the end of 1st epoch the process got killed due to memory usage.
Then I changed to batch size 1 and epoch 1. I was able to finish the training. But I observed that memory usage kept on increasing during the process. Around 25GB RAM was being used at the end.

Is it normal behavior of DETR or is this is an issue ?

Updated parts of the network during fine-tuning?

Hey there!

Thank you so much for sharing quality tutorials and codes.

I'd like to know the exact parts that are being updated during the fine-tuning stage.
Is the classification head (class_embed) the only one that are updated and the backbone and the rest of the network are not?

I wonder if the revised codes are all stated in your gist (https://gist.github.com/woctezuma/e9f8f9fe1737987351582e9441c46b5d) or are there other parts that you have fixed for fine-tuning?

If that's the case, I wonder how you froze the whole network except for the last class_embed fc layer.

Thanks!

Update num_queries through error.

I had made an attempt to change the num_queries to 500 as images have approximately 450 objects.
to which i received following error.

Traceback (most recent call last):
File "main.py", line 248, in
main(args)
File "main.py", line 178, in main
model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
File "/home/rsharma/git/detr/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DETR:
size mismatch for query_embed.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([500, 256]).

I had followed all the steps and able to get decent results.

but as soon as I change the num_queries, I am lost.
Any help is appreciated.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Trying to finetune the DeepDrive Berkeley dataset

It consist in a 8 (9 with non-object class) classes dataset. (https://bdd-data.berkeley.edu/)
First I transformed it to the COCO format and then followed the collab notebook.
I finetuned it for 10 epochs with standard learning rates and I get no convergence at all.
Would that be an implementation error of my part or do you think it is normal, have you tuned other datasets with more than 2 classes?
Any suggestions? I find it really bizarre if it doesn't converge since most of the classes are overlapping with the original model.

ImportError

Hi I am getting the following import error:

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    import datasets
  File "/content/detr/datasets/__init__.py", line 5, in <module>
    from .coco import build as build_coco
  File "/content/detr/datasets/coco.py", line 14, in <module>
    import datasets.transforms as T
  File "/content/detr/datasets/transforms.py", line 13, in <module>
    from util.misc import interpolate
  File "/content/detr/util/misc.py", line 22, in <module>
    from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.7/dist-packages/torchvision/ops/__init__.py)

I also structured the data folder as you suggest:
path_detr = '/content/drive/MyDrive/detr_final'
detr_final
|-annotations #jsons
│ ├ annotations/train.json
│ └ annotations/val.json
├ train_img/ # training images
└ val_img/ #validation images

And here is my call to main.py:

!python main.py \
  --dataset_file "detr_final" \
  --coco_path "/content/drive/MyDrive/detr_final" \
  --output_dir "outputs" \
  --resume "detr-r50_no-class-head.pth" \
  --num_classes $num_classes \
  --epochs 10

The only code of your notebook that I skipped is what follows, but I already had coco format annotations so I thought I could.

import convert as via2coco

data_path = '/content/VIA2COCO/'

for keyword in ['train', 'val']:

  input_dir = data_path + 'balloon/' + keyword + '/'
  input_json = input_dir + 'via_region_data.json'
  categories = ['balloon']
  super_categories = ['N/A']
  output_json = input_dir + 'custom_' + keyword + '.json'

  print('Converting {} from VIA format to COCO format'.format(input_json))

  coco_dict = via2coco.convert(
      imgdir=input_dir,
      annpath=input_json,
      categories=categories,
      super_categories=super_categories,
      output_file_name=output_json,
      first_class_index=first_class_index,
  )

Any ideas? Thanks a lot

Fine-tuning vs Training from Scratch

Hi, I'm sorry, I am confused by this statement:

Fine-tuning is recommended if your dataset has less than 10k images. Otherwise, training from scratch would be an option.

I know you're just repeating the advice of the DETR team here, but I hope you can help me clarify my understanding.

Wouldn't the model always benefit from transfer learning conceptually? That is, wouldn't first fine-tuning on COCO then fine-tuning on a custom dataset always be better than fine-tuning on the custom dataset directly from scratch, regardless of the size of the custom dataset?

I thought that was the whole point of transfer learning... Or is it because we are using a backbone pretrained on ImageNet (which is huge), so fine-tuning on COCO, on another custom dataset, or on both COCO and custom, does not make a big difference?

Many thanks in advance for your thoughts

convert data into COCO format

Hey, I have a set of images and I want them to be on coco dataset format. Can you tell me how to do that? how to have a proper annotation file like in coco.

specific changes in orginal detr to fine-tune

Hello @woctezuma
Thanks a lot for your valuable contributions. I am working on deformable detr. But there isn't much help with fine-tuning part. I came across your fine-tuning detr notebook, and it's amazing.

I wanted to ask about, what changes you made in your Detr fork, to accommodate fine-tuning? If I delete the class_embed.weight and class_embed.bias layers in deformable detr, it gives size mismatch error.

I was hoping, if you could brief, what changes did you make in original detr model, so that I can use them for deformable detr.

It will be a huge help. Looking forward to your response!