Giter Site home page Giter Site logo

woctezuma / finetune-detr Goto Github PK

View Code? Open in Web Editor NEW
135.0 4.0 23.0 81.38 MB

Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.

License: MIT License

Jupyter Notebook 100.00%
facebook detr finetune finetuning finetunes instance-segmentation instance instances segementation segment

finetune-detr's Introduction

Finetune DETR

The goal of this Google Colab notebook is to fine-tune Facebook's DETR (DEtection TRansformer).

With pre-trained DETR -> With finetuned DETR

From left to right: results obtained with pre-trained DETR, and after fine-tuning on the balloon dataset.

Usage

  • Acquire a dataset, e.g. the the balloon dataset,
  • Convert the dataset to the COCO format,
  • Run finetune_detr.ipynb to fine-tune DETR on this dataset. Open In Colab
  • Alternatively, run finetune_detectron2.ipynb to rely on the detectron2 wrapper. Open In Colab

NB: Fine-tuning is recommended if your dataset has less than 10k images. Otherwise, training from scratch would be an option.

Data

DETR will be fine-tuned on a tiny dataset: the balloon dataset. We refer to it as the custom dataset.

There are 61 images in the training set, and 13 images in the validation set.

We expect the directory structure to be the following:

path/to/coco/
├ annotations/  # JSON annotations
│  ├ annotations/custom_train.json
│  └ annotations/custom_val.json
├ train2017/    # training images
└ val2017/      # validation images

NB: if you are confused about the number of classes, check this Github issue.

Metrics

Typical metrics to monitor, partially shown in this notebook, include:

  • the Average Precision (AP), which is the primary challenge metric for the COCO dataset,
  • losses (total loss, classification loss, l1 bbox distance loss, GIoU loss),
  • errors (cardinality error, class error).

As mentioned in the paper, there are 3 components to the matching cost and to the total loss:

  • classification loss,
def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
    """Classification loss (NLL)
    targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
    """
    [...]
    loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
    losses = {'loss_ce': loss_ce}
  • l1 bounding box distance loss,
def loss_boxes(self, outputs, targets, indices, num_boxes):
    """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
       targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
       The target boxes are expected in format (center_x, center_y, w, h),normalized by the image
       size.
    """
    [...]
    loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
    losses['loss_bbox'] = loss_bbox.sum() / num_boxes
    loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
        box_ops.box_cxcywh_to_xyxy(src_boxes),
        box_ops.box_cxcywh_to_xyxy(target_boxes)))
    losses['loss_giou'] = loss_giou.sum() / num_boxes

Moreover, there are two errors:

  • cardinality error,
def loss_cardinality(self, outputs, targets, indices, num_boxes):
    """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty
    boxes. This is not really a loss, it is intended for logging purposes only. It doesn't
    propagate gradients
    """
    [...]
    # Count the number of predictions that are NOT "no-object" (which is the last class)
    card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
    card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
    losses = {'cardinality_error': card_err}
    # TODO this should probably be a separate loss, not hacked in this one here
    losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]

where accuracy is:

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""

Results

You should obtain acceptable results with 10 epochs, which require a few minutes of fine-tuning.

Out of curiosity, I have over-finetuned the model for 300 epochs (close to 3 hours). Here are:

All of the validation results are shown in view_balloon_validation.ipynb. Open In Colab

References

finetune-detr's People

Contributors

pyup-bot avatar woctezuma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

finetune-detr's Issues

Quadrilateral Boxes?

How we can fine-tune a dataset that contains quadrilateral (8-points) bounding boxes?

Memory usage: Exit code 137

First I tried with batch size 2. Somewhere at the end of 1st epoch the process got killed due to memory usage.
Then I changed to batch size 1 and epoch 1. I was able to finish the training. But I observed that memory usage kept on increasing during the process. Around 25GB RAM was being used at the end.

Is it normal behavior of DETR or is this is an issue ?

Updated parts of the network during fine-tuning?

Hey there!

Thank you so much for sharing quality tutorials and codes.

I'd like to know the exact parts that are being updated during the fine-tuning stage.
Is the classification head (class_embed) the only one that are updated and the backbone and the rest of the network are not?

I wonder if the revised codes are all stated in your gist (https://gist.github.com/woctezuma/e9f8f9fe1737987351582e9441c46b5d) or are there other parts that you have fixed for fine-tuning?

If that's the case, I wonder how you froze the whole network except for the last class_embed fc layer.

Thanks!

Update num_queries through error.

I had made an attempt to change the num_queries to 500 as images have approximately 450 objects.
to which i received following error.

Traceback (most recent call last):
File "main.py", line 248, in
main(args)
File "main.py", line 178, in main
model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
File "/home/rsharma/git/detr/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DETR:
size mismatch for query_embed.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([500, 256]).

I had followed all the steps and able to get decent results.

but as soon as I change the num_queries, I am lost.
Any help is appreciated.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Trying to finetune the DeepDrive Berkeley dataset

It consist in a 8 (9 with non-object class) classes dataset. (https://bdd-data.berkeley.edu/)
First I transformed it to the COCO format and then followed the collab notebook.
I finetuned it for 10 epochs with standard learning rates and I get no convergence at all.
Would that be an implementation error of my part or do you think it is normal, have you tuned other datasets with more than 2 classes?
Any suggestions? I find it really bizarre if it doesn't converge since most of the classes are overlapping with the original model.

ImportError

Hi I am getting the following import error:

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    import datasets
  File "/content/detr/datasets/__init__.py", line 5, in <module>
    from .coco import build as build_coco
  File "/content/detr/datasets/coco.py", line 14, in <module>
    import datasets.transforms as T
  File "/content/detr/datasets/transforms.py", line 13, in <module>
    from util.misc import interpolate
  File "/content/detr/util/misc.py", line 22, in <module>
    from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.7/dist-packages/torchvision/ops/__init__.py)

I also structured the data folder as you suggest:
path_detr = '/content/drive/MyDrive/detr_final'
detr_final
|-annotations #jsons
│ ├ annotations/train.json
│ └ annotations/val.json
├ train_img/ # training images
└ val_img/ #validation images

And here is my call to main.py:

!python main.py \
  --dataset_file "detr_final" \
  --coco_path "/content/drive/MyDrive/detr_final" \
  --output_dir "outputs" \
  --resume "detr-r50_no-class-head.pth" \
  --num_classes $num_classes \
  --epochs 10

The only code of your notebook that I skipped is what follows, but I already had coco format annotations so I thought I could.

import convert as via2coco

data_path = '/content/VIA2COCO/'

for keyword in ['train', 'val']:

  input_dir = data_path + 'balloon/' + keyword + '/'
  input_json = input_dir + 'via_region_data.json'
  categories = ['balloon']
  super_categories = ['N/A']
  output_json = input_dir + 'custom_' + keyword + '.json'

  print('Converting {} from VIA format to COCO format'.format(input_json))

  coco_dict = via2coco.convert(
      imgdir=input_dir,
      annpath=input_json,
      categories=categories,
      super_categories=super_categories,
      output_file_name=output_json,
      first_class_index=first_class_index,
  )

Any ideas? Thanks a lot

Fine-tuning vs Training from Scratch

Hi, I'm sorry, I am confused by this statement:

Fine-tuning is recommended if your dataset has less than 10k images. Otherwise, training from scratch would be an option.

I know you're just repeating the advice of the DETR team here, but I hope you can help me clarify my understanding.

Wouldn't the model always benefit from transfer learning conceptually? That is, wouldn't first fine-tuning on COCO then fine-tuning on a custom dataset always be better than fine-tuning on the custom dataset directly from scratch, regardless of the size of the custom dataset?

I thought that was the whole point of transfer learning... Or is it because we are using a backbone pretrained on ImageNet (which is huge), so fine-tuning on COCO, on another custom dataset, or on both COCO and custom, does not make a big difference?

Many thanks in advance for your thoughts

convert data into COCO format

Hey, I have a set of images and I want them to be on coco dataset format. Can you tell me how to do that? how to have a proper annotation file like in coco.

specific changes in orginal detr to fine-tune

Hello @woctezuma
Thanks a lot for your valuable contributions. I am working on deformable detr. But there isn't much help with fine-tuning part. I came across your fine-tuning detr notebook, and it's amazing.

I wanted to ask about, what changes you made in your Detr fork, to accommodate fine-tuning? If I delete the class_embed.weight and class_embed.bias layers in deformable detr, it gives size mismatch error.

I was hoping, if you could brief, what changes did you make in original detr model, so that I can use them for deformable detr.

It will be a huge help. Looking forward to your response!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.