Giter Site home page Giter Site logo

object-detection's Introduction

Object-Detection

Object detection and localization for custom dataset

The task is to train a model for object detection and localization on a dataset with 2 classes:

  1. Car
  2. Person

The dataset is annotated with coco format.

Documentation

The file structure is:

├── eagleView
│   ├── eagle_data
│       │──images
│          │──train
│          │──validation
│       │──labels
│          │──train
│          │──validation
│
│   ├── yolov5 (cloned repo)
│       ├── contains the download weights
│   
├── Object Detection.ipynb
└── data.yaml

Approach

  1. I decided to use YOLOv5 since it is already trained on the coco dataset. Also it is fast and easy to setup.
  2. I converted the annotations to yolo format. Further details are present in the jupyter notebook here.
  3. I then divided the data into train and validation with a 85% and 15% split respectively.
  4. There are three kinds of yolov5 models - small, medium and large. I used the pretrained weights provided here and used the dataset to finetune all the three kinds of yolov5 models. I have trained all the models for a maximum of 10 epochs.
  5. Since my gpu was not working with the required pytorch version, I used my cpu to train the small and medium model which took about 2 hrs and 4hrs respectively (https://github.com/bansalraghav/Object-Detection/blob/main/Object%20Detection.ipynb).
  6. For the large model I used google colab to train the model on gpu which took about ~1.5 hrs. (https://github.com/bansalraghav/Object-Detection/blob/main/YOLOv5_large.ipynb)
  7. For training the yolo model, we need to define 4 details in a yaml file. These are:
  • train images path
  • validation images path
  • nc (number of classes)
  • names (class names)
  1. There are 2 yaml files which I have provided. The data.yaml contains the configurations which I used to train the small and medium models locally whereas the colab_data.yaml contains the config I used to train the large model in google colab.

Assumptions

  1. The annotations provided are accurate.
  2. The classes are balanced so that the model can learn and train itself on both the classes equally.

Metrics

  1. While training the yolov5 network, we can link our runtime to wandb.ai and it keeps a track of all the epochs. It gives us various metrics such as precision, recall, mAP and losses for both training and validation.
  2. I have attached imaged of all the metrics in the Results folder.
  3. In the confusion matrix we can see that for the medium and large model, the false positive rate is less for the "car" category as compared to the small model. For the "person" category the false positive rate is high for all three models.

Other artifacts

  1. The comparison between the three yolo networks shows that the large network performs the best. But the drawback is that it takes the most time to train.

Conclusion

  1. YOLOv5 is a SOTA model which is easy to setup and quick to train. The only drawback would be to convert our custom annotations to yolo format but that also can be done with relative ease. Using the dataset which was provided to me, I was able to generate a mAP of 0.7 for the large and medium model which is good considering the fact that the model was only trained for 10 epochs with a batch size of 8.

Recommendations

  1. If we are to train the large model for more epochs and a large batch size, we may get slightly better results.
  2. Hyperparameter tuning can also lead to better results.

object-detection's People

Contributors

bansalraghav avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.