Giter Site home page Giter Site logo

object-detection-yolov8's Introduction

Object-detection-yolov8

Brief:

  • Object detection refers to locating the instances of objects in an image. A bounding box is drawn around the identified instances to signify the detection.
  • YOLO- (You Only Look Once) is an object detection algorithm. It treats the problem as a regression problem than a classification problem. It has 4 steps:
    • Dividing the input image into NxN grid cells
    • Bounding box regression- to determine the cell with the object in it
    • IOU- Intersection over union to determine the relevant bounding box
    • Non Maximum Suppression- To eliminate poorly drawn boxes

Tools Used:

  • Libraries: numpy, argparse, Ultralytics, OpenCV, PIL, python-xml

Dataset:

  • The dataset contains 416 images (.jpg) and 416 annotations (.xml) used for ppedetection task. There are 10 classes in this dataset.

Approach:

  1. The given dataset had images and their respective annotations. The annotations were in PascalVOC (Visual Object Classes) format. It has the data of the objects and metadata of the respective image. It is organized hierarchically and saved as an .xml file.
  2. To make inference with YOLO, the annotations must be in .txt and the syntax of the data should be as per yolo’s format: <class_id, x_c, y_c, width, height>, where width and height are the width and height of the bounding box, (x_c, y_c) are the coordinates of the center of the bounding box for the object with class class_id. data
  3. To read the .xml file, python’s in-built xml package was used to parse all the data. For each object in the image, xml stores the coordinates of the bounding box (top left coordinates (x_min, y_min) and bottom right coordinates (x_max, y_max). The names of the objects were mapped with class_ids. These were then converted into yolo’s intended format. boxes
  4. The first step is person-detection. A copy of annotation files were created such that they only have the data about the “person” class. Other class details were excluded.
  5. These images and new annotations were split into training and validation sets.
  6. Yolov8-nano is trained using this data with image size: 640x640 and batch-size:
  7. The hyperparameters were automatically chosen by the model.
  8. After the training, the weights were used to create a new dataset for the second task- ppe-detection. The trained model detected “person” from each image and saved these images separately. Now, new annotations were created for each of these images. These annotations contain the new coordinates with respect to the cropped images. These annotations had the remaining 9 classes. 1 explaination
  9. The new data has 1227 images (1227 persons from 416 images) and 1227 annotations with utmost 9 classes in each.
  10. These annotations were again converted into yolov8 format and now the dataset is split into train and test. This is now trained for to detecting ppe- objects. Both yolov8-nano and yolov8-medium sizes were considered for training process.

Inference:

  • pascalVOC_to_yolo.py takes input directory (annotations) with xml files and an output directory. It converts all the files in the input directory into yolo v8 type and saves them in the given output location.
  • inference.py takes input directory (images), output directory, person-detectionmodel’s weights’ path and ppe-detection-model’s weight’s path. This python script detects the person instance in each image, crops them and keep them and stored aside. Then bounding boxes are drawn on the original image and this image is saved. Now the cropped images which were stored aside are taken, and passed to the ppe-detection model to detect the ppe-objects for each of the cropped image (on each person). Now bounding boxes are drawn using OpenCV’s text and rectangle functions and this image is saved.

Testing:

output

Results:

Note: person-detection task was also performed with yolov8 medium and it showed almost similar results compared to nano model, except it was achieved in fewer epochs


Model

Person-det (nano)

Ppe-det (nano)

Ppe-det (medium)

Precision

0.949

0.796

0.854

Recall

0.91

0.514

0.528

mAP50

0.978

0.646

0.604

mAP50-95

0.76

0.448

0.435

Inference time (ms)

1.9

1.8

8.4

Weights size (mb)

5.95

5.92

49.5

Try out:

  pascalVOC_to_yolo.py --ip_dir INPUT_DIR_PATH --op_dir OUTPUT_DIR_PATH

where,

  • INPUT_DIR_PATH: folder which has all the .xml files
  • OUTPUT_DIR_PATH: where you would want to store the annotated .txt files

note:

  • if OUTPUT_DIR_PATH doesn't exist, it will be created
  inference.py --ip_dir IP_PATH op_dir OP_PATH --person_det_model WEIGHTS_PERSON --ppe_det_model WEIGHTS_PPE

where,

  • IP_PATH: folder which has the images
  • OP_PATH: where you would wan to save the detected object images
  • WEIGHTS_PERSON: name of weight file of the person-detection model
  • WEIGHTS_PPE: name of weight file of the ppe-detection model

note:

  • if OP_PATH doesn't exist, it will be created
  • WEIGHTS_PERSON and WEIGHTS_PPE are not just paths, but path+name.

object-detection-yolov8's People

Contributors

sanjay-906 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.