Giter Site home page Giter Site logo

bobld / yolov3mlnet Goto Github PK

View Code? Open in Web Editor NEW
19.0 3.0 7.0 8.28 MB

Use the YOLO v3 (ONNX) model for object detection in C# using ML.Net

License: MIT License

C# 100.00%
yolov3 ml-net machine-learning dotnet ml python computer-vision neural-network onnx onnx-torch

yolov3mlnet's Introduction

Another case study, based on this YOLO v3 model is available here.

See here for YOLO v4 use.

YOLO v3 in ML.Net

Use the YOLO v3 algorithms for object detection in C# using ML.Net. We start with a Torch model, then converting it to ONNX format and use it in ML.Net.

This is a case study on a document layout YOLO trained model. The model can be found in the following Medium article: Object Detection โ€” Document Layout Analysis Using Monk AI.

Main differences

  • The ONNX conversion removes 1 feature which is the objectness score, pc. The original model has (5 + classes) features for each bounding box, the ONNX model has (4 + classes) features per bounding box. We will use the class probability as a proxy for the objectness score when performing the Non-maximum Suppression (NMS) step. This is a known issue, more info here.
  • Image resizing is not optimised, and will always yield 416x416 size image. This is not the case in the original model (see this issue: RECTANGULAR INFERENCE).

Export to ONNX in Python

This is based on this article Object Detection โ€” Document Layout Analysis Using Monk AI.

Load the model

import os
import sys
from IPython.display import Image
sys.path.append("../Monk_Object_Detection/7_yolov3/lib")
from infer_detector import Infer

gtf = Infer()

f = open("dla_yolov3/classes.txt")
class_list = f.readlines()
f.close()

model_name = "yolov3"
weights = "dla_yolov3/dla_yolov3.pt"
gtf.Model(model_name, class_list, weights, use_gpu=False, input_size=(416, 416))

Test the model

img_path = "test_square.jpg"
gtf.Predict(img_path, conf_thres=0.2, iou_thres=0.5)
Image(filename='output/test_square.jpg')

Export the model

You need to set ONNX_EXPORT = True in ...\Monk_Object_Detection\7_yolov3\lib\models.py before loading the model.

We name the input layer image and the 2 ouput layers classes, bboxes. This is not needed but helps the clarity.

import torch
import torchvision.models as models

dummy_input = torch.randn(1, 3, 416, 416) # Create the right input shape (e.g. for an image)
dummy_input = torch.nn.Sigmoid()(dummy_input) # limit between 0 and 1 (superfluous?)
torch.onnx.export(gtf.system_dict["local"]["model"],
                  dummy_input, 
                  "dla_yolov3.onnx",
                  input_names=["image"],
                  output_names=["classes", "bboxes"],
                  opset_version=9)

Check exported model with Netron

The ONNX model can be viewed in Netron. Our model looks like this: neutron

  • The input layer size is [1 x 3 x 416 x 416]. This corresponds to 1 batch size x 3 colors x 416 pixels height x 416 pixel width (more info about fixed batch size here).

As per this article:

For an image of size 416 x 416, YOLO predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10,647 bounding boxes.

  • The bboxes output layer is of size [10,647 x 4]. This corresponds to 10,647 bounding boxes x 4 bounding box coordinates (x, y, h, w).
  • The classes output layer is of size [10,647 x 18]. This corresponds to 10,647 bounding boxes x 18 classes (this model has only 18 classes).

Hence, each bounding box has (4 + classes) = 22 features. The total number of prediction in this model is 22 x 10,647.

NB: The ONNX conversion removes 1 feature which is the objectness score, pc. The original model has (5 + classes) features for each bounding box. We will use the class probability as a proxy for the objectness score.

medium-explanation

More information can be found in this article: YOLO v3 theory explained

Load model in C#

Predict in C#

output

Resources

yolov3mlnet's People

Contributors

bobld avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yolov3mlnet's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.