Giter Site home page Giter Site logo

csvance / keras-mobile-detectnet Goto Github PK

View Code? Open in Web Editor NEW
99.0 10.0 25.0 24.68 MB

Fast Object Detector for the Jetson Nano

License: MIT License

Python 61.31% Jupyter Notebook 38.69%
keras object-detection machine-learning deep-learning tensorrt jetson-nano

keras-mobile-detectnet's Introduction

Keras MobileDetectNet

Example

MobileDetectNet is an object detector which uses MobileNet CNN to predict bounding boxes. It was designed to be computationally efficient for deployment on embedded systems and easy to train with limited data. It was inspired by the simple yet effective design of DetectNet and enhanced with the anchor system from Faster R-CNN.

Network Arcitecture

Example

Training

python train.py --help

Label Format

MobileDetectNet uses the KITTI label format and directory structure. See here for more details

Preprocessing

Images are scaled between -1 and 1 to take advantage of transfer learning from pretrained MobileNet.

Anchors

MobileNet outputs a 7x7x256 from its last layer with a 224x224x3 input. In each of the 7x7 squares we place 9 anchors with combinations of the following settings:

  • Scale 1, 2, and 3
  • Aspect Ratio 1, 4/3, and 3/4

We set the anchor to 1 if a rectangle has > 0.3 IoU with the anchor. The bounding box generated is given to the box with the highest IoU over 0.3.

Due to the smaller network receptive size and low spacial dimension output of MobileNet, anchors partially outside the image can be used.

Augmentation

Training is done with imgaug utilizing Keras Sequences for multicore preprocessing and online data augmentation:

return iaa.Sequential([
    iaa.Fliplr(0.5),
    iaa.CropAndPad(px=(0, 112), sample_independently=False),
    iaa.Affine(translate_percent={"x": (-0.4, 0.4), "y": (-0.4, 0.4)}),
    iaa.SomeOf((0, 3), [
        iaa.AddToHueAndSaturation((-10, 10)),
        iaa.Affine(scale={"x": (0.9, 1.1), "y": (0.9, 1.1)}),
        iaa.GaussianBlur(sigma=(0, 1.0)),
        iaa.AdditiveGaussianNoise(scale=0.05 * 255)
    ])
])

Data augmentation is also used for validation for the purpose of making sure smaller objects are detected.

return iaa.Sequential([
    iaa.CropAndPad(px=(0, 112), sample_independently=False),
    iaa.Affine(translate_percent={"x": (-0.4, 0.4), "y": (-0.4, 0.4)}),
])

If a dataset contains many smaller bounding boxes or detecting smaller objects is not a concern, this should be adjusted for both train and validation augmentation.

Loss

Standard loss functions are used for everything other than the bounding box regression, which uses 10*class_true_(ij)*|y_pred_(ij) - y_true_(ij)| in order to not penalize the network for bounding box predictions without an object present and to normalize the loss against class loss. Class loss is binary crossentropy and region loss is mean absolute error.

Optimization

Nadam is the recomended optimizer. A base lr of 0.001 is used, and ReduceLROnPlateau callback reduces it during training. Generally the model should converge to an optimal solution within 50 epochs, depending on the amount of training data used.

Inference

python inference.py --help

TensorRT

A TF-TRT helper function has been intergrated into the model which allows for easy inference acceleration on the nVidia Jetson platform. In model.py MobileDetectNet.tftrt_engine() will create a TensorRT accelerated Tensorflow graph. An example of how to use it is included in inference.py.

Performance

Using an FP16 TF-TRT graph the model runs at ~55 FPS on the Jetson Nano in mode 1 (5W). The performance doesn't seem to be effected running it in mode 0 (10W).

keras-mobile-detectnet's People

Contributors

csvance avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-mobile-detectnet's Issues

multi-classes object detection problem

Thanks you for shearing the codes. it is useful for single-object detection problem. Now, I am trying to modify it to distinguish the category of the objects. Could you give me some advice ? Thanks :)

Train with rectangle image

How can I train with the image shape=(128, 512,3)?
After changing some configuration of the model, I got this error:
image

Bounding boxes display incorrectly

Hi, thanks for this project. I'm running the inference script with
python inference.py --inference-type FP16 --test-path (path to folder containing images)
but the bounding boxes displayed do not correlate to the image itself. For examples, here are the outputs for three sample images:

image

image

image

The first one is particularly worrying since only one object should be detected, but numerous boxes are still being displayed.

Thanks

Epoch length incorrect?

Hi, is it possible that
steps_per_epoch=np.ceil(len(train_seq) / batch_size), validation_steps=np.ceil(len(val_seq) / batch_size),

should be
steps_per_epoch=np.ceil(len(train_seq.images) / batch_size), validation_steps=np.ceil(len(val_seq.images) / batch_size),

in train.py?

Otherwise the model doesn't seem to train for a full epoch. Also (and maybe this should be a separate thread), the model doesn't seem to train with use_multiprocessing enabled?

Thanks for your work btw!

Training Error. AttributeError: 'NoneType' object has no attribute 'shape'

I'm trying to train and I used this command:

python3 train.py --batch-size 24 --epochs 500 --train-path ~/my/folder/train --eval-path ~/my/folder/val --workers 4

but I get the following error:

File "/home/angelo/keras-mobile-detectnet/generator.py", line 99, in __getitem__
old_shape = image.shape
AttributeError: 'NoneType' object has no attribute 'shape'

How do I get around this?

Using a live camera feed as input?

Just for a little of background, I trained a custom object detector model using your train.py code. After that I tested it on inference.py, I filled out the necessary terminal flags to make sure that my model is used.

After that, I tried to edit your inference.py, so that instead of going through the photos of a folder, I use the input of the RaspberryPi Camera that I have. I know that the Raspberry Pi Camera works cuz I can open it via OpenCV. I did a lot of code alteration, but basically what I put in is a flag for camera, that when set to True, uses the camera feed instead.

Here is the code that I have:

` import numpy as np
import time
import plac
import os
import cv2

import gi

from model import MobileDetectNetModel

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

gi.require_version('Gst', '1.0')

@plac.annotations(
    inference_type=("Type of inference to test (TF, FP32, FP16, INT8)", 'option', 'T', str),
    batch_size=("Size of the TensorRT batch", 'option', 'B', int),
    weights=("Model weights", 'option', 'W', str),
    test_path=("Test images path", 'option', 'I', str),
    merge=("Test images only: Merge detected regions", 'flag', 'm', bool),
    stage=("Test images only: Augmentation training stage", 'option', 's', str),
    limit=("Test images only: Max number of images to run inference on", 'option', 'l', int),
    confidence=("Test images only: Minimum confidence in coverage to draw bbox", "option", "c", float),
    visualize=("Visualize the inference", "option", "V", bool),
    camera=("Use camera feed. Ignores test_path. Boolean.", "option", "C", bool)
)



# Set inference_type to FP16 to use TensorRT
def main(inference_type: str = "FP16",
         batch_size: int = 1,
         test_path: str = None,
         weights: str = None,
         merge: bool = False,
         stage: str = "test",
         limit: int = 20,
         confidence: float = 0.1,
         visualize: bool = True,
         camera: bool = False):

    keras_model = MobileDetectNetModel.complete_model()

    if weights is not None:
        keras_model.load_weights(weights, by_name=True)

    images_done = 0

    if test_path is not None:
        # import cv2

        if stage != 'test':
            from generator import MobileDetectNetSequence
            seq = MobileDetectNetSequence.create_augmenter(stage)
        else:
            seq = None

        images_full = []
        images_input = []
        images_scale = []

        for r, d, f in os.walk(test_path):
            for file in f:
                image_full = cv2.imread(os.path.join(r, file))
                image_input = cv2.resize(image_full, (224, 224))

                scale_width = image_full.shape[1] / 224
                scale_height = image_full.shape[0] / 224
                images_scale.append((scale_width, scale_height))

                if stage != 'test':
                    seq_det = seq.to_deterministic()
                    image_aug = (seq_det.augment_image(image_input).astype(np.float32) / 127.5) - 1.
                else:
                    image_aug = image_input.astype(np.float32) / 127.5 - 1.

                images_full.append(image_full)
                images_input.append(image_aug)

                images_done += 1

                if images_done == limit:
                    break

            if images_done == limit:
                break

        x_test = np.array(images_input)
    else:
        #x_test = np.random.random((limit, 224, 224, 3))
        x_test = np.random.random((224, 224, 3))
    
        
    # x_test = np.random.random((224, 224, 3))

    x_cold = np.random.random((batch_size, 224, 224, 3))

    print(f'Inference Type is {inference_type}')

    if inference_type == 'K':
        keras_model.predict(x_cold)
        t0 = time.time()
        model_outputs = keras_model.predict(x_test)
        t1 = time.time()
    elif inference_type == 'TF':
        tf_engine = keras_model.tf_engine()
        tf_engine.infer(x_cold)
        t0 = time.time()
        model_outputs = tf_engine.infer(x_test)
        t1 = time.time()
    elif inference_type == 'FP32':
        tftrt_engine = keras_model.tftrt_engine(precision='FP32', batch_size=batch_size)
        tftrt_engine.infer(x_cold)
        t0 = time.time()
        model_outputs = tftrt_engine.infer(x_test)
        t1 = time.time()
    	
    # WE ARE USING THIS INFERENCE TYPE, TFTRT
    elif inference_type == 'FP16':
        tftrt_engine = keras_model.tftrt_engine(precision='FP16', batch_size=batch_size)
        tftrt_engine.infer(x_cold)
        #t0 = time.time()
        #model_outputs = tftrt_engine.infer(x_test)
        #t1 = time.time()
        
    elif inference_type == 'INT8':
        tftrt_engine = keras_model.tftrt_engine(precision='INT8', batch_size=batch_size)
        tftrt_engine.infer(x_cold)
        t0 = time.time()
        model_outputs = tftrt_engine.infer(x_test)
        t1 = time.time()
    else:
        raise ValueError("Invalid inference type")

    #print('Time: ', t1 - t0)
    #print('FPS: ', x_test.shape[0]/(t1 - t0))

    if not visualize:
        return

#    if len(model_outputs) == 2:
#        classes, bboxes = model_outputs

    # TF / TensorRT models won't output regions (not useful for production)
    #elif len(model_outputs) == 3:
    #    regions, bboxes, classes = model_outputs
    #else:
    #    raise ValueError("Invalid model length output")


    if test_path is not None and camera is False:
        import matplotlib.pyplot as plt
        from matplotlib.colors import LinearSegmentedColormap

        # get colormap
        ncolors = 256
        color_array = plt.get_cmap('viridis')(range(ncolors))

        # change alpha values
        color_array[:, -1] = np.linspace(0.0, 1.0, ncolors)

        # create a colormap object
        map_object = LinearSegmentedColormap.from_list(name='viridis_alpha', colors=color_array)

        # register this new colormap with matplotlib
        plt.register_cmap(cmap=map_object)

        for idx in range(0, len(images_full)):

            rectangles = []

            # Does this only get the first 7 items? 
            for y in range(0, 7):
                for x in range(0, 7):

                    if classes[idx, y, x, 0] >= confidence:
                        rect = [
                            int(bboxes[idx, int(y), int(x), 0] * 224),
                            int(bboxes[idx, int(y), int(x), 1] * 224),
                            int(bboxes[idx, int(y), int(x), 2] * 224),
                            int(bboxes[idx, int(y), int(x), 3] * 224)]
                        rectangles.append(rect)

            if merge:
                rectangles, merges = cv2.groupRectangles(rectangles, 1, eps=0.75)

            scale_width, scale_height = images_scale[idx]

            for rect in rectangles:
                cv2.rectangle(images_full[idx],
                              (int(rect[0]*scale_width), int(rect[1]*scale_height)),
                              (int(rect[2]*scale_width), int(rect[3]*scale_height)),
                              (0, 255, 0), 5)

            plt.imshow(cv2.cvtColor(images_full[idx], cv2.COLOR_BGR2RGB), alpha=1.0, aspect='auto')
            plt.imshow(
                cv2.resize(classes[idx].reshape((7, 7)),
                           (images_full[idx].shape[1], images_full[idx].shape[0])),
                interpolation='nearest', alpha=0.5, cmap='viridis_alpha', aspect='auto')
            plt.show()


    font = cv2.FONT_HERSHEY_SIMPLEX
    bottomLeftCornerOfText = (10, 500)
    fontScale = 1
    fontColor = (255, 255, 255)
    lineType = 2

    if camera is True:
        print('camera flag detected!')
        
        #cap = cv2.VideoCapture("nvarguscamerasrc ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)21/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx, width=(int)960, height=(int)616 ! videoconvert ! video/x-raw, format=(string)BGR ! appsink")

        cap = cv2.VideoCapture("nvarguscamerasrc ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)60/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx, width=(int)960, height=(int)616 ! videoconvert ! appsink")
        
        if cap.isOpened():
            cv2.namedWindow("demo")
            while True:
                ret_val, image_np = cap.read()
                image_raw = image_np
                
                #print(f'*** original shape is {image_np.shape}')
                
                # Expand the dimensions
                image_np_expanded = np.expand_dims(image_np, axis=0)

                #print(f'*** image expanded shape is {image_np_expanded.shape}')


                images_full = []
                images_input = []
                images_scale = []
                
                dim = (224, 224)
                
                image_input = cv2.resize(image_raw, (224, 224))
                #image_input = image_np_expanded

                #print(f'image_raw shape is = {image_input.shape}')
                image_full = np.expand_dims(image_input, axis=0)

                #print(f'image_full shape after expanding is = {image_full.shape}')
                                
                #scale_width = image_full.shape[1] / 224
                #scale_height = image_full.shape[0] / 224
                #images_scale.append((scale_width, scale_height))

                if stage != 'test':
                    seq_det = seq.to_deterministic()
                    image_aug = (seq_det.augment_image(image_input).astype(np.float32) / 127.5) - 1.
                else:
                    image_aug = image_input.astype(np.float32) / 127.5 - 1.

                #images_full.append(image_full)
                #images_full.append(image_aug)
                
                t0 = time.time()
                #print(f'shape of image full before sending to ')
                model_outputs = tftrt_engine.infer(image_full)

                t1 = time.time()

                rectangles = []

                #print(f'length of model_outputs is = {len(model_outputs)}')

                if len(model_outputs) == 2:
                    classes, bboxes = model_outputs

                # TF / TensorRT models won't output regions (not useful for production)
                elif len(model_outputs) == 3:
                    regions, bboxes, classes = model_outputs
                else:
                    raise ValueError("Invalid model length output")


                framerate = 1.0/(t1 - t0)

                #print('Time: ', t1 - t0)
                #print('FPS: ', framerate)

                print()


                for y in range(0, 7):
                    for x in range(0, 7):
                        #print(f'confidence is = {classes[0, y, x, 0]}')
                        if classes[0, y, x, 0] >= confidence:
                            #print('confidence is enough!')
                            rect = [
                                int(bboxes[0, int(y), int(x), 0] * 224),
                                int(bboxes[0, int(y), int(x), 1] * 224),
                                int(bboxes[0, int(y), int(x), 2] * 224),
                                int(bboxes[0, int(y), int(x), 3] * 224)]
                            
                            print(f'rectangle is = {rect}')
                            
                            rectangles.append(rect)
                        
                        #else:
                        #    print('confidence not high enough')

                rectangles, merges = cv2.groupRectangles(rectangles, 1, eps=0.75)

                #scale_width, scale_height = images_scale[idx]

                if len(rectangles) > 0:
                    print(f'rectangle count is = {len(rectangles)}')
                    
                for rect in rectangles:
                    cv2.rectangle(image_raw,
                          (int(rect[0]), int(rect[1])),
                          (int(rect[2]), int(rect[3])),
                          (0, 255, 0), 5)
                
                cv2.putText(image_raw, "FPS: {0:.2f}".format(framerate), bottomLeftCornerOfText, font, fontScale, fontColor, lineType)
                
                cv2.imshow("demo", image_raw)
		
                if cv2.waitKey(1) == ord('q'):
                    break
        else:
            print('camera open failed')

        cv2.destroyAllWindows()



if __name__ == '__main__':
    plac.call(main)

`

Basically what happens is that I get the captured frame, and run it through inference. When I run the script (again, using my own model), the camera feed opens just fine, BUT when I make it view a photo of the object I trained it on, the same photos from the folder that I test it with, it doesn't detect my object anymore.

Basically, I'm trying to use my model and your base code in inference.py to run Object Detection on from the Camera Feed, but I haven't had any luck.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.