gustavz / realtime_object_detection Goto Github PK

View Code? Open in Web Editor NEW

279.0 24.0 148.0 119.36 MB

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV

License: MIT License

Python 53.98% Shell 3.68% C++ 42.34%

tensorflow object-detection api google python opencv deep-learning deep-neural-networks real-time

realtime_object_detection's Introduction

realtime_object_detection

Realtime Object Detection based on Tensorflow's Object Detection API and DeepLab Project

Version 1: use branch r1.0 for the original repo that was focused on high performance inference of ssd_mobilenet
(x10 Performance Increase on Nvidia Jetson TX2)

Version 2: use branch Master or to be additionally able to run and test Mask-Detection Models, KCF-Tracking and DeepLab Models (merge of the repo realtime_segmenation)

ROS Support: To use this Repo as ROS-Package including detection and segmentation ROS-Nodes use branch ros. Alternativley use the repo objectdetection_ros

About the Project

The Idea was to create a scaleable realtime-capable object detection pipeline that runs on various systems.
Plug and play, ready to use without deep previous knowledge.

The project includes following work:

optional download of tensorflow pretrained models
do Inference with OpenCV, either through video input or on selected test_images.
supported Models are all research/object_detection as well as research/deeplab models
enjoy this project's own ssd_mobilenet speed hack, which splits the model in a mutlithreaded cpu and gpu session.
Results in up to x10 performance increase depending on the running system
⇒ which makes it (one of) the fastest inference piplines out there
run statistic tests on sets of images and get statistical information like mean and median fps, std dev and much more
create timeline files measuring the exact time consumption of each operation in your model
inspect, summarize, quantize, transform and benchmark models with the provided scripts/
Use this Repo as ROS Package. the detection subscirbes a ROS Image topic and publishes the detection as ROS Node.

Inference:

create a copy of config.sample.yml named config.yml and only change configurations inside this file
For example: If you are not interested in visualization: set VISUALIZE to False,
or if you want to switch off the speed hack set SPLIT_MODEL to False,
to be able to use KCF_Tracking inside scripts/ run bash build_kcf.sh to build it and set USE_TRACKER to True to use it
(currently only works for pure object detection models without SPLIT_MODEL)

new class (Model,Config,Visualizer) structure. Simply create your own test file with:

from rod.model import ObjectDetectionModel, DeepLabModel
from rod.config import Config

model_type = 'od'                                              #or 'dl'
input_type = 'video'                                           #or 'image'
config = Config(model_type)
model = ObjectDetectionModel(config).prepare_model(input_type) #or DeepLabModel
model.run()

Alternativley run python + objectdetection_video.py or objectdetection_image.py or deeplab_video.py or deeplab_image.py or allmodels_image.py

Scripts:

To make use of the tools provided inside scripts/ follow this guide:

first change all paths and variables inside config_tools.sh to your needs / according to your system
When using the first time run: source config_tools.sh and in the same terminal run only once source build_tools.sh to build the tools. this will take a while.
For all following uses first run: source config_tools.sh(due to the exported variables) and after that you are able to run the wanted scripts always from the same terminal with source script.sh.
All scripts log the terminal output to test_results/

Setup:

Use the following setup for best and verified performance

Ubuntu 16.04
Python 2.7
Tensorflow 1.4 (this repo provides pre-build tf wheel files for jetson tx2)
OpenCV 3.3.1

Note: tensorflow v1.7.0 seems to have massive performance issues (try to use other versions)

Current max Performance on `ssd_mobilenet`:

Dell XPS 15 with i7 @ 2.80GHZ x8 and GeForce GTX 1050 4GB: 100 FPS
Nvidia Jetson Tx2 with Tegra 8GB: 30 FPS

Related Work:

objectdetection_ros: This Repository as ROS Package ready to use
test_models: A repo for models i am currently working on for benchmark tests
deeptraining_hands: A repo for setting up the ego- and oxford hands-datasets.
It also contains several scripts to convert various annotation formats to be able to train Networks on different deep learning frameworks
currently supports .xml, .mat, .csv, .record, .txt annotations
yolo_for_tf_od_api: A repo to be able to include Yolo V2 in tf's object detection api
realtime_segmenation: This repo was merged into v2.0
Mobile_Mask_RCNN: a Keras Model for training Mask R-CNN for mobile deployment
tf_training: Train Mobile Mask R-CNN Models on AWS Cloud
tf_models: My tensorflow/models fork which includes yolov2 and mask_rcnn_mobilenet_v1_coco
eetfm_automation: Export and Evaluation of TensorFlow Models Automation based on the Object Detection API

realtime_object_detection's People

Contributors

Stargazers

Watchers

Forkers

yuye1992 alro10 deeprobot2020 ossdc naisy vtaranti nvnnghia fenniewxy 2007anu-zz dapengchalmers hothaifa zumbalamambo starstylesky ag-networks optimus1072 vehicularkech balaji1994 robspringles qianyc afcarl kanda-robotics nguyentienluat-adm jussuficarus merryhunter jeremyczhj yougoforward willdamon manishbajpai7 locussam shihclin richard-coder guodebby vesor nehran batigooal cuulee ilidar jerrybonjour yangchuancv xudongyang zds0 jwen11 shashiyadav187 liuwen-nj diroepowerx hajungong007 myagmur01 gauthiermartin wuxiangchao ngadhvi seeker1943 kangyuzhe666 iscas-lee dsp6414 syedrz robotlinker luckynote yanhuizen amirunpri2018 dineshresearch strings679 coderhaoranlee a554142589 jbasttdi ml-lab 676942089 arishin hamzatopal xiuxi yaroslavschubert alphamary alexanderdurr javadfiyuzi wmjpillow ankitshah009 yudongheon djangid eddieburning cyitian yuliusd hoonkai mafangfang9 jonyboy2000 hanghangxie wuzuqing abkonate patricksabry1 thomas545 quantumtechnologies adityajakap mipromis mpraveen10 drroad varamishitha ygexe ajunlonglive dragonius rajgithub009

realtime_object_detection's Issues

is it possible to input a gray-style video?

my video is of gray scale. How would I modify your code to use your api?

No bounding boxes with deeplab_video.py

HI,

when I run
$ python objectdetection_video.py,
a window pops up, which shows the video captured by my camera and it has bounding boxes around the objects. However, When I run
$ python deeplab_video.py,
I see the same window, but this time without the bounding boxes around the objects.
Is this how deeplab_video.py should inference?
I used the default settings in the config file.

run_objectdetection can not be run successfully

The environment：jeston tx2:
opencv: 4.0.0
Camera: CSI Cameras
when I run run_objectdetection,it errors.

('frame', None)

Traceback (most recent call last):
File "run_objectdetection.py", line 189, in
detection(model, config)
File "run_objectdetection.py", line 84, in detection
gpu_feeds = {image_tensor: vs.expanded()}
File "/home/nvidia/Downloads/realtime_object_detection-master/rod/helper.py", line 220, in expanded
return np.expand_dims(cv2.cvtColor(self.frame, cv2.COLOR_BGR2RGB), axis=0)
cv2.error: OpenCV(4.0.0-pre) /home/nvidia/opencv/modules/imgproc/src/color.hpp:253: error: (-215:Assertion failed) VScn::contains(scn) && VDcn::contains(dcn) && VDepth::contains(depth) in function 'CvtHelper'

I find that "frame = vs.read()" is always None.
If you have any ideas,please tell me .
Thanks !

Running run_objectdetection.py results in indentation error

Running run_objectdetection.py results in indentation error, should be an easy fix

 File "C:\Git\realtime_object_detection\rod\helper.py", line 438
    json.dump(self._timeline_dict, f)
                                    ^
TabError: inconsistent use of tabs and spaces in indentation

Jetson Env

Dear GustavZ,

I want to know the environment that you run on Jetson for this model. Are you using tensorRT and cuDNN? We tried to run tensorflow zoo model on tensorRT, but it seems not supported well.

Thanks for your reply in advance!

Michael

Node 'Pad_17': Unknown input node 'Relu_18' when call import_graph_def for pb file

When I want to convert pb file to tflite OR when I just want to load pb file, I get this error for both cases.
PS: pb file generated from pth (onnx) file.

`import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_frozen_graph('saved_model.pb',
input_arrays=['Pad'], # input arrays
output_arrays=['RELU'] # output arrays as told in upper in my model case it si add_10
)

`import sys
from tensorflow.python.platform import gfile

from tensorflow.core.protobuf import saved_model_pb2
from tensorflow.python.util import compat

import tensorflow as tf
from tensorflow.python.platform import gfile
with tf.Session() as sess:
model_filename ='saved_model.pb'
with gfile.FastGFile(model_filename, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
#print(graph_def)
g_in = tf.train.import_meta_graph(graph_def)`

Does this repo support Faster-RCNN network with Jetson-TX2 in tensor-RT related stuffs?

I'm trying to implement object detection using Faster-RCNN with jetson-tx2 and tensor-RT. Please help.. @gustavz

Trying to use the GPU on my computer

I am currently trying to run this program using my custom built computer. It has a GeForce GTX 1070 with an i7 on board. However, whenever I run my program, I don't see any change in terms of my GPU processing. Tis is indicated through a simple nvidia-smi, which constantly gives me:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02              Driver Version: 396.24.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| 40%   51C    P0    38W / 151W |   1649MiB /  8118MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1319      G   /usr/lib/xorg/Xorg                           989MiB |
|    0      3617      G   compiz                                       414MiB |
|    0      3799      G   ...are/jetbrains-toolbox/jetbrains-toolbox     3MiB |
|    0      4318      G   ...-token=D21D68637A4C960C2EA136F424DD9CBC   240MiB |

My configuration seems to indicate that I am using the split between the CPU and the GPU, so I am not too sure of what is happening here. My configuration lies below:

### Inference Config
VIDEO_INPUT: /home/dcs_user/barberry.mp4                      # Input Must be OpenCV readable
ROS_INPUT: /camera/color/image_raw  # ROS Image Topic
VISUALIZE: True                     # Disable for performance increase
VIS_FPS: True                       # Draw current FPS in the top left Image corner
CPU_ONLY: False                     # CPU Placement for speed test
USE_OPTIMIZED: False                # whether to use the optimized model (only possible if transform with script)
DISCO_MODE: False                   # Secret Disco Visualization Mode
DOWNLOAD_MODEL: False               # Only for Models available at the TF model_zoo


### Testing
IMAGE_PATH: 'test_images'           # path for test_*.py test_images
LIMIT_IMAGES: None                  # if set to None, all images are used
WRITE_TIMELINE: False                # write json timeline file (slows infrence)
SAVE_RESULT: False                  # save detection results to disk
RESULT_PATH: 'test_results'         # path to save detection results
SEQ_MODELS: []                      # List of Models to sequentially test (Default all Models)


### Object_Detection
WIDTH: 600                          # OpenCV Video stream width
HEIGHT: 600                         # OpenCV Video stream height
MAX_FRAMES: 5000                    # only used if visualize==False
FPS_INTERVAL: 5                     # Interval [s] to print fps of the last interval in console
PRINT_INTERVAL: 500                 # intervall [frames] to print detections to console
PRINT_TH: 0.5                       # detection threshold for det_intervall
## speed hack
SPLIT_MODEL: True                   # Splits Model into a GPU and CPU session (currently only works for ssd_mobilenets)
MULTI_THREADING: True               # Additional Split Model Speed up through multi threading
SSD_SHAPE: 300                      # used for the split model algorithm (currently only supports ssd networks trained on 300x300 and 600x600 input)
SPLIT_NODES: ['Postprocessor/convert_scores','Postprocessor/ExpandDims_1']
                                    # hardcoded split points for ssd_mobilenet_v1
## Tracking
USE_TRACKER: False                  # Use a Tracker (currently only works properly WITHOUT split_model)
TRACKER_FRAMES: 20                  # Number of tracked frames between detections
NUM_TRACKERS: 5                     # Max number of objects to track
## Model
OD_MODEL_NAME: 'ssd_mobilenet_v11_coco'
OD_MODEL_PATH: 'models/ssd_mobilenet_v11_coco/{}'
LABEL_PATH: 'rod/data/tf_coco_label_map.pbtxt'
NUM_CLASSES: 90


### DeepLab
ALPHA: 0.3                     # mask overlay factor (also for mask_rcnn)
BBOX: True                     # compute boundingbox in postprocessing
MINAREA: 500                   # min Pixel Area to apply bounding boxes (avoid noise)
## Model
DL_MODEL_NAME: 'deeplabv3_mnv2_pascal_train_aug_2018_01_29'
DL_MODEL_PATH: 'models/deeplabv3_mnv2_pascal_train_aug/{}'

Low FPS when with visualisation

The FPS that my Jetson TX2 Outputs fluctuates a lot when I set ‘visualize’ to true in the config.yml file. The FPS drop significantly (down to 7-8FPS from 33-35FPS) when there are many objects present. I was wondering if anyone else experienced this and knew where the bottleneck was occurring.
I changed the object_detection.py script slightly, so that it can use a video as an Input, and the drop in frames does not occur when I set ‘visualize’ to false (in the config.yml).
I understand that visualising the output will reduce the FPS, but I was not expecting it to be so drastic. It would be greatly appreciated if someone could explain why the drop is so drastic or even how to reduce the drop in frames

License for repository

Could you put a LICENSE file to your repository?
For me it is always unclear what I am allowed to do, even after forking it and I don't feel well contributing to a repository without license.

Train SSD 600

HI, I want to train my custom dataset with ssd 600. Is it enough to set

 image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }

to 600 / 600 instead of 300 / 300 ?
Thank you very much.

ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'

@gustavz
Hi, I am new to tensorflow. I trained Tensorflow's object detection api with my own data using ssd_mobilenet_v2_coco model. When I use this 'realtime_object_detection' code, I get the error stated as follows:

File "run_objectdetection.py", line 178, in <module> config.NUM_CLASSES,config.SPLIT_MODEL, config.SSD_SHAPE).prepare_od_model() File "D:\realtime_object_detection_2\rod\model.py", line 163, in prepare_od_model self.load_frozenmodel() File "D:\realtime_object_detection_2\rod\model.py", line 135, in load_frozenmodel tf.import_graph_def(remove, name='') File "C:\Users\bubblelab\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func return func(*args, **kwargs) File "C:\Users\bubblelab\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 493, in import_graph_def raise ValueError(str(e)) *ValueError: Node 'Preprocessor/map/TensorArray_2': Unknown input node 'Preprocessor/map/strided_slice'*

The error comes from the model.py file line 130 when importing remove graph:
` keep = graph_pb2.GraphDef()
for n in nodes_to_keep_list:
keep.node.extend([copy.deepcopy(name_to_node_map[n])])

                remove = graph_pb2.GraphDef()
                remove.node.extend([score_def])
                remove.node.extend([expand_def])
                for n in nodes_to_remove_list:
                    remove.node.extend([copy.deepcopy(name_to_node_map[n])])

                with tf.device('/gpu:0'):
                    tf.import_graph_def(keep, name='')
                with tf.device('/cpu:0'):
                    tf.import_graph_def(remove, name='')`

I checked that the node named "Preprocessor/map/strided_slice" is really in the frozenmodel:
name: "Preprocessor/map/strided_slice" op: "StridedSlice" input: "Preprocessor/map/Shape" input: "Preprocessor/map/strided_slice/stack" input: "Preprocessor/map/strided_slice/stack_1" input: "Preprocessor/map/strided_slice/stack_2" attr { key: "Index" value { type: DT_INT32 } } attr { key: "T" value { type: DT_INT32 } } attr { key: "begin_mask" value { i: 0 } } attr { key: "ellipsis_mask" value { i: 0 } } attr { key: "end_mask" value { i: 0 } } attr { key: "new_axis_mask" value { i: 0 } } attr { key: "shrink_axis_mask" value { i: 1 } }

i found there is another node named 'Preprocessor/map/TensorArray_1' also has the node 'Preprocessor/map/strided_slice' as its input:

name: "Preprocessor/map/TensorArray_1" op: "TensorArrayV3" input: "Preprocessor/map/strided_slice" attr { key: "clear_after_read" value { b: true } } attr { key: "dtype" value { type: DT_FLOAT } } attr { key: "dynamic_size" value { b: false } } attr { key: "element_shape" value { shape { unknown_rank: true } } } attr { key: "identical_element_shapes" value { b: true } } attr { key: "tensor_array_name" value { s: "" } }

I can not fix this error. Any help would be appreciated!

How to achieve 8|10 FPS on TX2

Hey @gustavz ,

I have basically the exact same setup as you on the TX2 but my FPS is only half of what you have achieved. I was wondering how you were able to do so? Was the improvement mostly from adjusting the batch_non_max_suppression.score_threshold during export?

Jetson TX2 onboard camera

Hi GustavZ!

How did you configure config.yml file to open onboard camera on Jetson TX2?

Thanks!

Implement frame skipping

Implementing frame skipping would increase performance by a lot, as usually not every frame a second is needed for specific tasks. I tried to do it like this:

skip_frames = 2
process_frame = cur_frames % skip_frames == 0
if process_frame:
    # Do the split_model part
else:
     # Just display the image

This approach leads to a lot of problems, where the split_model part uses and displays older images. As I am not familiar with Python threading, multiprocessing and queues, probably someone with more experience could implement this or give hints for the right direction.

Hi Gustav I have questions!

First, I was amazed at your work. It fits perfectly in my work.

I am working at JetsonTX2 & DrivePX2, and as you know, there is a speed issue.

I got information about the various works and github.

Q1. How can you achieve 30 fps at SSD mobilenet JetsonTX2?
AS mentioned (1), you manually assigning the CNN related nodes on GPU and the rest nodes on CPU at tensorflow? How?

Q2. Have you experimented with other frameworks?
I have experimented with openCV DNN (SSD-mobilenet), caffe (SSD-mobilenet), darknet (YOLO v2, v3) and tensorflow (SSD-mobilenet).

However, i got performance up to only 9 fps.

Do you think the above frameworks lacks the ability to optimize GPU / CPU allocation?

Thank you

low fps on tx2 is only 0.5

Operating environment：tx2
system：L28.2
tensorflow version：1.8
HI @gustavz ：
Firstly ,thanks for your work.
I have some questions to consult you,
the first one : When I run your code on tx2 ,I donnot know why FPS is 0.5
the second one : the CPU GPU have low usage
If you have any ideas please tell me .
thanks

Optimize for inference

Hi @gustavz, thanks for the great work!

I saw the section about optimize for inference in you howto_wiki.
I wonder if you tried to use tensorflow.python.tools.optimize_for_inference for the current SSD-Mobilenet model and if so what are the results?

Did you also try quantization and graph transform tools?

Thank you.

AssertionError: Postprocessor/convert_scores is not in graph

Hi GustavZ
I try this on my jetson-tx2 and it show this error.

/usr/local/lib/python3.5/dist-packages/matplotlib/backends/backend_gtk3agg.py:16: UserWarning: The Gtk3Agg backend is known to not work on Python 3.x with pycairo. Try installing cairocffi. "The Gtk3Agg backend is known to not work on Python 3.x with pycairo. " Model found. Proceed. Loading frozen model into memory 2018-03-22 12:57:30.670896: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:858] ARM has no NUMA node, hardcoding to return zero 2018-03-22 12:57:30.671162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.66GiB freeMemory: 1.76GiB 2018-03-22 12:57:30.671251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) Traceback (most recent call last): File "/home/nvidia/realtime_object_detection-master/object_detection.py", line 301, in <module> main() File "/home/nvidia/realtime_object_detection-master/object_detection.py", line 295, in main graph, score, expand = load_frozenmodel() File "/home/nvidia/realtime_object_detection-master/object_detection.py", line 128, in load_frozenmodel assert d in name_to_node_map, "%s is not in graph" % d AssertionError: Postprocessor/convert_scores is not in graph

What should I do?

Is it possible to retrain you model ?

Hi,

I was wondering if there is any method that would let us retrain this model using Pascal voc a notion files and images ???

jetson TX2 camera

hi ,
I install tx2 with
Ubuntu 16.04
Python 2.7
Tensorflow 1.7
OpenCV 3.3.1
But when I ruan object_detection.py issue is coming

vidia@tegra-ubuntu:~/tf/realtime_object_detection$ python object_detection.py

Model found. Proceed.
Loading frozen model into memory
2018-03-30 07:11:08.517989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-30 07:11:08.518123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 2.29GiB
2018-03-30 07:11:08.518172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-30 07:11:10.014011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-30 07:11:10.014083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-03-30 07:11:10.014109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-03-30 07:11:10.014269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1694 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Loading label map
Building Graph
2018-03-30 07:11:33.197609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-30 07:11:33.197707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-30 07:11:33.197737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-03-30 07:11:33.197758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-03-30 07:11:33.197847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1694 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
start!!!
2018-03-30 07:11:33.199546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-30 07:11:33.199636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-30 07:11:33.199671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-03-30 07:11:33.199713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-03-30 07:11:33.199817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1694 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-03-30 07:11:33.200175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-30 07:11:33.200259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-30 07:11:33.200292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-03-30 07:11:33.200315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-03-30 07:11:33.200394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1694 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
VIDEOIO ERROR: V4L2: Pixel format of incoming image is unsupported by OpenCV
OpenCV Error: Unspecified error (GStreamer: unable to start pipeline
) in cvCaptureFromCAM_GStreamer, file /home/nvidia/opencv/modules/videoio/src/cap_gstreamer.cpp, line 887
VIDEOIO(cvCreateCapture_GStreamer(CV_CAP_GSTREAMER_V4L2, reinterpret_cast<char *>(index))): raised OpenCV exception:

/home/nvidia/opencv/modules/videoio/src/cap_gstreamer.cpp:887: error: (-2) GStreamer: unable to start pipeline
in function cvCaptureFromCAM_GStreamer

Start video stream with shape: 0,0
Press 'q' to Exit
Starting Detection
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/nvidia/opencv/modules/imgproc/src/color.cpp, line 11016
Traceback (most recent call last):
File "object_detection.py", line 302, in
main()
File "object_detection.py", line 298, in main
detection(graph, category, score, expand)
File "object_detection.py", line 213, in detection
image_expanded = np.expand_dims(cv2.cvtColor(image, cv2.COLOR_BGR2RGB), axis=0)
cv2.error: /home/nvidia/opencv/modules/imgproc/src/color.cpp:11016: error: (-215) scn == 3 || scn == 4 in function cvtColor

can you help me ? should i change config.yml parm?

What is the different between your model and other mobilenet model?

Hi, I run your project using mobinet v1. But the speed is only a half compare to your model. When I tried mobinet v2, I got error. Therefore I want to ask What is the different between your model and other mobilenet model?

Second problem is:
when I used your .config file to train on Pascal VOC data, after training I got model file and do object detection using your code. But, got this error:
ValueError: graph_def is invalid at node u'Preprocessor/map/TensorArray_2': Input tensor 'Preprocessor/map/strided_slice:0' not found in graph_def. Any suggestion?? Thank you!

Amazing Performance possible on iOS / Android?

I can't believe how fast this runs on my MacBook pro in CPU mode!!?
Why is this not the default method of TensorFlow?

Are you able to explain how you were able to know which nodes to process on CPU and which on GPU for max performance?

Also is it possible to implement this in C++ for iOS / Android?

How can I fix it?

When I run "objectdetection_video.py", it shows "No module named 'tf_utils'".

Over 20 FPS on TX2 with thread

Hi, GustavZ

Nice work!
It seems very good to split the model into Detection part and NMS part.
I changed CPU part run with thread. It over 20 FPS on Jetson TX2.

Thank you.

Query on Performance of r1.0 version of the code

Hi,
I ported the r1.0 version of the code in Jetson Tx2. I got 30 fps for a resolution of 320x240. But I observe that the frame rate is based on the number of objects in the scene. Say, for one object in the scene, it says 30 fps whereas, for 5 to 6 objects the frame rate drops down to 20 to 25 fps. Sometime even 15fps. Is my observation correct ? Or is the code's performance depends on the number of objects too ? Also, the test images in the folder has one objects in all the images. Can you share the performance for multiple objects too.

Thanks in advance.
Niran

Half FPS using tensorflow's official .pb file

There is something really weird happening to the execution times when using the official weights. If I use the frozen_graph.pb of this repo I get around 120 fps in my Laptop (A predator helios 300). BUT If I use the official frozen_grahp that you can download from this link I get only 60 FPS. That was using the split version, but the same behavior occurs not splitting the graph...

What kind of sorcery did you do to your weights? 0:

To reproduce this just download the weights from the link and replace them in the folder models

Pd: This pull request fixes the URL for the download of the mobilenet's weights.

Any plan to upgrade object detection model to mobilenetV2-ssdlite?

Great work! I can easily reproduce FPS 25 object detection on TX2.
Wondering if any plan to upgrade object detection model to mobilenetV2-ssdlite? should be another performance lift with new model. Thanks.

ValueError: graph_def is invalid

Hi , I trained my own model using original tensorflow object detection,
but I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 633, in import_graph_def
    source_op = name_to_op[operation_name]
KeyError: 'Preprocessor/map/strided_slice'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "object_detection.py", line 301, in <module>
    main()
  File "object_detection.py", line 295, in main
    graph, score, expand = load_frozenmodel()
  File "object_detection.py", line 161, in load_frozenmodel
    tf.import_graph_def(remove, name='')
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 640, in import_graph_def
    % (input_name,)))
ValueError: graph_def is invalid at node 'Preprocessor/map/TensorArray_2': Input tensor 'Preprocessor/map/strided_slice:0' not found in graph_def..
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 633, in import_graph_def
    source_op = name_to_op[operation_name]
KeyError: 'Preprocessor/map/strided_slice'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "object_detection.py", line 301, in <module>
    main()
  File "object_detection.py", line 295, in main
    graph, score, expand = load_frozenmodel()
  File "object_detection.py", line 161, in load_frozenmodel
    tf.import_graph_def(remove, name='')
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 640, in import_graph_def
    % (input_name,)))
ValueError: graph_def is invalid at node 'Preprocessor/map/TensorArray_2': Input tensor 'Preprocessor/map/strided_slice:0' not found in graph_def..

Is there something special to do when training with my own dataset?
Thank you, your code is great!

The problem of running objectdetection_image.py when MULTI_THREADING: True

results is like this ！I think it should be a multi-thread problem，How should I solve it?

How to convert the Mask RCNN model to tflite model

Have you converted Mask_RNN model to .tflite successfully?
when I run the script as shown in scripts/create_tflite_graph.sh, I got the following error

tensorflow/lite/toco/tooling_util.cc:908] Check failed: GetOpWithOutput(model, output_array) Specified output array "num_detections,detection_boxes,detection_scores,detection_classes,detection_masks" is not produced by any op in this graph. Is it a typo? To silence this message, pass this flag:  allow_nonexistent_arrays

Can I change the environment？I must need python3.6 and tf1.8

High inference time using r1.0 and master

Hi @gustavz
The model ran successfully on Jetson TX2 but the inference time was quite slow. I tried both r1.0 branch and the master branch, the inference time were-
For master:
18.15, 2.39, 2.62, 2.53 seconds
While for r1.0:
22.34, 0.27, 0.17, 0.13 seconds
for 4 images respectively.
Visualization was switched off.
Is there anything I'm missing that makes it this slow?

Thanks

RuntimeError: Attempted to use a closed Session.

I'm trying to disable the speed hack by setting SPLIT_MODEL to False, but I keep getting an error "RuntimeError: Attempted to use a closed Session.":

> Press 'q' to Exit
Traceback (most recent call last):
  File "objectdetection_image.py", line 20, in <module>
    main()
  File "objectdetection_image.py", line 17, in main
    model.run()
  File "/Volumes/Data/Projects/realtime_object_detection/src/realtime_object_detection/rod/model.py", line 235, in run
    self.detect()
  File "/Volumes/Data/Projects/realtime_object_detection/src/realtime_object_detection/rod/model.py", line 521, in detect
    self.run_default_sess()
  File "/Volumes/Data/Projects/realtime_object_detection/src/realtime_object_detection/rod/model.py", line 430, in run_default_sess
    options=self._run_options, run_metadata=self._run_metadata)
  File "/Volumes/Data/Projects/realtime_object_detection/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/Volumes/Data/Projects/realtime_object_detection/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1075, in _run
    raise RuntimeError('Attempted to use a closed Session.')
RuntimeError: Attempted to use a closed Session.

Is there a fix to this?

Thanks

Always got 'None' using g = gpu_worker.get_result_queue().

@gustavz Hey, When I implement the ' object_detection.py' on Tx2, I Always got 'None' using g = gpu_worker.get_result_queue(). And no detection results comes out, what's the reason of this problem?

Does not work with Tensorflow 1.8

I get this error when running source build_tools.py

I know you said your code is supported by Tensorflow 1.4 but I thought it would be good for you to know.

Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error

Hello GustavZ,
I ran into some problems running your code on the Jetson TX2.
At first no problems at all but after a few days I keep receiving this error.
full terminal log:

Model found. Proceed.
Loading frozen model into memory
2018-02-14 12:34:03.208044: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-14 12:34:03.208210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.71GiB
2018-02-14 12:34:03.208272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-14 12:34:04.632828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
Loading label map
Starting detection
2018-02-14 12:34:28.283994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-14 12:34:28.284142: E tensorflow/core/common_runtime/direct_session.cc:168] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Traceback (most recent call last):
  File "object_detection.py", line 249, in <module>
    main()
  File "object_detection.py", line 245, in main
    detection(graph, category, score, expand)
  File "object_detection.py", line 170, in detection
    with tf.Session(graph=detection_graph,config=config) as sess:
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1509, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 628, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

If I first run the program without splitting the model and after wards again with the split turned on, it all works fine! But after I reboot, the problem arises again...
Do you have any idea?

Real-time segmentation

Hey @gustavz, as mentioned is the previous threads I am looking into efficient segmentation networks that can be run on the TX2. Would you mind looping me in to the conversation regarding the mask implementation? Me and rest of the team at my university are all very interested in this subject and would like to figure out the best approach for doing real-time segmentation.

Error when run with video

Hi, I tried your project for a video and I got a error. The video were captured successfully.

Error is
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/nghia/opencv/modules/imgproc/src/color.cpp, line 10606 Traceback (most recent call last): File "object_detection.py", line 304, in <module> main() File "object_detection.py", line 300, in main detection(graph, category, score, expand) File "object_detection.py", line 216, in detection image_expanded = np.expand_dims(cv2.cvtColor(image, cv2.COLOR_BGR2RGB), axis=0) cv2.error: /home/nghia/opencv/modules/imgproc/src/color.cpp:10606: error: (-215) scn == 3 || scn == 4 in function cvtColor

I also get this kind of error

2018-03-09 11:00:12.881476: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2018-03-09 11:00:12.881514: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2018-03-09 11:00:12.881528: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) pip install -U numpyAborted (core dumped)

I used Python3.5
Tensorflow 1.4.1