Comments (30)
I was able to convert "faster_rcnn_resnet101_coco"; however in order to be able to use it you should modify config file to use fixed input images. Modify line 4-8:
keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 }
==> fixed_shape_resizer { height: 600 width: 1024 }
Use any dimension you like
from tf_trt_models.
It's possible that it would work, but we haven't tested it. Currently, the build_detection_graph method that we provide in this repository is tested to work only against the listed models.
That said, it is possible that for similar meta-architectures (SSD), configurations with different feature extractors would work. A list of feature extractors registered with the tensorflow/models repository is listed here.
You would need to update the object detection configuration proto to select the desired feature extractor.
Theoretically, the TensorRT integration in TensorFlow should support any model, as the operations that are not supported by TensorRT are run in native TensorFlow. That said, there may be caveats.
Please let me know if you run into issues.
from tf_trt_models.
Thank you for answer. I will report here after I try faster-rcnn with different backbones.
from tf_trt_models.
@jaybdub-nv , @bezero , When I tried to convert "faster_rcnn_resnet50_coco" with TF-TRT on TX2, I met a few other issues. I wonder how you got around them. Any help/suggestion is highly appreciated.
-
TX2 ran out of memory, especially when I tried to load an image and do tf_sess.run(...). And the program just got killed.
-
Same issue as #11
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph
int(msg[0]))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.
- The following error, which seems to be solved by bezero's fix as shown above.
<log time omitted>: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::377, condition: isValidDims(dims)
- The following error, which I think is because the 2nd stage classifier needs to handle input tensor of larger batch size (300).
<log time omitted>: F tensorflow/contrib/tensorrt/kernels/trt_engine_op.
cc:82] input tensor batch larger than max_batch_size: 1
from tf_trt_models.
@jkjung-avt I also had memory issues. I solved it by closing my browser, since it is using your memory resources (if possible close all idle applications that are using memory resources. If I am not wrong, scripts in this repo work with max_batch_size=1, so try to work with single images. For batch size >1 TX2 memory might not be sufficient.
from tf_trt_models.
@bezero Thanks for the reply. But closing the web browser and all other applications on TX2 did not solve the OOM issue for me. I also used single-image input for the faster_rcnn_resnet50. I had to reduce number of proposals/detections in the model config to some very small numbers to get around that...
from tf_trt_models.
I am able to run faster_rcnn_resnet50_coco, which is included in the list of supported models, but I don't seem to be getting any speedup, which makes me skeptical that any subgraphs are being optimized at all.
In order to get it to run, I used the following command to build the graph (along with the other code included in the Jupyter example):
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
precision_mode='FP16',
minimum_segment_size=3,
maximum_cached_engines=3
)
I am wondering if anyone has had success in speeding up any form of Faster R-CNN, and if so, could you share some insight into what settings need to be adjusted or how to go about getting the graph conversions to work correctly?
from tf_trt_models.
I shared my test results on Jetson TX2 developer forum before: https://devtalk.nvidia.com/default/topic/1037019/jetson-tx2/tensorflow-object-detection-and-image-classification-accelerated-for-nvidia-jetson/post/5288250/#5288250
Note that I had to reduce number of region proposals in the Faster RCNN models otherwise it runs too slowly. All code I used for testing could be found in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models
from tf_trt_models.
I am facing the following error while trying to get the FasterRCNN model on TensorRT. I have tried changing the resizer as per @bezero comment 6 but still doesn't help yet. Any pointers would be highly appreciated.
.cc:724] Can't determine the device, constructing an allocator at device 0
2018-12-13 09:49:37.182205: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::281, condition: isIndexedCHW(dims) && volume(dims) < MAX_TENSOR_SIZE
2018-12-13 09:49:37.182317: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 3 nodes failed: Invalid argument: Failed to create Input layer tensor InputPH_0 rank=-2. Skipping...
2018-12-13 09:49:37.182353: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
from tf_trt_models.
@bezero @jkjung-avt Did you run your faster rcnn models in the jupyter notebook? The notebook code works fine for me for the ssd models but if try the faster rcnn models I'm getting Engine buffer is full. buffer limit=1, current entries=1, requested batch=100
. I'm using the exact notebook code with three modifications:
1: MODEL = 'faster_rcnn_resnet50_coco'
2: removed score_threshold=0.3
from build_detection_graph(...
3: changed to fixed_shape_resizer { height: 600 width: 1024 }
in the config file
Can either of you reproduce this issue? I'm using Tensorflow 1.12 and TensorRT 5.0
from tf_trt_models.
I haven't managed to get the faster_rcnn_resnet50 model to work with tensorflow 1.12.0 and TensorRT. Previously I got it to work using tensorflow 1.8.0, with some tweaks. Details are all in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_resnet50_egohands.config
from tf_trt_models.
Hi @jkjung-avt, I used your configure file: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_inception_v2_egohands.config to train a model on my dataset(class num 13) and then tried to convert it to TRT but still got the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.
How to did u get rid of this error?
Another issue I'm facing is that my trained FRCNN-inception-v2 checkpoint file (103.5MB) is about twice size of fined-tuned checkpoint file(53.3MB). Do you have any idea about this?
Thanks in advance.
from tf_trt_models.
@CharlieXie, try setting 'remove_assert' to False. I recall that's how I got rid of the problem previously.
https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/tf_trt_models/detection.py#L108
from tf_trt_models.
@jkjung-avt , I use your tf_trt_models, when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco --build .I met an error ,like:
I do not know how to deal with it? Can you help me!
And when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco . Do not build, no error,but detec result is not ideal,like:
from tf_trt_models.
@xiaowenhe The segmentation fault could be caused by "out of memory" issue. You could use 'tegrastat' to monitor JTX2 memory usage and try to confirm if that's the case.
As to the bad detection result by TF-TRT optimized faster_rcnn_resnet50_coco model, I'm not exactly sure what the problem is. There could be many causes, e.g.
- mismatching tensorflow versions between training and inferencing,
- TF-TRT does not optimize certain operations in the model correctly,
- ...
from tf_trt_models.
@jkjung-avt ,thank you! But I bo not use TX2,. I want to test it in other first and then use tx2. And GPU like :
From the pic,only 5285M used!
from tf_trt_models.
I can not force performance by using optimized TensorRT. Can someone tell my why? After optimizing the frozen graph, I get bigger model ???
from tf_trt_models.
@hoangtuanvu What do you mean by not being able to optimize? TensorRT optimizes your frozen model for inference, which does not mean that you get smaller model. Did you compare inference time before and after TensorRT optimization?
from tf_trt_models.
@bezero I used TensorRT to optimize the frozen graph, but I did not get better speed for inference. I am currently working on person detection.
from tf_trt_models.
I'm having a similar situation to @atyshka - no improvement whatsoever. Only difference after generating an 'optimized' graph is that with every frame I'm getting a warning "Engine buffer is full". Has anyone figured out how to deal with this?
Xavier TF1.12+TRT5
from tf_trt_models.
Although I ran the detection demo using ssd_mobilenet_v1_coco.pb, I found that if I used the TP16 in
trt.create_inference_graph(), and the result shows that the benchmark is about 0.041013 seconds and I used the INT8, the result shows that the benchmark is about 0.383557 seconds. Why will INT8 slower than TP16
from tf_trt_models.
@hoangtuanvu I am facing an issue while running the inference on a tensorflow object detection model(242MB). I have TF 1.13 and TensorRT 5.1.2 . Below is the log details.
2019-06-03 15:05:21.432164: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_v1_101/conv1/TRTEngineOp_123 added for segment 123 consisting of 2 nodes succeeded.
2019-06-03 15:05:21.432437: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node rpn_proposals/softmax/TRTEngineOp_124 added for segment 124 consisting of 3 nodes succeeded.
2019-06-03 15:05:22.384389: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph
2019-06-03 15:05:22.384595: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2014 nodes (-599), 2353 edges (-637), time = 4514.9751ms.
2019-06-03 15:05:22.384653: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] layout: Graph size after: 2063 nodes (49), 2422 edges (69), time = 462.632ms.
2019-06-03 15:05:22.384702: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2059 nodes (-4), 2422 edges (0), time = 908.786ms.
2019-06-03 15:05:22.384748: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] TensorRTOptimizer: Graph size after: 1653 nodes (-406), 2000 edges (-422), time = 57351.3477ms.
time(s) (trt_conversion): 72.7292
graph_size(MB)(native_tf): 230.8
graph_size(MB)(trt): 493.0
num_nodes(trt_only): 125
2019-06-03 15:05:49.531006: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for TRTEngineOp_0 with batch size 720
2019-06-03 15:05:49.543881: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output.
2019-06-03 15:05:55.369363: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_23 with batch size 1
2019-06-03 15:05:55.837386: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/conv1/TRTEngineOp_123 with batch size 1
2019-06-03 15:05:57.403776: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_24 with batch size 1
2019-06-03 15:06:10.529445: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_25 with batch size 1
2019-06-03 15:06:13.628441: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_26 with batch size 1
2019-06-03 15:06:20.675574: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_27 with batch size 1
2019-06-03 15:06:25.591558: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_28 with batch size 1
2019-06-03 15:06:28.377901: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_29 with batch size 1
2019-06-03 15:06:35.168358: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_31 with batch size 1
Killed
========================================================================
when i run dmesg --follow to check the process details.
[10663.666441] [12648] 1000 12648 6243614 1507814 3582 12 0 0 python3
[10663.666444] Out of memory: Kill process 12648 (python3) score 751 or sacrifice child
[10663.674368] Killed process 12648 (python3) total-vm:24974456kB, anon-rss:5768628kB, file-rss:262628kB, shmem-rss:0kB
[10664.011176] oom_reaper: reaped process 12648 (python3), now anon-rss:0kB, file-rss:262708kB, shmem-rss:0kB
Any suggestion or feedback is appreciated.
from tf_trt_models.
Hi @zhucheng725
Why will INT8 slower than TP16
Do you have any update on this?
Thanks
from tf_trt_models.
Hello,
Did anyone manage to resolve this issue? or is it still an issue from the TF-TRT?
I see the same issue with TF2.0 as well.
from tf_trt_models.
Hi @zhucheng725
Why will INT8 slower than TP16
Do you have any update on this?
Thanks
Not yet
from tf_trt_models.
I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts.
thanks
InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.
from tf_trt_models.
I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts.
thanks
InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.
I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway!
from tf_trt_models.
from tf_trt_models.
If I am not wrong the error states that the system does not have enough memory to run faster_rcnn_inception_v2 model Get Outlook for Androidhttps://aka.ms/ghei36
…
________________________________ From: Bob_JIANG @.***> Sent: Tuesday, March 30, 2021, 12:04 p.m. To: NVIDIA-AI-IOT/tf_trt_models Cc: spurani; Comment Subject: Re: [NVIDIA-AI-IOT/tf_trt_models] Tensorrt supported detection networks (#6) I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSup pression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''. I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway! — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2BYGE7CIWHR774SEZXBSLTGHZA3ANCNFSM4FLQTZPA.
Thanks for the quick answer! I'm running a different model using TRT and my memory is normal during execution... Do you know the meaning of 'has inputs from different frames' in the error message?
from tf_trt_models.
I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks
InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.
Hi, I'm using TF-TRT on windows10, with tf_gpu =2.10.0 and tensorrt = 7.2.3 based on cuda 11.2 and cudnn 8.1.0. I have met the same error while building TRT engine for inference. Do you know how to deal with it? Thanks a lot for your reply.
from tf_trt_models.
Related Issues (20)
- tensorflow 2.x HOT 2
- TF-TRT vs UFF-TensorRT HOT 3
- Cannot download pre-build pip wheel on Step3
- FasterRCNN and MASRCNN are not working HOT 1
- Faster RCNN Inception v2
- Testing TF-TRT on XAVIER AGX
- inference time is too long 3s/img at example/classification/classification.ipynb
- Issue installing tf_trt_models HOT 1
- how can i use it to train the for custom task, like face recognition with masks ?
- Installation `SyntaxError: Missing parentheses in call to 'print'` HOT 1
- trt.create_inference_graph step in detection.ipynb stuck for long time HOT 5
- unexpected performance on ssd_resnet_50_fpn_coco
- Can it run successfully on Jetpack 4.4.1?
- could not do infer in multiprocess
- File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 947, in _MergeField (message_descriptor.full_name, name)) google.protobuf.text_format.ParseError: 141:9 : Message type "object_detection.protos.BatchNonMaxSuppression" has no field named "use_static_shapes".
- jetson nano inference HOT 9
- Train your own model with ssd_mobilenet_v1_coco
- jetson nano not enough memory
- TensorFlow 2 Models for Jetson Nano
- example/classification/classification.ipynb Model checkpoint load error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tf_trt_models.