Comments (3)
Step to reproduce
./pplnn-build/tools/pplnn --in-shapes 32_3_224_224 --dims 32_3_224_224 --warmuptimes 200 --runningtimes 200 --onnx-model vgg16.onnx [INFO][2021-07-05 08:31:30.885][pplnn.cc:683] ppl.nn version: v0.1.0-dirty [INFO][2021-07-05 08:31:32.207][pplnn.cc:88] ***** register CudaEngine ***** [INFO][2021-07-05 08:31:32.940][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1. [INFO][2021-07-05 08:31:33.295][opt_graph.cc:187] Create 71 TensorImpl [INFO][2021-07-05 08:31:33.295][opt_graph.cc:299] added 56 new bridge kernels [INFO][2021-07-05 09:46:30.989][opt_graph.cc:461] deleted 52 bridge kernels [INFO][2021-07-05 09:46:46.325][pplnn.cc:523] ----- input info ----- [INFO][2021-07-05 09:46:46.326][pplnn.cc:526] input[0]: [INFO][2021-07-05 09:46:46.326][pplnn.cc:527] name: input.1 [INFO][2021-07-05 09:46:46.326][pplnn.cc:534] dim(s): 32 3 224 224 [INFO][2021-07-05 09:46:46.326][pplnn.cc:536] DataType: FLOAT32 [INFO][2021-07-05 09:46:46.326][pplnn.cc:537] DataFormat: NDARRAY [INFO][2021-07-05 09:46:46.326][pplnn.cc:538] NumBytesIncludePadding: 19267584 [INFO][2021-07-05 09:46:46.326][pplnn.cc:539] NumBytesExcludePadding: 19267584 [INFO][2021-07-05 09:46:46.326][pplnn.cc:542] ----- output info ----- [INFO][2021-07-05 09:46:46.326][pplnn.cc:545] output[0]: [INFO][2021-07-05 09:46:46.326][pplnn.cc:546] name: 70 [INFO][2021-07-05 09:46:46.326][pplnn.cc:553] dim(s): 32 1000 [INFO][2021-07-05 09:46:46.326][pplnn.cc:555] DataType: FLOAT32 [INFO][2021-07-05 09:46:46.326][pplnn.cc:556] DataFormat: NDARRAY [INFO][2021-07-05 09:46:46.326][pplnn.cc:557] NumBytesIncludePadding: 128000 [INFO][2021-07-05 09:46:46.326][pplnn.cc:558] NumBytesExcludePadding: 128000 [INFO][2021-07-05 09:46:46.326][pplnn.cc:561] ---------------------- [INFO][2021-07-05 09:46:46.326][pplnn.cc:791] Run() costs: 9175.929688 ms. [INFO][2021-07-05 09:46:46.326][pplnn.cc:799] Run ok
As shown in log, the time start on 08:31 and start inference on 09:46, took 75 minutes to prepare. Is it normal?the model was import from torchvison and export to onnx
import torchvision dummy_input = torch.randn(32, 3, 224, 224) model = torchvision.models.vgg16(pretrained = True) model.eval() torch.onnx.export(model, dummy_input, "vgg16.onnx", opset_version=11)Also, test with batch size = 1, the time is pretty normal.
# ./pplnn-build/tools/pplnn --onnx-model vgg16.onnx --in-shapes 1_3_224_224 --dims 1_3_224_224 --warmuptimes 100 --runningtimes 100 [INFO][2021-07-05 05:21:44.428][pplnn.cc:683] ppl.nn version: v0.1.0-dirty [INFO][2021-07-05 05:21:46.437][pplnn.cc:88] ***** register CudaEngine ***** [INFO][2021-07-05 05:21:47.230][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1. [INFO][2021-07-05 05:21:47.511][opt_graph.cc:187] Create 71 TensorImpl [INFO][2021-07-05 05:21:47.511][opt_graph.cc:299] added 56 new bridge kernels [INFO][2021-07-05 05:24:30.634][opt_graph.cc:461] deleted 52 bridge kernels [INFO][2021-07-05 05:24:31.300][pplnn.cc:523] ----- input info ----- [INFO][2021-07-05 05:24:31.300][pplnn.cc:526] input[0]: [INFO][2021-07-05 05:24:31.300][pplnn.cc:527] name: input.1 [INFO][2021-07-05 05:24:31.300][pplnn.cc:534] dim(s): 1 3 224 224 [INFO][2021-07-05 05:24:31.300][pplnn.cc:536] DataType: FLOAT32 [INFO][2021-07-05 05:24:31.300][pplnn.cc:537] DataFormat: NDARRAY [INFO][2021-07-05 05:24:31.300][pplnn.cc:538] NumBytesIncludePadding: 602112 [INFO][2021-07-05 05:24:31.300][pplnn.cc:539] NumBytesExcludePadding: 602112 [INFO][2021-07-05 05:24:31.300][pplnn.cc:542] ----- output info ----- [INFO][2021-07-05 05:24:31.300][pplnn.cc:545] output[0]: [INFO][2021-07-05 05:24:31.300][pplnn.cc:546] name: 70 [INFO][2021-07-05 05:24:31.300][pplnn.cc:553] dim(s): 1 1000 [INFO][2021-07-05 05:24:31.300][pplnn.cc:555] DataType: FLOAT32 [INFO][2021-07-05 05:24:31.300][pplnn.cc:556] DataFormat: NDARRAY [INFO][2021-07-05 05:24:31.300][pplnn.cc:557] NumBytesIncludePadding: 4000 [INFO][2021-07-05 05:24:31.300][pplnn.cc:558] NumBytesExcludePadding: 4000 [INFO][2021-07-05 05:24:31.300][pplnn.cc:561] ---------------------- [INFO][2021-07-05 05:24:31.300][pplnn.cc:791] Run() costs: 344.269989 ms. [INFO][2021-07-05 05:24:31.300][pplnn.cc:799] Run ok
Actually, it may take hours to select the fastest algo for conv or gemm ops in prepare stage, especially when the batch size is large.
from ppl.nn.
The time cost for batch = 32 is reasonable. The algorithm selection process will execute the real tensor size and select the shortest time-consuming one from over 6000 kernels. Thus, the time cost for 32 batch model will be approximately 32 times longer than the single batch.
If you cannot stand over one hour cost for 32 batch. There are two ways to reduce time cost for preparing stage:
1 use '--quick-select' to skip algo selection.
2 reduce the dim size by '--dims', like '--dims 3_3_224_224'
from ppl.nn.
Thanks for the explaination.
from ppl.nn.
Related Issues (20)
- Slice op question HOT 1
- pplnn run mobilenet v2 model failed. (use cuda) HOT 7
- linux compile error protobuf static assertion failed HOT 3
- malloc_consolidate(): invalid chunk size HOT 2
- pplnn save-input 得到的NDARRAY的 shape不正确 HOT 1
- 如何使用cmake的将ppl.nn和依赖ppl.nn的代码一同编译? HOT 3
- Segmentation fault at ppl::nn::x86::X86Kernel::DumpOutputTensors HOT 5
- 获取模型推理结果(GetOutputs)耗时长 HOT 2
- Install Error HOT 1
- The compilation passed, but an error was reported in test phase HOT 2
- Floating point exception (core dumped) ? HOT 4
- 使用x86 engine运行resnet50 fp16 onnx模型 core dump
- (Ask) why InferInheritedType handle int8 to fp16 out? HOT 3
- Got wrong output shape when run a Gemm op(transB=0) use cuda HOT 4
- Crash with ONNX Split operator
- 关于全局engine,其他线程引用导致的性能下降问题 HOT 4
- 推理误差排查
- 多模型pipeline的示例
- ARM平台是否可以跑int8的推理
- cuda build error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppl.nn.