Giter Site home page Giter Site logo

triton-inference-server / model_analyzer Goto Github PK

View Code? Open in Web Editor NEW
402.0 13.0 75.0 6.24 MB

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

License: Apache License 2.0

Dockerfile 0.14% Smarty 0.09% Python 92.62% Shell 7.15%
deep-learning inference gpu performance-analysis

model_analyzer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

model_analyzer's Issues

ONNX model analyzer error: the provided PTX was compiled with an unsupported toolchain

Error log

2021-07-16 08:44:13.777 WARNING[entrypoint.py:232] Overriding the output model repo path "/tmp/model_output_repo/model_output"...
2021-07-16 08:44:13.779 INFO[entrypoint.py:112] Starting a Triton Server using docker...
2021-07-16 08:44:14.888 INFO[analyzer_state_manager.py:119] No checkpoint file found, starting a fresh run.
2021-07-16 08:44:14.888 INFO[analyzer.py:80] Profiling server only metrics...
2021-07-16 08:46:18.958 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-f6e2ae57-5841-b13b-c882-04fd5172d292 } for profiling.
2021-07-16 08:46:19.980 INFO[server_docker.py:128] Stopping triton server.
2021-07-16 08:46:20.614 INFO[run_search.py:292] Instance count set to 1, and dynamic batching is disabled.
2021-07-16 08:46:24.71 INFO[client.py:83] Model dpr_i0 load failed: [StatusCode.INVALID_ARGUMENT] load failed for model 'dpr_i0': version 1: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:115 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 222: the provided PTX was compiled with an unsupported toolchain. ; GPU=0 ; hostname=8c086208d7cc ; expr=cudaDeviceSynchronize();

;

2021-07-16 08:46:24.71 INFO[server_docker.py:128] Stopping triton server.
2021-07-16 08:46:30.301 INFO[client.py:83] Model dpr_i0 load failed: [StatusCode.INVALID_ARGUMENT] load failed for model 'dpr_i0': version 1: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:115 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 222: the provided PTX was compiled with an unsupported toolchain. ; GPU=0 ; hostname=6defe604979f ; expr=cudaDeviceSynchronize();

;

2021-07-16 08:46:30.301 INFO[server_docker.py:128] Stopping triton server.
^C2021-07-16 08:46:33.211 INFO[analyzer_state_manager.py:161] Received SIGINT 1/3. Will attempt to exit after current measurement.
^C^C2021-07-16 08:46:34.156 INFO[analyzer_state_manager.py:161] Received SIGINT 2/3. Will attempt to exit after current measurement.
2021-07-16 08:46:34.156 INFO[client.py:83] Model dpr_i0 load failed: [StatusCode.INVALID_ARGUMENT] load failed for model 'dpr_i0': version 1: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:115 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 222: the provided PTX was compiled with an unsupported toolchain. ; GPU=0 ; hostname=7c238636601e ; expr=cudaDeviceSynchronize();

;

2021-07-16 08:46:34.157 INFO[server_docker.py:128] Stopping triton server.
2021-07-16 08:46:34.804 INFO[analyzer_state_manager.py:145] Saved checkpoint to ./checkpoints/0.ckpt.
2021-07-16 08:46:34.804 INFO[analyzer.py:117] Finished profiling. Obtained measurements for models: [].
2021-07-16 08:46:34.804 INFO[server_docker.py:128] Stopping triton server.

model analyzer command
model-analyzer profile --config /config.yaml

model analyzer config

model_repository: /triton_models
batch_sizes: [1,2,4,8,16,32,64,128,256,512,1024]
concurrency:
    start: 1
    stop: 10
    step: 2
profile_models:
    - dpr
override_output_model_repository: true
triton_launch_mode: docker
gpus: all
client_protocol: grpc
triton_grpc_endpoint: localhost:8001
output_model_repository_path: /tmp/model_output_repo/model_output
triton_docker_image: nvcr.io/nvidia/tritonserver:21.06.1-py3
triton_server_flags:
        strict_model_config: False
        log_verbose: True

model-analyzer docker run command

docker run -it --rm --gpus 0 -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/model_output_repo:/tmp/model_output_repo -v /home/ubuntu/ml-stew/serving_benchmarker/model-analyzer.yaml:/config.yaml -v /home/ubuntu/ml-stew/serving_benchmarker/triton_models:/triton_models --net host nvcr.io/nvidia/tritonserver:21.06-py3-sdk

model-analyzer build command

docker pull nvcr.io/nvidia/tritonserver:21.06-py3-sdk

Model repo

triton_models
├── dpr
│   ├── 1
│   │   └── model.onnx
│   └── config.pbtxt
├── dpr-1
│   ├── 1
│   │   └── model.onnx
│   └── config.pbtxt
└── dpr.dvc

Model was converted from pytorch to onnx, conversion code

torch.onnx.export(
    context_encoder,
    sample_data.data,
    "context_encoder.onnx",
    export_params=True,
    opset_version=11,
    do_constant_folding=True,
    input_names=["input_ids","token_type_ids","attention_mask"],
    output_names=["output"],
    dynamic_axes={
        "input_ids": {0: "batch_size", 1: "sequence_length"},
        "token_type_ids": {0: "batch_size", 1: "sequence_length"},
        "attention_mask": {0: "batch_size", 1: "sequence_length"},
        "output": {0: "batch_size"},
    },
)

Running the triton inference server works fine on gpu, but the model analyzer fails and I've tried everything I could. Really appreciate if I can get some help here

Fail to analyze ensemble model: "inference.ModelConfig" should not have multiple "scheduling_choice" oneof fields

When I use model-analyzer to analyze a ensemble model with local luanch mode, it always fails with following error:

root@dl:/inference# model-analyzer profile --checkpoint-directory checkpoints -m $PWD/model_repo --profile-models quartznet-ensemble --output-model-repository-path=/output_repo/temp --override-output-model-repository --client-protocol grpc --run-config-search-max-concurrency 800 --run-config-search-max-instance-count 2 --run-config-search-max-preferred-batch-size 64

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/json_format.py", line 538, in _ConvertFieldValuePair
    raise ParseError('Message type "{0}" should not have multiple '
google.protobuf.json_format.ParseError: Message type "inference.ModelConfig" should not have multiple "scheduling_choice" oneof fields.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 315, in main
    analyzer.profile(client=client)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 104, in profile
    self._model_manager.run_model(model=model)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 84, in run_model
    self._run_model_with_search(model)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 138, in _run_model_with_search
    self._run_model_config_sweep(model, search_model_config=True)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 167, in _run_model_config_sweep
    self._run_config_generator.generate_run_config_for_model_sweep(
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 98, in generate_run_config_for_model_sweep
    model_config = ModelConfig.create_from_dictionary(
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/model/model_config.py", line 117, in create_from_dictionary
    protobuf_message = json_format.ParseDict(model_dict,
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/json_format.py", line 454, in ParseDict
    parser.ConvertMessage(js_dict, message)
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/json_format.py", line 485, in ConvertMessage
    self._ConvertFieldValuePair(value, message)
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/json_format.py", line 599, in _ConvertFieldValuePair
    raise ParseError(str(e))
google.protobuf.json_format.ParseError: Message type "inference.ModelConfig" should not have multiple "scheduling_choice" oneof fields.

The model repository I used can be downloaded here.

Cannot access reports locally!

Hello,
I have just started using model_analyzer, so I am kind of confused. I use the instructions provided in Quickstart session. I am running Model Analyzer inside a Docker container. Specifically, I run the following commands:

docker run -it --rm --gpus all \
        -v /var/run/docker.sock:/var/run/docker.sock \
        -v $HOME/model_analyzer/examples/quick-start:/quick_start_repository \ 
        -v $HOME/model_analyzer/output_model_repository:/output_model_repository
        --net=host --name model-analyzer \
        model-analyzer /bin/bash
  • model-analyzer profile -m /quick_start_repository/ --profile-models add_sub
  • mkdir analysis_results
  • model-analyzer analyze --analysis-models add_sub -e analysis_results

It seems that everything works well, but the generated reports (the whole analysis_results directory) can only be seen inside the docker and they do not appear in any local path.

What is the way to export them locally?
Thanks in advance.

ERROR:Fork support is only compatible with the epoll1 and poll polling strategies

version : 21.11
install : Building the Dockerfile
When I followed the quick_start,I met this error.
Did this error affect model analyzer work?

2022-03-12 03:55:30.802 INFO[run_search.py:292] [Search Step] Concurrency set to 8. Instance count set to 4, and preferred batch size is set to 2.
E0312 03:55:30.807687018     776 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2022-03-12 03:55:30.826 INFO[server_local.py:99] Triton Server started.
2022-03-12 03:55:32.987 INFO[client.py:83] Model add_sub_i18 loaded.
2022-03-12 03:55:32.988 INFO[model_manager.py:221] Profiling model add_sub_i18...
E0312 03:55:33.000142416     776 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2022-03-12 03:55:44.109 INFO[server_local.py:120] Stopped Triton Server.
2022-03-12 03:55:44.110 INFO[run_search.py:292] [Search Step] Concurrency set to 16. Instance count set to 4, and preferred batch size is set to 2.
E0312 03:55:44.115560469     776 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2022-03-12 03:55:44.134 INFO[server_local.py:99] Triton Server started.
2022-03-12 03:55:46.296 INFO[client.py:83] Model add_sub_i18 loaded.
2022-03-12 03:55:46.297 INFO[model_manager.py:221] Profiling model add_sub_i18...
E0312 03:55:46.309274939     776 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2022-03-12 03:55:56.371 INFO[server_local.py:120] Stopped Triton Server.
2022-03-12 03:55:56.372 INFO[run_search.py:292] [Search Step] Concurrency set to 32. Instance count set to 4, and preferred batch size is set to 2.

Not able to load models with custom plugins

How do we pass custom plusings to load the model in the model analyzer?

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /home/ubuntu/mount-disk/sec_models/:/models -v /home/ubuntu/mount-disk/plugins/:/plugins  --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash 

Running the below command inside the docker

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1

Facing this issue while running the above command

Error

2021-01-20 05:12:59.773 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 

'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 

'local', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-20 05:12:59.780 INFO[entrypoint.py:94] Starting a local Triton Server...
2021-01-20 05:12:59.788 INFO[server_local.py:62] Triton Server started.
2021-01-20 05:13:00.800 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-20 05:13:00.801 INFO[driver.py:236] init
2021-01-20 05:13:01.946 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-20 05:13:01.946 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-20 05:13:02.968 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-20 05:13:27.431 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.INVALID_ARGUMENT] load failed for model 'yolo1': version 1: Internal: unable to create TensorRT engine;
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
    self._client.load_model(model.name())
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] load failed for model 'yolo1': version 1: Internal: unable to create TensorRT engine;


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
    run_analyzer(config, analyzer, client, run_configs)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
    client.load_model(model=model)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
    f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INVALID_ARGUMENT] load failed for model 'yolo1': version 1: Internal: unable to create TensorRT engine;

2021-01-20 05:13:27.433 INFO[server_local.py:71] Triton Server stopped.
2021-01-20 05:13:28.251 INFO[server_local.py:80] Triton Server stopped.

Also, I am able to load an onnx model that doesn't require a plugin

How do we load a required versioned docker in --triton-version, that is 20.09-py3

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /home/ubuntu/mount-disk/sec_models/:/models -v /home/ubuntu/mount-disk/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash

Inside docker

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1  --triton-launch-mode docker --triton-version nvcr.io/nvidia/tritonserver:20.09-py3

Error

2021-01-21 09:30:39.160 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'docker', 'triton_version': 'nvcr.io/nvidia/tritonserver:20.09-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-21 09:30:39.166 INFO[entrypoint.py:105] Starting a Triton Server using docker...
2021-01-21 09:30:39.166 INFO[driver.py:236] init
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 980, in send
self.connect()
File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 531, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 980, in send
self.connect()
File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 214, in _retrieve_server_version
return self.version(api_version=False)["ApiVersion"]
File "/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py", line 181, in version
return self._result(self._get(url), json=True)
File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 237, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 372, in main
client, server = get_triton_handles(config)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 207, in get_triton_handles
server = get_server_handle(config)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 109, in get_server_handle
gpus=get_analyzer_gpus(config))
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/server/server_factory.py", line 40, in create_server_docker
return TritonServerDocker(image=image, config=config, gpus=gpus)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/server/server_docker.py", line 48, in init
self._docker_client = docker.from_env()
File "/usr/local/lib/python3.6/dist-packages/docker/client.py", line 101, in from_env
**kwargs_from_env(**kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/client.py", line 45, in init
self.api = APIClient(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 197, in init
self._version = self._retrieve_server_version()
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 222, in _retrieve_server_version
'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Also, I tried 20.09-py3 and 20.09 for --triton-version Still the same error.

How to get gpu_used_memory in perf_analyzer

Hi,
Currently I'm working with the model analyzer and I need to find the GPU allocated memory for the models. However, in the latency-report-file only the latency reports are existed. Is there any way that we can also report the gpu metrics?

My concern is about the models with dynamic input shape and I need to find the latency and also gpu allocated memory for different input lengths.

Error: Must specify at least one target to fetch or execute

Hi!

I'm having an error when trying to analyze my model. I'm running the docker container using the following command:
sudo docker run -it --rm --gpus all -v/var/run/docker.sock:/var/run/docker.sock -v/home/XXX/MODELS/triton/:/models -v/home/XXX/results:/results memory-analyzer model-analyzer --batch 1 --concurrency 1,2,4 --model-names 210222_26_damages --triton-version 20.12-py3 --model-repository=/models --export --export-path /results/

Here's the stdout output:

2021-05-04 19:15:02.640 INFO[entrypoint.py:338] Triton Model Analyzer started Namespace(batch_sizes='1', client_protocol='grpc', concurrency='1,2,4', duration_seconds=5, export=True, export_path='/results/', filename_model='metrics-model.csv', filename_server_only='metrics-server-only.csv', gpus=['all'], log_level='INFO', max_retries=100, model_names='210222_26_damages', model_repository='/models', monitoring_interval=0.01, perf_analyzer_path='perf_analyzer', triton_grpc_endpoint='localhost:8001', triton_http_endpoint='localhost:8000', triton_launch_mode='local', triton_metrics_url='http://localhost:8002/metrics', triton_server_path='tritonserver', triton_version='20.12-py3') arguments
2021-05-04 19:15:02.642 INFO[entrypoint.py:93] Starting a local Triton Server...
2021-05-04 19:15:02.645 INFO[server_local.py:61] Triton Server started.
2021-05-04 19:15:03.656 INFO[entrypoint.py:208] Triton Server is ready.
2021-05-04 19:15:03.656 INFO[driver.py:236] init
2021-05-04 19:15:04.700 INFO[entrypoint.py:353] Starting perf_analyzer...
2021-05-04 19:15:04.700 INFO[analyzer.py:87] Profiling server only metrics...
2021-05-04 19:15:05.721 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-0df7cc6b-cb4a-937c-08dc-eb6c9b19780c } for the analysis.
2021-05-04 19:15:14.718 INFO[client.py:83] Model 210222_26_damages loaded.
2021-05-04 19:15:14.718 INFO[analyzer.py:124] Profiling model 210222_26_damages...
2021-05-04 19:15:15.739 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-0df7cc6b-cb4a-937c-08dc-eb6c9b19780c } for the analysis.
2021-05-04 19:15:21.927 INFO[client.py:107] Model 210222_26_damages unloaded.
2021-05-04 19:15:21.927 ERROR[entrypoint.py:357] Model Analyzer encountered an error: Running perf_analyzer with ['perf_analyzer', '-m', '210222_26_damages', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '--concurrency-range', '1'] failed with exit status 1 : *** Measurement Settings ***
  Batch size: 1
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Must specify at least one target to fetch or execute.

2021-05-04 19:15:21.927 INFO[server_local.py:70] Triton Server stopped.

In case it is relevant, here's my model configuration:

name: "210222_26_damages"
platform: "tensorflow_savedmodel"
max_batch_size: 64
dynamic_batching {
  preferred_batch_size: [ 1,2,4,8,16,32,64 ]
  max_queue_delay_microseconds: 30000

}
version_policy: { latest { num_versions : 1 }}
optimization { execution_accelerators {
  gpu_execution_accelerator : [ {
    name : "auto_mixed_precision"}]
}}
input [
    {
    	  name: "image_tensor_input"
    	  data_type: TYPE_FP32
    	  dims: [1024, 1024, 3]
    }
]

What am I missing?

Thanks!

Quick start instructions not working

$ git log --format="%H" -n 1
d9cfe0dec18aaa70a760a14e06c019d4f81a3070
$ docker build . -t triton_modelanalyzer
$ docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v `pwd`/examples/quick-start:/workspace/examples triton_modelanalyzer bash
$ model-analyzer -m /workspace/examples/ -n add_sub
2021-01-26 19:15:46.221 INFO[entrypoint.py:378] Triton Model Analyzer started: config={'model_repository': '/workspace/examples/', 'model_names': ['add_sub'], 'objectives': [], 'constraints': {}, 'batch_sizes': 1, 'concurrency': 1, 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'local', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-26 19:15:46.223 INFO[entrypoint.py:91] Starting a local Triton Server...
2021-01-26 19:15:46.227 INFO[server_local.py:63] Triton Server started.
2021-01-26 19:15:47.232 INFO[entrypoint.py:207] Triton Server is ready.
2021-01-26 19:15:47.233 INFO[driver.py:236] init
2021-01-26 19:15:50.387 INFO[server_local.py:80] Triton Server stopped.
Traceback (most recent call last):
  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 390, in main
    run_configs = create_run_configs(config)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 237, in create_run_configs
    param_combinations = list(product(*tuple(sweep_params.values())))
TypeError: 'int' object is not iterable

Starting Triton inside the container works fine tritonserver --model-repository /workspace/examples/

AWS EC2 hangs

Hey I have been trying to run the model analyzer to get the best configuration for my model on my ec2 machine (T4, 16gb memory) but it just hangs after a while. But it works if I reduce the batch size. Is this because of issues wrt to memory? The size of the image that my model accepts is 720x1280.

No analysis found with add_sub model

2021-07-31 02:57:26.573 INFO[server_docker.py:145] Stopping triton server.
2021-07-31 02:57:28.733 INFO[analyzer_state_manager.py:146] Saved checkpoint to ./checkpoints/0.ckpt.
2021-07-31 02:57:28.733 INFO[analyzer.py:115] Finished profiling. Obtained measurements for models: [].
2021-07-31 02:57:28.733 INFO[server_docker.py:145] Stopping triton server.

From above can see that when i run the model-analyzer analyze, I get empty and the data is not found for model add_sub.
How can i solve the errors and get all the metrics

First ,
docker run -it --rm --gpus all
-v /var/run/docker.sock:/var/run/docker.sock
-v {pwd}/model_analyzer/examples/quick-start:/quick_start_repository
-v {pwd}/model_analyzer:{pwd}/model_analyzer
-w {pwd}/model_analyzer
--net=host --name model-analyzer
model-analyzer /bin/bash

second,
model-analyzer profile -m /quick_start_repository/ --profile-models add_sub --override-output-model-repository --triton-launch-mode=docker --triton-docker-image nvcr.io/nvidia/tritonserver:21.02-py3

lastly and i get error ,
model-analyzer analyze --analysis-models add_sub -e analysis_results

image

Quick start out of date

Hi folks.

I'm trying to run the quick start instructions https://github.com/triton-inference-server/model_analyzer/blob/main/docs/quick_start.md

I have successfully installed triton server and the model_analyzer from the docker file

But I'm getting an error at this step

$ model-analyzer -m /quick_start_repository -n add_sub --triton-launch-mode=local --export-path=analysis_results

Where the error looks like

2021-04-30 22:06:37.785 INFO[client.py:82] Model add_sub_i34 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled 2021-04-30 22:06:37.785 INFO[analyzer_state_manager.py:140] Saved checkpoint to analysis_results/checkpoints/0.ckpt.

And then finally fails at this step

File "/usr/local/bin/model-analyzer", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 233, in main analyzer.run() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 116, in run self._result_manager.collect_and_sort_results( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/result/result_manager.py", line 309, in collect_and_sort_results result_dict = results[model_name] KeyError: 'add_sub'

analysis_results/ and output_model_repository seem to be correctly filled out with directories for the results and variationss of add_sub respectively

Here is the full log error https://gist.github.com/msaroufim/912ab9a5ae17b5ed444bf790ead0612e

broken symlinks when using relative path for output_model_repository

When providing a relative path for output_model_repository the symlinks gets broken like in the following example:
with output_model_repository_path: ./output-model-repository the model-analyzer generates the following directory structure,

output_model_repository/
├── encoder_i0
│   ├── 1 -> ./output_model_repository/encoder_i1/1
│   └── config.pbtxt
└── encoder_i1
    ├── 1
    │   └── model.py
    └── config.pbtxt

You can see from the symlink inside encoder_i0, it is pointing to ./output_model_repository/encoder_i1/1 which is output_model_repository/encoder_i0/1/output_model_repository/encoder_i1/1 which doesn't exist and you get the following error
2021-10-26 12:35:40.491 INFO[client.py:85] Model encoder_i0 load failed: load failed for model 'encoder_i0': failed to stat file ./output_model_repository/encoder_i0/1

I believe we want the symlink to point to ../encoder_i1/1

Unable to fetch pre-built model_analyzer docker container

It's mentioned here that we can pull a pre-built container using docker pull nvcr.io/nvidia/clara/model-analyzer:latest command. However, this command always fail with a message "Error response from daemon: unauthorized: authentication required" I've tried finding the container here and it doesn't seems to appear.

Error when run model-analyzer in docker mode

version : 21.11
install : Triton SDK Container
I'm trying to run model-analyzer in quick start but it is failing with the following error:

2022-03-12 04:48:52.559 ERROR[entrypoint.py:214] Model Analyzer encountered an error: Failed to set the value for field "triton_server_path". Error: Either the binary '/opt/tritonserver/' is not on the PATH, or Model Analyzer does not have permissions to execute os.stat on this path.

I have set the triton_server_path field to a binary file that exists . It did not work.
This is my command :

model-analyzer profile -m /workspace/model_analyzer/examples/quick-start/ --profile-models add_sub --triton-launch-mode=docker --output-model-repository=/home/outputmodel/output --triton-server-path=/opt/tritonserver/

I don't want to change the version. Can anyone help me ?

perf_analyzer: unrecognized option '--verbose-csv'

Hey! I'm new to model analyzer.

I was following the quick start guide mentioned at https://github.com/triton-inference-server/model_analyzer/blob/main/docs/quick_start.md .

Upon profiling the sample model, the model-analyzer gives the following output:

Profile complete. Profiled 0 configurations for models: [].

When i try to have a look at the output above i see this error:
INFO[perf_analyzer.py:264] Running perf_analyzer ['perf_analyzer', '-m', 'add_sub_config_8', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'add_sub_config_8', '--verbose-csv', '--concurrency-range', '1', '--measurement-mode', 'count_windows'] failed with exit status 1 : perf_analyzer: unrecognized option '--verbose-csv'

Not able to provide <perf-analyzer-flags> 'shape' for perf_analyzer in config.yaml, results in termination

Hi,

I have added according to perf-analyzer-flags two flags to the perf_analyzer as shown below since I have a model with dynamic input shapes. --shape images:3,640,640 works when used with perf_analyzer separately.

But adding to my config.yaml:

perf_analyzer_flags: percentile: 95 shape: 'images:3,640,640'

or:
perf_analyzer_flags: percentile: 95 shape: images:3,640,640

Results in a termination:
ERROR[perf_analyzer.py:164] perf_analyzer was terminated by signal: SIGABRT

But when I remove the shape flag, this error is gone, but I still need to provide input shape since:
failed with exit status 1 : error: failed to create concurrency manager: input images contain a dynamic shape, provide shapes to send along with the request

So I am not sure how to add images:3,640,640 to the shape flag.

Model-analyser, on remote host and docker.

Can you please, help me with this issue?

I have generated the models that works with triton-20.09 in a stand alone Triton-inference-server container. I have built the models-analyser and it by default supports triton-inference 20.11. While, I am passing models and plugins that are generated in 20.09 it is giving me an error when loading with model-analyser since, model-analyser supports 20.11. On the other side, when I am generating models and plugins with trt ngc container-20.11 and loading in 20.11 model-analyser I am able to run the model-analyser without any issue. My requirement is to load the models and plugins in model-analyser that are generated for 20.09.

Running

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash

The models and plugins that are given in the above command, are generated for 20.09-py3. The models are loaded fine with 20.09-py3 triton inference server.

Command inside the docker.

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3

Error

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3
2021-01-23 19:39:10.854 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'docker', 'triton_version': '20.09-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:39:10.859 INFO[entrypoint.py:105] Starting a Triton Server using docker...
2021-01-23 19:39:10.859 INFO[driver.py:236] init
2021-01-23 19:39:13.687 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:39:14.714 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:39:14.714 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:39:15.737 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:39:21.852 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
    self._client.load_model(model.name())
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] failed to load 'yolo1', no version is available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
    run_analyzer(config, analyzer, client, run_configs)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
    client.load_model(model=model)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
    f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
2021-01-23 19:39:21.854 INFO[server_docker.py:128] Stopping triton server.

Also how do we Run docker on remote mode

Stand-alone-inference server-20.09-py3

sudo docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v/home/ubuntu/cuda/sec_models:/models -v/home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" bdb0cbe1c039 tritonserver --model-repository=/models --grpc-infer-allocation-pool-size=512 --log-verbose 1

op

I0123 19:44:29.564053 1 grpc_server.cc:2078] Thread started for ModelStreamInferHandler
I0123 19:44:29.564070 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
I0123 19:44:29.564351 1 http_server.cc:2705] Started HTTPService at 0.0.0.0:8000
I0123 19:44:29.605837 1 http_server.cc:2724] Started Metrics Service at 0.0.0.0:8002

Model-analyser command

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash

inside docker

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode remote --triton-grpc-endpoint localhost:8001
2021-01-23 19:53:10.191 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'remote', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:53:10.197 INFO[entrypoint.py:84] Using remote Triton Server...
2021-01-23 19:53:10.199 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:53:10.199 INFO[driver.py:236] init
2021-01-23 19:53:11.299 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:53:11.299 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:53:12.323 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:53:18.438 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
    self._client.load_model(model.name())
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
    run_analyzer(config, analyzer, client, run_configs)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
    client.load_model(model=model)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
    f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
root@tensorgo-rppg:/opt/triton-model-analyzer# 


Error when run model-analyzer in remote mode

I'm trying to run model-analyzer remotely for my custom model but it is failing with the following error:

2021-12-03 20:16:15.587 ERROR[entrypoint.py:214] Model Analyzer encountered an error: Failed to set the value for field "triton_server_path". Error: Either the binary 'tritonserver' is not on the PATH, or Model Analyzer does not have permissions to execute os.stat on this path.

I cannot specify the triton_server_path because I use remote mode model-analyzer and run tritonserver and model-analyzer in different containers.
I do next steps to reproduce the error:

  1. Run tritonserver container on custom ports.
  2. Run sdk container and then run command:

Singularity> model-analyzer profile --model-repository=/models --profile-models=detector_trt --triton-launch-mode=remote --triton-http-endpoint=localhost:5000 --triton-grpc-endpoint=localhost:5001 --triton-metrics-url=localhost:5002

I use singularity to run containers but it does’t matter.

when install model_analyzer raise Error

E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/c/cheese/libcheese-gtk25_3.34.0-1ubuntu1_amd64.deb 502 Server UnReachable [IP: 91.189.88.142 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/u/ubuntu-docs/ubuntu-docs_20.04.3_all.deb 502 Server UnReachable [IP: 91.189.88.152 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gnome-user-docs/gnome-user-docs_3.36.2+git20200704-0ubuntu0.1_all.deb 502 Server UnReachable [IP: 91.189.88.152 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/gst-plugins-base1.0/gstreamer1.0-gl_1.16.2-4ubuntu0.1_amd64.deb 502 Server UnReachable [IP: 91.189.88.152 80]

I think this because install some InRelease sofeware. I use --fix-missing to ignore,but may occur some bug.

Can give a new version Docker to fix this problem?

quickstart example failed `no version is available`

background

Trying to run model_analyser as in quick start.
I have the same issue as in previous issue, though the explained solution did not work for me.

Steps:

  1. run docker with mapping all directories:
docker run -it --rm --gpus all         -v /var/run/docker.sock:/var/run/docker.sock         -v $HOME/model_analyzer/examples/quick-start:/quick_start_repository    -v /home/shai/output:/home/shai/output     --net=host --name model-analyzer   nvcr.io/nvidia/tritonserver:21.08-py3-sdk /bin/bash
  1. in container bash, run model_analyser profile:
model-analyzer profile -m /quick_start_repository/ --profile-models add_sub --triton-launch-mode=docker --output-model-repository /home/shai/output/model_output --override-output-model-repository

results:

2021-09-09 09:15:32.428 INFO[entrypoint.py:113] Starting a Triton Server using docker...
2021-09-09 09:15:33.561 INFO[analyzer_state_manager.py:130] No checkpoint file found, starting a fresh run.
2021-09-09 09:15:33.561 INFO[analyzer.py:81] Profiling server only metrics...
2021-09-09 09:15:38.975 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-52faa4b8-dd4c-0ab4-6676-9f39263202b8 } for profiling.
2021-09-09 09:15:39.998 INFO[server_docker.py:162] Stopping triton server.
2021-09-09 09:15:43.250 INFO[model_manager.py:87] Running auto config search for model: add_sub
2021-09-09 09:15:43.251 INFO[run_search.py:144] Will sweep both the concurrency and model config parameters...
2021-09-09 09:15:43.251 INFO[run_search.py:290] [Search Step] Concurrency set to 1. Instance count set to 1, and dynamic batching is disabled.
2021-09-09 09:15:46.61 INFO[client.py:80] Model add_sub_i0 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i0', no version is available
2021-09-09 09:15:46.62 INFO[server_docker.py:162] Stopping triton server.
2021-09-09 09:15:47.685 INFO[run_search.py:290] [Search Step] Concurrency set to 1. Instance count set to 2, and dynamic batching is disabled.
2021-09-09 09:15:50.555 INFO[client.py:80] Model add_sub_i1 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i1', no version is available
2021-09-09 09:15:50.555 INFO[server_docker.py:162] Stopping triton server.
2021-09-09 09:15:52.106 INFO[run_search.py:290] [Search Step] Concurrency set to 1. Instance count set to 3, and dynamic batching is disabled.
2021-09-09 09:15:54.989 INFO[client.py:80] Model add_sub_i2 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i2', no version is available
2021-09-09 09:15:54.989 INFO[server_docker.py:162] Stopping triton server.

and no profiling found

file system loop detected

Hello, what might be the reason of following error?

$ docker run -it --rm --gpus all         -v /var/run/docker.sock:/var/run/docker.sock         -v $HOME/model_analyzer/examples/quick-start:/quick_start_repository         --net=host --name model-analyzer         model-analyzer /bin/bash
=============================
=== Triton Model Analyzer ===
=============================
NVIDIA Release 21.10 (build 28453983)
Copyright (c) 2020-2021, NVIDIA CORPORATION.  All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
find: File system loop detected; '/usr/bin/X11' is part of the same file system loop as '/usr/bin'.

failed to load 'add_sub_ixx', no version is available

I run quick start sample in docker launch mode, but aways get following exceptions:

root@dl:/inference/model_analyzer/examples# model-analyzer profile -m $PWD/quick-start/ --profile-models add_sub --triton-launch-mode docker --override-output-model-repository

2021-06-10 06:44:29.890 WARNING[entrypoint.py:288] Overriding the output model repo path "./output_model_repository"...
2021-06-10 06:44:29.894 INFO[entrypoint.py:111] Starting a Triton Server using docker...
2021-06-10 06:44:37.172 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:44:38.363 INFO[analyzer_state_manager.py:103] Loaded checkpoint from file ./0.ckpt
2021-06-10 06:44:38.364 INFO[analyzer.py:82] Profiling server only metrics...
2021-06-10 06:44:43.623 INFO[gpu_monitor.py:72] Using GPU(s) with UUID(s) = { GPU-4bea21a5-cc1a-c29f-2e2e-206c9b866a5f } for profiling.
2021-06-10 06:44:44.649 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:44:45.879 INFO[run_search.py:146] Will sweep both the concurrency and model config parameters...
2021-06-10 06:44:45.879 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 1, and dynamic batching is disabled.
2021-06-10 06:44:50.763 INFO[client.py:82] Model add_sub_i0 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i0', no version is available
2021-06-10 06:44:50.763 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:44:51.983 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 2, and dynamic batching is disabled.
2021-06-10 06:44:56.424 INFO[client.py:82] Model add_sub_i1 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i1', no version is available
2021-06-10 06:44:56.424 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:44:57.777 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 3, and dynamic batching is disabled.
2021-06-10 06:45:02.432 INFO[client.py:82] Model add_sub_i2 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i2', no version is available
2021-06-10 06:45:02.432 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:03.712 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 4, and dynamic batching is disabled.
2021-06-10 06:45:08.401 INFO[client.py:82] Model add_sub_i3 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i3', no version is available
2021-06-10 06:45:08.402 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:09.645 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 5, and dynamic batching is disabled.
2021-06-10 06:45:14.161 INFO[client.py:82] Model add_sub_i4 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i4', no version is available
2021-06-10 06:45:14.162 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:15.452 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 1, and dynamic batching is enabled.
2021-06-10 06:45:20.223 INFO[client.py:82] Model add_sub_i5 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i5', no version is available
2021-06-10 06:45:20.223 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:21.523 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 2, and dynamic batching is enabled.
2021-06-10 06:45:26.374 INFO[client.py:82] Model add_sub_i6 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i6', no version is available
2021-06-10 06:45:26.374 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:27.612 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 3, and dynamic batching is enabled.
2021-06-10 06:45:32.498 INFO[client.py:82] Model add_sub_i7 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i7', no version is available
2021-06-10 06:45:32.498 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:33.775 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 4, and dynamic batching is enabled.
2021-06-10 06:45:38.674 INFO[client.py:82] Model add_sub_i8 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i8', no version is available
2021-06-10 06:45:38.675 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:39.847 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 5, and dynamic batching is enabled.
2021-06-10 06:45:44.549 INFO[client.py:82] Model add_sub_i9 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i9', no version is available
2021-06-10 06:45:44.549 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:45.786 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 1, and preferred batch size is set to 1.
2021-06-10 06:45:50.210 INFO[client.py:82] Model add_sub_i10 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i10', no version is available
2021-06-10 06:45:50.210 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:51.510 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 2, and preferred batch size is set to 1.
2021-06-10 06:45:56.361 INFO[client.py:82] Model add_sub_i11 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i11', no version is available
2021-06-10 06:45:56.361 INFO[server_docker.py:128] Stopping triton server.
2021-06-10 06:45:57.734 INFO[run_search.py:289] Concurrency set to 1. Instance count set to 3, and preferred batch size is set to 1.
2021-06-10 06:46:02.310 INFO[client.py:82] Model add_sub_i12 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i12', no version is available
2021-06-10 06:46:02.310 INFO[server_docker.py:128] Stopping triton server.

Unable to analyze DistilBERT base model with model analyzer

Issue

Hello, my team uses DistilBERT base model(uncased) as base model for development. I would like to use model analyzer to check the model's performance and I'm able to make sure the analyzer loads the model with triton. But after stuck for a while, there's error saying "perf_analyzer took very long to exit, killing perf_analyzer...".

2021-10-06 21:50:03.203 INFO[entrypoint.py:117] Starting a Triton Server using docker...
2021-10-06 21:50:03.237 INFO[analyzer_state_manager.py:120] Loaded checkpoint from file ./checkpoints/0.ckpt
2021-10-06 21:50:03.238 INFO[analyzer.py:104] Profiling server only metrics...
2021-10-06 21:50:04.845 INFO[server_docker.py:135] Triton Server started.
2021-10-06 21:50:12.844 INFO[server_docker.py:181] Stopped Triton Server.
2021-10-06 21:50:12.846 INFO[model_manager.py:91] Running auto config search for model: test
2021-10-06 21:50:12.847 INFO[run_search.py:146] Will sweep both the concurrency and model config parameters...
2021-10-06 21:50:12.847 INFO[run_search.py:292] [Search Step] Concurrency set to 1. Instance count set to 1, and dynamic batching is disabled.
2021-10-06 21:50:14.575 INFO[server_docker.py:135] Triton Server started.
2021-10-06 21:50:21.396 INFO[client.py:83] Model test_i0 loaded.
2021-10-06 21:50:21.404 INFO[model_manager.py:221] Profiling model test_i0...
2021-10-06 21:50:21.404 WARNING[metrics_manager.py:179] CPU metric(s) are being collected.
2021-10-06 21:50:21.404 WARNING[metrics_manager.py:180] Collecting CPU metric(s) can affect the latency or throughput numbers reported by perf analyzer.
2021-10-06 21:50:21.405 INFO[metrics_manager.py:183] CPU metric(s) collection can be disabled by removing the CPU metrics (e.g. cpu_used_ram) from the --metrics flag.
2021-10-06 22:00:22.817 INFO[perf_analyzer.py:214] perf_analyzer took very long to exit, killing perf_analyzer...
2021-10-06 22:00:24.535 INFO[server_docker.py:181] Stopped Triton Server.
2021-10-06 22:00:24.540 INFO[run_search.py:292] [Search Step] Concurrency set to 1. Instance count set to 2, and dynamic batching is disabled.
2021-10-06 22:00:25.572 INFO[server_docker.py:135] Triton Server started.
2021-10-06 22:00:33.417 INFO[client.py:83] Model test_i1 loaded.
...

It happens to all the variations of the config file that the analyzer automatically generates. Any idea why and how to fix this issue?

To reproduce

  1. Use the following code to download the model and convert it to TorchScript model through tracing, inside NGC pytorch container(version 21.9).
import torch
from transformers import DistilBertForSequenceClassification, DistilBertTokenizerFast

model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
model.config.return_dict = False

inputs = ["test"]
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased",
                                                    model_max_length=512,
                                                    torchscript=True)

encodings = tokenizer(inputs, truncation=True, padding=True, return_tensors="pt")

traced_model = torch.jit.trace(model, (encodings["input_ids"], encodings["attention_mask"]))
torch.jit.save(traced_model, "model.pt")
  1. Run triton sdk container
docker run -it \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v /home/workspace/model_repository_dev:/models \
      -v /home/workspace/output:/home/workspace/output \
      --net=host nvcr.io/nvidia/tritonserver:21.09-py3-sdk
  1. Inside the container, run model analyzer
model-analyzer profile -m /models/ --config-file="/models/config.yaml" --triton-launch-mode=docker --override-output-model-repository --output-model-repository-path="/home/workspace/output/model_output" --collect-cpu-metrics=true

Here is the content of the config.yaml

model_repository: /models/
profile_models:
  test:
    cpu_only: true

Here is the content of the config.pbtxt

name: "test"
platform: "pytorch_libtorch"
input [
  {
    name: "INPUT__0"
    data_type: TYPE_INT32
    dims: [1, 512]
  },
  {
    name: "INPUT__1"
    data_type: TYPE_INT32
    dims: [1, 512]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [1, 662]
  }
]

We currently do not have GPU resources available so we are expecting to do the testing on CPU

Error when start triton server.

@igobypenn
The model.plan is convert in 20.11.py3 container, I can start the model with 20.11.py3 container,but can not load models with model_analyzer container.

Notice: I modify some code to print the error message.

when I run model-analyzer -m /workspace/examples/ -n fcos_lift_6c

`root@2fb036737861:/workspace/examples/fcos_lift_6c# model-analyzer -m /workspace/examples/ -n fcos_lift_6c --triton-launch-mode local
2020-12-29 11:41:44.849 INFO[entrypoint.py:360] Triton Model Analyzer started Namespace(batch_sizes='1', client_protocol='grpc', concurrency='1', duration_seconds=5, export=False, export_path='.', filename_model='metrics-model.csv', filename_server_only='metrics-server-only.csv', gpus=['all'], log_level='INFO', max_retries=100, model_names='fcos_lift_6c', model_repository='/workspace/examples/', monitoring_interval=0.01, perf_analyzer_path='perf_analyzer', triton_grpc_endpoint='localhost:8001', triton_http_endpoint='localhost:8000', triton_launch_mode='local', triton_metrics_url='http://localhost:8002/metrics', triton_output_path=None, triton_server_path='tritonserver', triton_version='20.11-py3') arguments
2020-12-29 11:41:44.854 INFO[entrypoint.py:93] Starting a local Triton Server...
2020-12-29 11:41:44.863 INFO[server_local.py:62] Triton Server started.
2020-12-29 11:41:44.867 INFO[entrypoint.py:208] Triton Server is ready.
2020-12-29 11:41:44.868 INFO[driver.py:236] init
2020-12-29 11:41:45.982 INFO[analyzer.py:87] Profiling server only metrics...
2020-12-29 11:41:47.20 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-a09e2347-cecb-8a34-a15d-50641a837031 } for the analysis.
Traceback (most recent call last):
File "/opt/triton-model-analyzer/model_analyzer/triton/client/client.py", line 79, in load_model
self._client.load_model(model.name())
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/init.py", line 555, in load_model
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/init.py", line 61, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] failed to load 'fcos_lift_6c', no version is available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 33, in
sys.exit(load_entry_point('nvidia-triton-model-analyzer', 'console_scripts', 'model-analyzer')())
File "/opt/triton-model-analyzer/model_analyzer/entrypoint.py", line 366, in main
run_analyzer(args, analyzer, client, run_configs)
File "/opt/triton-model-analyzer/model_analyzer/entrypoint.py", line 318, in run_analyzer
client.load_model(model=model)
File "/opt/triton-model-analyzer/model_analyzer/triton/client/client.py", line 82, in load_model
f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INTERNAL] failed to load 'fcos_lift_6c', no version is available`

Dose model analyzer tool support Jetson nano?

the model analyzer is amazing tool. it is very useful when we do some model performance analysis.
As I am deploying triton inference server on Jetson nano and window 10, So I try to install it on my jetson nano board. and it always failed.
So I want to check if it support Jetson nano? and if it support window 10?
Thanks a lot in advance!

Can't run .pb file on model_analyzer

Hello, I'm new at using model_analyzer.

I got a result of model_analyzer by doing QuickStart example, but I was not able to get the result on .pb file model especially.
I also check that I can execute perf-analyzer with these models. However, it is not possible to run in model-analyzer with the same setting.

Additionally, I have a question about the results of model analyzer.

  1. First, I run these codes to see the results of QuickStart.
    (1) Run the Triton Inference Server first
    docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/model_analyzer/examples/quick-start:/home/model_analyzer/examples/quick-start nvcr.io/nvidia/tritonserver:20.11-py3 tritonserver --model-control-mode=explicit --model-repository=/home/model_analyzer/examples/quick-start/
    (2) Open the model-analyzer container on the docker
    docker run -it --privileged --rm --gpus all \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /home/model_analyzer/examples/quick-start:/home/model_analyzer/examples/quick-start \ --net=host --name model-analyzer \ model-analyzer /bin/bash
    (3)
    model-analyzer -m /home/model_analyzer/examples/quick-start -n add_sub --triton-launch-mode=remote --export-path=analysis_results

(4) Results
Server Only:
Model GPU ID GPU Memory Usage (MB) GPU Utilization (%) GPU Power Usage (W)
triton-server 0 626.0 0.0 15.0

Models (Inference):
Model Batch Concurrency Model Config Path Instance Group Dynamic Batcher Sizes Satisfies Constraints Throughput (infer/sec) p99 Latency (ms) RAM Usage (MB)
add_sub 1 16 add_sub 1/GPU Disabled Yes 12387.4 1.5 0.0
add_sub 1 4 add_sub 1/GPU Disabled Yes 12164.8 0.4 0.0
add_sub 1 8 add_sub 1/GPU Disabled Yes 12083.8 0.7 0.0
add_sub 1 2 add_sub 1/GPU Disabled Yes 11931.6 0.2 0.0
add_sub 1 1 add_sub 1/GPU Disabled Yes 3057.6 1.4 0.0

Models (GPU Metrics):
Model GPU ID Batch Concurrency Model Config Path Instance Group Dynamic Batcher Sizes Satisfies Constraints GPU Memory Usage (MB) GPU Utilization (%) GPU Power Usage (W)
add_sub 0 1 16 add_sub 1/GPU Disabled Yes 624.0 8.2 31.8
add_sub 0 1 4 add_sub 1/GPU Disabled Yes 624.0 8.1 31.5
add_sub 0 1 8 add_sub 1/GPU Disabled Yes 624.0 8.1 31.7
add_sub 0 1 2 add_sub 1/GPU Disabled Yes 624.0 8.1 31.6
add_sub 0 1 1 add_sub 1/GPU Disabled Yes 624.0 2.4 30.9

What I want to know is that how GPU memory usage and GPU Utilization are almost same from Concurrency 2 to Concurrency 16. I think that concurrency is the execution of many models in parallel. Could you give the more explanation of concurrency please?

  1. I want to run other models on model-analyzer. In the each model folder, it has '1' folder where model.pb file is in and config.pbtxt and output_labels.txt. These model folders are in the same directory of QuickStart.
    ('add_sub', 'apple', 'new' location : /home/model_analyzer/examples/quick-start)
    My custom model is in the 'apple' folder and simple_identity model is in the 'new' folder.

Config file(config.pbtxt) explanation is below:

`name: "simple_identity"
platform: "tensorflow_savedmodel"
max_batch_size: 8

input [
{
name: "INPUT0"
data_type: TYPE_STRING
dims: [ -1 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [ -1 ]
label_filename: "output0_labels.txt"
}
]`

The error message what I've got was
root@d1:/# model-analyzer -m /home/model_analyzer/examples/quick-start -n new --triton-launch-mode=remote --export-path=analysis_results 2021-04-08 07:49:53.278 INFO[entrypoint.py:288] Triton Model Analyzer started: config={'model_repository': '/home/model_analyzer/examples/quick-start', 'model_names': [{'model_name': 'new', 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': []}}], 'objectives': {'perf_throughput': 10}, 'constraints': {}, 'batch_sizes': [1], 'concurrency': [], 'perf_analyzer_timeout': 600, 'perf_analyzer_cpu_util': 80.0, 'run_config_search_max_concurrency': 1024, 'run_config_search_max_instance_count': 5, 'run_config_search_disable': False, 'run_config_search_max_preferred_batch_size': 16, 'export': True, 'export_path': 'analysis_results', 'summarize': True, 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'perf_output': False, 'triton_launch_mode': 'remote', 'triton_docker_image': 'nvcr.io/nvidia/tritonserver:21.02-py3', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'triton_server_flags': {}, 'log_level': 'INFO', 'gpus': ['all'], 'output_model_repository_path': './output_model_repository', 'override_output_model_repository': False, 'config_file': None, 'inference_output_fields': ['model_name', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'dynamic_batch_sizes', 'satisfies_constraints', 'perf_throughput', 'perf_latency', 'cpu_used_ram'], 'gpu_output_fields': ['model_name', 'gpu_id', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'dynamic_batch_sizes', 'satisfies_constraints', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'server_output_fields': ['model_name', 'gpu_id', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'plots': [{'name': 'throughput_v_latency', 'title': 'Throughput vs. Latency', 'x_axis': 'perf_latency', 'y_axis': 'perf_throughput', 'monotonic': True}, {'name': 'gpu_mem_v_latency', 'title': 'GPU Memory vs. Latency', 'x_axis': 'perf_latency', 'y_axis': 'gpu_used_memory', 'monotonic': False}], 'top_n_configs': 3} 2021-04-08 07:49:53.280 INFO[entrypoint.py:79] Using remote Triton Server... 2021-04-08 07:49:53.280 WARNING[entrypoint.py:82] GPU memory metrics reported in the remote mode are not accuracte. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models. 2021-04-08 07:49:53.280 WARNING[entrypoint.py:89] Config sweep parameters are ignored in the "remote" mode because Model Analyzer does not have access to the model repository of the remote Triton Server. 2021-04-08 07:49:53.337 INFO[driver.py:236] init 2021-04-08 07:49:54.404 INFO[entrypoint.py:327] Starting perf_analyzer... 2021-04-08 07:49:54.404 INFO[analyzer.py:82] Profiling server only metrics... 2021-04-08 07:49:55.431 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-c8fdb676-2c11-669a-4cff-f300b28eb26a } for the analysis. 2021-04-08 07:49:56.464 INFO[run_search.py:155] Will sweep only through the concurrency values... 2021-04-08 07:49:56.464 INFO[run_search.py:262] Concurrency set to 1. 2021-04-08 07:49:56.468 INFO[client.py:82] Model new load failed: [StatusCode.INTERNAL] failed to load 'new', no version is available 2021-04-08 07:50:01.584 INFO[client.py:143] Model readiness failed for model new. Error None Traceback (most recent call last): File "/usr/local/bin/model-analyzer", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 328, in main analyzer.run() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 95, in run run_config_generator = RunConfigGenerator( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 65, in __init__ self._generate_run_configs() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 290, in _generate_run_configs self._generate_run_config_for_model_sweep( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 229, in _generate_run_config_for_model_sweep model_config = ModelConfig.create_from_triton_api( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/model/model_config.py", line 111, in create_from_triton_api model_config_dict = client.get_model_config(model_name, num_retries) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/client/grpc_client.py", line 54, in get_model_config model_config_dict = self._client.get_model_config(model_name, File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 476, in get_model_config raise_error_grpc(rpc_error) File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Request for unknown model: 'new' is not found

I can run perf-analyzer with the same setting, but can't run in model-analyzer.

Model Analyzer randomly skipping some combinations of parameters

I was running two models for sweep on the concurrency, batch size, gpu instances and dynamic batching.
The config.yaml file looks like this:

model_repository: /models
run_config_search_disable: True
profile_models:
model_1:
model_config_parameters:
instance_group:
-
kind: KIND_GPU
count: [2,4]
dynamic_batching:
max_queue_delay_microseconds: [2000, 5000]
perf_analyzer_flags:
input-data: /models/model_1/input.json
parameters:
concurrency: [1,2,3,4,5,6,7,8,9,10]
batch_sizes: [1,2,4,8,16,32,64,128,256,512,1000]
model_2:
model_config_parameters:
instance_group:
-
kind: KIND_GPU
count: [2,4]
dynamic_batching:
max_queue_delay_microseconds: [2000, 5000]
perf_analyzer_flags:
input-data: /models/model_2/input.json
parameters:
concurrency: [1,2,3,4,5,6,7,8,9,10]
batch_sizes: [1,2,4,8,16,32,64,128,256,512,1000]
analysis_models: model_1,model_2
inference_output_fields: [ 'model_name', 'batch_size', 'concurrency', 'model_config_path','instance_group', 'perf_server_queue', 'perf_throughput','perf_latency_p99']
export_path: analysis_results

When I go through the metric-model-inference csv file I observe that the model analyzer is randomly skipping some combinations of batch sizes and concurrencies. For example: In model_1_config_2 there is no test run for batch size 1000 and concurrencies > 5, In model_1_config_0 there is no recording for batch size 32 and concurrency 10.

Am i missing something here or is there an issue with the model-analyzer?

I am using the latest release, R22.03 branch.

The metrics-model-inference file looks like this:

Model Batch Concurrency Model Config Path Instance Group Server Queue time (ms) Throughput (infer/sec) p99 Latency (ms)
model_1 1000 2 model_1_config_2 4/GPU 0.0 22000.0 78.3
model_1 128 8 model_1_config_2 4/GPU 1.9 21376.0 47.1
model_1 256 4 model_1_config_2 4/GPU 2.1 21248.0 46.6
model_1 512 2 model_1_config_2 4/GPU 2.1 21248.0 42.8
model_1 1000 3 model_1_config_2 4/GPU 0.0 21000.0 125.0
model_1 1000 5 model_1_config_2 4/GPU 38.5 21000.0 225.1
model_1 1000 4 model_1_config_2 4/GPU 0.0 20659.8 172.3
model_1 512 3 model_1_config_2 4/GPU 1.6 20725.6 66.6
model_1 128 5 model_1_config_2 4/GPU 2.0 20736.0 29.4
model_1 256 6 model_1_config_2 4/GPU 2.0 20736.0 69.3
model_1 256 7 model_1_config_2 4/GPU 2.0 20736.0 82.5
model_1 128 9 model_1_config_2 4/GPU 1.9 20736.0 53.0
model_1 512 4 model_1_config_2 4/GPU 1.1 20480.0 90.5
model_1 512 5 model_1_config_2 4/GPU 19.4 20480.0 119.2
model_1 256 5 model_1_config_2 4/GPU 2.0 20480.0 58.1
model_1 128 10 model_1_config_2 4/GPU 1.8 20480.0 59.8
model_1 128 7 model_1_config_2 4/GPU 1.9 19712.0 43.4
model_1 64 8 model_1_config_2 4/GPU 1.9 19456.0 26.2
model_1 64 7 model_1_config_2 4/GPU 2.0 19264.0 23.7
model_1 128 6 model_1_config_2 4/GPU 1.9 19200.0 38.0
model_1 32 10 model_1_config_2 4/GPU 1.9 18880.0 16.9
model_1 64 6 model_1_config_2 4/GPU 2.0 18816.0 19.5
model_1 128 3 model_1_config_2 4/GPU 2.1 18816.0 18.6
model_1 64 5 model_1_config_2 4/GPU 2.0 18560.0 16.9
model_1 256 3 model_1_config_2 4/GPU 2.0 18432.0 37.6
model_1 128 4 model_1_config_2 4/GPU 2.0 18432.0 26.1
model_1 32 9 model_1_config_2 4/GPU 1.9 17856.0 16.1
model_1 256 2 model_1_config_2 4/GPU 2.1 17920.0 25.7
model_1 32 8 model_1_config_2 4/GPU 2.0 17408.0 14.5
model_1 1000 1 model_1_config_2 4/GPU 0.0 16994.3 44.4
model_1 64 4 model_1_config_2 4/GPU 2.0 16896.0 14.6
model_1 32 7 model_1_config_2 4/GPU 2.0 16559.4 13.2
model_1 32 6 model_1_config_2 4/GPU 2.0 16320.0 11.5
model_1 16 10 model_1_config_2 4/GPU 1.9 15840.0 9.9
model_1 512 1 model_1_config_2 4/GPU 2.1 15872.0 25.2
model_1 128 2 model_1_config_2 4/GPU 2.1 15872.0 14.4
model_1 64 3 model_1_config_2 4/GPU 2.1 15744.0 11.5
model_1 32 5 model_1_config_2 4/GPU 2.0 15360.0 10.1
model_1 256 1 model_1_config_2 4/GPU 2.1 14592.0 14.4
model_1 16 9 model_1_config_2 4/GPU 2.0 14400.0 10.0
model_1 16 8 model_1_config_2 4/GPU 2.0 14080.0 8.9
model_1 32 4 model_1_config_2 4/GPU 2.0 13824.0 8.9
model_1 16 7 model_1_config_2 4/GPU 2.0 13328.0 8.3
model_1 64 2 model_1_config_2 4/GPU 2.1 13184.0 8.8
model_1 16 6 model_1_config_2 4/GPU 2.0 12288.0 7.7
model_1 128 1 model_1_config_2 4/GPU 2.1 12160.0 8.8
model_1 32 3 model_1_config_2 4/GPU 2.0 11808.0 7.7
model_1 16 5 model_1_config_2 4/GPU 2.0 11440.0 6.8
model_1 8 10 model_1_config_2 4/GPU 1.9 11440.0 6.9
model_1 8 9 model_1_config_2 4/GPU 2.0 10584.0 6.8
model_1 8 8 model_1_config_2 4/GPU 2.0 10240.0 6.2
model_1 16 4 model_1_config_2 4/GPU 2.0 10101.9 6.2
model_1 8 7 model_1_config_2 4/GPU 2.0 9296.0 6.0
model_1 64 1 model_1_config_2 4/GPU 2.1 9152.0 6.1
model_1 8 6 model_1_config_2 4/GPU 2.0 8256.0 5.8
model_1 16 3 model_1_config_2 4/GPU 2.1 8160.0 5.7
model_1 8 5 model_1_config_2 4/GPU 2.0 7320.0 5.4
model_1 4 10 model_1_config_2 4/GPU 1.9 7200.0 5.6
model_1 4 9 model_1_config_2 4/GPU 1.9 6624.0 5.5
model_1 8 4 model_1_config_2 4/GPU 2.0 5952.0 5.3
model_1 4 8 model_1_config_2 4/GPU 2.0 5920.0 5.4
model_1 16 2 model_1_config_2 4/GPU 2.1 5856.0 5.2
model_1 32 2 model_1_config_2 4/GPU 2.1 5600.0 5.3
model_1 32 1 model_1_config_2 4/GPU 2.1 5568.0 5.3
model_1 4 7 model_1_config_2 4/GPU 2.0 5258.7 5.3
model_1 4 6 model_1_config_2 4/GPU 2.0 4536.0 5.3
model_1 8 3 model_1_config_2 4/GPU 2.1 4536.0 5.2
model_1 4 5 model_1_config_2 4/GPU 2.0 3680.0 5.4
model_1 2 10 model_1_config_2 4/GPU 1.9 3580.0 5.7
model_1 2 9 model_1_config_2 4/GPU 1.9 3240.0 5.6
model_1 4 4 model_1_config_2 4/GPU 2.0 3024.0 5.3
model_1 8 2 model_1_config_2 4/GPU 2.1 2992.0 5.3
model_1 2 8 model_1_config_2 4/GPU 2.0 2960.0 5.5
model_1 16 1 model_1_config_2 4/GPU 2.1 2944.0 5.2
model_1 2 7 model_1_config_2 4/GPU 2.0 2716.0 5.2
model_1 4 3 model_1_config_2 4/GPU 2.1 2340.0 5.3
model_1 2 6 model_1_config_2 4/GPU 2.0 2340.0 5.2
model_1 2 5 model_1_config_2 4/GPU 2.0 1910.0 5.3
model_1 1 10 model_1_config_2 4/GPU 1.9 1870.0 5.4
model_1 1 9 model_1_config_2 4/GPU 1.9 1692.0 5.4
model_1 4 2 model_1_config_2 4/GPU 2.1 1520.0 5.3
model_1 2 4 model_1_config_2 4/GPU 2.1 1512.0 5.3
model_1 8 1 model_1_config_2 4/GPU 2.1 1504.0 5.2
model_1 1 8 model_1_config_2 4/GPU 2.0 1496.0 5.5
model_1 1 7 model_1_config_2 4/GPU 2.0 1309.0 5.4
model_1 2 3 model_1_config_2 4/GPU 2.0 1146.0 5.3
model_1 1 6 model_1_config_2 4/GPU 2.0 1138.0 5.4
model_1 1 5 model_1_config_2 4/GPU 2.0 935.0 5.4
model_1 2 2 model_1_config_2 4/GPU 2.1 756.0 5.3
model_1 1 4 model_1_config_2 4/GPU 2.0 752.0 5.4
model_1 4 1 model_1_config_2 4/GPU 2.1 752.0 5.3
model_1 1 3 model_1_config_2 4/GPU 2.1 552.0 5.5
model_1 2 1 model_1_config_2 4/GPU 2.1 360.0 5.6
model_1 1 2 model_1_config_2 4/GPU 2.1 358.0 5.6
model_1 1 1 model_1_config_2 4/GPU 2.1 215.0 4.7
model_1 1000 2 model_1_config_1 2/GPU 0.0 21666.7 78.7
model_1 1000 3 model_1_config_1 2/GPU 34.9 21666.7 128.0
model_1 1000 4 model_1_config_1 2/GPU 76.7 21326.2 173.9
model_1 128 8 model_1_config_1 2/GPU 4.9 21376.0 47.2
model_1 128 10 model_1_config_1 2/GPU 4.9 21376.0 59.9
model_1 256 4 model_1_config_1 2/GPU 5.0 21248.0 46.5
model_1 512 2 model_1_config_1 2/GPU 5.1 20981.5 43.1
model_1 128 9 model_1_config_1 2/GPU 4.6 20864.0 53.4
model_1 1000 5 model_1_config_1 2/GPU 123.8 20666.7 266.8
model_1 512 3 model_1_config_1 2/GPU 18.7 20725.6 67.3
model_1 256 6 model_1_config_1 2/GPU 5.0 20736.0 68.8
model_1 256 7 model_1_config_1 2/GPU 23.4 20736.0 82.2
model_1 512 5 model_1_config_1 2/GPU 64.7 20480.0 139.3
model_1 512 4 model_1_config_1 2/GPU 39.7 20480.0 90.4
model_1 256 5 model_1_config_1 2/GPU 5.0 20480.0 56.8
model_1 64 10 model_1_config_1 2/GPU 4.9 19200.0 32.8
model_1 128 7 model_1_config_1 2/GPU 4.9 18816.0 46.0
model_1 128 5 model_1_config_1 2/GPU 4.9 18560.0 32.9
model_1 64 9 model_1_config_1 2/GPU 4.9 18432.0 31.4
model_1 128 6 model_1_config_1 2/GPU 4.9 18432.0 39.7
model_1 64 8 model_1_config_1 2/GPU 4.9 17920.0 28.3
model_1 256 3 model_1_config_1 2/GPU 5.0 17664.0 40.0
model_1 64 7 model_1_config_1 2/GPU 4.9 17472.0 25.6
model_1 1000 1 model_1_config_1 2/GPU 0.0 17000.0 44.3
model_1 128 4 model_1_config_1 2/GPU 5.0 16896.0 28.4
model_1 64 6 model_1_config_1 2/GPU 5.0 16512.0 22.6
model_1 256 2 model_1_config_1 2/GPU 5.1 16384.0 28.2
model_1 32 10 model_1_config_1 2/GPU 4.9 16320.0 19.6
model_1 128 3 model_1_config_1 2/GPU 5.1 16128.0 22.6
model_1 64 5 model_1_config_1 2/GPU 5.0 15680.0 19.7
model_1 32 9 model_1_config_1 2/GPU 4.9 15264.0 18.8
model_1 512 1 model_1_config_1 2/GPU 5.1 14848.0 27.8
model_1 32 8 model_1_config_1 2/GPU 5.0 14592.0 17.6
model_1 64 4 model_1_config_1 2/GPU 5.0 14321.7 17.3
model_1 32 7 model_1_config_1 2/GPU 5.0 13664.0 16.7
model_1 128 2 model_1_config_1 2/GPU 5.1 13568.0 17.3
model_1 32 6 model_1_config_1 2/GPU 5.0 13056.0 14.9
model_1 64 3 model_1_config_1 2/GPU 5.1 12851.1 14.2
model_1 256 1 model_1_config_1 2/GPU 5.1 12544.0 17.2
model_1 16 10 model_1_config_1 2/GPU 5.0 12000.0 13.6
model_1 32 5 model_1_config_1 2/GPU 5.0 11840.0 13.5
model_1 16 9 model_1_config_1 2/GPU 5.0 11088.0 12.9
model_1 16 8 model_1_config_1 2/GPU 5.0 10368.0 12.5
model_1 32 4 model_1_config_1 2/GPU 5.1 10240.0 12.5
model_1 64 2 model_1_config_1 2/GPU 5.1 10112.0 11.8
model_1 16 7 model_1_config_1 2/GPU 5.0 9856.0 11.7
model_1 128 1 model_1_config_1 2/GPU 5.1 9472.0 11.7
model_1 16 6 model_1_config_1 2/GPU 5.0 8640.0 11.4
model_1 32 3 model_1_config_1 2/GPU 5.1 8448.0 11.3
model_1 8 10 model_1_config_1 2/GPU 5.0 7600.0 11.0
model_1 16 5 model_1_config_1 2/GPU 5.0 7520.0 10.8
model_1 8 9 model_1_config_1 2/GPU 5.0 6768.0 10.9
model_1 8 8 model_1_config_1 2/GPU 5.0 6144.0 10.8
model_1 16 4 model_1_config_1 2/GPU 5.1 6016.0 10.8
model_1 32 2 model_1_config_1 2/GPU 5.1 5888.0 10.6
model_1 64 1 model_1_config_1 2/GPU 5.1 5696.0 10.3
model_1 8 7 model_1_config_1 2/GPU 5.0 5368.0 10.8
model_1 8 6 model_1_config_1 2/GPU 5.1 4656.0 10.7
model_1 16 3 model_1_config_1 2/GPU 5.1 4560.0 10.8
model_1 8 5 model_1_config_1 2/GPU 5.0 4080.0 9.7
model_1 4 10 model_1_config_1 2/GPU 4.9 4000.0 10.1
model_1 4 9 model_1_config_1 2/GPU 5.0 3456.0 10.8
model_1 8 4 model_1_config_1 2/GPU 5.1 3264.0 9.7
model_1 4 8 model_1_config_1 2/GPU 5.0 3104.0 10.7
model_1 16 2 model_1_config_1 2/GPU 5.1 3040.0 10.7
model_1 32 1 model_1_config_1 2/GPU 5.2 2976.0 10.7
model_1 4 7 model_1_config_1 2/GPU 5.0 2744.0 10.6
model_1 8 3 model_1_config_1 2/GPU 5.1 2472.0 9.7
model_1 4 6 model_1_config_1 2/GPU 5.0 2376.0 10.6
model_1 2 10 model_1_config_1 2/GPU 4.9 2020.0 9.9
model_1 4 5 model_1_config_1 2/GPU 5.1 1980.0 10.6
model_1 2 9 model_1_config_1 2/GPU 4.9 1836.0 9.9
model_1 8 2 model_1_config_1 2/GPU 5.1 1696.0 9.3
model_1 4 4 model_1_config_1 2/GPU 5.1 1648.0 10.1
model_1 2 8 model_1_config_1 2/GPU 5.0 1632.0 10.2
model_1 16 1 model_1_config_1 2/GPU 5.1 1616.0 9.9
model_1 2 7 model_1_config_1 2/GPU 5.0 1414.0 10.1
model_1 4 3 model_1_config_1 2/GPU 5.1 1282.7 9.8
model_1 2 6 model_1_config_1 2/GPU 5.0 1236.0 10.1
model_1 2 5 model_1_config_1 2/GPU 5.1 1090.0 9.6
model_1 1 10 model_1_config_1 2/GPU 4.9 1080.0 9.8
model_1 1 9 model_1_config_1 2/GPU 5.0 972.0 9.7
model_1 8 1 model_1_config_1 2/GPU 5.1 904.0 8.7
model_1 4 2 model_1_config_1 2/GPU 5.1 904.0 8.8
model_1 2 4 model_1_config_1 2/GPU 5.1 880.0 9.6
model_1 1 8 model_1_config_1 2/GPU 5.0 880.0 9.7
model_1 1 7 model_1_config_1 2/GPU 5.0 777.0 9.5
model_1 2 3 model_1_config_1 2/GPU 5.1 666.0 9.2
model_1 1 6 model_1_config_1 2/GPU 5.0 660.0 9.5
model_1 1 5 model_1_config_1 2/GPU 5.0 560.0 9.3
model_1 4 1 model_1_config_1 2/GPU 5.1 460.0 8.6
model_1 1 4 model_1_config_1 2/GPU 5.1 444.0 9.4
model_1 1 3 model_1_config_1 2/GPU 5.1 333.0 9.3
model_1 2 1 model_1_config_1 2/GPU 5.1 222.0 9.3
model_1 2 2 model_1_config_1 2/GPU 5.1 222.0 9.2
model_1 1 2 model_1_config_1 2/GPU 5.1 222.0 9.2
model_1 1 1 model_1_config_1 2/GPU 5.1 129.0 8.2
model_1 1000 2 model_1_config_3 4/GPU 0.0 21666.7 79.1
model_1 1000 3 model_1_config_3 4/GPU 0.0 21666.7 127.3
model_1 128 8 model_1_config_3 4/GPU 4.8 21504.0 46.3
model_1 256 6 model_1_config_3 4/GPU 5.0 21504.0 68.3
model_1 128 10 model_1_config_3 4/GPU 5.0 21376.0 58.8
model_1 256 4 model_1_config_3 4/GPU 5.1 21248.0 46.5
model_1 256 5 model_1_config_3 4/GPU 5.0 21248.0 57.1
model_1 512 2 model_1_config_3 4/GPU 5.1 21237.4 42.3
model_1 1000 4 model_1_config_3 4/GPU 0.0 20659.8 172.4
model_1 1000 5 model_1_config_3 4/GPU 39.0 20659.8 222.6
model_1 512 3 model_1_config_3 4/GPU 3.5 20736.0 66.6
model_1 128 9 model_1_config_3 4/GPU 5.0 20736.0 53.1
model_1 512 4 model_1_config_3 4/GPU 2.7 20480.0 90.9
model_1 512 5 model_1_config_3 4/GPU 20.9 20480.0 119.1
model_1 256 7 model_1_config_3 4/GPU 3.4 20480.0 81.5
model_1 128 7 model_1_config_3 4/GPU 4.9 18816.0 45.4
model_1 128 5 model_1_config_3 4/GPU 4.9 18560.0 32.9
model_1 64 9 model_1_config_3 4/GPU 4.9 18413.6 31.2
model_1 128 6 model_1_config_3 4/GPU 4.9 18432.0 40.2
model_1 64 8 model_1_config_3 4/GPU 4.9 17920.0 28.5
model_1 256 3 model_1_config_3 4/GPU 5.0 17664.0 40.3
model_1 64 7 model_1_config_3 4/GPU 4.9 17472.0 25.5
model_1 1000 1 model_1_config_3 4/GPU 0.0 17333.3 44.6
model_1 128 4 model_1_config_3 4/GPU 5.0 17408.0 28.6
model_1 64 6 model_1_config_3 4/GPU 5.0 16896.0 22.7
model_1 256 2 model_1_config_3 4/GPU 5.1 16384.0 28.2
model_1 128 3 model_1_config_3 4/GPU 5.1 16128.0 22.6
model_1 64 5 model_1_config_3 4/GPU 5.0 15680.0 19.5
model_1 32 9 model_1_config_3 4/GPU 4.9 15264.0 18.8
model_1 512 1 model_1_config_3 4/GPU 5.1 14840.6 27.8
model_1 32 8 model_1_config_3 4/GPU 5.0 14577.4 17.3
model_1 64 4 model_1_config_3 4/GPU 5.0 14336.0 17.3
model_1 32 7 model_1_config_3 4/GPU 5.0 13888.0 15.9
model_1 128 2 model_1_config_3 4/GPU 5.1 13568.0 17.3
model_1 32 6 model_1_config_3 4/GPU 5.0 13248.0 14.2
model_1 64 3 model_1_config_3 4/GPU 5.1 12672.0 14.1
model_1 256 1 model_1_config_3 4/GPU 5.1 12416.0 17.2
model_1 16 10 model_1_config_3 4/GPU 4.9 12160.0 13.0
model_1 32 5 model_1_config_3 4/GPU 5.0 12000.0 12.9
model_1 16 9 model_1_config_3 4/GPU 5.0 11232.0 12.6
model_1 16 8 model_1_config_3 4/GPU 5.0 10624.0 11.8
model_1 32 4 model_1_config_3 4/GPU 5.0 10496.0 11.8
model_1 64 2 model_1_config_3 4/GPU 5.1 10112.0 11.8
model_1 16 7 model_1_config_3 4/GPU 5.0 9846.1 11.2
model_1 128 1 model_1_config_3 4/GPU 5.1 9472.0 11.7
model_1 16 6 model_1_config_3 4/GPU 5.0 8832.0 10.7
model_1 32 3 model_1_config_3 4/GPU 5.1 8640.0 10.7
model_1 16 5 model_1_config_3 4/GPU 5.0 7920.0 9.9
model_1 8 10 model_1_config_3 4/GPU 4.9 7840.0 10.1
model_1 8 9 model_1_config_3 4/GPU 5.0 7128.0 10.0
model_1 8 8 model_1_config_3 4/GPU 5.0 6400.0 10.0
model_1 16 4 model_1_config_3 4/GPU 5.0 6336.0 9.9
model_1 32 2 model_1_config_3 4/GPU 5.1 6144.0 10.0
model_1 64 1 model_1_config_3 4/GPU 5.1 5696.0 10.3
model_1 8 7 model_1_config_3 4/GPU 5.0 5600.0 10.0
model_1 8 6 model_1_config_3 4/GPU 5.0 4800.0 9.9
model_1 16 3 model_1_config_3 4/GPU 5.1 4752.0 9.9
model_1 8 5 model_1_config_3 4/GPU 5.0 4080.0 9.7
model_1 4 10 model_1_config_3 4/GPU 4.9 4040.0 9.9
model_1 4 9 model_1_config_3 4/GPU 5.0 3672.0 9.8
model_1 4 8 model_1_config_3 4/GPU 5.0 3264.0 9.8
model_1 8 4 model_1_config_3 4/GPU 5.0 3264.0 9.7
model_1 16 2 model_1_config_3 4/GPU 5.1 3200.0 9.8
model_1 32 1 model_1_config_3 4/GPU 5.1 3104.0 9.9
model_1 4 7 model_1_config_3 4/GPU 5.0 2884.0 9.7
model_1 4 6 model_1_config_3 4/GPU 5.0 2448.0 9.8
model_1 8 3 model_1_config_3 4/GPU 5.1 2448.0 9.7
model_1 4 5 model_1_config_3 4/GPU 5.0 2060.0 9.7
model_1 2 10 model_1_config_3 4/GPU 4.9 2020.0 9.9
model_1 2 9 model_1_config_3 4/GPU 4.9 1836.0 9.9
model_1 4 4 model_1_config_3 4/GPU 5.0 1696.0 9.4
model_1 2 8 model_1_config_3 4/GPU 5.0 1680.0 9.6
model_1 16 1 model_1_config_3 4/GPU 5.1 1680.0 9.3
model_1 2 7 model_1_config_3 4/GPU 5.0 1470.0 9.6
model_1 4 3 model_1_config_3 4/GPU 5.1 1284.0 9.3
model_1 2 6 model_1_config_3 4/GPU 5.0 1284.0 9.4
model_1 2 5 model_1_config_3 4/GPU 5.0 1120.0 8.9
model_1 1 10 model_1_config_3 4/GPU 4.9 1110.0 9.1
model_1 1 9 model_1_config_3 4/GPU 4.9 999.0 9.1
model_1 4 2 model_1_config_3 4/GPU 5.1 912.0 8.8
model_1 8 2 model_1_config_3 4/GPU 5.1 904.0 8.7
model_1 2 4 model_1_config_3 4/GPU 5.1 904.0 8.8
model_1 1 8 model_1_config_3 4/GPU 5.0 904.0 8.9
model_1 8 1 model_1_config_3 4/GPU 5.1 904.0 8.7
model_1 1 7 model_1_config_3 4/GPU 5.0 791.0 8.9
model_1 2 3 model_1_config_3 4/GPU 5.1 684.0 8.8
model_1 1 6 model_1_config_3 4/GPU 5.0 684.0 8.9
model_1 1 5 model_1_config_3 4/GPU 5.0 570.0 8.8
model_1 2 2 model_1_config_3 4/GPU 5.1 460.0 8.7
model_1 4 1 model_1_config_3 4/GPU 5.1 460.0 8.6
model_1 1 4 model_1_config_3 4/GPU 5.0 456.0 8.8
model_1 1 3 model_1_config_3 4/GPU 5.1 342.0 8.8
model_1 2 1 model_1_config_3 4/GPU 5.1 228.0 8.8
model_1 1 1 model_1_config_3 4/GPU 5.1 132.0 7.6
model_1 1 2 model_1_config_3 4/GPU 5.1 131.0 7.7
model_1 1000 2 model_1_config_default 1/GPU 29.6 21666.7 78.0
model_1 512 2 model_1_config_default 1/GPU 16.3 21248.0 43.1
model_1 1000 3 model_1_config_default 1/GPU 75.2 21333.3 123.1
model_1 1000 4 model_1_config_default 1/GPU 121.9 21333.3 170.1
model_1 1000 5 model_1_config_default 1/GPU 168.0 21000.0 216.6
model_1 512 3 model_1_config_default 1/GPU 39.9 20992.0 66.9
model_1 512 4 model_1_config_default 1/GPU 64.4 20736.0 91.1
model_1 512 5 model_1_config_default 1/GPU 88.6 20736.0 114.9
model_1 256 2 model_1_config_default 1/GPU 8.7 19968.0 22.1
model_1 256 3 model_1_config_default 1/GPU 21.4 19968.0 35.0
model_1 256 4 model_1_config_default 1/GPU 34.0 19712.0 47.5
model_1 256 5 model_1_config_default 1/GPU 46.6 19712.0 60.0
model_1 128 2 model_1_config_default 1/GPU 4.9 18304.0 12.5
model_1 128 3 model_1_config_default 1/GPU 11.9 18176.0 19.9
model_1 128 4 model_1_config_default 1/GPU 18.9 18176.0 27.0
model_1 128 5 model_1_config_default 1/GPU 25.9 18048.0 34.4
model_1 1000 1 model_1_config_default 1/GPU 0.0 17333.3 44.7
model_1 512 1 model_1_config_default 1/GPU 0.0 16887.6 23.5
model_1 256 1 model_1_config_default 1/GPU 0.0 16128.0 12.4
model_1 64 3 model_1_config_default 1/GPU 6.7 16064.0 11.1
model_1 64 4 model_1_config_default 1/GPU 10.8 16000.0 15.2
model_1 64 5 model_1_config_default 1/GPU 14.8 16000.0 19.3
model_1 64 6 model_1_config_default 1/GPU 18.7 16000.0 23.1
model_1 128 1 model_1_config_default 1/GPU 0.0 14848.0 6.9
model_1 64 2 model_1_config_default 1/GPU 0.0 13120.0 4.0
model_1 64 1 model_1_config_default 1/GPU 0.0 13120.0 4.0
model_1 32 3 model_1_config_default 1/GPU 5.4 10688.0 8.7
model_1 32 4 model_1_config_default 1/GPU 8.4 10592.0 11.8
model_1 32 5 model_1_config_default 1/GPU 11.4 10592.0 14.8
model_1 32 2 model_1_config_default 1/GPU 2.5 10496.0 5.8
model_1 32 1 model_1_config_default 1/GPU 0.0 8960.0 3.1
model_1 16 3 model_1_config_default 1/GPU 5.1 5952.0 7.9
model_1 16 5 model_1_config_default 1/GPU 10.5 5904.0 13.4
model_1 16 4 model_1_config_default 1/GPU 7.7 5920.0 10.6
model_1 16 2 model_1_config_default 1/GPU 2.4 5888.0 5.3
model_1 16 1 model_1_config_default 1/GPU 0.0 5328.0 2.8
model_1 8 4 model_1_config_default 1/GPU 7.3 3184.0 10.0
model_1 8 5 model_1_config_default 1/GPU 9.8 3176.0 12.5
model_1 8 2 model_1_config_default 1/GPU 2.3 3168.0 5.0
model_1 8 3 model_1_config_default 1/GPU 4.9 3148.8 7.7
model_1 8 1 model_1_config_default 1/GPU 0.0 2952.0 2.6
model_1 4 2 model_1_config_default 1/GPU 2.3 1614.4 4.9
model_1 4 5 model_1_config_default 1/GPU 9.8 1604.0 12.4
model_1 4 3 model_1_config_default 1/GPU 4.9 1596.0 7.5
model_1 4 4 model_1_config_default 1/GPU 7.4 1588.0 10.1
model_1 4 1 model_1_config_default 1/GPU 0.0 1524.0 2.6
model_1 2 4 model_1_config_default 1/GPU 7.9 748.0 10.7
model_1 2 2 model_1_config_default 1/GPU 2.6 744.0 5.4
model_1 2 3 model_1_config_default 1/GPU 5.3 740.0 8.1
model_1 2 1 model_1_config_default 1/GPU 0.0 718.0 2.8
model_1 1 3 model_1_config_default 1/GPU 4.7 416.0 7.3
model_1 1 4 model_1_config_default 1/GPU 7.1 416.0 9.7
model_1 1 2 model_1_config_default 1/GPU 2.3 414.0 5.0
model_1 1 1 model_1_config_default 1/GPU 0.0 399.0 2.6
model_1 1000 2 model_1_config_0 2/GPU 0.0 21659.4 78.7
model_1 128 7 model_1_config_0 2/GPU 2.0 21504.0 40.3
model_1 1000 4 model_1_config_0 2/GPU 76.3 21333.3 173.3
model_1 512 2 model_1_config_0 2/GPU 2.1 20992.0 42.8
model_1 512 4 model_1_config_0 2/GPU 38.0 20981.5 93.9
model_1 1000 3 model_1_config_0 2/GPU 35.5 21000.0 129.2
model_1 1000 5 model_1_config_0 2/GPU 123.1 20666.7 265.1
model_1 128 6 model_1_config_0 2/GPU 2.0 20736.0 35.6
model_1 256 6 model_1_config_0 2/GPU 2.0 20736.0 68.8
model_1 512 3 model_1_config_0 2/GPU 18.6 20736.0 66.4
model_1 256 7 model_1_config_0 2/GPU 23.5 20736.0 82.5
model_1 256 4 model_1_config_0 2/GPU 2.1 20736.0 46.3
model_1 512 5 model_1_config_0 2/GPU 64.6 20480.0 139.1
model_1 128 5 model_1_config_0 2/GPU 1.9 20459.5 30.5
model_1 256 5 model_1_config_0 2/GPU 2.0 20480.0 57.4
model_1 128 4 model_1_config_0 2/GPU 2.1 20224.0 23.8
model_1 64 7 model_1_config_0 2/GPU 2.0 19264.0 23.8
model_1 128 3 model_1_config_0 2/GPU 2.1 19200.0 18.6
model_1 64 8 model_1_config_0 2/GPU 2.0 18944.0 26.8
model_1 64 6 model_1_config_0 2/GPU 2.0 18816.0 19.8
model_1 256 3 model_1_config_0 2/GPU 2.1 18432.0 37.8
model_1 64 5 model_1_config_0 2/GPU 2.0 18240.0 17.2
model_1 32 9 model_1_config_0 2/GPU 2.0 17568.0 16.5
model_1 1000 1 model_1_config_0 2/GPU 0.0 17333.3 44.3
model_1 256 2 model_1_config_0 2/GPU 2.1 17408.0 26.2
model_1 32 8 model_1_config_0 2/GPU 2.0 17152.0 15.0
model_1 64 4 model_1_config_0 2/GPU 2.1 16640.0 15.2
model_1 32 7 model_1_config_0 2/GPU 2.0 16352.0 13.7
model_1 32 6 model_1_config_0 2/GPU 2.0 16111.9 11.9
model_1 512 1 model_1_config_0 2/GPU 2.1 15864.1 25.6
model_1 128 2 model_1_config_0 2/GPU 2.1 15872.0 14.9
model_1 64 3 model_1_config_0 2/GPU 2.1 15360.0 12.1
model_1 16 10 model_1_config_0 2/GPU 2.0 15200.0 10.8
model_1 32 5 model_1_config_0 2/GPU 2.0 15040.0 10.6
model_1 256 1 model_1_config_0 2/GPU 2.1 14336.0 14.8
model_1 16 9 model_1_config_0 2/GPU 2.0 14112.0 10.5
model_1 16 8 model_1_config_0 2/GPU 2.0 13680.0 9.5
model_1 32 4 model_1_config_0 2/GPU 2.1 13440.0 9.6
model_1 64 2 model_1_config_0 2/GPU 2.1 12928.0 9.4
model_1 16 7 model_1_config_0 2/GPU 2.0 12992.0 8.8
model_1 16 6 model_1_config_0 2/GPU 2.0 12000.0 8.4
model_1 128 1 model_1_config_0 2/GPU 2.1 11776.0 9.5
model_1 32 3 model_1_config_0 2/GPU 2.1 11424.0 8.4
model_1 8 10 model_1_config_0 2/GPU 2.0 11120.0 7.6
model_1 16 5 model_1_config_0 2/GPU 2.0 10960.0 7.5
model_1 8 9 model_1_config_0 2/GPU 2.0 10224.0 7.2
model_1 8 8 model_1_config_0 2/GPU 2.0 9920.0 6.9
model_1 16 4 model_1_config_0 2/GPU 2.1 9664.0 6.8
model_1 32 2 model_1_config_0 2/GPU 2.1 9216.0 6.9
model_1 8 7 model_1_config_0 2/GPU 2.0 9016.0 6.7
model_1 64 1 model_1_config_0 2/GPU 2.1 8832.0 6.8
model_1 8 6 model_1_config_0 2/GPU 2.0 7984.0 6.1
model_1 16 3 model_1_config_0 2/GPU 2.1 7824.0 6.4
model_1 8 5 model_1_config_0 2/GPU 2.1 6920.0 6.0
model_1 4 10 model_1_config_0 2/GPU 2.0 6880.0 6.2
model_1 4 9 model_1_config_0 2/GPU 2.0 6408.0 6.0
model_1 4 8 model_1_config_0 2/GPU 2.0 5664.0 6.0
model_1 8 4 model_1_config_0 2/GPU 2.1 5504.0 6.0
model_1 32 1 model_1_config_0 2/GPU 2.1 5376.0 6.2
model_1 16 2 model_1_config_0 2/GPU 2.1 5344.0 6.1
model_1 4 7 model_1_config_0 2/GPU 2.0 4928.0 6.1
model_1 8 3 model_1_config_0 2/GPU 2.1 4296.0 6.0
model_1 4 6 model_1_config_0 2/GPU 2.0 4272.0 6.0
model_1 2 10 model_1_config_0 2/GPU 1.9 3620.0 5.6
model_1 4 5 model_1_config_0 2/GPU 2.0 3540.0 6.0
model_1 2 9 model_1_config_0 2/GPU 1.9 3240.0 5.7
model_1 2 8 model_1_config_0 2/GPU 2.0 2970.0 5.8
model_1 4 4 model_1_config_0 2/GPU 2.1 2848.0 6.0
model_1 8 2 model_1_config_0 2/GPU 2.1 2816.0 6.0
model_1 16 1 model_1_config_0 2/GPU 2.1 2720.0 5.9
model_1 2 7 model_1_config_0 2/GPU 2.0 2562.0 5.9
model_1 4 3 model_1_config_0 2/GPU 2.1 2352.0 5.1
model_1 2 6 model_1_config_0 2/GPU 2.0 2172.0 6.0
model_1 2 5 model_1_config_0 2/GPU 2.0 1810.0 6.0
model_1 1 10 model_1_config_0 2/GPU 1.9 1740.0 6.1
model_1 1 9 model_1_config_0 2/GPU 2.0 1584.0 6.1
model_1 8 1 model_1_config_0 2/GPU 2.1 1424.0 5.9
model_1 2 4 model_1_config_0 2/GPU 2.1 1424.0 6.0
model_1 1 8 model_1_config_0 2/GPU 2.0 1400.0 6.1
model_1 1 7 model_1_config_0 2/GPU 2.0 1267.0 6.1
model_1 2 3 model_1_config_0 2/GPU 2.1 1074.0 6.0
model_1 1 6 model_1_config_0 2/GPU 2.0 1044.0 6.1
model_1 1 5 model_1_config_0 2/GPU 2.0 875.0 6.1
model_1 4 2 model_1_config_0 2/GPU 2.1 760.0 5.3
model_1 4 1 model_1_config_0 2/GPU 2.1 756.0 5.3
model_1 2 2 model_1_config_0 2/GPU 2.1 732.0 6.0
model_1 1 4 model_1_config_0 2/GPU 2.1 712.0 6.1
model_1 1 3 model_1_config_0 2/GPU 2.1 516.0 6.1
model_1 1 2 model_1_config_0 2/GPU 2.1 346.0 6.3
model_1 2 1 model_1_config_0 2/GPU 2.1 338.0 6.3
model_1 1 1 model_1_config_0 2/GPU 2.1 203.0 5.3
model_2 128 7 model_2_config_0 2/GPU 2.0 21504.0 38.4
model_2 1000 3 model_2_config_0 2/GPU 31.1 21000.0 127.5
model_2 1000 6 model_2_config_0 2/GPU 167.0 20666.7 268.5
model_2 128 9 model_2_config_0 2/GPU 1.9 20736.0 52.7
model_2 256 6 model_2_config_0 2/GPU 2.0 20736.0 68.7
model_2 512 3 model_2_config_0 2/GPU 17.3 20736.0 67.8
model_2 1000 4 model_2_config_0 2/GPU 73.3 20666.7 174.9
model_2 1000 5 model_2_config_0 2/GPU 119.3 20666.7 261.9
model_2 512 2 model_2_config_0 2/GPU 2.1 20736.0 41.4
model_2 512 4 model_2_config_0 2/GPU 38.7 20480.0 90.2
model_2 256 4 model_2_config_0 2/GPU 2.1 20480.0 45.6
model_2 256 5 model_2_config_0 2/GPU 2.0 20459.5 56.5
model_2 128 5 model_2_config_0 2/GPU 2.0 20480.0 28.9
model_2 128 8 model_2_config_0 2/GPU 2.0 20480.0 48.6
model_2 128 10 model_2_config_0 2/GPU 9.5 20480.0 60.1
model_2 512 5 model_2_config_0 2/GPU 63.4 19968.0 139.1
model_2 256 7 model_2_config_0 2/GPU 23.6 19712.0 83.0
model_2 64 9 model_2_config_0 2/GPU 1.9 18989.0 29.5
model_2 64 6 model_2_config_0 2/GPU 2.0 18816.0 19.2
model_2 64 7 model_2_config_0 2/GPU 2.0 18368.0 24.2
model_2 128 6 model_2_config_0 2/GPU 1.9 18432.0 38.7
model_2 64 8 model_2_config_0 2/GPU 1.9 18432.0 26.7
model_2 128 4 model_2_config_0 2/GPU 2.0 17920.0 26.3
model_2 256 3 model_2_config_0 2/GPU 2.0 17664.0 38.4
model_2 64 5 model_2_config_0 2/GPU 2.0 17600.0 17.3
model_2 128 3 model_2_config_0 2/GPU 2.1 17280.0 20.5
model_2 32 9 model_2_config_0 2/GPU 1.9 17280.0 16.3
model_2 256 2 model_2_config_0 2/GPU 2.1 16896.0 26.2
model_2 32 8 model_2_config_0 2/GPU 1.9 16640.0 14.9
model_2 32 7 model_2_config_0 2/GPU 1.9 16128.0 13.5
model_2 64 4 model_2_config_0 2/GPU 2.0 16128.0 14.8
model_2 1000 2 model_2_config_0 2/GPU 0.0 16000.0 44.7
model_2 32 6 model_2_config_0 2/GPU 2.0 15744.0 11.8
model_2 1000 1 model_2_config_0 2/GPU 0.0 15750.0 44.7
model_2 128 2 model_2_config_0 2/GPU 2.1 15104.0 14.8
model_2 64 3 model_2_config_0 2/GPU 2.1 14976.0 11.8
model_2 16 10 model_2_config_0 2/GPU 1.9 14880.0 10.6
model_2 512 1 model_2_config_0 2/GPU 2.1 14848.0 25.4
model_2 32 5 model_2_config_0 2/GPU 2.0 14720.0 10.5
model_2 16 9 model_2_config_0 2/GPU 1.9 14256.0 9.8
model_2 16 8 model_2_config_0 2/GPU 1.9 13568.0 9.2
model_2 256 1 model_2_config_0 2/GPU 2.1 13312.0 14.7
model_2 32 4 model_2_config_0 2/GPU 2.0 13184.0 9.2
model_2 16 7 model_2_config_0 2/GPU 2.0 12880.0 8.5
model_2 64 2 model_2_config_0 2/GPU 2.1 12544.0 9.1
model_2 16 6 model_2_config_0 2/GPU 2.0 11616.0 8.0
model_2 32 3 model_2_config_0 2/GPU 2.0 11232.0 8.0
model_2 128 1 model_2_config_0 2/GPU 2.1 11252.7 9.0
model_2 16 5 model_2_config_0 2/GPU 2.0 10880.0 7.1
model_2 8 10 model_2_config_0 2/GPU 1.9 10880.0 7.2
model_2 8 9 model_2_config_0 2/GPU 1.9 10080.0 7.1
model_2 8 8 model_2_config_0 2/GPU 2.0 9728.0 6.5
model_2 16 4 model_2_config_0 2/GPU 2.0 9536.0 6.4
model_2 32 2 model_2_config_0 2/GPU 2.1 9152.0 6.4
model_2 8 7 model_2_config_0 2/GPU 2.0 8760.0 6.3
model_2 64 1 model_2_config_0 2/GPU 2.1 8448.0 6.4
model_2 8 6 model_2_config_0 2/GPU 2.0 7872.0 6.0
model_2 16 3 model_2_config_0 2/GPU 2.1 7728.0 6.0
model_2 8 5 model_2_config_0 2/GPU 2.0 6920.0 5.7
model_2 4 10 model_2_config_0 2/GPU 1.9 6840.0 5.9
model_2 4 9 model_2_config_0 2/GPU 1.9 6264.0 5.8
model_2 4 8 model_2_config_0 2/GPU 1.9 5600.0 5.8
model_2 8 4 model_2_config_0 2/GPU 2.0 5600.0 5.6
model_2 16 2 model_2_config_0 2/GPU 2.1 5472.0 5.6
model_2 32 1 model_2_config_0 2/GPU 2.1 5120.0 5.6
model_2 4 7 model_2_config_0 2/GPU 2.0 4951.1 5.7
model_2 4 6 model_2_config_0 2/GPU 2.0 4272.0 5.6
model_2 8 3 model_2_config_0 2/GPU 2.1 4224.0 5.6
model_2 4 5 model_2_config_0 2/GPU 2.0 3480.0 5.7
model_2 2 10 model_2_config_0 2/GPU 1.9 3376.6 6.0
model_2 2 9 model_2_config_0 2/GPU 1.9 3060.0 6.0
model_2 4 4 model_2_config_0 2/GPU 2.0 2832.0 5.6
model_2 2 8 model_2_config_0 2/GPU 1.9 2800.0 5.8
model_2 16 1 model_2_config_0 2/GPU 2.1 2784.0 5.5
model_2 8 2 model_2_config_0 2/GPU 2.1 2784.0 5.6
model_2 2 7 model_2_config_0 2/GPU 2.0 2548.0 5.6
model_2 4 3 model_2_config_0 2/GPU 2.1 2208.0 5.4
model_2 2 6 model_2_config_0 2/GPU 2.0 2196.0 5.5
model_2 2 5 model_2_config_0 2/GPU 2.0 1800.0 5.6
model_2 1 10 model_2_config_0 2/GPU 1.9 1760.0 5.8
model_2 1 9 model_2_config_0 2/GPU 1.9 1593.0 5.7
model_2 4 2 model_2_config_0 2/GPU 2.1 1448.0 5.5
model_2 8 1 model_2_config_0 2/GPU 2.1 1432.0 5.5
model_2 2 4 model_2_config_0 2/GPU 2.0 1432.0 5.6
model_2 1 8 model_2_config_0 2/GPU 1.9 1416.0 5.8
model_2 1 7 model_2_config_0 2/GPU 2.0 1218.0 5.8
model_2 2 3 model_2_config_0 2/GPU 2.0 1086.0 5.5
model_2 1 6 model_2_config_0 2/GPU 2.0 1068.0 5.7
model_2 1 5 model_2_config_0 2/GPU 2.0 885.0 5.7
model_2 2 2 model_2_config_0 2/GPU 2.1 724.0 5.5
model_2 4 1 model_2_config_0 2/GPU 2.1 720.0 5.5
model_2 1 4 model_2_config_0 2/GPU 2.0 712.0 5.7
model_2 1 3 model_2_config_0 2/GPU 2.1 522.0 5.8
model_2 2 1 model_2_config_0 2/GPU 2.1 343.7 5.8
model_2 1 2 model_2_config_0 2/GPU 2.1 344.0 5.9
model_2 1 1 model_2_config_0 2/GPU 2.1 206.0 4.8
model_2 1000 2 model_2_config_2 4/GPU 0.0 21333.3 75.0
model_2 1000 5 model_2_config_2 4/GPU 24.1 21000.0 219.8
model_2 1000 3 model_2_config_2 4/GPU 0.0 21000.0 125.6
model_2 1000 4 model_2_config_2 4/GPU 0.0 20666.7 170.6
model_2 128 9 model_2_config_2 4/GPU 1.9 20736.0 53.0
model_2 512 2 model_2_config_2 4/GPU 2.1 20736.0 41.1
model_2 512 3 model_2_config_2 4/GPU 1.5 20736.0 65.2
model_2 64 8 model_2_config_2 4/GPU 2.0 20480.0 24.5
model_2 512 5 model_2_config_2 4/GPU 19.0 20480.0 121.4
model_2 128 10 model_2_config_2 4/GPU 0.9 20480.0 58.9
model_2 512 4 model_2_config_2 4/GPU 1.4 20469.8 90.3
model_2 256 4 model_2_config_2 4/GPU 2.1 20480.0 46.2
model_2 256 5 model_2_config_2 4/GPU 2.0 20480.0 56.2
model_2 64 10 model_2_config_2 4/GPU 1.9 20480.0 30.9
model_2 128 8 model_2_config_2 4/GPU 2.0 20480.0 48.1
model_2 256 6 model_2_config_2 4/GPU 2.0 19968.0 68.9
model_2 128 5 model_2_config_2 4/GPU 1.9 19840.0 30.6
model_2 256 7 model_2_config_2 4/GPU 1.4 19712.0 82.4
model_2 128 7 model_2_config_2 4/GPU 1.9 19712.0 43.7
model_2 64 9 model_2_config_2 4/GPU 1.9 19584.0 29.5
model_2 64 6 model_2_config_2 4/GPU 2.0 18816.0 19.2
model_2 64 7 model_2_config_2 4/GPU 1.9 18816.0 23.9
model_2 256 3 model_2_config_2 4/GPU 2.0 18432.0 37.9
model_2 128 6 model_2_config_2 4/GPU 1.9 18432.0 39.0
model_2 128 4 model_2_config_2 4/GPU 2.0 18432.0 26.3
model_2 32 10 model_2_config_2 4/GPU 1.9 18221.8 17.4
model_2 64 5 model_2_config_2 4/GPU 2.0 17920.0 17.2
model_2 128 3 model_2_config_2 4/GPU 2.1 17646.4 20.4
model_2 32 9 model_2_config_2 4/GPU 1.9 17280.0 16.4
model_2 32 8 model_2_config_2 4/GPU 1.9 16896.0 14.9
model_2 256 2 model_2_config_2 4/GPU 2.1 16896.0 25.9
model_2 64 4 model_2_config_2 4/GPU 2.0 16128.0 14.8
model_2 32 7 model_2_config_2 4/GPU 2.0 15904.0 13.6
model_2 32 6 model_2_config_2 4/GPU 2.0 15744.0 11.7
model_2 1000 1 model_2_config_2 4/GPU 0.0 15750.0 44.5
model_2 128 2 model_2_config_2 4/GPU 2.1 15104.0 14.8
model_2 64 3 model_2_config_2 4/GPU 2.1 14976.0 11.7
model_2 16 10 model_2_config_2 4/GPU 1.9 14880.0 10.6
model_2 512 1 model_2_config_2 4/GPU 2.1 14848.0 25.5
model_2 32 5 model_2_config_2 4/GPU 2.0 14705.3 10.5
model_2 16 9 model_2_config_2 4/GPU 1.9 14112.0 10.1
model_2 16 8 model_2_config_2 4/GPU 1.9 13440.0 9.3
model_2 256 1 model_2_config_2 4/GPU 2.1 13312.0 14.7
model_2 32 4 model_2_config_2 4/GPU 2.0 13184.0 9.2
model_2 16 7 model_2_config_2 4/GPU 2.0 12867.1 8.5
model_2 64 2 model_2_config_2 4/GPU 2.1 12416.0 9.1
model_2 16 6 model_2_config_2 4/GPU 1.9 11616.0 8.0
model_2 32 3 model_2_config_2 4/GPU 2.1 11232.0 8.0
model_2 128 1 model_2_config_2 4/GPU 2.1 11136.0 9.0
model_2 8 10 model_2_config_2 4/GPU 1.9 10949.1 7.3
model_2 16 5 model_2_config_2 4/GPU 2.0 10880.0 7.1
model_2 8 9 model_2_config_2 4/GPU 1.9 10080.0 7.0
model_2 8 8 model_2_config_2 4/GPU 2.0 9664.0 6.5
model_2 16 4 model_2_config_2 4/GPU 2.0 9526.5 6.5
model_2 8 7 model_2_config_2 4/GPU 2.0 8848.0 6.3
model_2 64 1 model_2_config_2 4/GPU 2.1 8448.0 6.4
model_2 8 6 model_2_config_2 4/GPU 2.0 7824.0 6.0
model_2 16 3 model_2_config_2 4/GPU 2.1 7728.0 6.0
model_2 8 5 model_2_config_2 4/GPU 2.0 6960.0 5.7
model_2 4 10 model_2_config_2 4/GPU 1.9 6804.0 5.9
model_2 4 9 model_2_config_2 4/GPU 1.9 6264.0 5.8
model_2 4 8 model_2_config_2 4/GPU 1.9 5600.0 5.7
model_2 8 4 model_2_config_2 4/GPU 2.0 5562.4 5.7
model_2 16 2 model_2_config_2 4/GPU 2.1 5408.0 5.6
model_2 32 1 model_2_config_2 4/GPU 2.1 5152.0 5.6
model_2 32 2 model_2_config_2 4/GPU 2.1 5088.0 5.7
model_2 4 7 model_2_config_2 4/GPU 1.9 4928.0 5.7
model_2 4 6 model_2_config_2 4/GPU 2.0 4272.0 5.6
model_2 8 3 model_2_config_2 4/GPU 2.1 4224.0 5.6
model_2 4 5 model_2_config_2 4/GPU 2.0 3440.0 5.8
model_2 2 10 model_2_config_2 4/GPU 1.9 3380.0 6.1
model_2 2 9 model_2_config_2 4/GPU 1.9 3060.0 5.9
model_2 4 4 model_2_config_2 4/GPU 2.0 2816.0 5.7
model_2 8 2 model_2_config_2 4/GPU 2.1 2816.0 5.6
model_2 2 8 model_2_config_2 4/GPU 1.9 2784.0 5.8
model_2 16 1 model_2_config_2 4/GPU 2.1 2736.0 5.6
model_2 2 7 model_2_config_2 4/GPU 1.9 2534.0 5.6
model_2 4 3 model_2_config_2 4/GPU 2.0 2196.0 5.5
model_2 2 6 model_2_config_2 4/GPU 2.0 2193.8 5.5
model_2 2 5 model_2_config_2 4/GPU 2.0 1790.0 5.7
model_2 1 10 model_2_config_2 4/GPU 1.9 1760.0 5.8
model_2 1 9 model_2_config_2 4/GPU 1.9 1584.0 5.8
model_2 4 2 model_2_config_2 4/GPU 2.1 1448.0 5.5
model_2 8 1 model_2_config_2 4/GPU 2.1 1424.0 5.5
model_2 2 4 model_2_config_2 4/GPU 2.0 1424.0 5.7
model_2 1 8 model_2_config_2 4/GPU 1.9 1408.0 5.8
model_2 1 7 model_2_config_2 4/GPU 2.0 1239.0 5.8
model_2 2 3 model_2_config_2 4/GPU 2.1 1074.0 5.6
model_2 1 6 model_2_config_2 4/GPU 2.0 1068.0 5.7
model_2 1 5 model_2_config_2 4/GPU 2.0 880.0 5.9
model_2 2 2 model_2_config_2 4/GPU 2.1 720.0 5.6
model_2 1 4 model_2_config_2 4/GPU 2.0 712.0 5.7
model_2 4 1 model_2_config_2 4/GPU 2.1 711.3 5.6
model_2 1 3 model_2_config_2 4/GPU 2.1 525.0 5.8
model_2 2 1 model_2_config_2 4/GPU 2.1 344.0 5.8
model_2 1 2 model_2_config_2 4/GPU 2.1 342.0 5.9
model_2 1 1 model_2_config_2 4/GPU 2.1 206.0 4.9
model_2 1000 2 model_2_config_3 4/GPU 0.0 21333.3 75.2
model_2 1000 3 model_2_config_3 4/GPU 0.0 21000.0 121.5
model_2 1000 4 model_2_config_3 4/GPU 0.0 20666.7 170.2
model_2 1000 5 model_2_config_3 4/GPU 30.1 20666.7 215.8
model_2 128 9 model_2_config_3 4/GPU 4.9 20736.0 53.4
model_2 256 6 model_2_config_3 4/GPU 5.0 20736.0 69.2
model_2 512 2 model_2_config_3 4/GPU 5.1 20736.0 40.8
model_2 512 3 model_2_config_3 4/GPU 3.7 20736.0 68.3
model_2 512 5 model_2_config_3 4/GPU 17.8 20480.0 118.5
model_2 128 10 model_2_config_3 4/GPU 4.8 20480.0 59.3
model_2 512 4 model_2_config_3 4/GPU 2.7 20480.0 89.6
model_2 256 4 model_2_config_3 4/GPU 5.1 20480.0 45.3
model_2 256 5 model_2_config_3 4/GPU 5.0 20480.0 56.5
model_2 256 7 model_2_config_3 4/GPU 3.2 20224.0 81.9
model_2 128 8 model_2_config_3 4/GPU 4.9 19584.0 47.3
model_2 128 7 model_2_config_3 4/GPU 4.8 18816.0 45.9
model_2 64 10 model_2_config_3 4/GPU 4.9 18560.0 33.3
model_2 64 9 model_2_config_3 4/GPU 4.9 17856.0 31.7
model_2 128 5 model_2_config_3 4/GPU 4.9 17920.0 33.1
model_2 128 6 model_2_config_3 4/GPU 4.9 17664.0 40.6
model_2 64 8 model_2_config_3 4/GPU 4.9 17408.0 28.9
model_2 64 7 model_2_config_3 4/GPU 5.0 17024.0 25.9
model_2 256 3 model_2_config_3 4/GPU 5.0 16896.0 40.6
model_2 128 4 model_2_config_3 4/GPU 5.0 16384.0 28.8
model_2 64 6 model_2_config_3 4/GPU 5.0 16128.0 23.0
model_2 1000 1 model_2_config_3 4/GPU 0.0 16000.0 44.8
model_2 32 10 model_2_config_3 4/GPU 4.9 15680.0 19.9
model_2 64 5 model_2_config_3 4/GPU 5.0 15680.0 19.3
model_2 256 2 model_2_config_3 4/GPU 5.1 15360.0 28.4
model_2 128 3 model_2_config_3 4/GPU 5.0 15360.0 22.8
model_2 32 9 model_2_config_3 4/GPU 4.9 14976.0 19.1
model_2 32 8 model_2_config_3 4/GPU 4.9 14208.0 17.6
model_2 64 4 model_2_config_3 4/GPU 5.0 13824.0 17.7
model_2 32 7 model_2_config_3 4/GPU 4.9 13440.0 16.3
model_2 512 1 model_2_config_3 4/GPU 5.1 13312.0 28.4
model_2 128 2 model_2_config_3 4/GPU 5.1 13056.0 17.6
model_2 32 6 model_2_config_3 4/GPU 5.0 12864.0 14.5
model_2 64 3 model_2_config_3 4/GPU 5.1 12288.0 14.5
model_2 16 10 model_2_config_3 4/GPU 4.9 11840.0 13.3
model_2 256 1 model_2_config_3 4/GPU 5.1 11648.0 17.5
model_2 32 5 model_2_config_3 4/GPU 5.0 11520.0 13.2
model_2 16 9 model_2_config_3 4/GPU 4.9 10944.0 12.9
model_2 16 8 model_2_config_3 4/GPU 4.9 10368.0 12.1
model_2 32 4 model_2_config_3 4/GPU 5.0 10112.0 12.1
model_2 64 2 model_2_config_3 4/GPU 5.1 9600.0 12.0
model_2 16 7 model_2_config_3 4/GPU 5.0 9520.0 11.4
model_2 128 1 model_2_config_3 4/GPU 5.1 8832.0 12.0
model_2 16 6 model_2_config_3 4/GPU 5.0 8544.0 10.9
model_2 32 3 model_2_config_3 4/GPU 5.0 8352.0 11.1
model_2 8 10 model_2_config_3 4/GPU 4.9 7520.0 10.5
model_2 16 5 model_2_config_3 4/GPU 5.0 7440.0 10.4
model_2 8 9 model_2_config_3 4/GPU 4.9 6808.0 10.5
model_2 8 8 model_2_config_3 4/GPU 5.0 6144.0 10.3
model_2 16 4 model_2_config_3 4/GPU 5.0 6016.0 10.3
model_2 32 2 model_2_config_3 4/GPU 5.1 5824.0 10.4
model_2 8 7 model_2_config_3 4/GPU 5.0 5488.0 10.1
model_2 64 1 model_2_config_3 4/GPU 5.1 5376.0 10.7
model_2 8 6 model_2_config_3 4/GPU 5.0 4648.0 10.2
model_2 16 3 model_2_config_3 4/GPU 5.1 4560.0 10.3
model_2 8 5 model_2_config_3 4/GPU 5.0 3920.0 10.1
model_2 4 10 model_2_config_3 4/GPU 4.9 3880.0 10.2
model_2 4 9 model_2_config_3 4/GPU 4.9 3508.0 10.2
model_2 8 4 model_2_config_3 4/GPU 5.0 3136.0 10.1
model_2 4 8 model_2_config_3 4/GPU 4.9 3136.0 10.1
model_2 16 2 model_2_config_3 4/GPU 5.1 3104.0 10.0
model_2 32 1 model_2_config_3 4/GPU 5.1 2976.0 10.2
model_2 4 7 model_2_config_3 4/GPU 5.0 2772.0 10.1
model_2 4 6 model_2_config_3 4/GPU 5.0 2376.0 10.1
model_2 8 3 model_2_config_3 4/GPU 5.1 2352.0 10.1
model_2 4 5 model_2_config_3 4/GPU 5.0 1980.0 10.1
model_2 2 10 model_2_config_3 4/GPU 4.9 1960.0 10.2
model_2 2 9 model_2_config_3 4/GPU 4.9 1764.0 10.3
model_2 4 4 model_2_config_3 4/GPU 5.0 1632.0 9.7
model_2 2 8 model_2_config_3 4/GPU 4.9 1632.0 9.9
model_2 8 2 model_2_config_3 4/GPU 5.1 1632.0 9.7
model_2 16 1 model_2_config_3 4/GPU 5.1 1616.0 9.6
model_2 2 7 model_2_config_3 4/GPU 5.0 1414.0 9.9
model_2 4 3 model_2_config_3 4/GPU 5.1 1248.0 9.6
model_2 2 6 model_2_config_3 4/GPU 5.0 1236.0 9.7
model_2 2 5 model_2_config_3 4/GPU 5.0 1080.0 9.3
model_2 1 10 model_2_config_3 4/GPU 4.9 1070.0 9.5
model_2 1 9 model_2_config_3 4/GPU 4.9 963.0 9.4
model_2 2 4 model_2_config_3 4/GPU 5.0 872.0 9.1
model_2 4 2 model_2_config_3 4/GPU 5.1 872.0 9.1
model_2 8 1 model_2_config_3 4/GPU 5.1 872.0 9.0
model_2 1 8 model_2_config_3 4/GPU 4.9 864.0 9.3
model_2 1 7 model_2_config_3 4/GPU 5.0 763.0 9.3
model_2 2 3 model_2_config_3 4/GPU 5.1 660.0 9.0
model_2 1 6 model_2_config_3 4/GPU 5.0 660.0 9.1
model_2 1 5 model_2_config_3 4/GPU 5.0 550.0 9.2
model_2 2 2 model_2_config_3 4/GPU 5.1 444.0 9.0
model_2 4 1 model_2_config_3 4/GPU 5.1 444.0 8.9
model_2 1 4 model_2_config_3 4/GPU 5.0 440.0 9.1
model_2 1 3 model_2_config_3 4/GPU 5.1 330.0 9.1
model_2 2 1 model_2_config_3 4/GPU 5.1 220.0 9.1
model_2 1 2 model_2_config_3 4/GPU 5.1 220.0 9.1
model_2 1 1 model_2_config_3 4/GPU 5.1 127.0 7.9
model_2 1000 2 model_2_config_default 1/GPU 25.4 21000.0 74.6
model_2 1000 3 model_2_config_default 1/GPU 73.6 21000.0 122.6
model_2 1000 4 model_2_config_default 1/GPU 120.1 20666.7 170.0
model_2 1000 5 model_2_config_default 1/GPU 166.6 20666.7 216.2
model_2 512 3 model_2_config_default 1/GPU 39.0 20480.0 66.3
model_2 512 4 model_2_config_default 1/GPU 63.5 20469.8 90.2
model_2 512 5 model_2_config_default 1/GPU 88.0 20213.9 114.8
model_2 512 6 model_2_config_default 1/GPU 112.4 20224.0 139.5
model_2 256 3 model_2_config_default 1/GPU 20.7 19456.0 34.5
model_2 256 4 model_2_config_default 1/GPU 33.6 19456.0 47.5
model_2 256 5 model_2_config_default 1/GPU 46.6 19200.0 60.5
model_2 256 6 model_2_config_default 1/GPU 59.5 18944.0 73.5
model_2 128 2 model_2_config_default 1/GPU 4.6 17664.0 12.4
model_2 128 3 model_2_config_default 1/GPU 11.8 17536.0 19.5
model_2 128 4 model_2_config_default 1/GPU 19.0 17536.0 27.0
model_2 128 5 model_2_config_default 1/GPU 26.2 17518.5 34.6
model_2 1000 1 model_2_config_default 1/GPU 0.0 15750.0 44.6
model_2 512 2 model_2_config_default 1/GPU 0.0 15616.0 23.6
model_2 512 1 model_2_config_default 1/GPU 0.0 15616.0 23.6
model_2 64 3 model_2_config_default 1/GPU 7.0 15168.0 11.8
model_2 64 2 model_2_config_default 1/GPU 2.8 15168.0 8.5
model_2 64 4 model_2_config_default 1/GPU 11.2 15168.0 16.0
model_2 64 5 model_2_config_default 1/GPU 15.4 15104.0 20.2
model_2 256 1 model_2_config_default 1/GPU 0.0 14848.0 12.7
model_2 256 2 model_2_config_default 1/GPU 0.0 14592.0 12.7
model_2 128 1 model_2_config_default 1/GPU 0.0 13568.0 7.2
model_2 64 1 model_2_config_default 1/GPU 0.0 11776.0 4.3
model_2 32 2 model_2_config_default 1/GPU 2.5 9824.0 6.1
model_2 32 3 model_2_config_default 1/GPU 5.8 9760.0 9.5
model_2 32 5 model_2_config_default 1/GPU 12.3 9792.0 15.8
model_2 32 4 model_2_config_default 1/GPU 9.1 9728.0 12.8
model_2 32 1 model_2_config_default 1/GPU 0.0 8000.0 3.4
model_2 16 3 model_2_config_default 1/GPU 5.5 5440.0 8.5
model_2 16 5 model_2_config_default 1/GPU 11.4 5392.0 14.7
model_2 16 4 model_2_config_default 1/GPU 8.4 5392.0 11.7
model_2 16 2 model_2_config_default 1/GPU 2.6 5408.0 5.8
model_2 16 1 model_2_config_default 1/GPU 0.0 4715.3 3.1
model_2 8 4 model_2_config_default 1/GPU 8.0 2888.0 11.1
model_2 8 5 model_2_config_default 1/GPU 10.7 2896.0 13.7
model_2 8 3 model_2_config_default 1/GPU 5.3 2872.0 8.4
model_2 8 2 model_2_config_default 1/GPU 2.5 2872.0 5.4
model_2 8 1 model_2_config_default 1/GPU 0.0 2656.0 2.9
model_2 4 4 model_2_config_default 1/GPU 8.0 1456.0 11.1
model_2 4 3 model_2_config_default 1/GPU 5.3 1456.0 8.3
model_2 4 2 model_2_config_default 1/GPU 2.6 1440.0 5.7
model_2 4 1 model_2_config_default 1/GPU 0.0 1372.0 3.0
model_2 2 3 model_2_config_default 1/GPU 5.8 679.3 9.0
model_2 2 4 model_2_config_default 1/GPU 8.6 680.0 11.9
model_2 2 5 model_2_config_default 1/GPU 11.7 674.0 14.9
model_2 2 2 model_2_config_default 1/GPU 2.8 676.0 6.1
model_2 2 1 model_2_config_default 1/GPU 0.0 640.0 3.1
model_2 1 2 model_2_config_default 1/GPU 2.5 378.6 5.5
model_2 1 4 model_2_config_default 1/GPU 7.9 374.0 11.4
model_2 1 3 model_2_config_default 1/GPU 5.2 375.0 8.2
model_2 1 1 model_2_config_default 1/GPU 0.0 362.0 2.8
model_2 1000 3 model_2_config_1 2/GPU 31.5 20993.0 128.3
model_2 1000 4 model_2_config_1 2/GPU 73.0 20666.7 175.7
model_2 1000 5 model_2_config_1 2/GPU 119.8 20659.8 262.6
model_2 1000 6 model_2_config_1 2/GPU 166.5 20666.7 270.7
model_2 128 9 model_2_config_1 2/GPU 4.9 20736.0 53.3
model_2 256 6 model_2_config_1 2/GPU 5.0 20736.0 68.9
model_2 512 2 model_2_config_1 2/GPU 5.1 20736.0 41.2
model_2 512 3 model_2_config_1 2/GPU 17.4 20725.6 66.4
model_2 512 5 model_2_config_1 2/GPU 63.1 20480.0 138.7
model_2 128 8 model_2_config_1 2/GPU 4.9 20480.0 47.1
model_2 128 10 model_2_config_1 2/GPU 5.0 20480.0 59.1
model_2 256 4 model_2_config_1 2/GPU 5.0 20459.5 45.5
model_2 256 5 model_2_config_1 2/GPU 5.0 20480.0 56.8
model_2 256 7 model_2_config_1 2/GPU 20.5 20224.0 81.6
model_2 512 4 model_2_config_1 2/GPU 38.9 19968.0 91.4
model_2 128 7 model_2_config_1 2/GPU 4.9 18816.0 46.3
model_2 64 10 model_2_config_1 2/GPU 4.9 18560.0 33.5
model_2 64 9 model_2_config_1 2/GPU 4.9 17856.0 31.5
model_2 128 5 model_2_config_1 2/GPU 4.9 17920.0 33.1
model_2 128 6 model_2_config_1 2/GPU 4.9 17664.0 40.6
model_2 64 8 model_2_config_1 2/GPU 4.9 17408.0 29.2
model_2 64 7 model_2_config_1 2/GPU 4.9 17024.0 25.9
model_2 256 3 model_2_config_1 2/GPU 5.0 16896.0 40.7
model_2 128 4 model_2_config_1 2/GPU 5.0 16384.0 28.8
model_2 64 6 model_2_config_1 2/GPU 5.0 16128.0 23.1
model_2 1000 1 model_2_config_1 2/GPU 0.0 16000.0 44.8
model_2 32 10 model_2_config_1 2/GPU 4.9 15680.0 19.9
model_2 64 5 model_2_config_1 2/GPU 5.0 15680.0 19.0
model_2 1000 2 model_2_config_1 2/GPU 0.0 15500.0 44.7
model_2 256 2 model_2_config_1 2/GPU 5.1 15344.7 28.3
model_2 128 3 model_2_config_1 2/GPU 5.0 15360.0 23.0
model_2 32 9 model_2_config_1 2/GPU 4.9 14961.0 19.1
model_2 32 8 model_2_config_1 2/GPU 4.9 14336.0 17.7
model_2 512 1 model_2_config_1 2/GPU 5.1 13824.0 28.1
model_2 64 4 model_2_config_1 2/GPU 5.0 13810.2 17.6
model_2 32 7 model_2_config_1 2/GPU 5.0 13440.0 16.4
model_2 32 6 model_2_config_1 2/GPU 5.0 12864.0 14.4
model_2 128 2 model_2_config_1 2/GPU 5.1 12800.0 17.6
model_2 64 3 model_2_config_1 2/GPU 5.1 12288.0 14.4
model_2 16 10 model_2_config_1 2/GPU 4.9 11840.0 13.3
model_2 32 5 model_2_config_1 2/GPU 5.0 11520.0 13.2
model_2 256 1 model_2_config_1 2/GPU 5.1 11520.0 17.5
model_2 16 9 model_2_config_1 2/GPU 4.9 10944.0 12.9
model_2 16 8 model_2_config_1 2/GPU 5.0 10368.0 12.1
model_2 32 4 model_2_config_1 2/GPU 5.0 10101.9 12.1
model_2 64 2 model_2_config_1 2/GPU 5.1 9600.0 12.1
model_2 16 7 model_2_config_1 2/GPU 5.0 9520.0 11.4
model_2 128 1 model_2_config_1 2/GPU 5.1 8832.0 12.1
model_2 16 6 model_2_config_1 2/GPU 5.0 8544.0 11.0
model_2 32 3 model_2_config_1 2/GPU 5.1 8256.0 11.0
model_2 8 10 model_2_config_1 2/GPU 4.9 7520.0 10.5
model_2 16 5 model_2_config_1 2/GPU 5.0 7440.0 10.5
model_2 8 9 model_2_config_1 2/GPU 4.9 6840.0 10.4
model_2 8 8 model_2_config_1 2/GPU 5.0 6144.0 10.3
model_2 16 4 model_2_config_1 2/GPU 5.0 6080.0 10.3
model_2 32 2 model_2_config_1 2/GPU 5.1 5824.0 10.4
model_2 8 7 model_2_config_1 2/GPU 5.0 5488.0 10.1
model_2 64 1 model_2_config_1 2/GPU 5.1 5376.0 10.7
model_2 8 6 model_2_config_1 2/GPU 5.0 4656.0 10.2
model_2 16 3 model_2_config_1 2/GPU 5.0 4608.0 10.2
model_2 8 5 model_2_config_1 2/GPU 5.0 3920.0 10.1
model_2 4 10 model_2_config_1 2/GPU 4.9 3920.0 10.2
model_2 4 9 model_2_config_1 2/GPU 4.9 3492.0 10.2
model_2 4 8 model_2_config_1 2/GPU 4.9 3136.0 10.1
model_2 8 4 model_2_config_1 2/GPU 5.0 3136.0 10.0
model_2 16 2 model_2_config_1 2/GPU 5.1 3104.0 9.9
model_2 32 1 model_2_config_1 2/GPU 5.1 2976.0 10.1
model_2 4 7 model_2_config_1 2/GPU 5.0 2772.0 10.1
model_2 4 6 model_2_config_1 2/GPU 5.0 2376.0 10.1
model_2 8 3 model_2_config_1 2/GPU 5.1 2376.0 10.0
model_2 4 5 model_2_config_1 2/GPU 5.0 1980.0 10.1
model_2 2 10 model_2_config_1 2/GPU 4.9 1960.0 10.2
model_2 2 9 model_2_config_1 2/GPU 4.9 1782.0 10.2
model_2 4 4 model_2_config_1 2/GPU 5.0 1632.0 9.7
model_2 2 8 model_2_config_1 2/GPU 4.9 1632.0 9.9
model_2 8 2 model_2_config_1 2/GPU 5.1 1632.0 9.6
model_2 16 1 model_2_config_1 2/GPU 5.1 1616.0 9.6
model_2 2 7 model_2_config_1 2/GPU 5.0 1418.0 9.9
model_2 2 6 model_2_config_1 2/GPU 5.0 1246.8 9.6
model_2 4 3 model_2_config_1 2/GPU 5.1 1248.0 9.6
model_2 2 5 model_2_config_1 2/GPU 5.0 1080.0 9.2
model_2 1 10 model_2_config_1 2/GPU 4.9 1070.0 9.5
model_2 1 9 model_2_config_1 2/GPU 4.9 963.0 9.4
model_2 4 2 model_2_config_1 2/GPU 5.1 880.0 9.0
model_2 8 1 model_2_config_1 2/GPU 5.1 872.0 9.0
model_2 2 4 model_2_config_1 2/GPU 5.0 872.0 9.2
model_2 1 8 model_2_config_1 2/GPU 4.9 872.0 9.2
model_2 1 7 model_2_config_1 2/GPU 5.0 763.0 9.2
model_2 2 3 model_2_config_1 2/GPU 5.1 665.3 9.0
model_2 1 6 model_2_config_1 2/GPU 5.0 660.0 9.2
model_2 1 5 model_2_config_1 2/GPU 5.0 550.0 9.1
model_2 1 4 model_2_config_1 2/GPU 5.0 444.0 9.0
model_2 4 1 model_2_config_1 2/GPU 5.1 444.0 8.9
model_2 1 3 model_2_config_1 2/GPU 5.1 329.7 9.1
model_2 2 2 model_2_config_1 2/GPU 5.1 220.0 9.0
model_2 2 1 model_2_config_1 2/GPU 5.1 220.0 9.2
model_2 1 2 model_2_config_1 2/GPU 5.1 220.0 9.1
model_2 1 1 model_2_config_1 2/GPU 5.1 127.0 7.9

Running Model Analyzer and Triton Server on separate host machines

From the user:

"I am running the model analyzer in remote mode. One thing I am not clear is does the model analyzer and triton server needs to be run on same host? Current setup I am trying is different host machine running model-analyzer and triton inference server running on AKS cluster GPU node. With this setup I am getting error that GPUs are not matching for model-analyzer and triton. Attaching error log."

Error log can be found here:

error.log

Failed to run Quickstart

Hello, I'm new at Triton inference server.

Here's an explanation how I did.

  1. Run Triton inference server first
    docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v pwd/model_repository:/models nvcr.io/nvidia/tritonserver:20.11-py3 tritonserver --model-control-mode=explicit --model-repository=/models

Check that Triton server is working through IP_ADDRESS:8000/v2 on Postman
{
"name": "triton",
"version": "2.5.0",
"extensions": [
"classification",
"sequence",
"model_repository",
"schedule_policy",
"model_configuration",
"system_shared_memory",
"cuda_shared_memory",
"binary_tensor_data",
"statistics"
]
}

  1. I followed the instruction on Quickstart of model-analyzer
    model-analyzer -m /quick_start_repository -n add_sub --triton-launch-mode=remote --export-path=analysis_results

  2. Error message was
    root@d1:/# model-analyzer -m /quick_start_repository -n add_sub --triton-launch-mode=remote --export-path=analysis_results 2021-04-05 05:32:45.685 INFO[entrypoint.py:288] Triton Model Analyzer started: config={'model_repository': '/quick_start_repository', 'model_names': [{'model_name': 'add_sub', 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': []}}], 'objectives': {'perf_throughput': 10}, 'constraints': {}, 'batch_sizes': [1], 'concurrency': [], 'perf_analyzer_timeout': 600, 'perf_analyzer_cpu_util': 80.0, 'run_config_search_max_concurrency': 1024, 'run_config_search_max_instance_count': 5, 'run_config_search_disable': False, 'run_config_search_max_preferred_batch_size': 16, 'export': True, 'export_path': 'analysis_results', 'summarize': True, 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'perf_output': False, 'triton_launch_mode': 'remote', 'triton_docker_image': 'nvcr.io/nvidia/tritonserver:21.02-py3', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'triton_server_flags': {}, 'log_level': 'INFO', 'gpus': ['all'], 'output_model_repository_path': './output_model_repository', 'override_output_model_repository': False, 'config_file': None, 'inference_output_fields': ['model_name', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'dynamic_batch_sizes', 'satisfies_constraints', 'perf_throughput', 'perf_latency', 'cpu_used_ram'], 'gpu_output_fields': ['model_name', 'gpu_id', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'dynamic_batch_sizes', 'satisfies_constraints', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'server_output_fields': ['model_name', 'gpu_id', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'plots': [{'name': 'throughput_v_latency', 'title': 'Throughput vs. Latency', 'x_axis': 'perf_latency', 'y_axis': 'perf_throughput', 'monotonic': True}, {'name': 'gpu_mem_v_latency', 'title': 'GPU Memory vs. Latency', 'x_axis': 'perf_latency', 'y_axis': 'gpu_used_memory', 'monotonic': False}], 'top_n_configs': 3} 2021-04-05 05:32:45.687 INFO[entrypoint.py:79] Using remote Triton Server... 2021-04-05 05:32:45.687 WARNING[entrypoint.py:82] GPU memory metrics reported in the remote mode are not accuracte. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models. 2021-04-05 05:32:45.687 WARNING[entrypoint.py:89] Config sweep parameters are ignored in the "remote" mode because Model Analyzer does not have access to the model repository of the remote Triton Server. 2021-04-05 05:32:45.753 INFO[driver.py:236] init 2021-04-05 05:32:46.823 INFO[entrypoint.py:327] Starting perf_analyzer... 2021-04-05 05:32:46.823 INFO[analyzer.py:82] Profiling server only metrics... 2021-04-05 05:32:47.849 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-c8fdb676-2c11-669a-4cff-f300b28eb26a } for the analysis. 2021-04-05 05:32:48.869 INFO[run_search.py:155] Will sweep only through the concurrency values... 2021-04-05 05:32:48.869 INFO[run_search.py:262] Concurrency set to 1. 2021-04-05 05:32:48.870 INFO[client.py:82] Model add_sub load failed: [StatusCode.INTERNAL] failed to load 'add_sub', no version is available 2021-04-05 05:32:53.922 INFO[client.py:143] Model readiness failed for model add_sub. Error None Traceback (most recent call last): File "/usr/local/bin/model-analyzer", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 328, in main analyzer.run() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 95, in run run_config_generator = RunConfigGenerator( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 65, in __init__ self._generate_run_configs() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 290, in _generate_run_configs self._generate_run_config_for_model_sweep( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/config/run/run_config_generator.py", line 229, in _generate_run_config_for_model_sweep model_config = ModelConfig.create_from_triton_api( File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/model/model_config.py", line 111, in create_from_triton_api model_config_dict = client.get_model_config(model_name, num_retries) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/client/grpc_client.py", line 54, in get_model_config model_config_dict = self._client.get_model_config(model_name, File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 476, in get_model_config raise_error_grpc(rpc_error) File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Request for unknown model: 'add_sub' has no available versions

I didn't change add_sub folder, so I have no idea what to do.

Old protobuf install in Dockerfile build

If building the model-analyzer from the provided Dockerfile, protobuf version 3.6.1 is installed leading to a failure when running the profile command:

  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 280, in main
    analyzer.cb_search(client=client)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 166, in cb_search
    self._model_manager.cb_search_models(self._config.profile_models)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 113, in cb_search_models
    self._execute_vw_search(self._config.iterations)
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 296, in _execute_vw_search
    if not self._create_and_load_model_variant(
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/model_manager.py", line 338, in _create_and_load_model_variant
    variant_config.write_config_to_file(new_model_dir,
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/triton/model/model_config.py", line 182, in write_config_to_file
    model_config_bytes = text_format.MessageToBytes(self._model_config)
AttributeError: module 'google.protobuf.text_format' has no attribute 'MessageToBytes'

In the pre-built image nvcr.io/nvidia/tritonserver:21.06-py3-sdk, protobuf version 3.17 is installed which does have an implementation for MessageToBytes.

handle server_config unexceptedly when cli parameter contains "="

Hi~guys
I have run model_analyzer in local mode, and I want to run tritonserver using tensorflow backend version 2, so I set server_config as below:

    triton_server_flags:
      backend_config: tensorflow,version=2
      strict_model_config: false

But when I execute model-analyzer, tritonserver has error in log:
--backend-config option format is '<backend name>,<setting>=<value>'. Got tensorflow,version

And I see source code in server_local.py:

        if self._server_path:
            # Create command list and run subprocess
            cmd = [self._server_path]
            cmd += self._server_config.to_cli_string().replace('=', ' ').split()

There just replace '=' with ''

So how can I set a server_config parameter with '='?

Cant profile dynamic batch model

Hey guys

I have a model with the config.pbtxt below

name: "ner"
platform: "onnxruntime_onnx"
max_batch_size: 100
input [
  {
    name: "input_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "token_type_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT64
    dims: [ -1 ]
  }
]
output [
  {
    name: "start_logits"
    data_type: TYPE_FP32
    dims: [ -1, 5 ]
  },
  {
    name: "end_logits"
    data_type: TYPE_FP32
    dims: [ -1, 5 ]
  }
]

and when I try to run model_analyzer, I get the following message:

2021-12-14 12:31:48.240 INFO[model_manager.py:224] Profiling model ner_i0...
2021-12-14 12:31:49.260 INFO[perf_analyzer.py:258] Running perf_analyzer ['perf_analyzer', '-m', 'ner_i0', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '--concurrency-range', '1', '--measurement-mode', 'count_windows'] failed with exit status 1 : error: failed to create concurrency manager: input attention_mask contains dynamic shape, provide shapes to send along with the request

How can I config model_analyzer to send the shapes as well or its not possible to use it on dynamic batch model? Thank you

fail to quick start[StatusCode.UNAVAILABLE]

when I follow the QUICK START, I fail to model-analyzer the add_sub example.
steps:

docker run -it --rm --gpus all     -v $(pwd)/examples/quick-start:/quick_start_repository     --net=host --name model-analyzer     model-analyzer /bin/bash
model-analyzer profile --model-repository /quick_start_repository --profile-models add_sub

Results:

2022-03-02 08:25:05.758 INFO[entrypoint.py:161] Starting a local Triton Server...
2022-03-02 08:25:05.759 INFO[analyzer_state_manager.py:132] No checkpoint file found, starting a fresh run.
2022-03-02 08:25:05.964 INFO[analyzer.py:106] Profiling server only metrics...
2022-03-02 08:25:05.976 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:10.15 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:10.35 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:11.39 INFO[client.py:85] Model add_sub_config_default load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:11.55 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:11.70 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:12.74 INFO[client.py:85] Model add_sub_config_0 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:12.82 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:12.96 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:13.100 INFO[client.py:85] Model add_sub_config_1 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:13.116 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:13.134 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:14.138 INFO[client.py:85] Model add_sub_config_2 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:14.139 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:14.156 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:15.160 INFO[client.py:85] Model add_sub_config_3 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:15.225 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:15.243 INFO[server_local.py:100] Triton Server started.
2022-03-02 08:25:16.247 INFO[client.py:85] Model add_sub_config_4 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
2022-03-02 08:25:16.263 INFO[server_local.py:121] Stopped Triton Server.
2022-03-02 08:25:16.265 INFO[analyzer_state_manager.py:158] Saved checkpoint to /opt/triton-model-analyzer/checkpoints/0.ckpt.
2022-03-02 08:25:16.265 INFO[analyzer.py:129] Profile complete. Profiled 0 configurations for models: [].
2022-03-02 08:25:16.265 INFO[analyzer.py:130] To analyze the profile results and find the best configurations, run `model-analyzer analyze --analysis-models `

Unable to run model_analyzer

I am running a Triton inference Server in a docker container. I installed model-analyzer from source (not docker).
This is the command I used:
Triton Inference server:
docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /home/ubuntu/model_repository:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/models --model-control-mode=explicit

Model Analyzer:
model-analyzer profile -m /home/ubuntu/model_repository/ --profile-models tf_savedmodel_effdet1 --override-output-model-repository --triton-launch-mode=remote

I am getting this:

2021-08-12 10:56:38.639 WARNING[entrypoint.py:235] Overriding the output model repo path "./output_model_repository"...
2021-08-12 10:56:38.641 INFO[entrypoint.py:72] Using remote Triton Server...
2021-08-12 10:56:39.778 WARNING[entrypoint.py:77] GPU memory metrics reported in the remote mode are not accuracte. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
2021-08-12 10:56:39.778 WARNING[entrypoint.py:84] Config sweep parameters are ignored in the "remote" mode because Model Analyzer does not have access to the model repository of the remote Triton Server.
2021-08-12 10:56:39.779 INFO[analyzer_state_manager.py:117] Loaded checkpoint from file ./checkpoints/2.ckpt
2021-08-12 10:56:39.780 INFO[analyzer.py:81] Profiling server only metrics...
2021-08-12 10:56:41.947 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-c094d241-f952-eeb3-5fcc-41377057fe37 } for profiling.
2021-08-12 10:56:42.968 INFO[model_manager.py:87] Running auto config search for model: tf_savedmodel_effdet1
2021-08-12 10:56:42.968 INFO[run_search.py:149] Will sweep only through the concurrency values...
2021-08-12 10:56:42.968 INFO[run_search.py:288] [Search Step] Concurrency set to 1. 
2021-08-12 10:57:06.7 INFO[client.py:78] Model tf_savedmodel_effdet1 loaded.
2021-08-12 10:57:06.11 INFO[client.py:102] Model tf_savedmodel_effdet1 unloaded.
2021-08-12 10:57:27.27 INFO[client.py:78] Model tf_savedmodel_effdet1 loaded.
2021-08-12 10:57:27.27 INFO[model_manager.py:214] Profiling model tf_savedmodel_effdet1...
2021-08-12 10:57:28.94 INFO[gpu_monitor.py:73] Using GPU(s) with UUID(s) = { GPU-c094d241-f952-eeb3-5fcc-41377057fe37 } for profiling.
2021-08-12 10:57:29.125 INFO[perf_analyzer.py:219] Running perf_analyzer ['perf_analyzer', '-m', 'tf_savedmodel_effdet1', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '--concurrency-range', '1', '--measurement-mode', 'count_windows'] failed with exit status 1 : perf_analyzer: unrecognized option '--measurement-mode'
Usage: perf_analyzer [options]
.
.
.

Appreciate your help.

Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.

version : 21.11
install : Triton SDK Container && Building the Dockerfile
what`s wrong with me?

2022-04-01 01:41:15.266 INFO[model_manager.py:221] Profiling model yolov5tensorrt_i11...
2022-04-01 01:41:33.333 INFO[perf_analyzer.py:252] perf_analyzer's request count is too small, increased to 100.
2022-04-01 01:41:52.415 INFO[perf_analyzer.py:252] perf_analyzer's request count is too small, increased to 150.
2022-04-01 01:41:56.436 INFO[perf_analyzer.py:258] Running perf_analyzer ['perf_analyzer', '-m', 'yolov5tensorrt_i11', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '--concurrency-range', '256', '--measurement-mode', 'count_windows', '--measurement-request-count', '150'] failed with exit status 1 : *** Measurement Settings ***
  Batch size: 1
  Using "count_windows" mode for stabilization
  Minimum number of samples in each window: 150
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 256
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Broken pipe
Thread [1] had error: Broken pipe
Thread [2] had error: Broken pipe
Thread [3] had error: Broken pipe
...
Thread [253] had error: Broken pipe
Thread [254] had error: Broken pipe
Thread [255] had error: Broken pipe

computer:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P0    29W /  N/A |   2711MiB /  8192MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1167      G   /usr/lib/xorg/Xorg                174MiB |
|    0   N/A  N/A      1445      G   /usr/bin/gnome-shell               33MiB |
|    0   N/A  N/A      6367      G   ...AAAAAAAAA= --shared-files       35MiB |
|    0   N/A  N/A      7035      C   /usr/bin/python3                  133MiB |
|    0   N/A  N/A      7253      G   ...,69796438478906170,131072       64MiB |
|    0   N/A  N/A     23106      C   tritonserver                     2265MiB |
+-----------------------------------------------------------------------------+

image

Installation instructions perf_analyzer

I tried running the quick start instructions using docker containers on the deep learning AMI on AWS and ran into this error which I'm suspecting is my fault

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.

So I tried to look at the pip instructions instead and got stuck at this step

$ ./build_wheel.sh <path to perf_analyzer> true

And it seems like I there's a missing step referencing that Triton server needs to be setup as well https://github.com/triton-inference-server which would add the perf_analyzer to $PATH in which case it makes sense to to remove it as user requirement in the script so it would just be

$ ./build_wheel.sh

and also drop the True since you do mention that it's Linux specific later in the doc

Error starting embedded DCGM engine

I'm trying to run model-analyzer in kubernetes but it is failing with the following error:

│ Unhandled exception. System.TypeInitializationException: The type initializer for 'Triton.MemoryAnalyzer.Metrics.GpuMetrics' threw an exception.                                           │
│  ---> System.InvalidOperationException: Error starting embedded DCGM engine. DCGM initialization error.                                                                                    │
│    at Triton.MemoryAnalyzer.Metrics.GpuMetrics..cctor()                                                                                                                                    │
│    --- End of inner exception stack trace ---                                                                                                                                              │
│    at Triton.MemoryAnalyzer.Metrics.GpuMetrics..ctor()                                                                                                                                     │
│    at Triton.MemoryAnalyzer.MetricsCollector..ctor(MetricsCollectorConfig config)                                                                                                          │
│    at Triton.MemoryAnalyzer.Program.<>c__DisplayClass7_0.<Main>b__2(K8sOptions options)                                                                                                    │
│    at CommandLine.ParserResultExtensions.MapResult[T1,T2,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 notParsedFunc)                                     │
│    at Triton.MemoryAnalyzer.Program.Main(String[] args)                                                                                                                                    │
│ stream closed 

Has anyone seen it before ?

How to use analyze with specific parameters?

I have a 18 configs sweep with a concurrency test from 1 - 40. Is there a way to analyze (or summarize) the model at a specific concurrency or test parameters? Right now if I want set an objective for lowest latency, it will recommend the result from concurrency of 1 (which is is the lowest) but it might be different at bigger concurrency .

Segmentation fault (core dumped)

Description
When I run the demo "model-analyzer profile --model-repository quick-start/ --profile-models add_sub --override-output-model-repository", it happens Segmentation fault (core dumped)

Triton Information
I use the container nvcr.io/nvidia/tritonserver:20.09-py3, and create a python=3.8 environment with triton-model-analyzer

Are you using the Triton container or did you build it yourself?
yes
To Reproduce

use the container nvcr.io/nvidia/tritonserver:20.09-py3, and create a python=3.8 environment with triton-model-analyzer
use "model-analyzer profile --model-repository quick-start/ --profile-models add_sub --override-output-model-repository" command
2022-03-06 11:39:26.997 INFO[gpu_device_factory.py:50] Initiliazing GPUDevice handles...
2022-03-06 11:39:28.85 INFO[gpu_device_factory.py:246] Using GPU 0 NVIDIA GeForce RTX 2080 Ti with UUID GPU-f0873fa2-bf8b-0fd9-776b-1594a007e458
2022-03-06 11:39:28.88 WARNING[entrypoint.py:362] Overriding the output model repo path "/root/model_analyzer-main/examples/output_model_repository"...
2022-03-06 11:39:28.98 INFO[entrypoint.py:161] Starting a local Triton Server...
2022-03-06 11:39:28.100 INFO[analyzer_state_manager.py:132] No checkpoint file found, starting a fresh run.
2022-03-06 11:39:28.212 INFO[analyzer.py:106] Profiling server only metrics...
2022-03-06 11:39:28.229 INFO[server_local.py:100] Triton Server started.
Segmentation fault (core dumped)

Expected behavior
How can I solve this problem?

Run models concurrently

We know that the below command (with -n flag) benchmark the models in sequence -
model-analyzer -m /models/ -n model1, model2 --batch-sizes 1,2,4,8,16 -c 1,2,3

My requirement is to benchmark multiple models concurrently on a single GPU and was exploring to find out if there is a flag where we could pass models as arguments and the model analyzer could run them concurrently instead of running them in sequence?

No such file or directory: 'tritonserver'

21.09 py and 21.09 py-sdk docker image
when I use

root@VM-74-225-centos:/workspace#  model-analyzer profile  --profile-models TIS-recommend-onnx11 -m /data/home/guirongguo/ai/model_repository --override-output-model-repository

raise this error

System.ComponentModel.Win32Exception (2): No such file or directory

Running the cmd on Linux threw this errors.

docker run -v /var/run/docker.sock:/var/run/docker.sock
-v /ABSOLUTE/PATH/TO/MODELS:ABSOLUTE/PATH/TO/MODELS
-v /ABSOLUTE/PATH/TO/EXPORT/DIRECTORY:/results --net=host
nvcr.io/nvidia/clara/model-analyzer:ANALYZER-VERSION
--batch BATCH-SIZES
--concurrency CONCURRENCY-VALUES
--model-names MODEL-NAMES
--triton-version TRITON-VERSION
--model-folder /ABSOLUTE/PATH/TO/MODELS
--export --export-path /results/

[Bug] add_sub example failed

Problem:
add_sub example failed. log as below.
INFO[client.py:82] Model add_sub_i0 load failed: [StatusCode.INTERNAL] failed to load 'add_sub_i0', no version is available
It's an INFO, but finally I get empty metrices:

Server Only:
Model           GPU ID   Batch   Concurrency   Max GPU Memory Usage(MB)   Max GPU Memory Available(MB)   Max GPU Utilization(%)
triton-server   0        0       0             166.0                      14943.0                        0.0
triton-server   1        0       0             166.0                      14943.0                        0.0
triton-server   2        0       0             166.0                      14943.0                        0.0
triton-server   3        0       0             166.0                      14943.0                        0.0

Models (GPU Metrics):
Model   GPU ID   Batch   Concurrency   Model Config Path   Max GPU Memory Usage(MB)   Max GPU Memory Available(MB)   Max GPU Utilization(%)

Models (Inference):
Model   Batch   Concurrency   Model Config Path   Throughput(infer/sec)   Average Latency(us)   Max RAM Usage(MB)   Max RAM Available(MB)

Models (GPU Metrics - Failed Constraints):
Model   GPU ID   Batch   Concurrency   Model Config Path   Max GPU Memory Usage(MB)   Max GPU Memory Available(MB)   Max GPU Utilization(%)

Models (Inference - Failed Constraints):
Model   Batch   Concurrency   Model Config Path   Throughput(infer/sec)   Average Latency(us)   Max RAM Usage(MB)   Max RAM Available(MB)

All I've done is:

  • Pull image from ngc
    • nvcr.io/nvidia/tritonserver:21.03-py3-sdk as doc says
    • nvcr.io/nvidia/tritonserver:21.03-py3 for --triton-launch-mode=docker
  • Clone model_analyzer repo to $HOME
    • cd $HOME && git clone https://github.com/triton-inference-server/model_analyzer.git
  • Start docker container as doc says: docker run -it --rm --gpus all \ -v /var/run/docker.sock:/var/run/docker.sock \ -v $HOME/model_analyzer/examples/quick-start:/quick_start_repository \ --net=host --name model-analyzer \ nvcr.io/nvidia/tritonserver:21.03-py3-sdk /bin/bash
  • Under /workspace folder, run model-analyzer -m /quick_start_repository -n add_sub --triton-launch-mode=docker --triton-version=21.03-py3 --export-path=analysis_results --log-level=DEBUG --override-output-model-repository

Did I miss something?

Failed to load model

I am trying to use memory analyzer for the resnet50_netdef model. It seems that I can start the tritonserver but cannot load the model. What may be the problem here?

kingsleyl@prm-dgx-05:/gpfs/fs1/kingsley/server/docs/examples$ docker run -v /var/run/docker.sock:/var/run/docker.sock -v /gpfs/fs1/kingsley/server/docs/examples/embed_model_repository:/models -v /gpfs/fs1/kingsley/server/docs/examples/results:/results --net=host memory-analyzer:latest --batch 1,2,4 --concurrency 1,2,4 --model-names resnet50_netdef --model-folder /models --export --export-path /results/
Failed to load resnet50_netdef on inference server: skipping model

Server Only:
Model                         Batch               Concurrency         Throughput          Max Memory Util(%)  Max GPU Util(%)     Max BAR1(MB)        Max Framebuffer(MB)
triton-server                 0                   0                   0 infer/sec         0                   0                   8                   308

Models:
Model                         Batch               Concurrency         Throughput          Max Memory Util(%)  Max GPU Util(%)     Max BAR1(MB)        Max Framebuffer(MB)

By the way, I can start the tritonserver with resnet50_netdef model successfully in the environment inside the nvcr.io/nvidia/tritonserver:20.09-py3 container.

model_analyzer profile mode time out

I'm running model_analyzer with tensorflow_savedmodel backend and perf-analyzer timed out without any checkpoints saved. See sample command used (note that increasing --perf-analyzer-timeout does not help neither..):

root@eea5d043723b:/opt/tritonserver/model_analyzer/examples# model-analyzer profile -m /models --profile-models pdv1 --override-output-model-repository --collect-cpu-metrics true -c 1 --client-protocol http --run-config-search-max-concurrency 8 --run-config-search-max-instance-count 1 --run-config-search-preferred-batch-size-disable true --perf-analyzer-timeout 20
2021-09-10 22:18:23.453 WARNING[entrypoint.py:235] Overriding the output model repo path "./output_model_repository"...
2021-09-10 22:18:23.453 INFO[entrypoint.py:97] Starting a local Triton Server...
2021-09-10 22:18:23.453 INFO[analyzer_state_manager.py:130] No checkpoint file found, starting a fresh run.
2021-09-10 22:18:23.453 INFO[analyzer.py:81] Profiling server only metrics...
2021-09-10 22:18:23.459 INFO[server_local.py:98] Triton Server started.
2021-09-10 22:18:24.148 INFO[server_local.py:117] Triton Server stopped.
2021-09-10 22:18:24.148 INFO[model_manager.py:87] Running auto config search for model: pdv1
2021-09-10 22:18:24.148 INFO[run_search.py:296] [Search Step] Instance count set to 1, and dynamic batching is disabled.
2021-09-10 22:18:24.157 INFO[server_local.py:98] Triton Server started.
2021-09-10 22:18:24.974 INFO[client.py:78] Model pdv1_i0 loaded.
2021-09-10 22:18:24.974 INFO[model_manager.py:214] Profiling model pdv1_i0...
2021-09-10 22:18:24.974 WARNING[metrics_manager.py:173] CPU metric(s) are being collected.
2021-09-10 22:18:24.974 WARNING[metrics_manager.py:174] Collecting CPU metric(s) can affect the latency or throughput numbers reported by perf analyzer.
2021-09-10 22:18:24.974 INFO[metrics_manager.py:177] CPU metric(s) collection can be disabled by removing the CPU metrics (e.g. cpu_used_ram) from the --metrics flag.
2021-09-10 22:18:45.5 INFO[perf_analyzer.py:175] perf_analyzer took very long to exit, killing perf_analyzer...
2021-09-10 22:18:46.73 INFO[server_local.py:117] Triton Server stopped.
2021-09-10 22:18:46.73 INFO[analyzer_state_manager.py:156] Saved checkpoint to ./checkpoints/0.ckpt.
2021-09-10 22:18:46.73 INFO[analyzer.py:116] Finished profiling. Obtained measurements for models: [].

A follow up question: how does model_analyzer compose inference request body? is it possible to set the payload directly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.