I was running the server at merlin-inference:22.04 with the same model and settings as #40, which was able to receive queries, and but it has died with SIGBUS.
docker run --rm --runtime=nvidia --net=host -e HUGECTR_LOG_LEVEL=0 -it \
-v `pwd`:/models \
nvcr.io/nvidia/merlin/merlin-inference:22.04 \
tritonserver \
--model-repository=/models/ \
--load-model=meb \
--model-control-mode=explicit \
--backend-directory=/usr/local/hugectr/backends \
--backend-config=hugectr,ps=/models/meb/ps.json \
--log-info=true \
--log-verbose=0
$ ./run.sh [525/538]
Unable to find image 'nvcr.io/nvidia/merlin/merlin-inference:22.04' locally
22.04: Pulling from nvidia/merlin/merlin-inference
4d32b49e2995: Already exists
45893188359a: Pulling fs layer
5ad1f2004580: Pulling fs layer
6ddc1d0f9183: Pulling fs layer
4cc43a803109: Waiting
e94a4481e933: Waiting
3e7e4c9bc2b1: Waiting
9463aa3f5627: Waiting
a4a0c690bc7d: Waiting
59d451175f69: Waiting
eaf45e9f32d1: Waiting
d8d16d6af76d: Waiting
9e04bda98b05: Waiting
4f4fb700ef54: Pull complete
98e1b8b4cf4b: Pull complete
3ba4cd25cab4: Pull complete
e07a05c28244: Pull complete
6a99482f27f4: Pull complete
0a9c87e68332: Pull complete
6d909763dff3: Pull complete
7f01a1b77738: Pull complete
c70caad572e6: Pull complete
c0b57c72d7c7: Pull complete
3b7c493bb8f8: Pull complete
70f21191d5fa: Pull complete
b72ef49a1648: Pull complete
1735193fce1a: Pull complete
6f0a31eb4fc9: Pull complete
5a83b81d8cfd: Pull complete
24c069e055bb: Pull complete
9c90284fcd0f: Pull complete
405c3b74edb7: Pull complete
2c2cfec47605: Pull complete
f9e5bf6b037e: Pull complete
69b1183a0dc9: Pull complete
73133bf37ddc: Pull complete
187e35d56f89: Pull complete
23ec4ade6dcd: Pull complete
4fba3dd7f97c: Pull complete
11923c954056: Pull complete
95b67db4aa6d: Pull complete
73d16c81d9c9: Pull complete
f0a024c8b08f: Pull complete
69b1183a0dc9: Pull complete
73133bf37ddc: Pull complete
187e35d56f89: Pull complete
23ec4ade6dcd: Pull complete
4fba3dd7f97c: Pull complete
11923c954056: Pull complete
95b67db4aa6d: Pull complete
73d16c81d9c9: Pull complete
f0a024c8b08f: Pull complete
099d0dd31169: Pull complete
96d82345047b: Pull complete
188b63e153b6: Pull complete
de97abb09153: Pull complete
01be5700f44b: Pull complete
9f01a696bb8b: Pull complete
3e3d4a57ff34: Pull complete
ccb0ee9eb079: Pull complete
a496569779c9: Pull complete
e9c89d74ffd4: Pull complete
e225aafaa730: Pull complete
ef7b62a5bf12: Pull complete
43449b45a07c: Pull complete
09fcbe8e254c: Pull complete
b85ef4e24a81: Pull complete
Digest: sha256:eeb55e2463291d83b8ad3d05f63a8be641dd1684daa13cafc2ea044546130fd5
Status: Downloaded newer image for nvcr.io/nvidia/merlin/merlin-inference:22.04
==================================
== Triton Inference Server Base ==
==================================
NVIDIA Release 22.03 (build 33743047)
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 11.6 driver version 510.47.03 with kernel driver version 460.73.01.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
I0509 02:01:17.740583 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f7998000000' with size 268435456
I0509 02:01:17.750397 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0509 02:01:17.750411 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0509 02:01:17.750417 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 2 with size 67108864
I0509 02:01:17.750422 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 3 with size 67108864
I0509 02:01:17.750428 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 4 with size 67108864
I0509 02:01:17.750434 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 5 with size 67108864
I0509 02:01:17.750439 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 6 with size 67108864
I0509 02:01:17.750446 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 7 with size 67108864
I0509 02:01:18.869713 1 model_repository_manager.cc:997] loading: meb:1
I0509 02:01:18.999772 1 hugectr.cc:1597] TRITONBACKEND_Initialize: hugectr
I0509 02:01:18.999796 1 hugectr.cc:1604] Triton TRITONBACKEND API version: 1.8
I0509 02:01:18.999803 1 hugectr.cc:1608] 'hugectr' TRITONBACKEND API version: 1.8
I0509 02:01:18.999810 1 hugectr.cc:1631] The HugeCTR backend Repository location: /usr/local/hugectr/backends/hugectr
I0509 02:01:18.999816 1 hugectr.cc:1640] The HugeCTR backend configuration: {"cmdline":{"ps":"/models/meb/ps.json"}}
I0509 02:01:18.999842 1 hugectr.cc:344] *****Parsing Parameter Server Configuration from /models/meb/ps.json
I0509 02:01:18.999897 1 hugectr.cc:365] Support 64-bit keys = 1
I0509 02:01:18.999912 1 hugectr.cc:376] Volatile database -> type = parallel_hash_map
I0509 02:01:18.999918 1 hugectr.cc:381] Volatile database -> address = 127.0.0.1:7000
I0509 02:01:18.999924 1 hugectr.cc:386] Volatile database -> user name = default
I0509 02:01:18.999932 1 hugectr.cc:390] Volatile database -> password = <empty>
I0509 02:01:18.999938 1 hugectr.cc:397] Volatile database -> algorithm = phm
I0509 02:01:18.999945 1 hugectr.cc:402] Volatile database -> number of partitions = 16
I0509 02:01:18.999951 1 hugectr.cc:408] Volatile database -> max. batch size (GET) = 100000
I0509 02:01:18.999958 1 hugectr.cc:415] Volatile database -> max. batch size (SET) = 100000
I0509 02:01:18.999964 1 hugectr.cc:423] Volatile database -> refresh time after fetch = 0
I0509 02:01:18.999972 1 hugectr.cc:430] Volatile database -> overflow margin = 120000000
I0509 02:01:18.999979 1 hugectr.cc:436] Volatile database -> overflow policy = evict_oldest
I0509 02:01:18.999999 1 hugectr.cc:442] Volatile database -> overflow resolution target = 0.8
I0509 02:01:19.000007 1 hugectr.cc:450] Volatile database -> initial cache rate = 1
I0509 02:01:19.000014 1 hugectr.cc:456] Volatile database -> cache missed embeddings = 0
I0509 02:01:19.000021 1 hugectr.cc:466] Volatile database -> update filters = []
I0509 02:01:19.000060 1 hugectr.cc:583] Model name = meb
I0509 02:01:19.000067 1 hugectr.cc:592] Model 'meb' -> network file = /models/meb/1/meb.json
I0509 02:01:19.000074 1 hugectr.cc:599] Model 'meb' -> max. batch size = 100
I0509 02:01:19.000081 1 hugectr.cc:605] Model 'meb' -> dense model file = /models/meb/1/meb_dense_0.model
I0509 02:01:19.000089 1 hugectr.cc:611] Model 'meb' -> sparse model files = [/models/meb/1/meb0_sparse_0.model]
I0509 02:01:19.000097 1 hugectr.cc:622] Model 'meb' -> use GPU embedding cache = 1 [406/538]
I0509 02:01:19.000106 1 hugectr.cc:631] Model 'meb' -> hit rate threshold = 0.8
I0509 02:01:19.000115 1 hugectr.cc:639] Model 'meb' -> per model GPU cache = 1
I0509 02:01:19.000132 1 hugectr.cc:655] Model 'meb' -> num. pool worker buffers = 4
I0509 02:01:19.000142 1 hugectr.cc:662] Model 'meb' -> num. pool refresh buffers = 1
I0509 02:01:19.000149 1 hugectr.cc:669] Model 'meb' -> cache refresh rate per iteration = 0.2
I0509 02:01:19.000158 1 hugectr.cc:678] Model 'meb' -> deployed device list = [0, 1, 2, 3, 4, 5, 6, 7]
I0509 02:01:19.000167 1 hugectr.cc:686] Model 'meb' -> default value for each table = [0]
I0509 02:01:19.000179 1 hugectr.cc:706] *****The HugeCTR Backend Parameter Server is creating... *****
I0509 02:01:19.000351 1 hugectr.cc:714] ***** Parameter Server(Int64) is creating... *****
I0509 02:03:04.076790 1 hugectr.cc:725] *****The HugeCTR Backend Backend created the Parameter Server successfully! *****
I0509 02:03:04.076884 1 hugectr.cc:1703] TRITONBACKEND_ModelInitialize: meb (version 1)
I0509 02:03:04.076891 1 hugectr.cc:1716] Repository location: /models/meb
I0509 02:03:04.076899 1 hugectr.cc:1731] backend configuration in mode: {"cmdline":{"ps":"/models/meb/ps.json"}}
I0509 02:03:04.078340 1 hugectr.cc:974] Verifying model configuration: {
"name": "meb",
"platform": "",
"backend": "hugectr",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 100,
"input": [
{
"name": "DES",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "CATCOLUMN",
"data_type": "TYPE_INT64",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "ROWINDEX",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "OUTPUT0",
"data_type": "TYPE_FP32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "meb_0",
"kind": "KIND_GPU",
"count": 4,
"gpus": [
0,
1,
2,
3,
4,
5,
6,
7
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"max_nnz": {
"string_value": "8000"
},
"embedding_vector_size": {
"string_value": "15"
},
"gpucacheper": {
"string_value": "1.0"
},
"des_feature_num": {
"string_value": "0"
},
"hit_rate_threshold": {
"string_value": "0.8"
},
"gpucache": {
"string_value": "true"
},
"embeddingkey_long_type": {
"string_value": "true"
},
"slots": {
"string_value": "15"
},
"config": {
"string_value": "/models/meb/1/meb.json"
},
"cat_feature_num": {
"string_value": "8000"
},
"label_dim": {
"string_value": "1"
}
},
"model_warmup": []
}
I0509 02:03:04.078466 1 hugectr.cc:1060] The model configuration: {
"name": "meb",
"platform": "",
"backend": "hugectr",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 100,
"input": [
{
"name": "DES",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "CATCOLUMN",
"data_type": "TYPE_INT64",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "ROWINDEX",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "OUTPUT0",
"data_type": "TYPE_FP32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "meb_0",
"kind": "KIND_GPU",
"count": 4,
"gpus": [
0,
1,
2,
3,
4,
5,
6,
7
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"max_nnz": {
"string_value": "8000"
},
"embedding_vector_size": {
"string_value": "15"
},
"gpucacheper": {
"string_value": "1.0"
},
"des_feature_num": {
"string_value": "0"
},
"hit_rate_threshold": {
"string_value": "0.8"
},
"gpucache": {
"string_value": "true"
},
"embeddingkey_long_type": {
"string_value": "true"
},
"slots": {
"string_value": "15"
},
"config": {
"string_value": "/models/meb/1/meb.json"
},
"cat_feature_num": {
"string_value": "8000"
},
"label_dim": {
"string_value": "1"
}
},
"model_warmup": []
}
I0509 02:03:04.078548 1 hugectr.cc:1105] slots set = 15
I0509 02:03:04.078555 1 hugectr.cc:1111] desene number = 0
I0509 02:03:04.078562 1 hugectr.cc:1117] cat_feature number = 8000
I0509 02:03:04.078569 1 hugectr.cc:1129] embedding size = 15
I0509 02:03:04.078576 1 hugectr.cc:1135] maxnnz = 8000
I0509 02:03:04.078583 1 hugectr.cc:1153] HugeCTR model config path = /models/meb/1/meb.json
I0509 02:03:04.078593 1 hugectr.cc:1176] support gpu cache = 1
I0509 02:03:04.078610 1 hugectr.cc:1199] gpu cache per = 1
I0509 02:03:04.078619 1 hugectr.cc:1216] hit-rate threshold = 0.8
I0509 02:03:04.078626 1 hugectr.cc:1232] Label dim = 1
I0509 02:03:04.078633 1 hugectr.cc:1238] support 64-bit embedding key = 1
I0509 02:03:04.078639 1 hugectr.cc:1252] Model_Inference_Para.max_batchsize: 100
I0509 02:03:04.078645 1 hugectr.cc:1256] max_batch_size in model config.pbtxt is 100
I0509 02:03:04.078654 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 0
I0509 02:03:04.078661 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 1
I0509 02:03:04.078667 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 2
I0509 02:03:04.078673 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 3
I0509 02:03:04.078680 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 4
I0509 02:03:04.078686 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 5
I0509 02:03:04.078693 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 6
I0509 02:03:04.078699 1 hugectr.cc:1326] ******Creating Embedding Cache for model meb in device 7
I0509 02:03:04.078705 1 hugectr.cc:1353] ******Creating Embedding Cache for model meb successfully
I0509 02:03:04.084639 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 0)
I0509 02:03:04.084655 1 hugectr.cc:1495] Triton Model Instance Initialization on device 0
I0509 02:03:04.092570 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:04.092584 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:04.119280 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:04.119445 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:04.119527 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:04.119535 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:06.406007 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:06.406216 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 1)
I0509 02:03:06.406245 1 hugectr.cc:1495] Triton Model Instance Initialization on device 1
I0509 02:03:06.406305 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:06.406313 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:06.423536 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:06.423681 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:06.423781 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:06.423789 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:07.636730 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:07.636912 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 2)
I0509 02:03:07.636927 1 hugectr.cc:1495] Triton Model Instance Initialization on device 2
I0509 02:03:07.636933 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:07.636941 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:07.653589 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:07.653740 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:07.653842 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:07.653850 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:08.864003 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:08.864193 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 3)
I0509 02:03:08.864208 1 hugectr.cc:1495] Triton Model Instance Initialization on device 3
I0509 02:03:08.864215 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:08.864239 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:08.881026 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:08.881182 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:08.881300 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:08.881309 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:10.089367 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:10.089537 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 4)
I0509 02:03:10.089549 1 hugectr.cc:1495] Triton Model Instance Initialization on device 4
I0509 02:03:10.089556 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:10.089563 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:10.106748 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:10.106901 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:10.107012 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:10.107020 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:11.315555 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:11.315748 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 5)
I0509 02:03:11.315763 1 hugectr.cc:1495] Triton Model Instance Initialization on device 5
I0509 02:03:11.315770 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:11.315776 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:11.333771 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:11.333941 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:11.334056 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:11.334065 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:12.548906 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:12.549089 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 6)
I0509 02:03:12.549103 1 hugectr.cc:1495] Triton Model Instance Initialization on device 6
I0509 02:03:12.549110 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:12.549116 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:12.566637 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:12.566807 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:12.566930 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:12.566939 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:13.789554 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:13.789746 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_0 (device 7)
I0509 02:03:13.789761 1 hugectr.cc:1495] Triton Model Instance Initialization on device 7
I0509 02:03:13.789768 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:13.789775 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:13.807076 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:13.807262 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:13.807390 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:13.807399 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:15.054373 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:15.054532 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 0)
I0509 02:03:15.054545 1 hugectr.cc:1495] Triton Model Instance Initialization on device 0
I0509 02:03:15.054551 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:15.054558 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:15.059515 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:15.059646 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:15.059775 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:15.059784 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:15.435527 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:15.435710 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 1)
I0509 02:03:15.435724 1 hugectr.cc:1495] Triton Model Instance Initialization on device 1
I0509 02:03:15.435733 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:15.435742 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:15.440908 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:15.441018 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:15.441116 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:15.441124 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:15.809277 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:15.809436 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 2)
I0509 02:03:15.809449 1 hugectr.cc:1495] Triton Model Instance Initialization on device 2
I0509 02:03:15.809455 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:15.809462 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:15.814312 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:15.814412 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:15.814511 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:15.814519 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:16.182679 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:16.182843 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 3)
I0509 02:03:16.182857 1 hugectr.cc:1495] Triton Model Instance Initialization on device 3
I0509 02:03:16.182863 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:16.182870 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:16.188940 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:16.189040 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:16.189140 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:16.189148 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:16.567086 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:16.567323 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 4)
I0509 02:03:16.567338 1 hugectr.cc:1495] Triton Model Instance Initialization on device 4
I0509 02:03:16.567344 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:16.567350 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:16.572950 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:16.573061 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:16.573169 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:16.573177 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:16.954819 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:16.954976 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 5)
I0509 02:03:16.954989 1 hugectr.cc:1495] Triton Model Instance Initialization on device 5
I0509 02:03:16.954996 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:16.955001 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:16.960033 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:16.960144 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:16.960262 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:16.960270 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:17.335444 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:17.335611 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 6)
I0509 02:03:17.335624 1 hugectr.cc:1495] Triton Model Instance Initialization on device 6
I0509 02:03:17.335631 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:17.335637 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:17.340769 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:17.340877 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:17.340976 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:17.340983 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
I0509 02:03:17.725151 1 hugectr.cc:1565] ******Loading HugeCTR model successfully
I0509 02:03:17.725387 1 hugectr.cc:1851] TRITONBACKEND_ModelInstanceInitialize: meb_0_1 (device 7)
I0509 02:03:17.725400 1 hugectr.cc:1495] Triton Model Instance Initialization on device 7
I0509 02:03:17.725407 1 hugectr.cc:1505] Dense Feature buffer allocation:
I0509 02:03:17.725412 1 hugectr.cc:1512] Categorical Feature buffer allocation:
I0509 02:03:17.730393 1 hugectr.cc:1530] Categorical Row Index buffer allocation:
I0509 02:03:17.730501 1 hugectr.cc:1540] Predict result buffer allocation:
I0509 02:03:17.730597 1 hugectr.cc:1864] ******Loading HugeCTR Model******
I0509 02:03:17.730606 1 hugectr.cc:1558] The model origin json configuration file path is: /models/meb/1/meb.json
[qgpuvb813-cvision:1 :0:381] Caught signal 7 (Bus error: nonexistent physical address)
==== backtrace (tid: 381) ====
0 0x00000000000143c0 __funlockfile() ???:0
1 0x000000000018ba51 __nss_database_lookup() ???:0
2 0x0000000000068d6c ncclGroupEnd() ???:0
3 0x000000000005de2d ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f163 clone() ???:0
=================================