nod-ai / shark Goto Github PK
View Code? Open in Web Editor NEWSHARK - High Performance Machine Learning Distribution
License: Apache License 2.0
SHARK - High Performance Machine Learning Distribution
License: Apache License 2.0
The python script to run:
Output tosa file:
https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa.mlir
Tosa file after elide big attributes:
https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa_elide.mlir
-Let's make it easier for users to browse and find models in SHARK, by putting up hyperlink in the text part of the model's name (i.e BERT, Albert, Alexnet, etc) which links to their respective tank model directory (e.g bert link, albert link, alexnet link).
To reproduce:
pytest tank/tf/hf_masked_lm/tapas-base_tf_test.py -k "static_cpu"
Error output:
E ImportError:
E TFTapasMainLayer requires the tensorflow_probability library but it was not found in your environment. You can install it with pip as
E explained here: https://github.com/tensorflow/probability.
I wasn't able to get this to work by pip installing tfp-nightly -- if we can get it to work let's make it run out of the box for IMPORTER=1.
Currently, there is no support for benchmarking pytorch models on CUDA via pytest.
SHARK/setup_venv.sh should be updated with a GPU_BENCHMARKS
flag to uninstall the CPU version of Pytorch Nightly and replace with CUDA version.
SHARK/shark/shark_benchmark_runner.py::SharkBenchmarkRunner has a torch_benchmark method that should be updated to run with GPU/CUDA for gpu pytest cases.
Add --iree-flow-demote-i64-to-i32 (default false) and --iree-flow-demote-f64-to-f32 to compile MiniLM via Torch-mlir and run on IREE.
Please add RNNT (speech recognition) to the Shark Tank: https://github.com/mlcommons/inference/tree/master/speech_recognition/rnnt
Can we get a save method to checkpoint/save the model/save the vmfb S.T we do not need to recompile from scratch every time we run the script.
Several features/improvements to SHARK's pytest --benchmark option are tracked in this issue:
--benchmark
for TensorFlow shark tank module tests.HF Benchmarker is a module within SHARK that enable easy testing of HF models with ONNX, Torch, TF, and SHARK-RT of course. this work is based of SharkBenchmarker for MLIR part and Microsoft Transformer Benchmark.
EDIT: nightly ORT did not fix GPU nor did it fix TF.
Has some Runtime issues wrt RuntimeError: Intra op parallelism cannot be modified after initialization.
and RuntimeError: Visible devices cannot be modified after being initialized
. See https://github.com/microsoft/onnxruntime/issues/ 11751 for more details.
Currently the only supported device is CPU, since we will get OOM with GPU. The problem lies within importing of onnxruntime causes to load 39GB of data into the GPU, this leaves very little space for us to load our model and even run anything.
To reproduce...
cloned the repo and tried running the examples both resnet and minilm.
I keep getting
RuntimeError: required keyword attribute 'is_zero' is undefined
seems to have something to do with ModuleBuilder -> mb.import_module(module._c, class_annotator)
env
Using the Apple Silicon M1 Snapshot version of torch-mlir.
Running on M1 macbook, python 3.9
attached screenshot of both resnet50_script and minilm
Disclaimer: new to torch-mlir
In Shark Downloader, we check if local hash for shark_tank artifacts matches upstream hash, and if it doesn't, all artifacts are downloaded from gs://shark_tank for the latest upstream hash, replacing local files.
This becomes a problem if one uses generate_sharktank.py to populate local shark_tank and run tests, as the upstream artifacts are used instead of the local artifacts (in my case, with significant changes).
I think this is of critical importance to our correspondence with the IREE team as well as our SHARK team's development process.
We have a few options to handle this:
to reproduce:
python generate_sharktank.py
pytest -s tank/MiniLM-L12-H384-uncased/
it will be evident that the artifacts are replaced by contents of gs://shark_tank/microsoft_MiniLM-L12-H384-uncased_tf/
Add support to save mlir files when running the tests
(new_dylib_venv) 139 anush@nod-shared-a100-3:~/github/shark$ IREE_SAVE_TEMPS=iree_temps_bert_dynamic pytest tank/pytorch/bert_test.py::BertModuleTest::test_module_dynamic_cpu --save_mlir
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --save_mlir
inifile: /home/anush/github/shark/pytest.ini
rootdir: /home/anush/github/shark```
because of hash checking local artifacts of the nightly build aren't being tested, the existing latest will instead, this has the downstream effect of making impossible to automatically pass checks when a change to the tank is made.
add a ci that tests the generated pip packages in an end-user style
pytest benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu]```
fails with:
====================================================== short test summary info =======================================================FAILED benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu] - OSError: Can't load tokenizer for 'xlm-roberta-base'...========================================================= 1 failed in 12.31s =========================================================
We need model the CUDA backend in SHARK to be similar to:
if use_gpu:
backend = "cuda"
backend_config = "cuda"
args = ["--iree-cuda-llvm-target-arch=sm_80", "--iree-hal-cuda-disable-loop-nounroll-wa"]
ireert.flags.FUNCTION_INPUT_VALIDATION = False
ireert.flags.parse_flags("--cuda_allow_inline_execution")
...
# Setting up input on host and moving to device.
host_inputs =[encoded_input["input_ids"], encoded_input["attention_mask"], encoded_input["token_type_ids"]]
if use_gpu:
device_inputs = [ireert.asdevicearray(config.device, a) for a in host_inputs]
else:
device_inputs = host_inputs
https://huggingface.co/docs/transformers/v4.21.1/en/model_doc/bert#transformers.BertForMaskedLM
This model is currently 17x slower than Torch on A100 GPU and we would like to track this.
(new_dylib_venv) anush@nod-shared-a100-3:~/github/shark$ pytest tank/pytorch/tests/resnet101_test.py::Resnet101ModuleTest::test_module_static_cpu
================================================================================================= test session starts ==================================================================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- /home/anush/github/shark/new_dylib_venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0, typeguard-2.13.3
collecting ... Fatal Python error: Aborted
Current thread 0x00007efd4103a1c0 (most recent call first):
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 304 in _constant_eager_impl
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 279 in _constant_impl
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 267 in constant
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 343 in _constant_tensor_conversion_function
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 1623 in convert_to_tensor
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/profiler/trace.py", line 183 in wrapped
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 264 in args_to_matching_eager
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 77 in non_deterministic_ints_eager_fallback
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 50 in non_deterministic_ints
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 80 in non_deterministic_ints
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 381 in from_non_deterministic_state
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 349 in TFGenerationMixin
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 344 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 41 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 38 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872 in _get_module
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862 in __getattr__
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 863 in __getattr__
File "<frozen importlib._bootstrap>", line 1075 in _handle_fromlist
File "/home/anush/github/shark/tank/pytorch/tests/test_utils.py", line 7 in <module>
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "/home/anush/github/shark/tank/pytorch/tests/resnet101_test.py", line 3 in <module>
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/pathlib.py", line 533 in import_path
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 608 in _importtestmodule
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 519 in _getobj
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 301 in obj
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 536 in _inject_setup_module_fixture
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 522 in collect
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 768 in collect
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 643 in perform_collect
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_collection
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 321 in _main
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
File "/home/anush/github/shark/new_dylib_venv/bin/pytest", line 8 in <module>
Extension modules: torch._C, torch._C._fft, torch._C._linalg, torch._C._nn, torch._C._sparse, torch._C._special, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, sentencepiece._sentencepiece (total: 116)
Aborted (core dumped)```
Pinning to 4.18 as a workaround
To reproduce:
pytest tank/tf/MiniLM-L12-H384-uncased_tf_test.py -k "gpu"
FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_dynamic_gpu
FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_static_gpu
FAILED tank/facebook_deit-small-distilled-patch16-224_torch/facebook_deit-small-distilled-patch16-224_torch_test.py::DeitModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_dynamic_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_torch/google_vit-base-patch16-224_torch_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/nvidia_mit-b0_torch/nvidia_mit-b0_torch_test.py::MitModuleTest::test_module_static_gpu
Error Log (common for cases shown above):
E iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
E Diagnostics:
E
E
E Invoked with:
E iree-compile /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=cuda --iree-llvm-embedded-linker-path=/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host --iree-hal-cuda-disable-loop-nounroll-wa --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64
E
E Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
Alexnet seems to fail static cases on AMD for some reason - but seems like something in the test script than underlying infra
anush@alderlake ~/github/shark
% pytest tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0
collected 1 item
tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan FAILED [100%]
====================================================================================== FAILURES =======================================================================================
_____________________________________________________________________ AlexnetModuleTest.test_module_static_vulkan _____________________________________________________________________
a = (<alexnet_torch_test.AlexnetModuleTest testMethod=test_module_static_vulkan>,)
@wraps(func)
def standalone_func(*a):
> return func(*(a + p.args), **p.kwargs)
shark.venv/lib/python3.10/site-packages/parameterized/parameterized.py:533:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tank/alexnet_torch/alexnet_torch_test.py:78: in test_module
self.module_tester.create_and_check_module(dynamic, device)
tank/alexnet_torch/alexnet_torch_test.py:43: in create_and_check_module
shark_module.compile()
shark/shark_inference.py:87: in compile
self.shark_runner = SharkRunner(
shark/shark_runner.py:81: in __init__
) = get_iree_compiled_module(
shark/iree_utils/compile_utils.py:122: in get_iree_compiled_module
return get_iree_module(flatbuffer_blob, device, func_name)
shark/iree_utils/compile_utils.py:106: in get_iree_module
ctx.add_vm_module(vm_module)
shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:255: in add_vm_module
self.add_vm_modules((vm_module,))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <iree.runtime.system_api.SystemContext object at 0x7f62b0332e60>, vm_modules = (<VmModule module : [forward, __init]>,)
def add_vm_modules(self, vm_modules):
assert self._is_dynamic, "Cannot 'add_module' on a static context"
for m in vm_modules:
if m.name in self._bound_modules:
raise ValueError(f"Attempt to register duplicate VmModule: '{m.name}'")
bound_module = BoundModule(self, m)
self._bound_modules[m.name] = bound_module
if self._tracer:
self._tracer.add_module(bound_module.traced_module)
> self._vm_context.register_modules(vm_modules)
E RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/native_executable.cc:127: UNAVAILABLE; VK_ERROR_INITIALIZATION_FAILED; while invoking native function hal.executable.create; while calling import;
E [ 1] native hal.executable.create:0 -
E [ 0] bytecode module.__init:1788 <stdin>:134:11
E at <stdin>:9:3
shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:252: RuntimeError
-------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
Found Radeon XT Device. Using rdna2-unknown-linux
The models are present in the /home/anush/.local/shark_tank/. If you want a fresh
download, consider deleting the directory.
Found Radeon XT Device. Using rdna2-unknown-linux
-------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------
Copying gs://shark_tank/latest/alexnet_torch/hash.npy...
/ [1 files][ 640.0 B/ 640.0 B]
Operation completed over 1 objects/640.0 B.
'DISPLAY' environment variable not set... skipping surface info
=============================================================================== short test summary info ===============================================================================
FAILED tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan - RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/na...
================================================================================= 1 failed in 12.42s ==================================================================================
The nodai-shark
pip package specifies version dependencies to iree-runtime
and iree-compiler
that are too old.
$ pipdeptree -p nodai-shark
nodai-SHARK==20220810.173
- iree-compiler [required: >=20220427.13, installed: 20220714.204]
- numpy [required: Any, installed: 1.22.4]
- PyYAML [required: Any, installed: 6.0]
- iree-runtime [required: >=20220427.13, installed: 20220714.204]
- numpy [required: Any, installed: 1.22.4]
- PyYAML [required: Any, installed: 6.0]
- numpy [required: Any, installed: 1.22.4]
- PyYAML [required: Any, installed: 6.0]
- torch-mlir [required: >=20220428.420, installed: 20220606.495]
- numpy [required: Any, installed: 1.22.4]
- torch [required: ==1.13.0.dev20220606+cpu, installed: 1.13.0.dev20220606+cpu]
- typing-extensions [required: Any, installed: 4.2.0]
When running with
iree-compiler 20220604.24
iree-runtime 20220604.24
I get this error
$ python ./resnet50_script.py --device="cpu"
/home/petkantchin/.local/shark_tank/
load image from https://upload.wikimedia.org/wikipedia/commons/2/26/YellowLabradorLooking_new.jpg
Copying gs://shark_tank/274650f/resnet50_torch/function_name.npy...
Copying gs://shark_tank/274650f/resnet50_torch/golden_out.npz...
Copying gs://shark_tank/274650f/resnet50_torch/hash.npy...
Copying gs://shark_tank/274650f/resnet50_torch/inputs.npz...
\ [4 files][593.2 KiB/593.2 KiB]
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.
Copying gs://shark_tank/274650f/resnet50_torch/resnet50_dynamic_torch.mlir...
Copying gs://shark_tank/274650f/resnet50_torch/resnet50_torch.mlir...
- [6 files][391.5 MiB/391.5 MiB] 10.7 MiB/s
Operation completed over 6 objects/391.5 MiB.
Target triple found:x86_64-linux-gnu
ERROR:root:Could not create driver local-task (not registered)
Traceback (most recent call last):
File "./resnet50_script.py", line 72, in <module>
shark_module.compile()
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 87, in compile
self.shark_runner = SharkRunner(
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_runner.py", line 81, in __init__
) = get_iree_compiled_module(
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 120, in get_iree_compiled_module
return get_iree_module(flatbuffer_blob, device, func_name)
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 102, in get_iree_module
config = ireert.Config(IREE_DEVICE_MAP[device])
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 115, in __init__
self.driver = _create_default_iree_driver(
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 97, in _create_default_iree_driver
raise RuntimeError(
RuntimeError: Could not create any requested driver ['local-task'] (available=['cuda', 'dylib', 'dylib-sync', 'vmvx', 'vmvx-sync', 'vulkan']) : {}
Updating IREE to 20220714.204
fixed the issue.
iree-compiler 20220714.204
iree-runtime 20220714.204
I suspect that the dependency version requirements has to be fixed. Other earlier versions may be OK as well. I have not checked.
Title says it all
will disable them for now
Many users have their favourite deep learning framework of choice and not use others. We should set the setup_venv.sh to be have an option to choose whether they are intending to use torch-frontend, tf-frontend, or both. This way users can have a leaner environment!
Error log for resnet101 that is similar (if not identical) to the error messages produced from the dynamic vulkan case on a few of our PyTorch models: gist
This error is also encountered for the dynamic vulkan case on the following models:
These cases will be xfailed.
Generate a webpage that can be rendered in the main README.md from each perf-linux run
(shark.venv) anush@MacStudio shark % pytest tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu
========================================================================================================================== test session starts ===========================================================================================================================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0 -- /Users/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/anush/github/shark, configfile: pytest.ini
plugins: xdist-2.5.0, forked-1.4.0
collected 1 item
tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu Fatal Python error: Segmentation fault
Current thread 0x0000000104f34580 (most recent call first):
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 75 in _create_default_iree_driver
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 115 in __init__
File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 102 in get_iree_module
File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 120 in get_iree_compiled_module
File "/Users/anush/github/shark/shark/shark_runner.py", line 80 in __init__
File "/Users/anush/github/shark/shark/shark_inference.py", line 73 in compile
File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 111 in create_and_check_module
File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 137 in test_module_static_cpu
File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 591 in run
File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 650 in __call__
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/unittest.py", line 327 in runtest
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 259 in <lambda>
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 258 in call_runtest_hook
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 219 in call_and_report
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 130 in runtestprotocol
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 322 in _main
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
File "/Users/anush/github/shark/shark.venv/bin/pytest", line 8 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, PIL._imaging (total: 40)
zsh: segmentation fault pytest
It would be helpful for debugging to be able to see the result of each dispatch when running a module. IREE already does this, e.g.
$ iree-run-module --device=vulkan --entry_function=forward --function_input=1x4xf32=1.0 --module_file=model.vmfb
EXEC @forward
=== forward_dispatch_0::forward_dispatch_0_generic_3x4 inputs ===
=== forward_dispatch_0::forward_dispatch_0_generic_3x4 outputs ===
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]
=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 inputs ===
1x4xf32=[1 1 1 1]
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]
=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 outputs ===
1x3xf32=[1 0 0]
result[0]: hal.buffer_view
1x3xf32=[1 0 0]
This could be a flag such as
shark_module = SharkInference(
model, func_name, device="vulkan", mlir_dialect="linalg", print_dispatches=True
)
and gives a numpy view of the dispatch results or just shows the iree output.
`pytest shark/tests/models
pytest shark/tests/models -n auto`
seems incorrect
(shark.venv) a@debian-1:~/github/dshark$ python -m shark.examples.minilm_jit
/home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
" and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Target triple found:x86_64-linux-gnu
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
(shark.venv) a@debian-1:~/github/dshark$
To reproduce:
pytest tank/tf/hf_masked_lm/roberta-base_tf_test.py
The illegal operation is tf.BatchMatMulV2
.
Allows for easy switching from the command line for the included examples.
Catch all tracker for putting together a quickstart guide.
XLM-roberta assert failure:
> np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E AssertionError:
E Not equal to tolerance rtol=0.1, atol=0.01
E
E Mismatched elements: 5505 / 4000032 (0.138%)
E Max absolute difference: 0.09074688
E Max relative difference: 3171.7234
E x: array([[[ 2.683771, 0.183121, 10.453473, ..., 6.315439, 2.047505,
E 3.32532 ],
E [-0.482143, 0.061366, 9.494564, ..., 6.593861, 1.620899,...
E y: array([[[ 2.671124, 0.182537, 10.456981, ..., 6.322483, 2.0[515](https://github.com/nod-ai/SHARK/runs/7868468050?check_suite_focus=true#step:9:516)46,
E 3.322179],
E [-0.481575, 0.061454, 9.495419, ..., 6.59101 , 1.619549,...
roberta-base-tf assert failure:
> np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E AssertionError:
E Not equal to tolerance rtol=0.1, atol=0.01
E
E Mismatched elements: 453 / 804240 (0.0563%)
E Max absolute difference: 0.04533577
E Max relative difference: 763.70135
E x: array([[[33.55235 , -3.827327, 18.863625, ..., 3.420343, 6.171632,
E 11.648125],
E [-0.598835, -4.141003, 14.904708, ..., -4.515923, -1.790529,...
E y: array([[[33.567413, -3.829913, 18.870962, ..., 3.422938, 6.174327,
E 11.656706],
E [-0.58585 , -4.141752, 14.913631, ..., -4.516505, -1.788759,...
To reproduce:
On a100 instance,
pytest tank/*roberta -k "gpu"
Our SHARK model tests (all gpu cases) do not free some (maybe all) allocated CUDA memory after test execution is completed.
ERROR root:system_api.py:88 Could not create default driver device cuda
Traceback (most recent call last):
File "/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 86, in _create_default_iree_driver
device = driver.create_default_device()
RuntimeError: Error creating default device: iree/runtime/src/iree/hal/drivers/cuda/cuda_device.c:146: INTERNAL; CUDA driver error 'CUDA_ERROR_OUT_OF_MEMORY' (2): out of memory
To reproduce:
pytest tank -k "gpu"
watch nvidia-smi
concurrently to observe in real-time)I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py
.
curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
#Install deps for test script
pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu" #use cuda or vulkan or metal
I got this error:
Traceback (most recent call last):
File "./resnet50_script.py", line 7, in <module>
from shark.shark_inference import SharkInference
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
from torch_mlir.dialects.torch.importer.jit_ir import (
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
from ....._mlir_libs._jit_ir_importer import *
ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index
With iree-org/iree#9975 and other upcoming changes, we'll be looking to enable TensorCore on more kernels for performance. This may change results in some tests enough to fail assertions checking for correctness.
EX: distilbert tf
========================================================================== FAILURES ===========================================================================
_________________________________________________________ DistilBertModuleTest.test_module_static_gpu _________________________________________________________
self = <distilbert-base-uncased_tf_test.DistilBertModuleTest testMethod=test_module_static_gpu>
@pytest.mark.skipif(
check_device_drivers("gpu"), reason=device_driver_info("gpu")
)
def test_module_static_gpu(self):
dynamic = False
device = "gpu"
> self.module_tester.create_and_check_module(dynamic, device)
tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:48:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <distilbert-base-uncased_tf_test.DistilBertModuleTester object at 0x7fcf101fb430>, dynamic = False, device = 'gpu'
def create_and_check_module(self, dynamic, device):
model, func_name, inputs, golden_out = download_tf_model(
"distilbert-base-uncased"
)
shark_module = SharkInference(
model, func_name, device=device, mlir_dialect="mhlo"
)
shark_module.compile()
result = shark_module.forward(inputs)
> np.testing.assert_allclose(golden_out, result, rtol=1e-02, atol=1e-03)
E AssertionError:
E Not equal to tolerance rtol=0.01, atol=0.001
E
E Mismatched elements: 4292 / 488352 (0.879%)
E Max absolute difference: 0.02955437
E Max relative difference: 48.456425
E x: array([[[ -6.442754, -6.393649, -6.419188, ..., -5.638614,
E -5.491579, -3.414548],
E [ -7.036943, -6.988676, -7.100483, ..., -6.865986,...
E y: array([[[ -6.442857, -6.394039, -6.419235, ..., -5.639162,
E -5.492108, -3.414864],
E [ -7.039788, -6.991871, -7.102982, ..., -6.868385,...
tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:28: AssertionError
=================================================================== short test summary info ===================================================================
FAILED tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py::DistilBertModuleTest::test_module_static_gpu - AssertionError:
===================================================================== 1 failed in 45.37s ======================================================================
Looks like the expected value difference is 0.02955437
just above 0.01
tolerance. This and other tolerances may needed to be updated.
Currently backend selection is in string, while it's great/working for now. May be confusing on what backends are valid, and may produce bugs later on (for example: typos in string can just flow through compile phase and compile "something" but will produce error later on, will be hard to debug this issue if someone doesn't realize the typo).
Importer tools other than torch-mlir tools need to be marked so pytests can run on venvs that dont have IMPORTER=1
Error output:
error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
note: see current operation: %416 = "torch.aten.view"(%414, %415) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>
Traceback (most recent call last):
File "/home/ean/SHARK/generate_sharktank.py", line 180, in <module>
save_torch_model(args.torch_model_csv)
File "/home/ean/SHARK/generate_sharktank.py", line 68, in save_torch_model
mlir_importer.import_debug(
File "/home/ean/SHARK/shark/shark_importer.py", line 163, in import_debug
imported_mlir = self.import_mlir(
File "/home/ean/SHARK/shark/shark_importer.py", line 109, in import_mlir
return self._torch_mlir(is_dynamic, tracing_required), func_name
File "/home/ean/SHARK/shark/shark_importer.py", line 74, in _torch_mlir
return get_torch_mlir_module(
File "/home/ean/SHARK/shark/torch_mlir_utils.py", line 150, in get_torch_mlir_module
pm.run(mb.module)
RuntimeError: Failure while executing pass pipeline.
Reproduce:
distilbert-base-uncased,True,hf
to tank/pytorch/torch_model_list.csv
python generate_sharktank.py
Upstream issue is here: llvm/torch-mlir#853
Workaround:
# Replace shark_venv with whatever your venv is
cd shark_venv/lib/python3.10/site-packages/torch_mlir/.dylibs
rm *.dylib
ln -s ../../torch/lib/libc10.dylib
ln -s ../../torch/lib/libshm.dylib
ln -s ../../torch/lib/libtorch.dylib
ln -s ../../torch/lib/libtorch_cpu.dylib
ln -s ../../torch/lib/libtorch_python.dylib
--iree-cuda-llvm-target-arch=sm_80
should be set with something like:
To reproduce all failing cases (requires #199 ):
pytest tank/tf/hf_masked_lm/funnel-transformer_tf_test.py -k "not cpu"
There's a numerics issue on gpu cases (perhaps IREE upstream issue?)
Vulkan cases fail during iree-compile.
All cases fail with the following error on TensorFlow longformer:
E <unknown>:0: error: The following illegal operations still remain:
E tf.BatchMatMulV2 (count: 24)
E tf.StridedSlice (count: 24)
E tf.Tile (count: 12)
E tf.TensorScatterAdd (count: 36)
E tf.Where (count: 3)
To reproduce:
pytest tank/tf/hf_masked_lm/longformer-base-4096_tf_test.py -k "static_cpu"
(errors are the same for all cases, so one test case should be sufficient for repro purposes.)
To reproduce:
pytest -s tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py -k "vulkan"
SHARK results fail to validate against TF golden values:
E assert True == False
E + where False = compare_tensors_tf(<tf.Tensor: shape=(1, 16, 68729), dtype=float32, numpy=\narray([[[ 0.53806955, 0.14671442, 0. , ..., -0.2818507 ,\n 0.08806332, 0.14761735],\n [-0.00822675, -0.0385315 , 0. , ..., 0.00425125,\n 0.06710303, -0.04765199],\n [-0.1951161 , -0.1519102 , 0. , ..., 0.1955705 ,\n 0.13747491, -0.2091976 ],\n ...,\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ]]], dtype=float32)>, array([[[ 0.5380049 , 0.13949418, 0. , ..., -0.28169703,\n 0.08681311, 0.14958172],\n [-0.00976601, -0.03920554, 0. , ..., 0.00616576,\n 0.06795865, -0.0488795 ],\n [-0.1871761 , -0.15056488, 0. , ..., 0.19165687,\n 0.13996662, -0.20523356],\n ...,\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ],\n [ 0. , 0. , 0. , ..., 0. ,\n 0. , 0. ]]], dtype=float32))
tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py:86: AssertionError
Use TempFileSAver to save each PyTest output artifacts and recreate if the test fails.
looks like you can also write your own TempFileSaver to do whatever you want (see the section just below that), like in this test: https://github.com/google/iree/blob/427a94a09be70631c5d9f89d12616bb7f1954257/compiler/src/iree/compiler/API/python/test/tools/compiler_core_test.py#L176-L199
https://discord.com/channels/689900678990135345/689900680009482386/985726147448950805
Currently, two cases of GPU memory management issues appear when running pytests for Tensorflow masked_lm models.
When running gpu tests for albert_base_v2, the static_gpu case (currently included in this issue) passes if tolerance values for compare_tensors_tf are increased to rtol=1e-02 and atol=1e-01. All of the tests mentioned in that issue pass with the increased tolerances. This isn't really acceptible accuracy, but we are waiting from the IREE team, so we can work around it for now to get memory management squared away.
TF albert on CPU passes for dynamic and static cases only if the tests are run individually. Tensorflow's allocated memory in CUDA does not free up for the second GPU test whether the first passes or not.
If we try bert_static_gpu, however, cuda runs out of memory even when the test is run by itself -- TF allocates ~39GB of gpu memory for the model at the beginning of the test and we run into cuda OOM when shark_module.compile() is called (hal allocation in IREE).
All of the TF model tests in tank/tf/hf_masked_lm/ share this issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.