nod-ai / shark Goto Github PK

SHARK - High Performance Machine Learning Distribution

License: Apache License 2.0

Python 72.32% Shell 0.35% CMake 0.47% C++ 22.29% Jupyter Notebook 2.48% C 1.07% MLIR 0.01% PowerShell 0.17% CSS 0.73% JavaScript 0.11%

amd apple-silicon deep-learning machine-learning mlir nvidia pytorch

shark's People

Contributors

Stargazers

Watchers

Forkers

pashu123 ramiro050 nirvedhmeshram shukla-gaurav raikonenfnu vivekkhandelwal1 cinj eliasj42 yzhang93 dan-garvey amoslewis sogartar zhaohb gpetters-amd tigerneil vidsinghal kooljblack phaneeshb qedawkins danielschulz erman-gurses codes1gn iannod jakopina abhishek-varma ianferreira joker-eph mariecwhite harishanand95 zeta1999 aldesilv cpietsch wildjerry stellaraccident claforte jinchen62 one-lithe-rune gtd-carthage zenafey deinferno nithinsubbiah fokin33 calcifer11 fraserhum erkinalp hbcbh1999 kilgorio ukaserge dymil 5l1v3r1 evanguansf algisjab averad seerge risharde techthiyanes wuhuqifei678 m68k-fr yoshyteru fetpo mengchengxxx marormur schwabischesbauernbrot diazid kencheren cstueckrath naruto-raj stevenet1980 zippynetworks phoenix-meadowlark kyozomi webzone minisparrow cmlibo nuclei-foundation raktorx ianscrivener vguerra leandrossce mycpuorg kp-forks 00mjk godot73 rprasad2 tsabbir96 harshita9 karan-kankariya holmes-jack assistaplus zzl133 suryajasper ranvirsv ayaanshah2204 mcx monorimet cqtqlyst njsharpe turical aleans-ru yota79

shark's Issues

GPT-2 torch to tosa

The python script to run:

gpt2tosa.py

Output tosa file:
https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa.mlir
Tosa file after elide big attributes:
https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa_elide.mlir

Add hyperlinks to README's Model list

-Let's make it easier for users to browse and find models in SHARK, by putting up hyperlink in the text part of the model's name (i.e BERT, Albert, Alexnet, etc) which links to their respective tank model directory (e.g bert link, albert link, alexnet link).

TF tapas-base import requirements aren't met with IMPORTER=1 ./setup_venv.sh

To reproduce:

pytest tank/tf/hf_masked_lm/tapas-base_tf_test.py -k "static_cpu"

Error output:

E           ImportError: 
E           TFTapasMainLayer requires the tensorflow_probability library but it was not found in your environment. You can install it with pip as
E           explained here: https://github.com/tensorflow/probability.

I wasn't able to get this to work by pip installing tfp-nightly -- if we can get it to work let's make it run out of the box for IMPORTER=1.

GPU benchmarks for PyTorch tests benchmark on CPU instead.

Currently, there is no support for benchmarking pytorch models on CUDA via pytest.

SHARK/setup_venv.sh should be updated with a GPU_BENCHMARKS flag to uninstall the CPU version of Pytorch Nightly and replace with CUDA version.

SHARK/shark/shark_benchmark_runner.py::SharkBenchmarkRunner has a torch_benchmark method that should be updated to run with GPU/CUDA for gpu pytest cases.

Add --iree-flow-demote-i64-to-i32 and --iree-flow-demote-f64-to-f32

Add --iree-flow-demote-i64-to-i32 (default false) and --iree-flow-demote-f64-to-f32 to compile MiniLM via Torch-mlir and run on IREE.

Add RNNT model to Shark Tank

Please add RNNT (speech recognition) to the Shark Tank: https://github.com/mlcommons/inference/tree/master/speech_recognition/rnnt

Checkpoint model

Can we get a save method to checkpoint/save the model/save the vmfb S.T we do not need to recompile from scratch every time we run the script.

Improvements to pytest --benchmark option.

Several features/improvements to SHARK's pytest --benchmark option are tracked in this issue:

Enhancements/Fixes to HF Benchmark Runtime

HF Benchmarker is a module within SHARK that enable easy testing of HF models with ONNX, Torch, TF, and SHARK-RT of course. this work is based of SharkBenchmarker for MLIR part and Microsoft Transformer Benchmark.
EDIT: nightly ORT did not fix GPU nor did it fix TF.

Some issues/Enhancements that need fixing

1. Integrate running of TF in HF-Benchmarker.

Has some Runtime issues wrt RuntimeError: Intra op parallelism cannot be modified after initialization. and RuntimeError: Visible devices cannot be modified after being initialized. See https://github.com/microsoft/onnxruntime/issues/ 11751 for more details.

2. Fix up HF Benchmark Runtime with GPU

Currently the only supported device is CPU, since we will get OOM with GPU. The problem lies within importing of onnxruntime causes to load 39GB of data into the GPU, this leaves very little space for us to load our model and even run anything.

"is_zero" is undefined running resnet50 script.

To reproduce...

cloned the repo and tried running the examples both resnet and minilm.

I keep getting

RuntimeError: required keyword attribute 'is_zero' is undefined

seems to have something to do with ModuleBuilder -> mb.import_module(module._c, class_annotator)
env
Using the Apple Silicon M1 Snapshot version of torch-mlir.
Running on M1 macbook, python 3.9

attached screenshot of both resnet50_script and minilm

Disclaimer: new to torch-mlir

Locally generated shark_tank artifacts are not usable for pytests.

In Shark Downloader, we check if local hash for shark_tank artifacts matches upstream hash, and if it doesn't, all artifacts are downloaded from gs://shark_tank for the latest upstream hash, replacing local files.
This becomes a problem if one uses generate_sharktank.py to populate local shark_tank and run tests, as the upstream artifacts are used instead of the local artifacts (in my case, with significant changes).

I think this is of critical importance to our correspondence with the IREE team as well as our SHARK team's development process.
We have a few options to handle this:

add a pytest option to use local files (avoid SHARK downloader)
have shark downloader look in SHARK/gen_shark_tank/ before doing anything with google storage -- if locally generated artifacts are present, don't touch gs://shark_tank and simply use local artifacts.

to reproduce:

python generate_sharktank.py
pytest -s tank/MiniLM-L12-H384-uncased/

it will be evident that the artifacts are replaced by contents of gs://shark_tank/microsoft_MiniLM-L12-H384-uncased_tf/

add option to --save_mlir to pytest runs

Add support to save mlir files when running the tests

(new_dylib_venv) 139 anush@nod-shared-a100-3:~/github/shark$ IREE_SAVE_TEMPS=iree_temps_bert_dynamic  pytest tank/pytorch/bert_test.py::BertModuleTest::test_module_dynamic_cpu --save_mlir
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --save_mlir
  inifile: /home/anush/github/shark/pytest.ini
  rootdir: /home/anush/github/shark```

ci - improvment to-do-list

because of hash checking local artifacts of the nightly build aren't being tested, the existing latest will instead, this has the downstream effect of making impossible to automatically pass checks when a change to the tank is made.
add a ci that tests the generated pip packages in an end-user style

fix xlm-roberta lowering

pytest benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu]```

fails with:
====================================================== short test summary info =======================================================FAILED benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu] - OSError: Can't load tokenizer for 'xlm-roberta-base'...========================================================= 1 failed in 12.31s =========================================================

CUDA needs to default to sm_80 and use devicearrays

We need model the CUDA backend in SHARK to be similar to:

https://github.com/nod-ai/transformer-benchmarks/blob/435984a420a2f285f717aa4752c14c0cabfd8c96/benchmark.py#L397-L437


    if use_gpu:
        backend = "cuda"
        backend_config = "cuda"
        args = ["--iree-cuda-llvm-target-arch=sm_80", "--iree-hal-cuda-disable-loop-nounroll-wa"]
        ireert.flags.FUNCTION_INPUT_VALIDATION = False
        ireert.flags.parse_flags("--cuda_allow_inline_execution")

...

    # Setting up input on host and moving to device.
    host_inputs =[encoded_input["input_ids"], encoded_input["attention_mask"], encoded_input["token_type_ids"]]
    if use_gpu:
        device_inputs = [ireert.asdevicearray(config.device, a) for a in host_inputs]
    else:
        device_inputs = host_inputs

Add BertForMaskedLM to Shark Tank

https://huggingface.co/docs/transformers/v4.21.1/en/model_doc/bert#transformers.BertForMaskedLM

This model is currently 17x slower than Torch on A100 GPU and we would like to track this.

GPT-2 torch to linalg

HF transformers 4.19.x is broken

(new_dylib_venv) anush@nod-shared-a100-3:~/github/shark$ pytest tank/pytorch/tests/resnet101_test.py::Resnet101ModuleTest::test_module_static_cpu
================================================================================================= test session starts ==================================================================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- /home/anush/github/shark/new_dylib_venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0, typeguard-2.13.3
collecting ... Fatal Python error: Aborted

Current thread 0x00007efd4103a1c0 (most recent call first):
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 304 in _constant_eager_impl
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 279 in _constant_impl
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 267 in constant
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 343 in _constant_tensor_conversion_function
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 1623 in convert_to_tensor
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/profiler/trace.py", line 183 in wrapped
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 264 in args_to_matching_eager
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 77 in non_deterministic_ints_eager_fallback
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 50 in non_deterministic_ints
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 80 in non_deterministic_ints
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 381 in from_non_deterministic_state
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 349 in TFGenerationMixin
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 344 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 41 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 38 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872 in _get_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862 in __getattr__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 863 in __getattr__
  File "<frozen importlib._bootstrap>", line 1075 in _handle_fromlist
  File "/home/anush/github/shark/tank/pytorch/tests/test_utils.py", line 7 in <module>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/tank/pytorch/tests/resnet101_test.py", line 3 in <module>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/pathlib.py", line 533 in import_path
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 608 in _importtestmodule
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 519 in _getobj
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 301 in obj
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 536 in _inject_setup_module_fixture
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 522 in collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 768 in collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 643 in perform_collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_collection
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 321 in _main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/home/anush/github/shark/new_dylib_venv/bin/pytest", line 8 in <module>

Extension modules: torch._C, torch._C._fft, torch._C._linalg, torch._C._nn, torch._C._sparse, torch._C._special, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, sentencepiece._sentencepiece (total: 116)
Aborted (core dumped)```

Pinning to 4.18 as a workaround

TensorFlow MiniLM segfaults for GPU cases on A100.

To reproduce:

pytest tank/tf/MiniLM-L12-H384-uncased_tf_test.py -k "gpu"

iree-compile fails on some vision transformers with GPU

FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_dynamic_gpu
FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_static_gpu
FAILED tank/facebook_deit-small-distilled-patch16-224_torch/facebook_deit-small-distilled-patch16-224_torch_test.py::DeitModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_dynamic_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_torch/google_vit-base-patch16-224_torch_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/nvidia_mit-b0_torch/nvidia_mit-b0_torch_test.py::MitModuleTest::test_module_static_gpu

Error Log (common for cases shown above):

E         iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
E         Diagnostics:
E         
E         
E         Invoked with:
E          iree-compile /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=cuda --iree-llvm-embedded-linker-path=/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host --iree-hal-cuda-disable-loop-nounroll-wa --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64
E         
E         Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.

Alexnet failures on AMD

Alexnet seems to fail static cases on AMD for some reason - but seems like something in the test script than underlying infra


anush@alderlake ~/github/shark
 % pytest tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0
collected 1 item                                                                                                                                                                      

tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan FAILED                                                                                   [100%]

====================================================================================== FAILURES =======================================================================================
_____________________________________________________________________ AlexnetModuleTest.test_module_static_vulkan _____________________________________________________________________

a = (<alexnet_torch_test.AlexnetModuleTest testMethod=test_module_static_vulkan>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

shark.venv/lib/python3.10/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tank/alexnet_torch/alexnet_torch_test.py:78: in test_module
    self.module_tester.create_and_check_module(dynamic, device)
tank/alexnet_torch/alexnet_torch_test.py:43: in create_and_check_module
    shark_module.compile()
shark/shark_inference.py:87: in compile
    self.shark_runner = SharkRunner(
shark/shark_runner.py:81: in __init__
    ) = get_iree_compiled_module(
shark/iree_utils/compile_utils.py:122: in get_iree_compiled_module
    return get_iree_module(flatbuffer_blob, device, func_name)
shark/iree_utils/compile_utils.py:106: in get_iree_module
    ctx.add_vm_module(vm_module)
shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:255: in add_vm_module
    self.add_vm_modules((vm_module,))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <iree.runtime.system_api.SystemContext object at 0x7f62b0332e60>, vm_modules = (<VmModule module : [forward, __init]>,)

    def add_vm_modules(self, vm_modules):
      assert self._is_dynamic, "Cannot 'add_module' on a static context"
      for m in vm_modules:
        if m.name in self._bound_modules:
          raise ValueError(f"Attempt to register duplicate VmModule: '{m.name}'")
        bound_module = BoundModule(self, m)
        self._bound_modules[m.name] = bound_module
        if self._tracer:
          self._tracer.add_module(bound_module.traced_module)
>     self._vm_context.register_modules(vm_modules)
E     RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/native_executable.cc:127: UNAVAILABLE; VK_ERROR_INITIALIZATION_FAILED; while invoking native function hal.executable.create; while calling import; 
E     [ 1]   native hal.executable.create:0 -
E     [ 0] bytecode module.__init:1788 <stdin>:134:11
E           at <stdin>:9:3

shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:252: RuntimeError
-------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
Found Radeon XT Device. Using rdna2-unknown-linux
The models are present in the /home/anush/.local/shark_tank/. If you want a fresh 
                download, consider deleting the directory.
Found Radeon XT Device. Using rdna2-unknown-linux
-------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------
Copying gs://shark_tank/latest/alexnet_torch/hash.npy...
/ [1 files][  640.0 B/  640.0 B]                                                
Operation completed over 1 objects/640.0 B.                                      
'DISPLAY' environment variable not set... skipping surface info
=============================================================================== short test summary info ===============================================================================
FAILED tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan - RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/na...
================================================================================= 1 failed in 12.42s ==================================================================================

Incompatible version range of IREE dependency

The nodai-shark pip package specifies version dependencies to iree-runtime and iree-compiler that are too old.

$ pipdeptree -p nodai-shark
nodai-SHARK==20220810.173
  - iree-compiler [required: >=20220427.13, installed: 20220714.204]
    - numpy [required: Any, installed: 1.22.4]
    - PyYAML [required: Any, installed: 6.0]
  - iree-runtime [required: >=20220427.13, installed: 20220714.204]
    - numpy [required: Any, installed: 1.22.4]
    - PyYAML [required: Any, installed: 6.0]
  - numpy [required: Any, installed: 1.22.4]
  - PyYAML [required: Any, installed: 6.0]
  - torch-mlir [required: >=20220428.420, installed: 20220606.495]
    - numpy [required: Any, installed: 1.22.4]
    - torch [required: ==1.13.0.dev20220606+cpu, installed: 1.13.0.dev20220606+cpu]
      - typing-extensions [required: Any, installed: 4.2.0]

When running with

iree-compiler      20220604.24
iree-runtime       20220604.24

I get this error

$ python ./resnet50_script.py --device="cpu"
/home/petkantchin/.local/shark_tank/
load image from https://upload.wikimedia.org/wikipedia/commons/2/26/YellowLabradorLooking_new.jpg
Copying gs://shark_tank/274650f/resnet50_torch/function_name.npy...
Copying gs://shark_tank/274650f/resnet50_torch/golden_out.npz...                
Copying gs://shark_tank/274650f/resnet50_torch/hash.npy...                      
Copying gs://shark_tank/274650f/resnet50_torch/inputs.npz...                    
\ [4 files][593.2 KiB/593.2 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://shark_tank/274650f/resnet50_torch/resnet50_dynamic_torch.mlir...
Copying gs://shark_tank/274650f/resnet50_torch/resnet50_torch.mlir...           
- [6 files][391.5 MiB/391.5 MiB]   10.7 MiB/s                                   
Operation completed over 6 objects/391.5 MiB.                                    
Target triple found:x86_64-linux-gnu
ERROR:root:Could not create driver local-task (not registered)
Traceback (most recent call last):
  File "./resnet50_script.py", line 72, in <module>
    shark_module.compile()
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 87, in compile
    self.shark_runner = SharkRunner(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_runner.py", line 81, in __init__
    ) = get_iree_compiled_module(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 120, in get_iree_compiled_module
    return get_iree_module(flatbuffer_blob, device, func_name)
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 102, in get_iree_module
    config = ireert.Config(IREE_DEVICE_MAP[device])
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 115, in __init__
    self.driver = _create_default_iree_driver(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 97, in _create_default_iree_driver
    raise RuntimeError(
RuntimeError: Could not create any requested driver ['local-task'] (available=['cuda', 'dylib', 'dylib-sync', 'vmvx', 'vmvx-sync', 'vulkan']) : {}

Updating IREE to 20220714.204 fixed the issue.

iree-compiler      20220714.204
iree-runtime       20220714.204

I suspect that the dependency version requirements has to be fixed. Other earlier versions may be OK as well. I have not checked.

Shark downloader has a lot of duplicate code

Title says it all

hook up tflitehub tests to pytest

will disable them for now

Add Option for setup_venv.sh to choose frontends

Many users have their favourite deep learning framework of choice and not use others. We should set the setup_venv.sh to be have an option to choose whether they are intending to use torch-frontend, tf-frontend, or both. This way users can have a leaner environment!

Torchvision Models failing for dynamic case on Vulkan backend.

Error log for resnet101 that is similar (if not identical) to the error messages produced from the dynamic vulkan case on a few of our PyTorch models: gist

This error is also encountered for the dynamic vulkan case on the following models:

alexnet_torch
mobilenet_v3_small_torch
resnet101_torch
resnet18_torch
resnet50_torch
squeezenet1_0_torch
wide_resnet50_2

These cases will be xfailed.

autogenerate supported model list

Generate a webpage that can be rendered in the main README.md from each perf-linux run

some tflite tests fail on macOS

(shark.venv) anush@MacStudio shark % pytest tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu 
========================================================================================================================== test session starts ===========================================================================================================================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0 -- /Users/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/anush/github/shark, configfile: pytest.ini
plugins: xdist-2.5.0, forked-1.4.0
collected 1 item                                                                                                                                                                                                                                                         

tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu Fatal Python error: Segmentation fault

Current thread 0x0000000104f34580 (most recent call first):
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 75 in _create_default_iree_driver
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 115 in __init__
  File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 102 in get_iree_module
  File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 120 in get_iree_compiled_module
  File "/Users/anush/github/shark/shark/shark_runner.py", line 80 in __init__
  File "/Users/anush/github/shark/shark/shark_inference.py", line 73 in compile
  File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 111 in create_and_check_module
  File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 137 in test_module_static_cpu
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 591 in run
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 650 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/unittest.py", line 327 in runtest
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 259 in <lambda>
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 258 in call_runtest_hook
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 219 in call_and_report
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 130 in runtestprotocol
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 322 in _main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/Users/anush/github/shark/shark.venv/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, PIL._imaging (total: 40)
zsh: segmentation fault  pytest

Feature Request: Flag for showing result of each dispatch

It would be helpful for debugging to be able to see the result of each dispatch when running a module. IREE already does this, e.g.

$ iree-run-module --device=vulkan --entry_function=forward --function_input=1x4xf32=1.0 --module_file=model.vmfb
EXEC @forward
=== forward_dispatch_0::forward_dispatch_0_generic_3x4 inputs ===

=== forward_dispatch_0::forward_dispatch_0_generic_3x4 outputs ===
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]

=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 inputs ===
1x4xf32=[1 1 1 1]
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]

=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 outputs ===
1x3xf32=[1 0 0]

result[0]: hal.buffer_view
1x3xf32=[1 0 0]

This could be a flag such as

shark_module = SharkInference(
    model, func_name, device="vulkan", mlir_dialect="linalg", print_dispatches=True
)

and gives a numpy view of the dispatch results or just shows the iree output.

Documentation out of date for running tests

`pytest shark/tests/models

If on Linux for quicker results:

pytest shark/tests/models -n auto`

seems incorrect

minilm_jit example doesn't work

(shark.venv) a@debian-1:~/github/dshark$ python -m  shark.examples.minilm_jit
/home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  " and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Target triple found:x86_64-linux-gnu
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
(shark.venv) a@debian-1:~/github/dshark$

TF deberta-base fails during iree-import-tf (illegal ops)

To reproduce:
pytest tank/tf/hf_masked_lm/roberta-base_tf_test.py

The illegal operation is tf.BatchMatMulV2.

add SHARK backend command line arg

Allows for easy switching from the command line for the included examples.

Add documentation for using SHARK APIs on Apple M1

Catch all tracker for putting together a quickstart guide.

TF roberta/XLM roberta numerics issues on A100 if num_iterations >= 100

XLM-roberta assert failure:

>       np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.01
E       
E       Mismatched elements: 5505 / 4000032 (0.138%)
E       Max absolute difference: 0.09074688
E       Max relative difference: 3171.7234
E        x: array([[[ 2.683771,  0.183121, 10.453473, ...,  6.315439,  2.047505,
E                 3.32532 ],
E               [-0.482143,  0.061366,  9.494564, ...,  6.593861,  1.620899,...
E        y: array([[[ 2.671124,  0.182537, 10.456981, ...,  6.322483,  2.0[515](https://github.com/nod-ai/SHARK/runs/7868468050?check_suite_focus=true#step:9:516)46,
E                 3.322179],
E               [-0.481575,  0.061454,  9.495419, ...,  6.59101 ,  1.619549,...

roberta-base-tf assert failure:

>       np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.01
E       
E       Mismatched elements: 453 / 804240 (0.0563%)
E       Max absolute difference: 0.04533577
E       Max relative difference: 763.70135
E        x: array([[[33.55235 , -3.827327, 18.863625, ...,  3.420343,  6.171632,
E                11.648125],
E               [-0.598835, -4.141003, 14.904708, ..., -4.515923, -1.790529,...
E        y: array([[[33.567413, -3.829913, 18.870962, ...,  3.422938,  6.174327,
E                11.656706],
E               [-0.58585 , -4.141752, 14.913631, ..., -4.516505, -1.788759,...

To reproduce:

On a100 instance,

remove xfail for gpu case in tank/roberta-base_tf/roberta-base_tf_test.py
remove xfail for gpu case in tank/xlm-roberta-base_tf/xlm-roberta-base_tf.py
run: pytest tank/*roberta -k "gpu"

CUDA memory is not released after individual test cases.

Our SHARK model tests (all gpu cases) do not free some (maybe all) allocated CUDA memory after test execution is completed.

ERROR    root:system_api.py:88 Could not create default driver device cuda
Traceback (most recent call last):
  File "/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 86, in _create_default_iree_driver
    device = driver.create_default_device()
RuntimeError: Error creating default device: iree/runtime/src/iree/hal/drivers/cuda/cuda_device.c:146: INTERNAL; CUDA driver error 'CUDA_ERROR_OUT_OF_MEMORY' (2): out of memory

To reproduce:

Setup a system+environment to run GPU tests for SHARK.
Run:

pytest tank -k "gpu"

(optional but highly recommended to run watch nvidia-smi concurrently to observe in real-time)

undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py.

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
#Install deps for test script
pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal

I got this error:

Traceback (most recent call last):
  File "./resnet50_script.py", line 7, in <module>
    from shark.shark_inference import SharkInference
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
    from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir import (
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
    from ....._mlir_libs._jit_ir_importer import *
ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index

Numerical Errors Due to Reduced Precision from TF32

With iree-org/iree#9975 and other upcoming changes, we'll be looking to enable TensorCore on more kernels for performance. This may change results in some tests enough to fail assertions checking for correctness.

EX: distilbert tf

========================================================================== FAILURES ===========================================================================
_________________________________________________________ DistilBertModuleTest.test_module_static_gpu _________________________________________________________

self = <distilbert-base-uncased_tf_test.DistilBertModuleTest testMethod=test_module_static_gpu>

    @pytest.mark.skipif(
        check_device_drivers("gpu"), reason=device_driver_info("gpu")
    )
    def test_module_static_gpu(self):
        dynamic = False
        device = "gpu"
>       self.module_tester.create_and_check_module(dynamic, device)

tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:48: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <distilbert-base-uncased_tf_test.DistilBertModuleTester object at 0x7fcf101fb430>, dynamic = False, device = 'gpu'

    def create_and_check_module(self, dynamic, device):
        model, func_name, inputs, golden_out = download_tf_model(
            "distilbert-base-uncased"
        )
    
        shark_module = SharkInference(
            model, func_name, device=device, mlir_dialect="mhlo"
        )
        shark_module.compile()
        result = shark_module.forward(inputs)
>       np.testing.assert_allclose(golden_out, result, rtol=1e-02, atol=1e-03)
E       AssertionError: 
E       Not equal to tolerance rtol=0.01, atol=0.001
E       
E       Mismatched elements: 4292 / 488352 (0.879%)
E       Max absolute difference: 0.02955437
E       Max relative difference: 48.456425
E        x: array([[[ -6.442754,  -6.393649,  -6.419188, ...,  -5.638614,
E                 -5.491579,  -3.414548],
E               [ -7.036943,  -6.988676,  -7.100483, ...,  -6.865986,...
E        y: array([[[ -6.442857,  -6.394039,  -6.419235, ...,  -5.639162,
E                 -5.492108,  -3.414864],
E               [ -7.039788,  -6.991871,  -7.102982, ...,  -6.868385,...

tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:28: AssertionError
=================================================================== short test summary info ===================================================================
FAILED tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py::DistilBertModuleTest::test_module_static_gpu - AssertionError: 
===================================================================== 1 failed in 45.37s ======================================================================

Looks like the expected value difference is 0.02955437 just above 0.01 tolerance. This and other tolerances may needed to be updated.

Reduce amount of dependencies and disk space with an install

add a TF /PT flag to disable TF32

https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_tensor_float_32_execution

Change frontend from strings to enum

Currently backend selection is in string, while it's great/working for now. May be confusing on what backends are valid, and may produce bugs later on (for example: typos in string can just flow through compile phase and compile "something" but will produce error later on, will be hard to debug this issue if someone doesn't realize the typo).

Mark tests that require IMPORTER=1 tools

Importer tools other than torch-mlir tools need to be marked so pytests can run on venvs that dont have IMPORTER=1

DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

Error output:

error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
note: see current operation: %416 = "torch.aten.view"(%414, %415) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>                                                                                   
Traceback (most recent call last):
  File "/home/ean/SHARK/generate_sharktank.py", line 180, in <module>
    save_torch_model(args.torch_model_csv)
  File "/home/ean/SHARK/generate_sharktank.py", line 68, in save_torch_model
    mlir_importer.import_debug(
  File "/home/ean/SHARK/shark/shark_importer.py", line 163, in import_debug
    imported_mlir = self.import_mlir(
  File "/home/ean/SHARK/shark/shark_importer.py", line 109, in import_mlir
    return self._torch_mlir(is_dynamic, tracing_required), func_name
  File "/home/ean/SHARK/shark/shark_importer.py", line 74, in _torch_mlir
    return get_torch_mlir_module(
  File "/home/ean/SHARK/shark/torch_mlir_utils.py", line 150, in get_torch_mlir_module
    pm.run(mb.module)
RuntimeError: Failure while executing pass pipeline.

Reproduce:

add distilbert-base-uncased,True,hf to tank/pytorch/torch_model_list.csv
run python generate_sharktank.py

Intel macOS crashes with loading libtorch twice

Upstream issue is here: llvm/torch-mlir#853

Workaround:

# Replace shark_venv with whatever your venv is
cd shark_venv/lib/python3.10/site-packages/torch_mlir/.dylibs
rm *.dylib
ln -s ../../torch/lib/libc10.dylib
ln -s ../../torch/lib/libshm.dylib
ln -s ../../torch/lib/libtorch.dylib
ln -s ../../torch/lib/libtorch_cpu.dylib
ln -s ../../torch/lib/libtorch_python.dylib

Check CUDA version and enable the corresponding flag in IREE cuda

--iree-cuda-llvm-target-arch=sm_80 should be set with something like:

https://gist.githubusercontent.com/f0k/63a664160d016a491b2cbea15913d549/raw/7f3ada81a4807771f2e5fcfc20a4e60da8aced4f/cuda_check.py

TF Funnel Transformer fails for gpu and vulkan cases.

To reproduce all failing cases (requires #199 ):

pytest tank/tf/hf_masked_lm/funnel-transformer_tf_test.py -k "not cpu"

There's a numerics issue on gpu cases (perhaps IREE upstream issue?)

Vulkan cases fail during iree-compile.

Longformer-base-4096 fails import to IREE (illegal ops)

All cases fail with the following error on TensorFlow longformer:

E         <unknown>:0: error: The following illegal operations still remain: 
E               tf.BatchMatMulV2 (count: 24)
E               tf.StridedSlice (count: 24)
E               tf.Tile (count: 12)
E               tf.TensorScatterAdd (count: 36)
E               tf.Where (count: 3)

To reproduce:

pytest tank/tf/hf_masked_lm/longformer-base-4096_tf_test.py -k "static_cpu"

(errors are the same for all cases, so one test case should be sufficient for repro purposes.)

TF tiny-random-flaubert numerics issue on Vulkan. (A100)

To reproduce:

pytest -s tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py -k "vulkan"

SHARK results fail to validate against TF golden values:

E       assert True == False
E        +  where False = compare_tensors_tf(<tf.Tensor: shape=(1, 16, 68729), dtype=float32, numpy=\narray([[[ 0.53806955,  0.14671442,  0.        , ..., -0.2818507 ,\n          0.08806332,  0.14761735],\n        [-0.00822675, -0.0385315 ,  0.        , ...,  0.00425125,\n          0.06710303, -0.04765199],\n        [-0.1951161 , -0.1519102 ,  0.        , ...,  0.1955705 ,\n          0.13747491, -0.2091976 ],\n        ...,\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ]]], dtype=float32)>, array([[[ 0.5380049 ,  0.13949418,  0.        , ..., -0.28169703,\n          0.08681311,  0.14958172],\n        [-0.00976601, -0.03920554,  0.        , ...,  0.00616576,\n          0.06795865, -0.0488795 ],\n        [-0.1871761 , -0.15056488,  0.        , ...,  0.19165687,\n          0.13996662, -0.20523356],\n        ...,\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ]]], dtype=float32))

tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py:86: AssertionError

Use TEmpFileSaver API in IREE

Use TempFileSAver to save each PyTest output artifacts and recreate if the test fails.

looks like you can also write your own TempFileSaver to do whatever you want (see the section just below that), like in this test: https://github.com/google/iree/blob/427a94a09be70631c5d9f89d12616bb7f1954257/compiler/src/iree/compiler/API/python/test/tools/compiler_core_test.py#L176-L199

https://discord.com/channels/689900678990135345/689900680009482386/985726147448950805

Fix tensorflow GPU memory management for pytest runs.

Currently, two cases of GPU memory management issues appear when running pytests for Tensorflow masked_lm models.

When running gpu tests for albert_base_v2, the static_gpu case (currently included in this issue) passes if tolerance values for compare_tensors_tf are increased to rtol=1e-02 and atol=1e-01. All of the tests mentioned in that issue pass with the increased tolerances. This isn't really acceptible accuracy, but we are waiting from the IREE team, so we can work around it for now to get memory management squared away.

TF albert on CPU passes for dynamic and static cases only if the tests are run individually. Tensorflow's allocated memory in CUDA does not free up for the second GPU test whether the first passes or not.

If we try bert_static_gpu, however, cuda runs out of memory even when the test is run by itself -- TF allocates ~39GB of gpu memory for the model at the beginning of the test and we run into cuda OOM when shark_module.compile() is called (hal allocation in IREE).

All of the TF model tests in tank/tf/hf_masked_lm/ share this issue.

nod-ai / shark Goto Github PK

shark's People

Contributors

Stargazers

Watchers

Forkers

shark's Issues

Some issues/Enhancements that need fixing

1. Integrate running of TF in HF-Benchmarker.

2. Fix up HF Benchmark Runtime with GPU

If on Linux for quicker results:

Recommend Projects

Recommend Topics

Recommend Org