Giter Site home page Giter Site logo

apple-m1-bert's Introduction

Apple-M1-BERT Inference

Setup Environment:

  1. Install Miniforge from their official [page] (https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh)

Once downloaded:

chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
  1. Create an environment for TVM and all dependencies to run this repo:
# Create a conda environment
conda create --name tvm-m1 python=3.8
conda activate tvm-m1

# Install TensorFlow and dependencies
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal

# Install PyTorch and Transformers
conda install -c pytorch pytorch torchvision
conda install -c huggingface transformers -y

# Install TVM dependencies
conda install numpy decorator attrs cython
conda install llvmdev
conda install cmake
conda install tornado psutil xgboost cloudpickle pytest
  1. Clone and Build TVM

Clone TVM

git clone --recursive https://github.com/apache/tvm tvm
cd tvm && mkdir build
cp cmake/config.cmake build

Edit the config.cmake file in the build directory setting the following

USE_METAL ON
USE_LLVM ON
USE_OPENMP gnu

Build TVM

make -DCMAKE_OSX_ARCHITECTURES=arm64 ..

Set the following environment variables or add to ~/.zshrc for persistency:

export TVM_HOME=/Users/tvmuser/git/tvm
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}

For more details on setting up TensorFlow on MacOS click here

Run the TF and Keras benchmarks:

  1. Dump bert-base-uncased model into a graph by running python dump_tf_graph.py.
  2. Get Keras CPU benchmark by running python run_keras.py --device cpu. Sample output: [Keras] Mean Inference time (std dev) on cpu: 579.0056343078613 ms (20.846548561801576 ms)
  3. Get Keras GPU benchmark by running python run_keras.py --device gpu. Sample output: [Keras] Mean Inference time (std dev) on gpu: 1767.4337482452393 ms (27.00876036973127 ms)
  4. Get CPU benchmark by running python run_graphdef.py --device cpu --graph-path ./models/bert-base-uncased.pb. Sample output: [Graphdef] Mean Inference time (std dev) on cpu: 512.3187007904053 ms (6.115432898641167 ms)
  5. Get GPU benchmark by running python run_graphdef.py --device gpu --graph-path ./models/bert-base-uncased.pb. Sample output: [Graphdef] Mean Inference time (std dev) on gpu: 543.5237417221069 ms (4.210226676450006 ms)

Running TVM AutoScheduler Search

We have provided search_dense_cpu.py and search_dense_gpu.py for searching on M1 CPUs and M1 GPUs. Both scripts are using RPC. You should run each of these commands in separate windows or use a session manager like screen or tmux for each command.

The scripts require that you have converted HuggingFace's bert-base-uncased model to relay. This can be done via the dump_pt.py script.

  1. Start RPC Tracker: python -m tvm.exec.rpc_tracker --host 0.0.0.0 --port 9190
  2. Start RPC Server: python -m tvm.exec.rpc_server --tracker 127.0.0.1:9190 --port 9090 --key m1 --no-fork

Before continuing make sure all existing logs have been removed from the ./assets folder.

Once the scripts have completed you should see an output like this:

Extract tasks...
Compile...
Upload
run
Evaluate inference time cost...
Mean inference time (std dev): 35.54 ms (1.39 ms)

Running TVM Inference

If you decide not to run a search and use the provided pre-searched logs from AutoScheduler in assets folder for the M1Pro CPU and GPU comment out the following:

    if not os.path.exists(log_file):
        tasks, task_weights = auto_scheduler.extract_tasks(
            mod["main"], params, target=target_host, target_host=target_host)
        for idx, task in enumerate(tasks):
            print("========== Task %d  (workload key: %s) ==========" %
                  (idx, task.workload_key))
            print(task.compute_dag)

        run_tuning(tasks, task_weights, log_file)

Why is TVM much faster than Apple TensorFlow with MLCompute?

  • TVM AutoScheduler is able to using machine learning to search out CPU/GPU code optimization; Human experts are not able to cover all optimizations.
  • TVM is able to fuse any subgraphs qualified of computation nature and directly generate code for the target; Human experts are only able to manually add fusing patterns, manually optimize certain subgraph.
  • We visualized bert-base-uncased graph in Apple TensorFlow. Here is a sample block in BERT. sample block As we can see, MLCompute tried to rewrite a TF graph, replace some operators to what it supports In real practice perfect covert is alway hard, in BERT case, we can see MatMul operator is swapped to MLCMatMul, LayerNorm operator is swapped to MLCLayerNorm, while all others operators are not covered by MLCompute. In GPU case, data is copied between CPU and GPU almost in every step. On the other hand, TVM directly generates ALL operators on GPU, so it is able to maximize gpu utilization.

If you'd like to learn more about TVM please visit our Apache Project site or the OctoML site as well as our OctoML's blog.

apple-m1-bert's People

Contributors

antinucleon avatar philatoctoml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apple-m1-bert's Issues

ImportError: TFBertForSequenceClassification requires the TensorFlow library but it was not found in your environment.

Even though tensorflow is installed as can be seen below:

python -c 'import tensorflow as tf; print(tf.__version__)' 2.4.0-rc0
upon running following:
python dump_tf_graph.py
I am getting error that tensorflow library is not found.

Traceback (most recent call last): File "dump_tf_graph.py", line 25, in <module> main() File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "dump_tf_graph.py", line 19, in main model = tf_utils.get_huggingface_model(model_name, batch_size, seq_len) File "/Users/ankitadhar/Documents/NLP/worksheet/git-bert/Apple-M1-BERT/tf_utils.py", line 40, in get_huggingface_model model = _load_keras_model( File "/Users/ankitadhar/Documents/NLP/worksheet/git-bert/Apple-M1-BERT/tf_utils.py", line 21, in _load_keras_model model = module.from_pretrained(name) File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/transformers/utils/dummy_tf_objects.py", line 372, in from_pretrained requires_tf(self) File "/Users/ankitadhar/miniforge3/envs/tf_env/lib/python3.8/site-packages/transformers/file_utils.py", line 550, in requires_tf raise ImportError(TENSORFLOW_IMPORT_ERROR.format(name)) ImportError: TFBertForSequenceClassification requires the TensorFlow library but it was not found in your environment. Checkout the instructions on the installation page: https://www.tensorflow.org/install and follow the ones that match your environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.