Giter Site home page Giter Site logo

hfuse's Introduction

Artifect Evaluation

Prerequest:

You first need to install CUDA and docker on your machine: https://www.celantur.com/blog/run-cuda-in-docker-on-linux/

Step 1: Download container

docker run --rm --privileged --gpus all -it leeleo3x/hfuse:latest bash

Step 2: Generate Fused Kernels (Optional)

Note that we have provided pre-fused kernels and you can skip this step.

  • Fuse DL kernels:
cd /root
mkdir fused-torch
cd fused-torch
HFUSE_PARALLEL=1 ../HFuse/build/tools/llvm-smart-fuser/llvm-smart-fuser ../HFuse/configs/ml_fusion.yaml ../HFuse/configs/ml_kernels.yaml  ../TorchKernel/fused/
  • Fuse crypto kernels:
cd /root
mkdir fused-crypto
cd fused-crypto
HFUSE_PARALLE=0 ../HFuse/build/tools/llvm-smart-fuser/llvm-smart-fuser ../HFuse/configs/crypto_fusion.yaml ../HFuse/configs/crypto_kernels.yaml  ../ethminer/libethash-cuda

You may use HFUSE_PARALLEL to enable parallel fusing. Note that you can only use it while fusing DL kernels. Fusing all kernels takes ~30min.

  • Next, you need to move the fused kernels to the corresponding folders.
cd /root
mv ./fused-torch/* ./TorchKernel/fused
mv ./fused-crypto/* ./ethminer/libethash-cuda/
  • Note: HFuse makes several implace modifications of the input kernels due to the limitation of Clang Toolchain. If you cancel a fusing task (Ctrl-C) or the system crashes due to unknown reseaons. You need to go to the source file repo and revert all changes made by HFuse.

  • To revert changes of crypto kernels:

cd /root/ethminer/
git checkout .
  • To revert changes of DL kernels:
cd /root/TorchKernel
git checkout .

Step 3: Build Fused Kernels

  • Build DL kernels:
cd /root/TorchKernel
./build.sh
  • Build crypto kernels:
cd /root/ethminer
mkdir build
cd build
cmake ..
make fuser -j4

Building two projects take ~30min.

Step 4: Run Fused Kernels

  • To run DL kernels
cd /root/TorchKernel
/usr/local/cuda-11.5/bin/nvprof --csv --log-file performance.csv python3 ./call_{arch}.py
/usr/local/cuda-11.5/bin/nvprof -f -o dl.nvprof python3 ./call_{arch}.py
python3 ~/nvprof2json/nvprof2json.py dl.nvprof > dl.json

And repace {arch} with the GPU you use (1080 or v100).

  • To run crypto kernels
cd /root/ethminer
/usr/local/cuda-11.5/bin/nvprof --csv --log-file performance.csv ./build/fuse/fuser
/usr/local/cuda-11.5/bin/nvprof -f -o crypto.nvprof ./build/fuse/fuser
python3 ~/nvprof2json/nvprof2json.py crypto.nvprof > crypto.json

The execution time of each kernel are stored in /root/ethminer/performance.csv and /root/TorchKernel/performance.csv.

  • To visualize kernel execution time results (Figure 7)
mv /root/TorchKernel/dl.json /root/HFuse/scripts/data-new
mv /root/ethminer/crypto.json /root/HFuse/scripts/data-new
cd /root/HFuse/scripts/
python3 analyze_nvprof.py ./data-new/dl.json ./data-new/crypto.json

The graph is stored in /root/HFuse/scripts/fused.png

  • To collect kernel metrics (Figure 8)
cd /root/ethminer
/usr/local/cuda-11.5/bin/nvprof --csv --log-file metrics.csv --events "elapsed_cycles_pm" --metrics "issue_slot_utilization,achieved_occupancy,stall_memory_dependency" ./build/fuse/fuser
cd /root/TorchKernel
/usr/local/cuda-11.5/bin/nvprof --csv --log-file metrics.csv --events "elapsed_cycles_pm" --metrics "issue_slot_utilization,achieved_occupancy,stall_memory_dependency" python3 call.py

You will see the metrics in /root/ethminer/metrics.csv and /root/TorchKernel/metrics.csv files. Each row shows the name of the kenel, the types of the metrics collected, and the value of the metrics.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.