Giter Site home page Giter Site logo

iree-nvgpu's Introduction

OpenXLA NVIDIA GPU Compiler and Runtime

This project contains the compiler and runtime plugins enabling specialized targeting of the OpenXLA platform to NVIDIA GPUs. It builds on top of the core IREE toolkit.

Development setup

The project can be built either as part of IREE by manually specifying plugin paths via -DIREE_COMPILER_PLUGIN_PATHS, or for development tailored to NVIDIA GPUs specifically, can be built directly:

cmake -GNinja -B build/ -S . \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DIREE_ENABLE_ASSERTIONS=ON \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DIREE_ENABLE_LLD=ON

# Recommended:
# -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache

Note that you will need a check-out of the IREE codebase in ../iree relative to the directory where the openxla-nvgpu compiler was checked out. Running the sync_deps.py script should bring in all source dependencies at needed versions (into the parent directory).

Additional options for configuring IREE are in the IREE getting started guide for details of how to set this up.

Installing dependencies

You must have a CUDA Toolkit installed together with a cuDNN (see instructions).

See the project settings for options to build without components requiring full dependencies.

On Linux platform path to libcudnn.so should be added to LD_LIBRARY_PATH.

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

Running tests

Some of the tests can run only on an Ampere+ devices because they rely on the cuDNN runtime fusion engine.

Tests that depend on having a device present can be disabled with -DOPENXLA_NVGPU_INCLUDE_DEVICE_TESTS=OFF.

cmake --build build --target openxla-nvgpu-run-tests

Project Maintenance

This section is a work in progress describing various project maintenance tasks.

Pre-requisite: Install openxla-devtools

pip install git+https://github.com/openxla/openxla-devtools.git

Sync all deps to pinned versions

openxla-workspace sync

Update IREE to head

This updates the pinned IREE revision to the HEAD revision at the remote.

# Updates the sync_deps.py metadata.
openxla-workspace roll iree
# Brings all dependencies to pinned versions.
openxla-workspace sync

Full update of all deps

This updates the pinned revisions of all dependencies. This is presently done by updating openxla-pjrt-plugin to remote HEAD and deriving the IREE dependency from its pin.

# Updates the sync_deps.py metadata.
openxla-workspace roll nightly
# Brings all dependencies to pinned versions.
openxla-workspace sync

Pin current versions of all deps

This can be done if local, cross project changes have been made and landed. It snapshots the state of all deps as actually checked out and updates the metadata.

openxla-workspace pin

iree-nvgpu's People

Contributors

bviyer avatar chsigg avatar ezhulenev avatar frgossen avatar ftynse avatar gmngeoffrey avatar iree-github-actions-bot avatar jpienaar avatar matthias-springer avatar mjsml avatar sherhut avatar stellaraccident avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

iree-nvgpu's Issues

[Epic] Production integration of cuBLAS, cuDNN, and Triton

MS2 Epic level item - to track library integration work.

### P1 cuBLAS
- [ ] cuBLAS integration for GEMMs - needs owner
### P1 Triton
- [ ] Triton integration @ezhulenev 
- [ ] #54 
- [ ] #13848
### P2 cuDNN
- [ ] cuDNN for MHA (flash attention) integration
- [ ] https://github.com/openxla/openxla-nvgpu/issues/98

[cuDNN] Use destination-passing style @cudnn.execute API

Instead of allocating result buffers in the custom cuDNN module we should pass everything as an argument.

Example:

%ret = flow.tensor.empty : tensor<4x4x4x4xf32>
%ret_buffer = hal.tensor.export %ret : tensor<4x4x4x4xf32> -> !hal.buffer
call @cudnn.execute(..., %ret_buffer) : (...) -> ()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.