Giter Site home page Giter Site logo

fbusabiaga / kff Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wahibium/kff

0.0 1.0 0.0 7.97 MB

Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels

License: MIT License

Makefile 0.13% Cuda 81.87% C++ 10.38% C 1.80% MATLAB 5.79% Objective-C 0.02% M 0.02%

kff's Introduction


      An End-To-End Automated Method for 
      GPU Multi-Kernel Transformations to
      Exploit Inter-Kernel Data Locality 
          of Stencil Applications

author: Mohamed Wahib

version: 0.1 Alpha

released: May 2015

license: MIT License

language: C++

This project includes different components of an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The transformation is based on two basic operations, kernel fission and fusion, and relies on a series of steps: gathering metadata, generating graphs expressing dependencies and precedency constraints, searching for optimal kernel fissions/fusions, and code generation. Simple annotations are provided for enabling CUDA-to-CUDA transformations at which the user-written kernels are collectively replaced by auto- generated kernels optimized for locality. Driven by the flexibility required for accommodating different applications, we propose a workflow transformation approach to enable user intervention at any of the transformation steps. We demonstrate the practicality and effectiveness of automatic transformations in exploiting exposed data localities using real-world weather models of large codebases having dozens of kernels and data arrays. Experimental results show that the proposed end-to-end automated approach, with minimum intervention from the user, yields improvement in performance that is comparable to manual kernel fusion.

The project includes the following components:

1- LOGGA: a grouped genetic algorithm, which searches for optimal kernel fissions/fusions that would generate the ideal data reuse for the exposed locality.

2- Translator: a program for translating the original CUDA code to new CUDA code for which the kernel transformation was applied. The translator uses ROSE compiler to parse, change and unparse the original source code.

3- Metadata Gatherer: a set of tools to gather metadata about the performance and characteristics of the original program.

4- DDG and OEG Generators: a tools applying heuristics to extract the Data dependency Graph and Order-Execution-Graphs from the source code. The tools also allow amending the graphs.

The components mentioned above will be released in the stated order after testing and verifying each component individually. Each component is designed to be used as a standalone tool or as part of the end-to-end framework.

Latest component --- LOGGA ---


      Locality Optimization Grouped 
        Genetic Algorithm (LOGGA)

  1. INTRODUCTION

The instructions for compiling and using the implementation , version 0.1, can be found below.

The short version of the instructions for compiling the code, using the resulting executable and some more comments follow. However, we encourage you to read the report in order to take advantage of the features of the implementation and understand what is actually going on when you see all the outputs.

  1. External Dependencies

KFF depends on the following external software:

In addition, the following platform-specific tools and libraries are required when using the respective platform:

  1. COMPILATION

For compiling LOGGA, in Makefile, change the following two lines:

  1. Line 35 - In the statement CC = CC, the CC on the right-hand side should be changed to the name of a preferred C++ compiler on your machine. With a gcc, for instance, the line should be changed to CC = gcc.

  2. Line 38 - In the statement FLAGS = -O3, the required optimization level should be set (for GNU gcc this is -O4, for SGI CC it is -O3). For no code optimization, use only OPTIMIZE = . For instance, for a maximal optimization with gcc, i.e. -O4, use OPTIMIZE = -O4. All modules are compiled at once since some compilers (as SGI CC) use intermodule optimization that does not allow them to compile each source file separately and link them together afterwards.

Run the following command line:

make all

After compiling the source codes, your directory should contain an executable file; run.

  1. COMMAND LINE PARAMETERS

There are three parameters that can be passed to the program:

-> input file name -h -> help on command line parameters -paramDescription -> print out the description of input file parameters

  1. EXAMPLE INPUT FILES

Example input programs are located in a sub-directory examples. In "logga" diretory, files with the names starting with input are input files and files starting with output are output files produced with the parameters specified in the corresponding input files. The "applications" directory include example applications used for in the end-to-end framework

  1. COMMENTS

This code is distributed for academic purposes with absolutely no warranty of any kind, either expressed or implied, to the extent permitted by applicable state law. We are not responsible for any damage resulting from its proper or improper use.

If you have any comments or identify any bugs, please contact the author at the following address (email is the preferred form of communication):

Mohamed Wahib HPC Programming Framework Research Team RIKEN Advanced Institute for Computational Science 7-1-26, Minatojima-minami-machi, Chuo-ku Kobe, Hyogo 650-0047 email: [email protected]

kff's People

Contributors

fbusabiaga avatar wahibium avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.