Giter Site home page Giter Site logo

platformawareprogramming / platformaware.jl Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 0.0 1.54 MB

Platform-aware programming in Julia

License: MIT License

Julia 100.00%
heterogeneous-computing heterogeneous-parallel-programming high-performance-computing parallel-computing parallel-programming

platformaware.jl's Introduction

PlatformAware.jl

TagBot CompatHelper

A package for improving the practice of platform-aware programming in Julia.

It helps HPC package developers write code for different versions of computationally intensive functions (kernels) according to different assumptions about the features of the execution platform.

What is platform-aware programming ?

We define platform-aware programming as the practice of coding computationally intensive functions, called kernels, using the most appropriate abstractions and programming interfaces, as well as performance tuning techniques, to take better advantage of the features of the target execution platform. This is a well-known practice in programming for HPC applications.

Platform-aware programming is especially suitable when the developer is interested in employing heterogeneous computing resources, such as accelerators (e.g., GPUs, FPGAs, and MICs), especially in conjunction with multicore and cluster computing.

For example, suppose a package developer is interested in providing a specialized kernel implementation for NVIDIA A100 Tensor Core GPUs, meeting the demand from users of a specific cloud provider offering virtual machines with accelerators of this model. The developer would like to use CUDA programming with this device's supported computing capability (8.0). However, other users may require support from other cloud providers that support different accelerator models, from different vendors (for example, AMD Instinct™ MI210 and Intel® Agilex™ F-Series FPGA and SoC FPGA). In this scenario, the developer will face the challenge of coding and deploying for multiple devices. This is a typical platform-aware programming scenario where PlatformAware.jl should be useful, which is becoming increasingly common as the use of heterogeneous computing platforms increases to accelerate AI and data analytics applications.

Target users

PlatformAware.jl is aimed primarily at package developers dealing with HPC concerns, especially using heterogenous computing resources. We assume that package users are only interested in using package operations without being concerned about how they are implemented.

Usage tutorial

We present a simple example that readers may reproduce to test PlatformAware.jl features.

Consider the problem of performing a convolution operation using a Fast Fourier Transform (FFT). To do this, the user can implement a fftconv function that uses a fft function offered by a user-defined package called MyFFT.jl, capable of performing the FFT on an accelerator (e.g., GPU) if it is present.

using MyFFT
fftconv(X,K) = fft(X) .* conj.(fft(K)) 

This tutorial shows how to create MyFFT.jl, demonstrating the basics of how to install PlatformAware.jl and how to use it to create a platform-aware package.

Creating the MyFFT.jl project

In the Julia REPL, as shown in the screenshot below, run ] generate MyFFT.jl to create a new project called MyFFT.jl, run 🔙cd("MyFFT.jl") to move to the directory of the created project, and ] activate . to enable the current project (MyFFT.jl) in the current Julia REPL session.

f1

These operations create a standard "hello world" project, with the contents of the following snapshot:

f2

Installing PlatformAware.jl

Before coding the platform-aware package, it is necessary to add PlatormAware.jl as a dependency of MyFFT.jl by running the following command in the Julia REPL:

] add PlatformAware

Now, load the PlatfomAware.jl package (using PlatformAware or import PlatformAware) and read the output message:

f3

Platform.toml is the platform description file, containing a set of key-value pairs, each describing a feature of the underlying platform. It must be created by the user running PlatformWare.setup(), which performs a sequence of feature detection operations on the platform.

Platform.toml is written in a human-editable format. Therefore, it can be modified by users to add undetected platform features or ignore detected features.

Sketching the MyFFT.jl code

In order to implement the fft kernel function, we edit the src/MyFFT.jl file. First, we sketch the code of the fft kernel methods:

module MyFFT

  import PlatformAware

  # setup platorm features (parameters)
  @platform feature clear
  @platform feature accelerator_count
  @platform feature accelerator_api

  # Fallback kernel
  @platform default fft(X) = ...

  # OpenCL kernel, to be called 
  @platform aware fft({accelerator_count::(@atleast 1), accelerator_api::(@api OpenCL)}, X) = ...

  # CUDA kernel
  @platform aware fft({accelerator_count::(@atleast 1), accelerator_api::(@api CUDA)},X) = ...

  export fft

end

The sequence of @platorm feature macro declarations specifies the set of platform parameters that will be used by subsequent kernel method declarations, that is, the assumptions that will be made to distinguish them. You can refer to this table for a list of all supported platform parameters. By default, they are all included. In the case of fft, the kernel methods are differentiated using only two parameters: accelerator_count and accelerator_api. They denote, respectively, assumptions about the number of accelerator devices and the native API they support.

The @platorm default macro declares the default kernel method, which will be called if none of the assumptions of other kernel methods declared using @platform aware macro calls are valid. The default kernel must be unique to avoid ambiguity.

Finally, the kernels for accelerators that support OpenCL and CUDA APIs are declared using the macro @platform aware. The list of platform parameters is declared just before the regular parameters, such as X, in braces. Their types denote assumptions. For example, @atleast 1 denotes a quantifier representing one or more units of a resource, while @api CUDA and @api OpenCL denote types of qualifiers that refer to the CUDA and OpenCL APIs.

The programmer must be careful not to declare kernel methods with overlapping assumptions in order to avoid ambiguities.

Other dependencies

Before adding the code for the kernels, add the code to load their dependencies. This can be done directly by adding the following code to the src/MyFFT.jl file, right after import PlatformAware:

import CUDA
import OpenCL
import CLFFT
import FFTW

Also, you should add CUDA.jl, OpenCL.jl, CLFFT.jl, and FFFT.jl as dependencies of MyFFT.jl. To do this, execute the following commands in the Julia REPL:

] add CUDA OpenCL CLFFT FFTW

NOTE: CLFFT.jl is not available on JuliaHub due to compatibility issues with recent versions of Julia. We're working with the CLFFT.jl maintainers to address this issue. If you have an error with the CLFFT dependency, point to our CLFFT.jl fork by running add https://github.com/JuliaGPU/CLFFT.jl#master.

As a performance optimization, we can take advantage of platform-aware features to selectively load dependencies, speeding up the loading of MyFFT.jl. To do this, we first declare a kernel function called which_api in src/MyFFT.jl, right after the @platform feature declaration:

@platform default which_api() = :fftw
@platform aware which_api({accelerator_api::(@api CUDA)}) = :cufft
@platform aware which_api({accelerator_api::(@api OpenCL)}) = :clfft

Next, we add the code for selective dependency loading:

api = which_api()
if (api == :cufft) 
    import CUDA
elseif (api == :clfft) 
    import OpenCL
    import CLFFT
else # api == :fftw 
    import FFTW
end

Full src/MyFFT.jl code

Finally, we present the complete code for src/MyFFT.jl, with the implementation of the kernel methods:

module MyFFT

    using PlatformAware

    @platform feature clear
    @platform feature accelerator_count
    @platform feature accelerator_api

    @platform default which_api() = :fftw
    @platform aware which_api({accelerator_count::(@atleast 1), accelerator_api::(@api CUDA)}) = :cufft
    @platform aware which_api({accelerator_count::(@atleast 1), accelerator_api::(@api OpenCL)}) = :clfft

    api = which_api()
    @info "seleted FFT API" api
    
    if (api == :cufft) 
        using CUDA; const cufft = CUDA.CUFFT
    elseif (api == :clfft) 
        using OpenCL
        using CLFFT; const clfft = CLFFT
    else # api == :fftw 
        using FFTW; const fftw = FFTW
    end

    # Fallback kernel
    @platform default fft(X) = fftw.fft(X)

    # OpenCL kernel
    @platform aware function fft({accelerator_count::(@atleast 1), accelerator_api::(@api OpenCL)}, X)
        T = eltype(X)
        _, ctx, queue = cl.create_compute_context()
        bufX = cl.Buffer(T, ctx, :copy, hostbuf=X)
        p = clfft.Plan(T, ctx, size(X))
        clfft.set_layout!(p, :interleaved, :interleaved)
        clfft.set_result!(p, :inplace)
        clfft.bake!(p, queue)
        clfft.enqueue_transform(p, :forward, [queue], bufX, nothing)
        reshape(cl.read(queue, bufX), size(X))
    end

    # CUDA kernel
    @platform aware fft({accelerator_count::(@atleast 1), accelerator_api::(@api CUDA)},X) = cufft.fft(X |> CuArray)

    export fft

end # module MyFFT

Running and testing the fft kernel methods

To test fft in a convolution, open a Julia REPL session in the MyFFT.jl directory and execute the following commands:

NOTE: If you receive an ambiguity error after executing fftconv, don't panic ! Read the next paragraphs.

 import Pkg; Pkg.activate(".")
 using MyFFT
 
 function fftconv(img,krn) 
   padkrn = zeros(size(img))
   copyto!(padkrn,CartesianIndices(krn),krn,CartesianIndices(krn))
   fft(img) .* conj.(fft(padkrn))  
 end
 
 img = rand(Float64,(20,20,20))  # image
 krn = rand(Float64,(4,4,4))     # kernel
 
 fftconv(img,krn) 

The fft kernel method that corresponds to the current Platform.toml will be selected. If Platform.toml was not created before, the default kernel method will be selected. The reader can consult the Platform.toml file to find out about the platform features detected by PlatformAware.setup(). The reader can also see the selected FFT API in the logging messages after using MyFFT.

By carefully modifying the Platform.toml file, the reader can test all kernel methods. For example, if an NVIDIA GPU was recognized by PlatformAware.setup(), the accelerator_api entry in Platform.toml will probably include the supported CUDA and OpenCL versions. For example, for an NVIDIA GeForce 940MX GPU, accelerator_api = "CUDA_5_0;OpenCL_3_0;unset;unset;OpenGL_4_6;Vulkan_1_3;DirectX_11_0". This may lead to an ambiguity error, as multiple dispatch will not be able to distinguish between the OpenCL and CUDA kernel methods based on the accelerator_api parameter alone. In this case, there are two alternatives:

  • To edit Platform.toml by setting CUDA or OpenCL platform type (e.g. CUDA_5_0 or OpenCL_3_0) to unset in the accelerator_api entry, making it possible to select manually the kernel method that will be selected;
  • To modify the CUDA kernel signature by including, for example, accelerator_manufacturer::NVIDIA in the list of platform parameters, so that NVIDIA GPUs will give preference to CUDA and OpenCL will be applied to accelerators of other vendors (recommended).

A general guideline

Therefore, we suggest the following general guideline for package developers who want to take advantage of PlatformWare.jl.

  1. Identify the kernel functions, that is, the functions with high computational requirements in your package, which are the natural candidates to exploit parallel computing, acceleration resources, or both.

  2. Provide a default (fallback) method for each kernel function, using the @platform default macro.

  3. Identify the target execution platforms to which you want to provide specialized methods for each kernel function. You can choose a set of execution platforms for all kernels, or you can select one or more platforms for each kernel independently. For helping your choice, look at the following information sources:

  4. For each platform you select, define a set of assumptions about its features that will guide your implementation decisions. In fact, it is possible to define different assumptions for the same platform, leading to multiple implementations of a kernel for the same platform. For example, you might decide to implement different parallel algorithms to solve a problem according to the number of nodes and the interconnection characteristics of a cluster.

  5. Provide platform-aware methods for each kernel function using the @platform aware macro.

  6. After implementing and testing all platform-aware methods, you have a list of platform parameters that were used to make assumptions about the target execution platform(s). You can optionally instruct the PlatformAware.jl to use only that parameters by using the @platform feature macro.

Contributing

Contributions are very welcome, as are feature requests and suggestions.

Please open an issue if you encounter any problems.

License

PlatformAware.jl is licensed under the MIT License

platformaware.jl's People

Contributors

decarvalhojunior-fh avatar juliohm avatar pedrosfa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

platformaware.jl's Issues

Extend/correct/update databases of processors and accelerators

  • Initially, there are databases for Intel (processors and accelerators), AMD (processors and accelerators), and NVIDIA (accelerators).
  • Contributors are invited to update databases in the following ways:
    • by inserting new accelerators and processors;
    • by providing missing information about accelerators and processors in the current databases;
    • by correcting wrong information about accelerators and processors in the current databases;
    • by inserting new databases, for different vendors.
  • In order to introduce new accelerators and processors, it is needed to create new platform types.

We are working on documentation to explain how to update databases and platform types. When this documentation is ready, we will post a comment on this issue. For now, talk to the main project maintainer for instructions.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Macro coding style

Interesting talk at JuliaCon 2022 !

from the commented code @ https://github.com/platform-aware-programming/PlatformAware.jl/blob/master/src/quantifiers/atmost.jl

abstract type AtMostInf end                     # 0

multiplier_super = "Inf"
magnitude_ = ""
for magnitude in reverse(['n', 'u', 'm', ' ', 'K', 'M', 'G', 'T', 'P', 'E'])
    for multiplier in reverse([1, 2, 4, 8, 16, 32, 64, 128, 256, 512])
        magnitude_super = multiplier == 512 ? magnitude_ : magnitude   
        code = "abstract type AtMost" * string(multiplier) * magnitude * " <: AtMost" * string(multiplier_super) * magnitude_super * " end"        
        eval(Meta.parse(code)); println(code)
        multiplier_super = multiplier
    end
    magnitude_ = magnitude
end

abstract type AtMost0 <: AtMost1n end

One can derive a proper impl in macro land

abstract type AtMostInf end                     # 0
let mul_super = "Inf" ,
    mag_ = "" ;
    for mag in reverse(['n', 'u', 'm', ' ', 'K', 'M', 'G', 'T', 'P', 'E'])
        for mul in reverse([1, 2, 4, 8, 16, 32, 64, 128, 256, 512])
            mag_super = mul==512 ? mag_ : mag
            nm1 = Symbol("AtMost" * string(mul) * mag)
            nm2 = Symbol("AtMost" * string(mul_super) * mag_super)
            @eval abstract type $nm1 <: $nm2 end
            mul_super = mul
        end
        mag_ = mag
    end
end
abstract type AtMost0 <: AtMost1n end

Proof : open a julia session, run the previous code block and the next one

using Test
@test AtMost16G |> supertype == AtMost32G
@test AtMost128u |> supertype == AtMost256u
@test AtMost4P |> supertype == AtMost8P
@test AtMost256M |> supertype == AtMost512M

Obviously, the same thing does apply for AtLeast

BTW you should not call parse in macro but return quote ("must not" should be more appropriated :) )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.