Giter Site home page Giter Site logo

ugemm's Introduction

ugemm

public domain Simple, Minimalistic, Fast GEMM library

How to build on macOS

$ make

How to build on Linux

# cat /etc/yum.repos.d/rocm.repo 
[ROCm]
name=ROCm
#baseurl=http://repo.radeon.com/rocm/yum/2.2/
baseurl=http://repo.radeon.com/rocm/yum/4.0/
enabled=1
gpgcheck=0

# dnf install opencl-headers mesa-libOpenCL ocl-icd-devel
# dnf install rocm-clang-ocl rocm-opencl rocm-opencl-devel rocm-utils
$ gcc -O3 sgemm_ocl.c -o sgemm_ocl -lOpenCL -lm

$ make

How to use

$ ./sgemm_ocl1
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.032 seconds per run, 62.9 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl2
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.016 seconds per run, 122.3 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl3
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.018 seconds per run, 112.6 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl4
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.015 seconds per run, 131.8 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl6
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.012 seconds per run, 163.9 GFLOPS
0.000e+00/1.463e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.6711264766e+20 vs  -3.6711264766e+20 

$ ./sgemm-fast_ocl 
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.012 seconds per run, 162.1 GFLOPS
0.000e+00/1.463e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.6711264766e+20 vs  -3.6711264766e+20 

$ FORCE_CPU=1 ./sgemm_ocl
pthread-Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz (platform 0/2, device 0/1)
Maximum memory allocation size is 4294967296 bytes
>>> Done: took 0.108 seconds per run, 19.8 GFLOPS
0.000e+00/3.849e+21=0.000e+00. 0.000e+00 at [  0,  0]   2.3661284071e+18 vs   2.3661284071e+18 

$ ./sgemm_ocl -p 1
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 1/2, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.015 seconds per run, 146.7 GFLOPS
0.000e+00/3.849e+21=0.000e+00. 0.000e+00 at [  0,  0]   2.3661284071e+18 vs   2.3661284071e+18 

Reference

ugemm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ugemm's Issues

Doesn't build currently

I was planning to use your ugemm headers. To test I built your project but it fails. Any hints to fix would be appreciated.

[linux@linux ugemm]$ make
clang -Wfloat-conversion -Ofast -march=native -funroll-loops -finline-functions -ffp-contract=fast -mf16c -ftree-vectorize `pkg-config --libs --cflags OpenCL`  -lm -Wl,-s -Wl,--gc-sections  saxpy_ocl.c   -o saxpy_ocl
saxpy_ocl.c:166:25: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'float *' [-Wint-conversion]
        { 0, sizeof(float), 0, &ALPHA, 0 },
                               ^~~~~~
saxpy_ocl.c:167:46: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'float[1048576]' [-Wint-conversion]
        { CL_MEM_READ_ONLY,  sizeof(float)*SIZE, 0, X, OCL_INPUT },
                                                    ^
saxpy_ocl.c:167:49: error: incompatible integer to pointer conversion initializing 'cl_mem' (aka 'struct _cl_mem *') with an expression of type 'int' [-Wint-conversion]
        { CL_MEM_READ_ONLY,  sizeof(float)*SIZE, 0, X, OCL_INPUT },
                                                       ^~~~~~~~~
./ocl.h:28:19: note: expanded from macro 'OCL_INPUT'
#define OCL_INPUT       2
                        ^
saxpy_ocl.c:168:46: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'float[1048576]' [-Wint-conversion]
        { CL_MEM_READ_WRITE, sizeof(float)*SIZE, 0, Y, OCL_INPUT|OCL_OUTPUT },
                                                    ^
saxpy_ocl.c:168:49: error: incompatible integer to pointer conversion initializing 'cl_mem' (aka 'struct _cl_mem *') with an expression of type 'int' [-Wint-conversion]
        { CL_MEM_READ_WRITE, sizeof(float)*SIZE, 0, Y, OCL_INPUT|OCL_OUTPUT },
                                                       ^~~~~~~~~~~~~~~~~~~~
./ocl.h:28:19: note: expanded from macro 'OCL_INPUT'
#define OCL_INPUT       2
                        ^
saxpy_ocl.c:166:25: error: initializer element is not a compile-time constant
        { 0, sizeof(float), 0, &ALPHA, 0 },
                               ^~~~~~
saxpy_ocl.c:172:4: warning: incompatible pointer types initializing 'args_t *' with an expression of type 'char[13]' [-Wincompatible-pointer-types]
        { "XaxpyFastest", 0, 1,{SIZE/WPT},{WGS}, args },
          ^~~~~~~~~~~~~~
saxpy_ocl.c:172:23: error: incompatible integer to pointer conversion initializing 'cl_kernel' (aka 'struct _cl_kernel *') with an expression of type 'int' [-Wint-conversion]
        { "XaxpyFastest", 0, 1,{SIZE/WPT},{WGS}, args },
                             ^
saxpy_ocl.c:172:25: warning: braces around scalar initializer [-Wbraced-scalar-init]
        { "XaxpyFastest", 0, 1,{SIZE/WPT},{WGS}, args },
                               ^~~~~~~~~~
saxpy_ocl.c:172:43: error: incompatible pointer to integer conversion initializing 'size_t' (aka 'unsigned long') with an expression of type 'args_t[4]' [-Wint-conversion]
        { "XaxpyFastest", 0, 1,{SIZE/WPT},{WGS}, args },
                                                 ^~~~
2 warnings and 8 errors generated.
make: *** [<builtin>: saxpy_ocl] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.