flame / blis Goto Github PK

BLAS-like Library Instantiation Software Framework

License: Other

Shell 1.01% C 89.29% Assembly 0.24% Makefile 2.26% Python 0.18% Fortran 5.22% MATLAB 1.19% Emacs Lisp 0.01% C++ 0.60%

blas blas-libraries blis high-performance high-performance-computing hpc linear-algebra linear-algebra-library matrix matrix-calculations matrix-functions matrix-library matrix-multiplication optimization

blis's People

Contributors

Stargazers

Watchers

Forkers

wellposed tlrmchlsmth jeffhammond jmhautbois wernsaar tkelman cfandy alvarovm ipolkovn timmyliu biotrump xianyi mindis rwl nwhitehead scibuilder lauriru matcheydj arm-hpc zhmz90 zkk995 jameslinus insertinterestingnamehere fmarrabal devinamatthews esauvage bluss volcacius elemental shawnless skbaum kputtur santanu-thangaraj siddhesh-singh shirleyyim lucien-ye timeu hominhquan magania frankyyyt krzysz00 chriszheng124 amd ilibx liuguoyou audynamo zhangyangang ashutoshvt rockfordwei mlzxy akkaze iotamudelta pysuxing cmxnono cartazio yujunfeng itcolossus geocug baufeng octwanna kali ararslan jjykh ivanredbread isuruf cfib fesun xyuan l1212s tonyskjellum cnjsdfcy explosion huguanglong cswater xjtueducation vishalbelsare ye-luo t0rrant zpmaths superfluffy rongpeng ermig1979 binebrank benjaminxiang jacobgorm luoyujun sg47 mronoffon hwz00 moaadil erolbicero sspeng kammitama5 stevesf9 kvaragan paklui yangquan07 reginaxtc horacehou rth

blis's Issues

4mh failures with icc

The following operations fail with the dunnington and sandybridge configurations (and probably haswell too, but not tested) using icc 16.0.1 and compiling with CVECOPTS = '-xSSE4.2' and '-xAVX' respectively:

cgemm4mh
chemm4mh
csymm4mh
csyrk4mh
csyr2k4mh
ctrmm34mh

multi thread crash

Hello,
It seems libblis is not thread safe, I have a gemm invocation

static inline void matmul(cv::Mat &c, cv::Mat &a, cv::Mat b)
{
        float   alphap = 1.0;
        //since beta is zero, we don't need to init c to zero
        float   betap = 0.0;

        cntx_t cntx;
        bli_gemm_cntx_init(&cntx);

        bli_sgemm(BLIS_NO_TRANSPOSE, BLIS_NO_TRANSPOSE, a.rows, b.cols, a.cols,
                &alphap,
                (float *)a.data, a.cols, 1,
                (float *)b.data, b.cols, 1,
                &betap,
                (float *)c.data, b.cols, 1, &cntx);
        bli_gemm_cntx_finalize(&cntx);
}

and we have several thread will invoke the matmul, then it will crash as follow, the parameter p get changed to 0, if I change the program to run only one thread to invoke the matmul, it will be all right.
#0 0x0000000000545abc in bli_spackm_6xk_ref (conja=BLIS_NO_CONJUGATE, n=25, kappa=0x7fffec000dc0, a=0x7fffb8091540, inca=25, lda=1, p=0x0, ldp=6)

at frame/1m/packm/ukernels/bli_packm_cxk_ref.c:414

#1 0x00000000004ee461 in bli_spackm_cxk (conja=BLIS_NO_CONJUGATE, panel_dim=6, panel_len=25, kappa=0x7fffec000dc0, a=0x7fffb8091540, inca=25, lda=1, p=0x0, ldp=6,

cntx=0x7fffd247c190) at frame/1m/packm/bli_packm_cxk.c:216

#2 0x00000000004c4806 in bli_spackm_struc_cxk (strucc=BLIS_GENERAL, diagoffc=0, diagc=BLIS_NONUNIT_DIAG, uploc=BLIS_DENSE, conjc=BLIS_NO_CONJUGATE,

schema=BLIS_PACKED_COL_PANELS, invdiag=0, m_panel=25, n_panel=6, m_panel_max=25, n_panel_max=6, kappa=0x7fffec000dc0, c=0x7fffb8091540, rs_c=1, cs_c=25, p=0x0, 
rs_p=6, cs_p=1, is_p=1, cntx=0x7fffd247c190) at frame/1m/packm/bli_packm_struc_cxk.c:255

#3 0x00000000004bbbe6 in bli_spackm_blk_var1 (strucc=BLIS_GENERAL, diagoffc=0, diagc=BLIS_NONUNIT_DIAG, uploc=BLIS_DENSE, transc=BLIS_NO_TRANSPOSE,

schema=BLIS_PACKED_COL_PANELS, invdiag=0, revifup=0, reviflo=0, m=25, n=1936, m_max=25, n_max=1938, kappa=0x7fffec000dc0, c=0x7fffb8091540, rs_c=1, cs_c=25, p=0x0, 
rs_p=6, cs_p=1, is_p=1, pd_p=6, ps_p=150, packm_ker=0x4c46f9 <bli_spackm_struc_cxk>, cntx=0x7fffd247c190, thread=0x7fffb8007600)
at frame/1m/packm/bli_packm_blk_var1.c:668

#4 0x00000000004bb133 in bli_packm_blk_var1 (c=0x7fffd2479e50, p=0x7fffd24798e0, cntx=0x7fffd247c190, t=0x7fffb8007600) at frame/1m/packm/bli_packm_blk_var1.c:234
#5 0x00000000004aed11 in bli_packm_int (a=0x7fffd2479e50, p=0x7fffd24798e0, cntx=0x7fffd247c190, cntl=0x7fffec002c00, thread=0x7fffb8007600)

at frame/1m/packm/bli_packm_int.c:125

#6 0x00000000004b23ca in bli_gemm_blk_var1f (a=0x7fffd2479d80, b=0x7fffd2479e50, c=0x7fffd2479f20, cntx=0x7fffd247c190, cntl=0x7fffec002d00, thread=0x7fffb8007660)

at frame/3/gemm/bli_gemm_blk_var1f.c:79

#7 0x00000000004488b2 in bli_gemm_int (alpha=0x7c66a0 <BLIS_ONE>, a=0x7fffd247a160, b=0x7fffd247a230, beta=0x7c66a0 <BLIS_ONE>, c=0x7fffd247a090, cntx=0x7fffd247c190,

cntl=0x7fffec002d00, thread=0x7fffb8007660) at frame/3/gemm/bli_gemm_int.c:154

#8 0x00000000004b304b in bli_gemm_blk_var3f (a=0x7fffd247a530, b=0x7fffd247a600, c=0x7fffd247a6d0, cntx=0x7fffd247c190, cntl=0x7fffec002da0, thread=0x7fffb80c0ae0)

at frame/3/gemm/bli_gemm_blk_var3f.c:121

#9 0x00000000004488b2 in bli_gemm_int (alpha=0x7c66a0 <BLIS_ONE>, a=0x7fffd247a840, b=0x7fffd247a910, beta=0x7c66a0 <BLIS_ONE>, c=0x7fffd247a9e0, cntx=0x7fffd247c190,

cntl=0x7fffec002da0, thread=0x7fffb80c0ae0) at frame/3/gemm/bli_gemm_int.c:154

#10 0x00000000004b2b28 in bli_gemm_blk_var2f (a=0x7fffd247ace0, b=0x7fffd247adb0, c=0x7fffd247ae80, cntx=0x7fffd247c190, cntl=0x7fffec002e40, thread=0x7fffb80c0ca0)

at frame/3/gemm/bli_gemm_blk_var2f.c:123

#11 0x00000000004488b2 in bli_gemm_int (alpha=0x7fffd247bcf0, a=0x7fffd247b0c0, b=0x7fffd247b190, beta=0x7fffd247bf60, c=0x7fffd247b260, cntx=0x7fffd247c190,

cntl=0x7fffec002e40, thread=0x7fffb80c0ca0) at frame/3/gemm/bli_gemm_int.c:154

#12 0x0000000000423d60 in bli_level3_thread_decorator (n_threads=1, func=0x447f75 <bli_gemm_int>, alpha=0x7fffd247bcf0, a=0x7fffd247b0c0, b=0x7fffd247b190,

beta=0x7fffd247bf60, c=0x7fffd247b260, cntx=0x7fffd247c190, cntl=0x7fffec002e40, thread=0x7fffb80070c0) at frame/base/bli_threading.c:92

#13 0x0000000000447f5a in bli_gemm_front (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190,

cntl=0x7fffec002e40) at frame/3/gemm/bli_gemm_front.c:86

#14 0x0000000000429be5 in bli_gemmnat (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)

at frame/ind/oapi/bli_l3_nat_oapi.c:80

#15 0x000000000049242b in bli_gemmind (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)

at frame/ind/oapi/bli_l3_ind_oapi.c:59

#16 0x000000000044701e in bli_gemm_ex (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)

at frame/3/bli_l3_oapi.c:74

#17 0x0000000000419e3c in bli_sgemm (transa=BLIS_NO_TRANSPOSE, transb=BLIS_NO_TRANSPOSE, m=1936, n=16, k=25, alpha=0x7fffd247c188, a=0x7fffb8091540, rs_a=25, cs_a=1,

b=0x7fffb8003790, rs_b=16, cs_b=1, beta=0x7fffd247c18c, c=0x7fffb8040e30, rs_c=16, cs_c=1, cntx=0x7fffd247c190) at frame/3/bli_l3_tapi.c:93

Sandybridge configuration incorrect for recent Mac OS X

Because of the fact that Apple chose to alias gcc to clang and g++ to clang++, the current Sandybridge configuration,

blis/config/sandybridge/make_defs.mk

Line 79 in ec25807

CC := gcc

, will fail since Clang does not (yet) support OpenMP.

packm breaks with 1x1 micro-kernels

packm implicitly assumes that the register blocksizes are both non-unit. While this is not a bad assumption in practice, it would be nice to lift this constraint so that the right thing happens even if MR or NR (or both) happen to be 1. The problem boils down to the definition of the bli_is_row_stored_f() and bli_is_col_stored_f() macros, which only look at the row and column strides [of the packed micro-panel]. Naturally, if both are unit, then a "row-stored" mx1 micro-panel is indistinguishable from a "column-stored" 1xn micro-panel.

How to cross compile for use on windows?

Hi, I would like to try blis. Is there a guide on how to cross compile on linux so that I can use blis on windows?

support LLVM OpenMP

OpenMP is not supported with Clang in the build system has been false [http://blog.llvm.org/2015/05/openmp-support_22.html](for almost a year now). While Mac does not appear to support it in the default toolchain, I've been using Homebrew's clang-omp just fine.

I think that the BLIS build system should test for OpenMP support "the old fashioned way" (i.e. like configure does) and use OpenMP if it is available.

Issue with the new blis library

I am running libflame benchmark test routine (test_libflame.x in test folder). the test gets aborted because FLA_Hemv_check() called from FLA_Hemv_external() reports "Detecting unequal object datatypes". Prior to this call the data is getting corrupted and culprit could be Trsm from the blis library.
Note: With older version of blis, the test suite works just fine.

Test Parameters are: single precision, row-major format, FLA_Chol_solve() corrupts the data due to the call to Trsm_external.

Support for building with CMake

Having a CMake build system available makes it a lot easier to build with a variety of compilers in a variety of environments. In particular, it makes it a lot easier to build things on Windows.
Is there interest in this?
I'm considering implementing this myself, though it'll probably take a while since I have several side projects going at the moment.

Sandybridge OpenMP Failure

Hi,

I compiled BLIS on my server(CentOS 7.1, gcc 4.8.5, 2*Xeon E5-2620v2), but it seems that BLIS can just use single thread. I used the command

./configure sandybridge
make -j
make install

to compile and install. I didn't change any file in config/sandybridge. I checked that

#define BLIS_ENABLE_OPENMP

is uncommented in bli_config.h, so it should be able to use OpenMP. Then I used the makefile in wiki/BuildSystem#linking-against-blis section and added -fopenmp flag to compile the test program. However the program is single-threaded. I also tried to use the script below to test the program but it still failed to use OpenMP

export OMP_NUM_THREADS=12
make -f BLIS-Makefile
./testBLIS.x

I tried to use
···
#define BLIS_ENABLE_PTHREAD
···
instead of OpenMP setting, but it still fail.

What should I do to use OpenMP?

Should cblas files be using same includes as Fortran blas compat files?

Compare:

blis/frame/compat/bla_gemm.c

Line 35 in 0b126de

#include "blis.h"

blis/frame/compat/cblas/src/cblas_dgemm.c

Lines 1 to 4 in 0b126de

    
           #include "bli_config.h" 
        
           #include "bli_system.h" 
        
           #include "bli_type_defs.h" 
        
           #include "bli_cblas.h"

BLIS test suite uses aligned ldims for column storage cases, but not row (or general) cases

Due to the current implementation of libblis_test_mobj_create(), tests with column-stored matrix operands cause those operands to be created with aligned leading dimensions. However, when row storage is tested, matrix operands are created with leading dimensions that are NOT aligned. This is because bli_obj_create() applies alignment to the default storage case (when "0, 0" is passed in for rs, cs), which currently is column storage. However, when explicit strides are passed in, such as is necessary in order to request row storage, alignment is not applied.

Proposed solution: Add a new parameter to input.general that controls globally whether the test suite will align its operands or not. Then, update libblis_test_mobj_create() so that it keys off of this parameter and then manually aligns the strides (using the SIMD alignment value), if needed, regardless of whether row, column, or general storage is being used, and then passes those strides into bli_obj_create().

configure does not obviously fail if non-existent configuration is used

For example, if one does configure knc instead of configure mic, the results is:

[dam879@stampede knc]$~/src/blis/configure -p `pwd`/install knc
configure: checking whether we need to update the version file.
configure: checking version file '/home1/02742/dam879/src/blis/version'.
configure: starting configuration of BLIS 0.2.0.
configure: manual configuration requested.
configure: configuring with 'knc' configuration sub-directory.
configure: using install prefix '/home1/02742/dam879/build/blis/knc/install'.
configure: debug symbols disabled.
configure: disabling verbose make output, enable with 'make V=1'.
configure: building BLIS as a static library.
configure: threading is disabled.
configure: the CBLAS compatibility layer is disabled.
configure: the BLAS compatibility layer is enabled.
configure: the internal integer size is automatically determined.
configure: the BLAS/CBLAS interface integer size is 32-bit.
configure: creating ./config.mk from /home1/02742/dam879/src/blis/build/config.mk.in
configure: creating ./bli_config.h from /home1/02742/dam879/src/blis/build/bli_config.h.in
configure: creating ./obj/knc
configure: creating ./obj/knc/config
configure: creating ./obj/knc/frame
configure: creating ./obj/knc/testsuite
configure: creating ./lib/knc
configure: mirroring /home1/02742/dam879/src/blis/config/knc to ./obj/knc/config
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
configure: mirroring /home1/02742/dam879/src/blis/frame to ./obj/knc/frame
configure: creating makefile fragment in /home1/02742/dam879/src/blis/config/knc
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
/home1/02742/dam879/src/blis/build/gen-make-frags/gen-make-frag.sh: line 230: /home1/02742/dam879/src/blis/config/knc/.fragment.mk: No such file or directory
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/0
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/0/copysc
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/kernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/packv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/scalv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/unpackv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1d
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1f
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1f/kernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/packm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/packm/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/scalm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/unpackm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/unpackm/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/gemv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/ger
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/hemv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/her
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/her2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/symv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/syr
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/syr2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/trmv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/trsv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/gemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/gemm/ind
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/hemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/her2k
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/herk
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/symm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/syr2k
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/syrk
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trmm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trmm3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trsm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base/check
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base/noopt
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/cntl
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas/f77_sub
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas/src
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/check
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/f2c
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/f2c/util
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/io
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ri
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ri3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/rih
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ro
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/rpi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/cntx
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/include
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/oapi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/tapi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels/gemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels/trsm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/util
configure: creating symbolic link to Makefile.
configure: creating symbolic link to common.mk.
configure: configured to build outside of source distribution.

Without looking carefully this seems to indicate a successful configuration. Failure in this case should be quick and obvious.

use generic paths for toolchain in POWER7

[jhammond@ftlogin2 git]$ cat 0002-generic-gcc-path-instead-of-something-at-IBM-Austin.patch 0003-generic-gcc-path-instead-of-something-at-IBM-Austin.patch 
From f02aca90c2c045c3ed7573ff5bb8a82b3e45938b Mon Sep 17 00:00:00 2001
From: Jeff Hammond <[email protected]>
Date: Mon, 31 Mar 2014 21:53:56 +0000
Subject: [PATCH 2/5] generic gcc path instead of something at IBM Austin

---
 kernels/power7/3/test/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernels/power7/3/test/Makefile b/kernels/power7/3/test/Makefile
index 356cde9d..15f27b81 100644
--- a/kernels/power7/3/test/Makefile
+++ b/kernels/power7/3/test/Makefile
@@ -1,5 +1,5 @@

-CC = /opt/at6.0/bin/powerpc64-linux-gcc
+CC = gcc
 TARGET_ARCH = -m64 -mvsx

 TGTS   = exp
-- 
1.9.1

From edd5efef2508cf3623d55dd47bb92159ebb2ee34 Mon Sep 17 00:00:00 2001
From: Jeff Hammond <[email protected]>
Date: Mon, 31 Mar 2014 21:54:15 +0000
Subject: [PATCH 3/5] generic gcc path instead of something at IBM Austin

---
 config/power7/make_defs.mk | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config/power7/make_defs.mk b/config/power7/make_defs.mk
index df3eb363..de3f05b5 100644
--- a/config/power7/make_defs.mk
+++ b/config/power7/make_defs.mk
@@ -76,7 +76,7 @@ GIT_LOG    := $(GIT) log --decorate
 #

 # --- Determine the C compiler and related flags ---
-CC             := /opt/at6.0/bin/powerpc64-linux-gcc
+CC             := gcc
 # Enable IEEE Standard 1003.1-2004 (POSIX.1d). 
 # NOTE: This is needed to enable posix_memalign().
 CPPROCFLAGS    := -D_POSIX_C_SOURCE=200112L
@@ -96,7 +96,7 @@ CFLAGS_KERNELS := $(CDBGFLAGS) $(CKOPTFLAGS) $(CVECFLAGS) $(CWARNFLAGS) $(CMISCF
 CFLAGS_NOOPT   := $(CDBGFLAGS)                            $(CWARNFLAGS) $(CMISCFLAGS) $(CPPROCFLAGS)

 # --- Determine the archiver and related flags ---
-AR             := /opt/at6.0/bin/powerpc64-linux-ar
+AR             := ar
 ARFLAGS        := cru

 # --- Determine the linker and related flags ---
-- 
1.9.1

uninitialized variable warnings in frame/util/norm1m/bli_norm1m_unb_var1.c

While perhaps innocuous, such compiler warnings are not ideal...

frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_znorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_cnorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_dnorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_snorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function

Test suite always exits with code 0

As testsuite always exits with code 0 (success), it is hard to use it as a unit test. The test suite should return non-zero if any of the tests failed.

Fused gemvs and trmvs?

BLIS often refers to fused level 1 BLAS-like operations, but I have not seen any fused level 2 operations (e.g., a single-sweep y := A x and u := A^T v). Is there a plan to support such kernels?

Assignment between restrict pointers "alpha" and "alpha_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.

XLC is the only compiler I've ever seen that gives these warnings, but I think they are formally legit. On the other hand, they may be false positives in practice.

[jhammond@vestalac1 blis]$ ./configure -p $HOME/BLIS/bgq bgq  && make -j32 && make install
configure: checking whether we need to update the version file.
configure: checking version file './version'.
configure: found '.git' directory; assuming git clone.
configure: executing git describe --tags.
configure: got back 0.1.1-14-gd531a24.
configure: truncating to 0.1.1-14.
configure: updating version file './version'.
configure: starting configuration of BLIS 0.1.1-14.
configure: configuring with 'bgq' configuration sub-directory.
configure: detected -p option; using install prefix '/home/jhammond/BLIS/bgq'.
configure: creating ./config.mk from ./build/config.mk.in
configure: creating ./obj/bgq
configure: creating ./obj/bgq/config
configure: creating ./obj/bgq/frame
configure: creating ./obj/bgq/testsuite
configure: creating ./lib/bgq
configure: mirroring ./config/bgq to ./obj/bgq/config
configure: mirroring ./frame to ./obj/bgq/frame
configure: creating makefile fragment in ./config/bgq
configure: creating makefile fragment in ./config/bgq/kernels
configure: creating makefile fragment in ./config/bgq/kernels/1
configure: creating makefile fragment in ./config/bgq/kernels/1f
configure: creating makefile fragment in ./config/bgq/kernels/3
configure: creating makefile fragment in ./frame
configure: creating makefile fragment in ./frame/0
configure: creating makefile fragment in ./frame/0/absqsc
configure: creating makefile fragment in ./frame/0/addsc
configure: creating makefile fragment in ./frame/0/copysc
configure: creating makefile fragment in ./frame/0/divsc
configure: creating makefile fragment in ./frame/0/getsc
configure: creating makefile fragment in ./frame/0/mulsc
configure: creating makefile fragment in ./frame/0/normfsc
configure: creating makefile fragment in ./frame/0/setsc
configure: creating makefile fragment in ./frame/0/sqrtsc
configure: creating makefile fragment in ./frame/0/subsc
configure: creating makefile fragment in ./frame/0/unzipsc
configure: creating makefile fragment in ./frame/0/zipsc
configure: creating makefile fragment in ./frame/1
configure: creating makefile fragment in ./frame/1/addv
configure: creating makefile fragment in ./frame/1/axpyv
configure: creating makefile fragment in ./frame/1/copyv
configure: creating makefile fragment in ./frame/1/dotv
configure: creating makefile fragment in ./frame/1/dotxv
configure: creating makefile fragment in ./frame/1/invertv
configure: creating makefile fragment in ./frame/1/packv
configure: creating makefile fragment in ./frame/1/scal2v
configure: creating makefile fragment in ./frame/1/scalv
configure: creating makefile fragment in ./frame/1/setv
configure: creating makefile fragment in ./frame/1/subv
configure: creating makefile fragment in ./frame/1/swapv
configure: creating makefile fragment in ./frame/1/unpackv
configure: creating makefile fragment in ./frame/1d
configure: creating makefile fragment in ./frame/1d/addd
configure: creating makefile fragment in ./frame/1d/axpyd
configure: creating makefile fragment in ./frame/1d/copyd
configure: creating makefile fragment in ./frame/1d/invertd
configure: creating makefile fragment in ./frame/1d/scal2d
configure: creating makefile fragment in ./frame/1d/scald
configure: creating makefile fragment in ./frame/1d/setd
configure: creating makefile fragment in ./frame/1d/subd
configure: creating makefile fragment in ./frame/1f
configure: creating makefile fragment in ./frame/1f/axpy2v
configure: creating makefile fragment in ./frame/1f/axpyf
configure: creating makefile fragment in ./frame/1f/dotaxpyv
configure: creating makefile fragment in ./frame/1f/dotxaxpyf
configure: creating makefile fragment in ./frame/1f/dotxf
configure: creating makefile fragment in ./frame/1m
configure: creating makefile fragment in ./frame/1m/addm
configure: creating makefile fragment in ./frame/1m/axpym
configure: creating makefile fragment in ./frame/1m/copym
configure: creating makefile fragment in ./frame/1m/packm
configure: creating makefile fragment in ./frame/1m/packm/ukernels
configure: creating makefile fragment in ./frame/1m/scal2m
configure: creating makefile fragment in ./frame/1m/scalm
configure: creating makefile fragment in ./frame/1m/setm
configure: creating makefile fragment in ./frame/1m/subm
configure: creating makefile fragment in ./frame/1m/unpackm
configure: creating makefile fragment in ./frame/1m/unpackm/ukernels
configure: creating makefile fragment in ./frame/2
configure: creating makefile fragment in ./frame/2/gemv
configure: creating makefile fragment in ./frame/2/ger
configure: creating makefile fragment in ./frame/2/hemv
configure: creating makefile fragment in ./frame/2/her
configure: creating makefile fragment in ./frame/2/her2
configure: creating makefile fragment in ./frame/2/symv
configure: creating makefile fragment in ./frame/2/syr
configure: creating makefile fragment in ./frame/2/syr2
configure: creating makefile fragment in ./frame/2/trmv
configure: creating makefile fragment in ./frame/2/trsv
configure: creating makefile fragment in ./frame/3
configure: creating makefile fragment in ./frame/3/gemm
configure: creating makefile fragment in ./frame/3/gemm/3m
configure: creating makefile fragment in ./frame/3/gemm/3m/ukernels
configure: creating makefile fragment in ./frame/3/gemm/4m
configure: creating makefile fragment in ./frame/3/gemm/4m/ukernels
configure: creating makefile fragment in ./frame/3/gemm/ukernels
configure: creating makefile fragment in ./frame/3/hemm
configure: creating makefile fragment in ./frame/3/hemm/3m
configure: creating makefile fragment in ./frame/3/hemm/4m
configure: creating makefile fragment in ./frame/3/her2k
configure: creating makefile fragment in ./frame/3/her2k/3m
configure: creating makefile fragment in ./frame/3/her2k/4m
configure: creating makefile fragment in ./frame/3/herk
configure: creating makefile fragment in ./frame/3/herk/3m
configure: creating makefile fragment in ./frame/3/herk/4m
configure: creating makefile fragment in ./frame/3/symm
configure: creating makefile fragment in ./frame/3/symm/3m
configure: creating makefile fragment in ./frame/3/symm/4m
configure: creating makefile fragment in ./frame/3/syr2k
configure: creating makefile fragment in ./frame/3/syr2k/3m
configure: creating makefile fragment in ./frame/3/syr2k/4m
configure: creating makefile fragment in ./frame/3/syrk
configure: creating makefile fragment in ./frame/3/syrk/3m
configure: creating makefile fragment in ./frame/3/syrk/4m
configure: creating makefile fragment in ./frame/3/trmm
configure: creating makefile fragment in ./frame/3/trmm/3m
configure: creating makefile fragment in ./frame/3/trmm/4m
configure: creating makefile fragment in ./frame/3/trmm3
configure: creating makefile fragment in ./frame/3/trmm3/3m
configure: creating makefile fragment in ./frame/3/trmm3/4m
configure: creating makefile fragment in ./frame/3/trsm
configure: creating makefile fragment in ./frame/3/trsm/3m
configure: creating makefile fragment in ./frame/3/trsm/3m/ukernels
configure: creating makefile fragment in ./frame/3/trsm/4m
configure: creating makefile fragment in ./frame/3/trsm/4m/ukernels
configure: creating makefile fragment in ./frame/3/trsm/ukernels
configure: creating makefile fragment in ./frame/base
configure: creating makefile fragment in ./frame/base/check
configure: creating makefile fragment in ./frame/base/noopt
configure: creating makefile fragment in ./frame/cntl
configure: creating makefile fragment in ./frame/compat
configure: creating makefile fragment in ./frame/compat/check
configure: creating makefile fragment in ./frame/compat/f2c
configure: creating makefile fragment in ./frame/compat/f2c/util
configure: creating makefile fragment in ./frame/include
configure: creating makefile fragment in ./frame/include/level0
configure: creating makefile fragment in ./frame/include/level0/ri
configure: creating makefile fragment in ./frame/include/level0/ri3
configure: creating makefile fragment in ./frame/util
configure: creating makefile fragment in ./frame/util/amaxv
configure: creating makefile fragment in ./frame/util/asumv
configure: creating makefile fragment in ./frame/util/mkherm
configure: creating makefile fragment in ./frame/util/mksymm
configure: creating makefile fragment in ./frame/util/mktrim
configure: creating makefile fragment in ./frame/util/norm1m
configure: creating makefile fragment in ./frame/util/norm1v
configure: creating makefile fragment in ./frame/util/normfm
configure: creating makefile fragment in ./frame/util/normfv
configure: creating makefile fragment in ./frame/util/normim
configure: creating makefile fragment in ./frame/util/normiv
configure: creating makefile fragment in ./frame/util/printm
configure: creating makefile fragment in ./frame/util/printv
configure: creating makefile fragment in ./frame/util/randm
configure: creating makefile fragment in ./frame/util/randv
configure: creating makefile fragment in ./frame/util/sumsqv
configure: configured to build within top-level directory of source distribution.
Compiling config/bgq/kernels/1/bli_axpyv_opt_var1.c (NOTE: using flags for kernels)
Compiling frame/0/unzipsc/bli_unzipsc.c
Compiling frame/0/unzipsc/bli_unzipsc_check.c
Compiling frame/0/unzipsc/bli_unzipsc_unb_var1.c
Compiling frame/0/zipsc/bli_zipsc.c
Compiling frame/0/zipsc/bli_zipsc_check.c
Compiling frame/0/zipsc/bli_zipsc_unb_var1.c
Compiling frame/1/addv/bli_addv.c
Compiling frame/1/addv/bli_addv_check.c
Compiling frame/1/addv/bli_addv_kernel.c
Compiling frame/1/addv/bli_addv_ref.c
Compiling frame/1/axpyv/bli_axpyv.c
Compiling frame/1/axpyv/bli_axpyv_check.c
Compiling frame/1/axpyv/bli_axpyv_kernel.c
Compiling frame/1/axpyv/bli_axpyv_ref.c
Compiling frame/1/copyv/bli_copyv.c
Compiling frame/1/copyv/bli_copyv_check.c
Compiling frame/1/copyv/bli_copyv_kernel.c
Compiling frame/1/copyv/bli_copyv_ref.c
Compiling frame/1/dotv/bli_dotv.c
Compiling frame/1/dotv/bli_dotv_check.c
Compiling frame/1/dotv/bli_dotv_kernel.c
Compiling frame/1/dotv/bli_dotv_ref.c
Compiling frame/1/dotxv/bli_dotxv.c
Compiling frame/1/dotxv/bli_dotxv_check.c
Compiling frame/1/dotxv/bli_dotxv_kernel.c
Compiling frame/1/dotxv/bli_dotxv_ref.c
Compiling frame/1/invertv/bli_invertv.c
Compiling frame/1/invertv/bli_invertv_check.c
Compiling frame/1/invertv/bli_invertv_kernel.c
Compiling frame/1/invertv/bli_invertv_ref.c
Compiling frame/1/packv/bli_packv.c
Compiling frame/1/packv/bli_packv_check.c
Compiling frame/1/packv/bli_packv_cntl.c
Compiling frame/1/packv/bli_packv_init.c
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 45.33: 1506-1418 (E) Assignment between restrict pointers "alpha" and "alpha_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 46.29: 1506-1418 (E) Assignment between restrict pointers "x" and "x_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 47.29: 1506-1418 (E) Assignment between restrict pointers "y" and "y_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 68.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
make: *** [obj/bgq/config/kernels/1/bli_axpyv_opt_var1.o] Error 1
make: *** Waiting for unfinished jobs....

Add configure option to enable profiling at compile time

For GNU gcc, this takes the form of the -pg option. Should be disabled by default.

Intel Skylake is not autodetected.

On my i5-6500, the configuration is auto-detected as 'reference'. Since there is no skylake ukernel, the config should be set to haswell.

problem using only cblas interface

If I only want to use the cblas interface so I only include cblas.h, I get the error: ‘f77_int’ was not declared.

Kernel symlinks cause build failure in msys2

Another one for you, following up on #9

When I try to build the Sandy Bridge configuration on Windows in MSYS2 with MinGW compiler (on an i7-2630QM), I get a failure to link the test executable:

Archiving lib/sandybridge/libblis.a
Linking test_libblis.x against './lib/sandybridge/libblis.a -lm'
./lib/sandybridge/libblis.a(bli_gemm_cntl.o):bli_gemm_cntl.c:(.text+0x1bc): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm_ukernel.o):bli_gemm_ukernel.c:(.text+0x11): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemmtrsm_l_ukr_ref.o):bli_gemmtrsm_l_ukr_ref.c:(.text+0x10d): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemmtrsm_u_ukr_ref.o):bli_gemmtrsm_u_ukr_ref.c:(.text+0x10d): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o):bli_gemm4m_ukr_ref.c:(.text+0xe94): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o):bli_gemm4m_ukr_ref.c:(.text+0xedb): more undefined references to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1' follow
d:/code/mingw-builds/x64-4.8.1-win32-seh-rev5/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.8.1/../../../../x86_64-w64-mingw32/bin/ld.exe: ./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o): bad reloc address 0x0 in section `.pdata'
collect2.exe: error: ld returned 1 exit status
Makefile:531: recipe for target 'test_libblis.x' failed
make: *** [test_libblis.x] Error 1

If I try in Cygwin, setting CC := x86_64-w64-mingw32-gcc and AR := x86_64-w64-mingw32-ar in config/sandybridge/make_defs.mk to use the MinGW cross-compiler, then the executable links correctly but segfaults when running the tests. The backtrace is more interesting if I set BLIS_SIMD_ALIGN_SIZE to 1 in config/sandybridge/config.h, since my patch in #9 didn't completely fix the alignment problems. Backtrace with alignment=1 (also uncommented CDBGFLAGS := -g to get debug info) posted here https://gist.github.com/tkelman/25d290b131c0a1205b27. Everything passes until blis_dgemm_nn_ccc. The same bli_dgemm_opt_8x4_ref_u4_nodupl_avx1 that was an undefined reference in MSYS2 is causing the segfault in Cygwin-to-MinGW cross-compile.

Undefined references to ``GOMP_parallel``

When I try to test for BLIS's support for dgemm_, I see the following link error, which does not seem to be covered by your build system wiki:

"""
/usr/bin/gcc-4.9 -DCHECK_FUNCTION_EXISTS=dgemm_ CMakeFiles/cmTryCompileExec3961259244.dir/CheckFunctionExists.c.o -o cmTryCompileExec3961259244 -rdynamic /home/poulson/Install/lib/libblis.a -lpthread -lm
/home/poulson/Install/lib/libblis.a(bli_init.o): In function bli_init': bli_init.c:(.text+0xb4): undefined reference toGOMP_critical_name_start'
bli_init.c:(.text+0xc6): undefined reference to GOMP_critical_name_end' /home/poulson/Install/lib/libblis.a(bli_init.o): In functionbli_finalize':
bli_init.c:(.text+0x1d0): undefined reference to GOMP_critical_name_start' bli_init.c:(.text+0x1e2): undefined reference toGOMP_critical_name_end'
/home/poulson/Install/lib/libblis.a(bli_mem.o): In function bli_mem_acquire_m': bli_mem.c:(.text+0x77): undefined reference toGOMP_critical_name_start'
bli_mem.c:(.text+0x8f): undefined reference to GOMP_critical_name_end' /home/poulson/Install/lib/libblis.a(bli_mem.o): In functionbli_mem_release':
bli_mem.c:(.text+0xfa): undefined reference to GOMP_critical_name_start' bli_mem.c:(.text+0x112): undefined reference toGOMP_critical_name_end'
/home/poulson/Install/lib/libblis.a(bli_threading_omp.o): In function bli_level3_thread_decorator._omp_fn.0': bli_threading_omp.c:(.text+0x5): undefined reference toomp_get_thread_num'
/home/poulson/Install/lib/libblis.a(bli_threading_omp.o): In function bli_level3_thread_decorator': bli_threading_omp.c:(.text+0x99): undefined reference toGOMP_parallel'
collect2: error: ld returned 1 exit status
"""

pthreads does not compile on gcc/linux

Commands:

 ./configure -t pthreads sandybridge
make

First error is:

In file included from ./frame/base/bli_threading.h:88:0,
                 from ./frame/include/blis.h:73,
                 from config/sandybridge/kernels/3/bli_gemm_asm_d8x4.c:38:
./frame/base/bli_threading_pthreads.h:43:13: error: conflicting types for ‘pthread_barrierattr_t’
 typedef int pthread_barrierattr_t;
             ^
In file included from /usr/include/pthread.h:26:0,
                 from ./frame/base/bli_threading_pthreads.h:40,
                 from ./frame/base/bli_threading.h:88,
                 from ./frame/include/blis.h:73,
                 from config/sandybridge/kernels/3/bli_gemm_asm_d8x4.c:38:
/usr/include/x86_64-linux-gnu/bits/pthreadtypes.h:249:3: note: previous declaration of ‘pthread_barrierattr_t’ was here
 } pthread_barrierattr_t;
   ^

versions:

gcc version 5.3.1 20160409 (Debian 5.3.1-14)
blis dd62080

Issue with pthreads and memory broker abstraction

When compiling with pthreads now, I get a bunch of warnings and errors like the below. If I back up to the commit ce59f81, these go away.

In file included from ./frame/thread/bli_mutex.h:43:0,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
frame/base/bli_membrk.c: In function ‘bli_membrk_init’:
./frame/base/bli_membrk.h:56:2: warning: passing argument 1 of ‘pthread_mutex_init’ from incompatible pointer type
( &( (membrk_p)->mutex ) )
^
./frame/thread/bli_mutex_pthreads.h:53:22: note: in definition of macro ‘bli_mutex_init’
pthread_mutex_init( mtx_p );
^
frame/base/bli_membrk.c:44:18: note: in expansion of macro ‘bli_membrk_mutex’
bli_mutex_init( bli_membrk_mutex( membrk ) );
^
In file included from ./frame/thread/bli_mutex_pthreads.h:42:0,
from ./frame/thread/bli_mutex.h:43,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
/usr/include/pthread.h:723:12: note: expected ‘union pthread_mutex_t *’ but argument is of type ‘struct mtx_s *’
extern int pthread_mutex_init (pthread_mutex_t *__mutex,
^
In file included from ./frame/thread/bli_mutex.h:43:0,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
./frame/thread/bli_mutex_pthreads.h:53:2: error: too few arguments to function ‘pthread_mutex_init’
pthread_mutex_init( mtx_p );
^
frame/base/bli_membrk.c:44:2: note: in expansion of macro ‘bli_mutex_init’
bli_mutex_init( bli_membrk_mutex( membrk ) );
^
In file included from ./frame/thread/bli_mutex_pthreads.h:42:0,
from ./frame/thread/bli_mutex.h:43,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
/usr/include/pthread.h:723:12: note: declared here
extern int pthread_mutex_init (pthread_mutex_t *__mutex,
^

Problems about specifying compiler with configure

Hi, I'm trying to compile BLIS on my server, it has a Xeon E5-2620v2 core and I want to use my ICC 2016. I used the following command to configure:
./configure CC=icc sandybridge
But when I tried to make it, it shows that:
config/sandybridge/make_defs.mk:84: *** gcc is required for this configuration.. Stop.
It seems that the file in configure/sandybridge did not change.
Is there anything wrong in my command? Please help, much thanks!

bli_obj_create_with_attached_buffer mishandling empty matrices

The BLAS interface does not appear to work for dtrsm when one of the input matrices has a zero dimension. It appears to be the result of bli_trsm calling bli_obj_create_with_attached_buffer on the input matrices, which leads to a check that improperly aborts if the corresponding buffer is null, even if the matrix dimensions were zero. I would assume that this bug affects a large number of routines.

BLAS-like kernel for symmetric updates in LDL factorization without pivoting

There is a long history of proposals to include simple modifications of ?syrk and ?herk to support C += alpha A (D A)^T, where D is diagonal. This kernel turns out to be important for Interior Point Methods, which often make use of LDL factorizations (without pivoting) of modified saddle-point systems.

Does such a routine already exist in BLIS? If not, is there a good place to start for adding support? Or a preferred name/convention?

Broken link in README.md

The "new BLAS-like API" link in the README is empty and leads to a 404.

Should it instead point to https://github.com/flame/blis/wiki/BLISAPIQuickReference?

reg-BLAS.R test fails

being not a programmer I'd like to ask - what might the problem?

Type 'q()' to quit R.

PR#4582 %*% with NAs

stopifnot(is.na(NA %% 0), is.na(0 %% NA))

depended on the BLAS in use.

found from fallback test in slam 0.1-15

most likely indicates an inaedquate BLAS.

x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))
[,1] [,2] [,3]
[1,] NA NA 0
[2,] 2 1 0
stopifnot(identical(z, x %% t(y)))
Error: identical(z, x %% t(y)) is not TRUE

License/copyright headers in need of update

The license/copyright headers at the top of each source file need to be updated. The copyright year needs to be changed to "2016". (I never got around to updating it in 2015 and so it still reads "2014".)

Unfortunately, this change is going to touch virtually every file in the repository. If you have any objections, or this will disrupt your work, please speak up.

Constants in bli_const.c "undefined" on OSX

It seems that the linker on OSX is not able to find any of the constants defined in bli_const.c, such as BLIS_ONE etc. The problem, according to this page is that undefined constants end up as "common" symbols which are ignored by the OSX linker. The two available solutions seem to be:

Initialize the variables. Since these are obj_t's, default-initialization with ...={} should be OK (and they are initialized for real later).
Compile with -fno-common. This would have to be added to each configuration or in common.mk.

I am not sure when this problem first appeared, since I thought I had successfully compiled after the "big commit", but I am seeing it in my branch based off of cbcd0b7. Using gcc-5.3.0 from Homebrew instead of the system "gcc" may also be a contributing factor.

Configuring maximum number of threads at runtime

A user of Elemental has been running into strange performance issues when building on top of BLIS that seem to be due to the environment variable OMP_NUM_THREADS being set to one still leading to a large performance degradation when the number of independent uses of BLIS times the number of configured threads is larger than the number of cores on the machine.

While they have configured BLIS to use OpenMP with 16 threads, it is often preferred when using the library from within an MPI environment to be able to disable threading at runtime (or at least, decrease the number of threads).

What environment variables should be set to one to have the same effect as exporting OMP_NUM_THREADS=1 for other BLAS libraries? I would humbly suggest either adding support for OMP_NUM_THREADS or adding the full list of variables (including BLIS_IR_NT) to the wiki that need to be configured to have the same effect.

BLIS Test Failure in BlueGene/Q

Hi,

I compiled BLIS 0.1.8 on our BG/Q system without any modification, but I got the following failures when I
run the test. The complete test log is given here: https://goo.gl/vGXz6S . (libblis.a binary： https://goo.gl/8lQ999 )

Note: The version of our software stack is V1R2M0, job a submitted interactively through SLURM scheduler.

e.g.
...
blis_cgemm4mh_ct_ccc 200 200 200 0.582 1.28e-04 FAILURE
blis_cgemm4mh_ch_ccc 100 100 100 0.515 3.80e-04 FAILURE
blis_cgemm4mh_ch_ccc 200 200 200 0.582 1.30e-04 FAILURE
blis_cgemm4mh_tn_ccc 100 100 100 0.528 3.60e-04 FAILURE
...
blis_zgemm4mh_hc_ccc 200 200 200 2.446 1.40e-01 FAILURE
blis_zgemm4mh_hc_ccc 300 300 300 2.798 1.51e-01 FAILURE
blis_zgemm4mh_hc_ccc 400 400 400 3.141 1.72e-01 FAILURE
blis_zgemm4mh_ht_ccc 100 100 100 1.449 1.43e-01 FAILURE
blis_zgemm4mh_ht_ccc 200 200 200 2.308 1.56e-01 FAILURE
blis_zgemm4mh_ht_ccc 300 300 300 2.682 1.63e-01 FAILURE
...
blis_zsymm4mh_ruch_ccc 100 100 1.449 3.44e-01 FAILURE
blis_zsymm4mh_ruch_ccc 200 200 2.332 3.72e-01 FAILURE
blis_zsymm4mh_ruch_ccc 300 300 2.718 3.99e-01 FAILURE
blis_zsymm4mh_ruch_ccc 400 400 3.051 3.70e-01 FAILURE
blis_csyrk4mh_ln_cc 100 100 0.421 3.72e-04 FAILURE
blis_csyrk4mh_ln_cc 200 200 0.528 1.26e-04 FAILURE
blis_csyrk4mh_lc_cc 100 100 0.421 3.38e-04 FAILURE
blis_csyrk4mh_lc_cc 200 200 0.527 1.25e-04 FAILURE
blis_csyrk4mh_lt_cc 100 100 0.439 3.34e-04 FAILURE
...

Please advice.

Thanks!

Rgds,
Dominic Chien

Document level-0 operations in BLASAPIQuickReference wiki

To clarify, I'm referring to the level-0 object-based APIs in frame/0, not the level-0 scalar macros in frame/include/level0.

sandybridge: Does not compile with -dnoopt

The sandybridge configuration does not compile with -dnoopt, it requires avx.

The following would fix it:

COPTFLAGS      := -O0 -march=native

Sample error, using gcc (gcc version 5.3.1 20160316 (Debian 5.3.1-12))

config/sandybridge/kernels/3/bli_gemm_int_d8x4.c: In function ‘bli_dgemm_int_8x4’:
config/sandybridge/kernels/3/bli_gemm_int_d8x4.c:111:10: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
  va0_3b0 = _mm256_setzero_pd();
          ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
                 from config/sandybridge/kernels/3/bli_gemm_int_d8x4.c:36:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
 _mm256_load_pd (double const *__P)
 ^

``configure`` should default to the ``auto`` configuration

It seems strange for configure not to default to the auto configuration. It took a bit of digging for me to find that this was supported. Why not attempt the auto-detection instead of failing?

OMP problem with Intel compiler

Compiling frame/base/bli_threading_omp.c
frame/base/bli_threading_omp.c: In function 'bli_barrier':
frame/base/bli_threading_omp.c:88: error: expected end of line before 'capture'
frame/base/bli_threading_omp.c:89: error: invalid operator for '#pragma omp atomic' before '=' token
make: *** [obj/sandybridge/frame/base/bli_threading_omp.o] Error 1

I know that earlier Intel compilers did not support the full OMP 3.1 standard, but this is Intel 15.

Document / detect required compiler versions per configuration

Title was: Build failure on piledriver

BLIS commit a41e68e
OS: Ubuntu 12.04 (on Travis CI)
CPU: AMD Opteron 6376
Compiler: GCC 4.6.3

Short version of the error message is:

Compiling frame/0/setsc/bli_setsc_check.c
frame/0/getsc/bli_getsc_check.c:1:0: error: bad value (bdver2) for -march= switch

(repeated several times for other files in frame/0)

Full log is at https://travis-ci.org/tkelman/BLIS.jl/jobs/30217818

Fork safety

The combination of thread pools + fork is unfortunate: it tends to lead to random freezes. Unfortunately, the best way to achieve high-level parallelism in Python is to use fork (via the multiprocessing module).

Fundamentally the problem is that if you spawn a thread pool, and then fork, then the child ends up thinking that it has a thread pool, and dispatching work to it, but there aren't actually any threads running. This doesn't end well.

When using OMP for threading, dealing with this is the responsibility of the OMP implementation, and out-of-scope for BLIS itself. But when using pthreads, this should be handled by using pthread_atfork to register a pre-fork callback that shuts down the thread pool.

The equivalent issue in OpenBLAS was fixed in OpenMathLib/OpenBLAS#343, specifically with this code (which should probably actually be called openblas_install_fork_handler...):

+void openblas_fork_handler()
+{
+  // This handler shuts down the OpenBLAS-managed PTHREAD pool when OpenBLAS is
+  // built with "make USE_OPENMP=0".
+  // Hanging can still happen when OpenBLAS is built against the libgomp
+  // implementation of OpenMP. The problem is tracked at:
+  //   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035
+  // In the mean time build with USE_OPENMP=0 or link against another
+  // implementation of OpenMP.
+#ifndef OS_WINDOWS
+  int err;
+  err = pthread_atfork (BLASFUNC(blas_thread_shutdown), NULL, NULL);
+  if(err != 0)
+    openblas_warning(0, "OpenBLAS Warning ... cannot install fork handler. You may meet hang after fork.\n");
+#endif
+}

The attached test case can also probably be re-used with trivial tweaks to use the BLIS API instead of cblas: https://github.com/ogrisel/OpenBLAS/blob/49bd98f410369c9604031296f8ff47c5c20052bb/utest/test_fork.c

sandybridge segfault with 32-bit dim_t

The culprit seems to be the load of k in the micro-kernel which is explicitly a movq. Changing type of k_iter and k_next to [u?]int64_t should fix it.

add an AVX1 / Sandy Bridge DGEMM kernel

I may have a go exploring that if i find some time, but if someone else beats me to it, that'd be great too!

Shared library versions of BLIS?

Is there a simple means of modifying BLIS to build a shared library instead of a static library? It seems to be missing from the configure script.

Illegal Instruction in bli_init()

I've just set up my first blis program, but calling bli_init() causes the program to trigger a SIGILL. I'm using Ubuntu 14.04 GNU/Linux using the Sandymount configuration for BLIS. It also breaks here when I run the test_gemm_blis.x test in blis/test/ or the test suite with the given makefiles.

I've traced it down to inside of bli_const_init() using gdb.

bli_obj_create_const(2.0, &BLIS_TWO);

I've tried re-configuring & rebuilding the library several times but it doesn't seem to help.

BLIS should allow simultaneously exporting both 32- and 64-bit variants of BLAS/CBLAS

The de facto standard is that the standard BLAS/CBLAS functions take 32-bit integers in their API. Julia experimented with changing this so that they could use 64-bit integers in their main BLAS wrappers, and this worked great for a little while until they discovered that when people started trying to link in other existing BLAS-using code, this code was assuming that BLAS uses 32-bit integers and was causing segfaults. Their solution was to continue to use a 64-bit integer version of BLAS, but with symbols renamed to avoid collisions (so e.g. dgemm_ uses 32-bit integers, and dgemm_64_ uses 64-bit integers... [edited to get the 64-bit symbol names correct])

As mentioned in #37 (comment) , it would be great if a single BLIS library could export both 32- and 64-bit versions of these symbols simultaneously. It doesn't look like this would be too hard, since both the BLAS2BLIS interface is already generated using C preprocessor magic, and the CBLAS wrapper is already getting programmatically patched...

Edge cases for row-major ukernels not handled quite right

For gemm ukernels which prefer contiguous rows, the temporary buffer for edge cases (when m != mr or n != nr) is treated as column-major (i.e. rs_c = 1 and cs_c = mr). If the general-stride pathway in the ukernel is very slow then this might have a performance impact for small to medium size matrices.

The ukernel wrapper should treat the temporary buffer as row-major (rs_c = nr and cs_c = 1) in this case.

Incorrect result in axpys (theoretically)

The ???axpys family of macros could give the incorrect result for alpha and x complex and y real. For example, zzdaxpys calls daxpyris which computes yr += ar * xr instead of the correct result yr += ar * xr - ai * xi.

One way to fix this is to drop all of the scalar macros and just use C99 complex operators like normal people.

bli_gemm_8x8.h?

I attempted to build BLIS version 0.2.0 on bgq, but I found this error
"./config/bgq/bli_kernel.h", line 174.10: 1506-296 (S) #include file "bli_gemm_8x8.h" not found.

and I check the file "bli_kernel.h" for bgq, line 174 contains

include "bli_gemm_8x8.h"

but I cannot locate this header file anywhere in the package.

For my previous ticket "BLIS Test Failure in BlueGene/Q #34", is there any follow up? seems that BLIS still not working correctly for all complex test cases.
Is there any instruction to build LAPACK to work with BLIS?

Thanks!

Intermittent NaNs appearing in results from calling shared-library blis from Julia

CPU: Core2 Duo E8400 (old machine)
OS: Ubuntu 14.04, x86-64

Compiled BLIS reference configuration, setting BLIS_ENABLE_DYNAMIC_BUILD := yes. By itself, BLIS passes its own make test.

I'm calling into BLIS from Julia by the following steps:

git clone https://github.com/julialang/julia
cd julia
mkdir -p $PWD/usr/lib
cp /path/to/libblis.so $PWD/usr/lib
echo 'override USE_SYSTEM_BLAS = 1' > Make.user
echo 'override USE_BLAS64 = 0' >> Make.user
echo 'override LIBBLAS = -L$(build_libdir) -lblis -lm' >> Make.user
echo 'override LIBBLASNAME = libblis' >> Make.user
make -j8   # this will take quite a while - Julia has lots of big dependencies
make testall

This gives me a different failure each time I repeat make testall. Here are some examples, from the first couple of files in Julia's test suite (linalg1 and linalg2):
https://gist.github.com/47dbd5517c4a6f56fb2e
https://gist.github.com/51835795c8ada7c0f2a1
https://gist.github.com/552ec09e5e78d1cd3da7
https://gist.github.com/7cce0057bf9a3009a92a
https://gist.github.com/0f01364072634cc95be9
https://gist.github.com/9292fc3fe1a2e9afe09d

I'll see if I can translate a few of these test cases into C to figure out whether the problem is reproducible outside of Julia. I'll also try setting the BLIS integer size to 64-bit and delete the USE_BLAS64 = 0 line, see whether that changes anything.

More compilation error on BG/Q

I got another error when I attempt to compile the latest version (0.2) for BG/Q

(1) Undeclared identifier bli_daxpyf_fusefac
Compiling ../config/bgq/kernels/1f/bli_axpyf_opt_var1.c (NOTE: using flags for kernels)
"../config/bgq/kernels/1f/bli_axpyf_opt_var1.c", line 57.20: 1506-045 (S) Undeclared identifier bli_daxpyf_fusefac.
"../config/bgq/kernels/1f/bli_axpyf_opt_var1.c", line 64.39: 1506-098 (E) Missing argument(s).
make: *** [obj/bgq/config/kernels/1f/bli_axpyf_opt_var1.o] Error 1

in file ./config/bgq/kernels/1f/bli_axpyf_opt_var1.c, line 57

if ( b_n < PASTEMAC(d,axpyf_fusefac) || inca != 1 || incx != 1 || incy != 1 || bli_is_unaligned_to( a, 32 ) || bli_is_unaligned_to( y, 32 ) )
use_ref = TRUE;

I cannot find where "axpyf_fusefac" is, so I simply comment out this line to call the reference DAXPYF function

(2) More error in bli_gemm_int_8x8.c
But when I commented out the line 57, I got a bunch of error messages in bli_gemm_int_8x8.c:

Compiling ../config/bgq/kernels/3/bli_gemm_int_8x8.c (NOTE: using flags for kernels)
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 133.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 134.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
...
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 231.28: 1506-045 (S) Undeclared identifier c_z.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 262.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 263.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 264.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 265.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 267.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 268.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 302.19: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 303.19: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 304.18: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 305.18: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
...

Allow BLIS developer to specify arbitrary malloc()-style allocation functions

As part of the configuration, BLIS should allow the developer to specify the function to call for allocating memory for the following three use cases that occur in BLIS:

blocks that are allocated within the BLIS memory pools, which get used for packing buffers.
internally-invoked allocation for things such as control tree nodes.
user-invoked allocation, primarily exemplified by bli_obj_create() and friends.

The BLIS developer would specify, in bli_kernel.h, cpp macros to identify the names of the malloc()-style functions to use in any of the above cases. It would then be the developer's responsibility to ensure that the object code that defines the malloc() substitutes are available at link-time. The developer does not need to provide a prototype for the malloc() substitutes since we will require those functions to adhere to the same function signature as malloc(), and therefore BLIS can/will provide those prototypes on behalf of the developer, similar to what is done when a developer defines custom kernels/micro-kernels.

	#include "bli_config.h"
	#include "bli_system.h"
	#include "bli_type_defs.h"
	#include "bli_cblas.h"

flame / blis Goto Github PK

blis's People

Contributors

Stargazers

Watchers

Forkers

blis's Issues

PR#4582 %*% with NAs

depended on the BLAS in use.

found from fallback test in slam 0.1-15

most likely indicates an inaedquate BLAS.

include "bli_gemm_8x8.h"

Recommend Projects

Recommend Topics

Recommend Org