Giter Site home page Giter Site logo

Comments (30)

ViralBShah avatar ViralBShah commented on September 13, 2024 1

@jishnub The default algorithm seems to call syevd and not syev, based on the docs. But all the 3 algorithms are failing. :-(

from julia.

jishnub avatar jishnub commented on September 13, 2024 1

The segfault happens for all the algorithms

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

We just bumped openblas yesterday, so it is quite likely an upstream bug.

On my mac I'm not getting a segmentation fault, but it seems to keep running endlessly. We should report this upstream.

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

Works fine for size 63x63 for me and stops working 64x64 onwards.

from julia.

jishnub avatar jishnub commented on September 13, 2024

For the record, I am being able to call dsyev in C using OpenBLAS's master branch when compiling with gcc directly.
E.g., using the following code:

#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>

int main() {
    int n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(int row=0; row<n; row++){
        for(int col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    int lda = n;

    // Compute eigenvalues and eigenvectors
    LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);

    // Print eigenvalues
    printf("Eigenvalues:\n");
    for (int i = 0; i < n; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

Compiling with

gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -lopenblas

This runs correctly and produces results.

from julia.

jishnub avatar jishnub commented on September 13, 2024

I've noticed that this error occurs in julia when I set BLAS.set_num_threads(4) or higher. There is no segfault for 3 or fewer threads.

rr trace attached: https://julialang-dumps.s3.amazonaws.com/reports/2024-08-13T11-47-58-jishnub.tar.zst

from julia.

giordano avatar giordano commented on September 13, 2024

For the record, I am being able to call dsyev in C using OpenBLAS's master branch when compiling with gcc directly.

It'd be useful to make sure you can reproduce the segfault when linking to Julia's libopenblas build: if not, that test may not be useful.

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

Doesn't segfault when linking to our openblas with 1 or 4 threads. Used slightly modified driver below:

➜  lib git:(master) gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration
ld: warning: search path 'openblasinstallation/lib' not found
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
➜  lib git:(master) OPENBLAS_NUM_THREADS=4 DYLD_LIBRARY_PATH=. ./hermeig
Eigenvalues:
0.000008
0.022556
0.036368
0.050084
#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>

int main() {
    long n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(4);

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    long lda = n;

    // Compute eigenvalues and eigenvectors
    LAPACKE_dsyev64_(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);

    // Print eigenvalues
    printf("Eigenvalues:\n");
    for (int i = 0; i < n; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

from julia.

giordano avatar giordano commented on September 13, 2024

Then that doesn't seem to be a good reproducer 🙂

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

Or there could perhaps be a bug on the Julia side?

The LAPACKE interface seems to hide the lwork discovery.

from julia.

giordano avatar giordano commented on September 13, 2024

Or there could perhaps be a bug on the Julia side?

Perhaps, but I'd spend more time to make sure the C reproducer is 100% faithful to what we're doing on the Julia side. Also, if you want to build OpenBLAS locally, to reproduce the Yggdrasil build I'd suggest compiling OpenBLAS with something like

make USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=400 NO_AFFINITY=1 NO_STATIC=1 NUM_THREADS=512 INTERFACE64=1 SYMBOLSUFFIX=64_ DYNAMIC_ARCH=1 TARGET=GENERIC all

from julia.

giordano avatar giordano commented on September 13, 2024

Also, do we have a backtrace with gdb?

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

svd also fails. The failures are only on Hermitian matrices 64x64 and larger. Other factorizations seem to be working fine on Hermitians.

Of course svd failing may not be additional data, in that it is probably using the same internal LAPACK routines.

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

@giordano I could try get a gdb trace, but on M1 mac, it just goes into an infinite loop (should I ctrl-C?) - not a segfault. @jishnub Which platform are you seeing the segfault on?

from julia.

giordano avatar giordano commented on September 13, 2024

Also, do we have a backtrace with gdb?

julia> eigen(H);

Thread 9 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd23f9640 (LWP 3380628)]
0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
(gdb) where
#0  0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
#1  0x00007fffdda5f70f in dsyr2k_UN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
#2  0x00007fffddb99cd8 in exec_threads () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
#3  0x00007fffddb99efc in blas_thread_server () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
#4  0x00007ffff7e08ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007ffff7e9a850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

So, it looks to me the segmentation fault is in OpenBLAS. Certainly the issue could be in what Julia passes to OpenBLAS, but this again begs for a more faithful C reproducer.

from julia.

jishnub avatar jishnub commented on September 13, 2024

x86_64-linux-gnu

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

So, if I set BLAS num threads to 1, then no crash on the Hermitian 64x64.

julia> H =Hermitian(rand(64,64))

julia> BLAS.set_num_threads(4)

julia> eigen(H)
^C^C^C^C^CWARNING: Force throwing a SIGINT
^C^C^C^C^CERROR: ^C^C^C^C^C^CInterruptException:

julia> ^C

julia> BLAS.set_num_threads(1)

julia> eigen(H)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
64-element Vector{Float64}:
 -4.105758427499317
 -3.971213922931517

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

I believe a more faithful reproducer using the direct fortran API calling the workspace query, which does not crash:

➜  lib git:(vs/55471) ✗ gcc -g -o hermeig hermeig.c -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration && DYLD_LIBRARY_PATH=. ./hermeig
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
First 5 Eigenvalues:
-4.396484
-4.053237
-3.783926
-3.699819
-3.197730
#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(4);

    // Check if the memory has been successfully                                                                                                     
    // allocated by malloc or not                                                                                                                    
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues                                                                                                                
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix                                                                                                               
    long lda = n;

    // Allocate work                                                                                                                                 
    long lwork = -1;
    double *work = (double*) malloc(sizeof(double));
    long info = -100;

    char nchar = 'N';
    char uchar = 'U';

    // Compute eigenvalues and eigenvectors                                                                                                          
    dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);
    lwork = (long)work[0];
    work = (double*)realloc(work, lwork*sizeof(double));
    dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);

    // Print eigenvalues                                                                                                                             
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

from julia.

giordano avatar giordano commented on September 13, 2024

@jishnub can you try with https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz?

I get no crashes:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia>

This was compiled with GCC 12. As I mentioned in OpenMathLib/OpenBLAS#4868 (comment), the segmentation fault appears to go away depending on the version of GCC used (but I can't tell whether it's a bug in OpenBLAS which is hidden/surfaced by some compiler versions, or a genuine compiler error)

from julia.

jishnub avatar jishnub commented on September 13, 2024

Oddly, I seem to still face the issue using this:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia> BLAS.get_num_threads()
1

julia> BLAS.set_num_threads(4)

julia> eigen(H);

[88907] signal 11 (1): Segmentation fault

[88907] signal 11 (1): Segmentation fault
in expression starting at REPL[7]:1
Allocations: 2069146 (Pool: 2069051;Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
[1]    88907 segmentation fault (core dumped)  julia +nightly

from julia.

giordano avatar giordano commented on September 13, 2024

Aaah, right, when forwarding the blas library the number of threads is reset.

from julia.

giordano avatar giordano commented on September 13, 2024

which does not crash:

I think we need two cases:

  1. a code which linked to julia's build of openblas crashes in the same way as within julia
  2. the exact same code which doesn't crash when linked to a different build of openblas, but please provide all information of compiler used and build configuration

Any other combination of reproducers which doesn't involve the two above is probably not very useful to narrow down the issue, because it doesn't necessarily show much.

from julia.

giordano avatar giordano commented on September 13, 2024

@jishnub can you please test again https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz (same URL as before, but it's a new build, which includes OpenMathLib/OpenBLAS#4871)? It seems to work for me now:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> BLAS.set_num_threads(4)

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia>

from julia.

jishnub avatar jishnub commented on September 13, 2024

This works for me as well, thanks!

from julia.

giordano avatar giordano commented on September 13, 2024

A more minimal Julia reproducer is

using LinearAlgebra
LAPACK.syevd!('V', 'U', ones(1024, 1024));

We're entering this function

Base.@constprop :aggressive function syevd!(jobz::AbstractChar, uplo::AbstractChar, A::AbstractMatrix{$elty})
The lapack function being called is DSYEVD. Can someone please confirm this is a sensible program (NOTE: I updated the code compared the a previous version of this message):

#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 1024;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(64);

    // Check if the memory has been successfully                                                                                                     
    // allocated by malloc or not                                                                                                                    
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = 1.0;
        }
    }

    // Allocate space for eigenvalues                                                                                                                
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix                                                                                                               
    long lda = n;

    // Allocate work                                                                                                                                 
    double *work = (double*) malloc(sizeof(double));
    long lwork = -1;
    long *iwork = (long*) malloc(sizeof(long));
    long liwork = -1;
    long info = -100;

    char jobz = 'V';
    char uplo = 'U';

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Print eigenvalues                                                                                                                             
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

But this doesn't crash for me

$ gcc -o test test.c -L${HOME}/.julia/juliaup/julia-nightly/lib/julia -lopenblas64_ -Wl,-rpath,${HOME}/.julia/juliaup/julia-nightly/lib/julia
test.c: In function ‘main’:
test.c:9:5: warning: implicit declaration of function ‘openblas_set_num_threads64_’ [-Wimplicit-function-declaration]
     openblas_set_num_threads64_(64);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c:46:5: warning: implicit declaration of function ‘dsyevd_64_’ [-Wimplicit-function-declaration]
     dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
     ^~~~~~~~~~
$ ./test
First 5 Eigenvalues:
0.000000
0.000000
0.000000
0.000000
0.000000

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

You have to call dsyevd twice. First time for the workspace query. Then take that value and allocate the work array and then actually call it again. It should segfault in the second call where the computation will actually happen.

#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 1024;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(64);

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = 1.0;
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    long lda = n;

    // workspace query
    double *work = (double*) malloc(sizeof(double));
    long lwork = -1;
    long *iwork = (long*) malloc(sizeof(long));
    long liwork = -1;
    long info = -100;

    char jobz = 'V';
    char uplo = 'U';

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Workspace allocation
    lwork = work[0];
    work = (double *) malloc(lwork * sizeof(double));
    liwork = iwork[0];
    iwork = (long *) malloc(liwork * sizeof(long));

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Print eigenvalues
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    return 0;
}

from julia.

giordano avatar giordano commented on September 13, 2024

Thanks. Sadly, adding

    lwork = (long)work[0];
    work = (double*)realloc(work, lwork*sizeof(double));
    liwork = (long)iwork[0];
    iwork = (long*)realloc(iwork, liwork*sizeof(long));
    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1L, 1L);

after the first dsyevd_64_ call doesn't seem to change the outcome 😞 However it looks like OpenMathLib/OpenBLAS#4879 works for me on the machines where I was still observing a segmentation fault after the first patch. Hopefully we'll be good after that, but I'm still puzzled by the fact we can't prepare a C reproducer. I also tried to use dlopen/dlsym, to even more closely match what we do in Julia, to no avail.

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

The bug is real, but it seems that Julia is just better at triggering it than C - whether it is our allocator or LBT or something else.

from julia.

ViralBShah avatar ViralBShah commented on September 13, 2024

The C script does crash for me with the second call to dsyevd on mac.

from julia.

giordano avatar giordano commented on September 13, 2024

The C script does crash for me with the second call to dsyevd on mac.

Ah, interesting, I can confirm it segfaults for me as well on an M1 with OpenBLAS from Julia nightly, but not with OpenBLAS from Julia v1.10.4:

% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/repo/julia/usr/lib -Wl,-rpath,${HOME}/repo/julia/usr/lib -lopenblas64_
% ./test
zsh: segmentation fault  ./test
% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -Wl,-rpath,${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -lopenblas64_
% ./test
First 5 Eigenvalues:
-0.000000
-0.000000
-0.000000
-0.000000
-0.000000

However it never segfaults for me on an x86_64-linux-gnu machine where I could reproduce the segfault in Julia

from julia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.