Giter Site home page Giter Site logo

Comments (20)

cnpetra avatar cnpetra commented on August 28, 2024

Thanks, will look into it once I find a Mac. Can you provide any info on LAPACK/BLAS u’re using? At first sight, it looks like the issue is related to that.

from hiop.

goxberry avatar goxberry commented on August 28, 2024

I think it's linking to the Accelerate framework right now. I'll try linking to a different LAPACK library.

from hiop.

goxberry avatar goxberry commented on August 28, 2024

I see BLAS calls in HiOp that look like, for instance, dgemv_, which assumes Fortran symbols are exported with trailing underscores. On my system, Fortran name-mangling uses leading and trailing underscores, so the BLAS library from reference LAPACK exports the symbol _dgemv_, and so does OpenBLAS:

snippet of nm libopenblas.a from OpenBLAS:

/Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/openblas-0.2.20-dxpaoysvipwyqcscj5n5j6t5cvqtldr2/lib/libopenblas.a(dgemv.o):
                   U ___assert_rtn
                   U ___stack_chk_fail
                   U ___stack_chk_guard
                   U _blas_memory_alloc
                   U _blas_memory_free
  0000000000000000 T _dgemv_
                   U _dgemv_n
                   U _dgemv_t
                   U _dscal_k
                   U _xerbla_
  0000000000000300 s l_dgemv_.gemv

snippet of nm libblas.a from reference LAPACK:

/Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/netlib-lapack-3.7.1-hxqwiepbfb3forawjin3gmj6rpb6cmxk/lib/libblas.a(dgemv.f.o):
 0000000000000780 s EH_frame1
 0000000000000000 T _dgemv_
                  U _lsame_
                  U _xerbla_
 00000000000006c3 s lC1
 00000000000006c4 s lC2
 00000000000006c5 s lC3
 00000000000006d0 s lC4
 00000000000006c6 s lC5

The compiler stack I've been using is Apple Clang 8.1 (from XCode 8.3) and gfortran 7.2.0 (from GCC 7.2.0). There are a bunch of ways around this issue, but none of them are quick:

  • on my side, I could force gfortran to export Fortran symbols without a leading underscore using -fno-leading-underscore, but that option means that other libraries will also expect LAPACK without leading underscores, so it might break other libraries

  • hypre uses C preprocessor macros to add leading and trailing underscores as needed, and then uses GNU Autotools to detect how Fortran symbols are exported

  • use CBLAS and LAPACKE, which is a standard part of LAPACK as of version 3.4.0

  • add a fake Fortran file and use CMake to detect the symbol mangling, which adds an unnecessary Fortran dependency to the project

For now, I'll stick to using HiOp on the clusters, because I know it works there. If I need to run HiOp on my laptop for some reason, I can consult with you and put together a patch, if you're interested.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

@junkudo has a fix for the fortran name mangling and will get in here soon...

from hiop.

junkudo avatar junkudo commented on August 28, 2024

I can fix the fortran name mangling this weekend. :)

from hiop.

jandrej avatar jandrej commented on August 28, 2024

I get segfaults as well using clang5 on linux when I try to run the examples. With GCC everything works fine.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

Julian, thanks for reporting. Apparently, hiop has all kind of issues with clang.

clang5 means it's coming with llvm5, or its clang version 5.0 ?

I only have clang v3.4.2 on my linux box.

clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-redhat-linux-gnu
Thread model: posix

from hiop.

jandrej avatar jandrej commented on August 28, 2024

It's clang version 5.0 from the llvm ubuntu repositories. Let me know if I can give further information which could help.

~$ clang -v
clang version 5.0.1-svn325091-1~exp1 (branches/release_50)
Target: x86_64-pc-linux-gnu
Thread model: posix

from hiop.

jandrej avatar jandrej commented on August 28, 2024

Just for your info. The disassembly states a UD2 instruction. This means clang recognized undefined behavior in the code. It's located in

bool hiopHessianLowRank::updateLogBarrierDiagonal(const hiopVector& Dx)

A run with -fsanitize=undefined returns

/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:135:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:135:16: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
/home/juan/repos/hiop/src/Optimization/hiopKKTLinSys.cpp:539:27: runtime error: execution reached the end of a value-returning function without returning a value

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

I see BLAS calls in HiOp that look like, for instance, dgemv_, which assumes Fortran symbols are exported with trailing underscores. On my system, Fortran name-mangling uses leading and trailing underscores, so the BLAS library from reference LAPACK exports the symbol _dgemv_, and so does OpenBLAS:

snippet of nm libopenblas.a from OpenBLAS:

/Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/openblas-0.2.20-dxpaoysvipwyqcscj5n5j6t5cvqtldr2/lib/libopenblas.a(dgemv.o):
                   U ___assert_rtn
                   U ___stack_chk_fail
                   U ___stack_chk_guard
                   U _blas_memory_alloc
                   U _blas_memory_free
  0000000000000000 T _dgemv_
                   U _dgemv_n
                   U _dgemv_t
                   U _dscal_k
                   U _xerbla_
  0000000000000300 s l_dgemv_.gemv

snippet of nm libblas.a from reference LAPACK:

/Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/netlib-lapack-3.7.1-hxqwiepbfb3forawjin3gmj6rpb6cmxk/lib/libblas.a(dgemv.f.o):
 0000000000000780 s EH_frame1
 0000000000000000 T _dgemv_
                  U _lsame_
                  U _xerbla_
 00000000000006c3 s lC1
 00000000000006c4 s lC2
 00000000000006c5 s lC3
 00000000000006d0 s lC4
 00000000000006c6 s lC5

The compiler stack I've been using is Apple Clang 8.1 (from XCode 8.3) and gfortran 7.2.0 (from GCC 7.2.0). There are a bunch of ways around this issue, but none of them are quick:

  • on my side, I could force gfortran to export Fortran symbols without a leading underscore using -fno-leading-underscore, but that option means that other libraries will also expect LAPACK without leading underscores, so it might break other libraries
  • hypre uses C preprocessor macros to add leading and trailing underscores as needed, and then uses GNU Autotools to detect how Fortran symbols are exported
  • use CBLAS and LAPACKE, which is a standard part of LAPACK as of version 3.4.0
  • add a fake Fortran file and use CMake to detect the symbol mangling, which adds an unnecessary Fortran dependency to the project

For now, I'll stick to using HiOp on the clusters, because I know it works there. If I need to run HiOp on my laptop for some reason, I can consult with you and put together a patch, if you're interested.

@goxberry : I've finally got my hands on a mac laptop and tested the solver. @junkudo 's fortran name mangling works like a champ. I've only fixed a couple of compilation warnings. Everything works fine. I've used clang + gfortran + blas from accelarate (thanks again for the instructions!) It would be awesome if you could give it try on your system.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

Just for your info. The disassembly states a UD2 instruction. This means clang recognized undefined behavior in the code. It's located in

bool hiopHessianLowRank::updateLogBarrierDiagonal(const hiopVector& Dx)

A run with -fsanitize=undefined returns

/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:135:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:135:16: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
/home/juan/repos/hiop/src/Optimization/hiopKKTLinSys.cpp:539:27: runtime error: execution reached the end of a value-returning function without returning a value

@jandrej : tried -fsanitized and could not replicate your errors. Probably because those problems were fixed in the meanwhile -- I did a lot of valgrinding on linux on the library under many use cases within mfem recently and and fixed a couple of uninitialized memory accesses. It would be awesome if you can check again and see what happens. thanks

from hiop.

jandrej avatar jandrej commented on August 28, 2024

My cmake command is

CC=clang cmake -DCMAKE_CXX_FLAGS="-fsanitize=undefined" ../

which still produces the runtime error when running ex1

/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:137:56: runtime error: null pointer passed as argument 1, which is declared to never be null
/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:137:56: runtime error: null pointer passed as argument 2, which is declared to never be null

using the latest version of the repo.

from hiop.

goxberry avatar goxberry commented on August 28, 2024

Thank you both for patching this issue! I’m currently on travel, and will take a look after I return, probably no later than Monday.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

My cmake command is

CC=clang cmake -DCMAKE_CXX_FLAGS="-fsanitize=undefined" ../

which still produces the runtime error when running ex1

/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:137:56: runtime error: null pointer passed as argument 1, which is declared to never be null
/home/juan/repos/hiop/src/LinAlg/hiopMatrix.cpp:137:56: runtime error: null pointer passed as argument 2, which is declared to never be null

using the latest version of the repo.

thanks! I did get runtime errors with your cmake command, one coming from hiopMatrix.cpp:137:56, though they were all related to allocating an array of size 0. It may be that we're using different versions of clang (?). In any case, I've addressed all the errors I was seeing. Could you please pull the master and see what you're getting?

from hiop.

jandrej avatar jandrej commented on August 28, 2024

The example nlpDenseCons_ex1 is not crashing but still throws the "runtime error" from clang sanitize.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

could not replicate on my redhat machine. I've used clang 3.4.2 though.

@goxberry Could you please see if you get any errors with fsanitize when you're testing the fix for this issue?

Use something like

rm -rf *; CC=clang CXX=clang++ cmake -DCMAKE_CXX_FLAGS="-fsanitize=nullability,undefined,integer,alignment" -DHIOP_USE_MPI=ON -DHIOP_DEEPCHECKS=ON -DCMAKE_BUILD_TYPE=DEBUG ..; make -j4

and run ./src/Drivers/nlpDenseCons_ex1.exe

from hiop.

goxberry avatar goxberry commented on August 28, 2024

@cnpetra HiOp compiles and runs successfully with and without the -fsanitize=... flags on macOS 10.12.6 with OpenBLAS 0.3.3, gfortran 8.2.0, and AppleClang 9.0.0 (based on LLVM clang 4.0). I can't reproduce the behavior @jandrej is seeing. Maybe I need to build with a later version of LLVM's clang?

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

yeah, it's strange we don't get the runtime error of @jandrej

looked at the code again, and, apparently passing null pointers to memcpy is not allowed even when the number of bytes to be copied is zero. I safequarded memcpy from null pointers in the latest commit.

@jandrej : is it too much to ask to try again? :)

from hiop.

jandrej avatar jandrej commented on August 28, 2024

The last commit seemed to clear the errors for clang! I don't see any warnings/errors anymore during the runs of the examples.

from hiop.

cnpetra avatar cnpetra commented on August 28, 2024

great! closing the issue.

from hiop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.