Giter Site home page Giter Site logo

Comments (6)

v-dobrev avatar v-dobrev commented on September 21, 2024 1

This looks like memory corruption from somewhere else, not from NewtonSolver. Can you try building MFEM in debug mode (option MFEM_DEBUG=YES) and run your code? In debug mode, the library performs some additional checks which may catch the issue. If this build runs with the same error, try running the code under valgrind -- that can also help you find memory related issues.

from mfem.

lmolin3 avatar lmolin3 commented on September 21, 2024 1

You were right, the valgrind one was an issue with the MPI implementation. Apparently with the docker image I was working on had Intel MPI (causing troubles), while OpenMPI v4.1 works.
Thanks for the suppression file as well, it'll help spotting the error in the code.

from mfem.

jandrej avatar jandrej commented on September 21, 2024

https://godbolt.org/z/8a3T6rEfa

works as expected

from mfem.

lmolin3 avatar lmolin3 commented on September 21, 2024

I know the NewtonSolver is not the problem, I've used it before and didn't have any problems. I was wondering If you have any idea of where the problem may come from in this case. Thanks again!

from mfem.

lmolin3 avatar lmolin3 commented on September 21, 2024

@v-dobrev Appreciate your help! I was already working in debug mode. I've experienced issues in the past using valgrind on MFEM in debug mode, but installing valgrind.3.22.0 seemed to solve that issue.

Running it on one of the parallel examples (ex1p) gives signal SIGILL on MPI::Init(), am I doing anything wrong?

mpirun -np 2 valgrind --tool=memcheck --verbose --trace-children=yes --leak-check=full --show-reachable=yes --track-
origins=yes --log-file=/home/euler/develop/valgrind-logs/test.log ./ex1p
==99063==    at 0x65D7932: hwloc_list_special_objects (in /usr/local/lib/libhwloc.so.15.6.2)
==99063==    by 0x65D7F75: hwloc_connect_special_levels (in /usr/local/lib/libhwloc.so.15.6.2)
==99063==    by 0x65DDBA2: hwloc_topology_reconnect (in /usr/local/lib/libhwloc.so.15.6.2)
==99063==    by 0x65E2AA4: hwloc_topology_load (in /usr/local/lib/libhwloc.so.15.6.2)
==99063==    by 0x5123A1A: MPII_hwtopo_init (in /usr/local/lib/libmpi.so.12.3.0)
==99063==    by 0x50B5AB7: MPII_Init_thread (in /usr/local/lib/libmpi.so.12.3.0)
==99063==    by 0x50B577E: MPIR_Init_impl (in /usr/local/lib/libmpi.so.12.3.0)
==99063==    by 0x4EC23A8: internal_Init (in /usr/local/lib/libmpi.so.12.3.0)
==99063==    by 0x4EC2445: PMPI_Init (in /usr/local/lib/libmpi.so.12.3.0)
==99063==    by 0x1690EE: mfem::Mpi::Init_(int*, char***) (source/mfem/mesh/../general/communication.hpp:79)
==99063==    by 0x1675FF: mfem::Mpi::Init() (source/mfem/mesh/../general/communication.hpp:36)
==99063==    by 0x165E4D: main (source/mfem/examples/ex1p.cpp:72)

from mfem.

v-dobrev avatar v-dobrev commented on September 21, 2024

My suspicion is that the valgrind + MPI issue is not mfem related. Can you try with a simple "hello world"-like MPI program?

We do nightly tests with valgrind + MPI and they do not show any issues. We do need to use a suppression file for some MPI-related issues and older hypre versions. Here's what it looks like:

{
   MPI-related-write
   Memcheck:Param
   writev(vector[...])
   fun:writev
}

{
   MPI-related-syscall
   Memcheck:Param
   sched_setaffinity(mask)
   fun:syscall
}

{
   MPI_Init-memcpy
   Memcheck:Overlap
   fun:memcpy
   fun:pmgr_allgather
   fun:MPID_PSM_Init
   fun:MPID_Init
   fun:MPIR_Init
   fun:main
}

{
   OMPI_Init-param-getsockopt-optlen
   Memcheck:Param
   socketcall.getsockopt(optlen)
   ...
   fun:ompi_mpi_init
   fun:PMPI_Init*
}

{
   OMPI_Init-param-getsockopt-optlen_out
   Memcheck:Param
   socketcall.getsockopt(optlen_out)
   ...
   fun:ompi_mpi_init
   fun:PMPI_Init*
}

{
   OMPI_Init-param-setsockopt-optlen
   Memcheck:Param
   setsockopt(optlen)
   ...
   fun:ompi_mpi_init
   fun:PMPI_Init*
}

{
   OMPI_Init-leak
   Memcheck:Leak
   ...
   fun:ompi_mpi_init
}

{
   OMPI_Finalize-leak
   Memcheck:Leak
   ...
   fun:ompi_mpi_finalize
}

{
   hypre-BoomerAMG-GaussElimSetup
   Memcheck:Overlap
   fun:memcpy
   fun:ompi_ddt_copy_content_same_ddt
   fun:ompi_coll_tuned_allgather_intra_bruck
   fun:PMPI_Allgather
   fun:hypre_GaussElimSetup
   fun:hypre_BoomerAMGSetup
}

{
   hypre-BoomerAMG_GMExpandInterp
   Memcheck:Leak
   match-leak-kinds: definite
   fun:calloc
   fun:hypre_CAlloc
   fun:hypre_BoomerAMG_GMExpandInterp
   fun:hypre_BoomerAMGSetup
}

What MPI implementation are you using? I had some issues with OpenMPI v5.0.x but v4.1.x should work fine. I'm not sure about MPICH, or other implementations.

from mfem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.