gridap / gridapdistributed.jl Goto Github PK

View Code? Open in Web Editor NEW

96.0 9.0 14.0 4.6 MB

Parallel distributed-memory version of Gridap

License: MIT License

Julia 99.86% Shell 0.14%

parallel mpi parallel-computing distributed finite-elements partial-differential-equations

gridapdistributed.jl's Introduction

GridapDistributed

Parallel distributed-memory version of Gridap.jl.

Purpose

GridapDistributed.jl provides a fully-parallel distributed memory extension of the Gridap.jl library. It allows users to approximate PDEs on parallel computers, from multi-core CPU desktop computers to HPC clusters and supercomputers. The sub-package is designed to be as non-intrusive as possible. As a result, sequential Julia scripts written in the high level API of Gridap.jl can be used almost verbatim up to minor adjustments in a parallel context using GridapDistributed.jl.

At present, GridapDistributed.jl provides scalable parallel data structures for grid handling, finite element spaces setup, and distributed linear system assembly. For the latter part, i.e., global distributed sparse matrices and vectors, GridapDistributed.jl relies on PartitionedArrays.jl as distributed linear algebra backend.

Documentation

GridapDistributed.jl and Gridap.jl share almost the same high-level API. We refer to the documentation of Gridap.jl for more details about the API. In the example below, we show the minor differences among the APIs of Gridap.jl and GridapDistributed.jl. We also refer to the following tutorial and the GridapDistributed.jl documentation for additional examples and rationale.

Execution modes and how to execute the program in each mode

GridapDistributed.jl driver programs can be either run in debug execution mode (very useful for developing/debugging parallel programs, see test/sequential/ folder for examples) or in message-passing (MPI) execution mode (when you want to deploy the code in the actual parallel computer and perform a fast simulation, see test/mpi/ folder for examples). In any case, even if you do no have access to a parallel machine, you should be able to run in both modes in your local desktop/laptop.

A GridapDistributed.jl driver program written in debug execution mode as, e.g., the one available at test/sequential/PoissonTests.jl, is executed from the terminal just as any other Julia script:

julia test/sequential/PoissonTests.jl

On the other hand, a driver program written in MPI execution mode, such as the one shown in the snippet in the next section, involves an invocation of the mpiexecjl script (see below):

mpiexecjl -n 4 julia gridap_distributed_mpi_mode_example.jl

with the appropriate number of MPI tasks, -n 4 in this particular example.

Simple example (MPI-parallel execution mode)

The following Julia code snippet solves a 2D Poisson problem in parallel on the unit square. The example follows the MPI-parallel execution mode (note the with_mpi() function call) and thus it must be executed on 4 MPI tasks (note the mesh is partitioned into 4 parts) using the instructions below. If a user wants to use the debug execution mode, one just replaces with_mpi() by with_debug(). GridapDistributed.jl debug execution mode scripts are executed as any other julia sequential script.

using Gridap
using GridapDistributed
using PartitionedArrays
function main(ranks)
  domain = (0,1,0,1)
  mesh_partition = (2,2) 
  mesh_cells = (4,4)
  model = CartesianDiscreteModel(ranks,mesh_partition,domain,mesh_cells)
  order = 2
  u((x,y)) = (x+y)^order
  f(x) = -Δ(u,x)
  reffe = ReferenceFE(lagrangian,Float64,order)
  V = TestFESpace(model,reffe,dirichlet_tags="boundary")
  U = TrialFESpace(u,V)
  Ω = Triangulation(model)
  dΩ = Measure(Ω,2*order)
  a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
  l(v) = ∫( v*f )dΩ
  op = AffineFEOperator(a,l,U,V)
  uh = solve(op)
  writevtk(Ω,"results",cellfields=["uh"=>uh,"grad_uh"=>∇(uh)])
end
with_mpi() do distribute 
  ranks = distribute_with_mpi(LinearIndices((4,)))
  main(ranks)
end

The domain is discretized using the parallel Cartesian-like mesh generator built-in in GridapDistributed. The only minimal difference with respect to the sequential Gridap script is a call to the with_mpi function of PartitionedArrays.jl right at the beginning of the program. With this function, the programmer sets up the PartitionedArrays.jl communication backend (i.e., MPI in the example), specifies, and provides a function to be run on each part (using Julia do-block syntax in the example). The function body is equivalent to a sequential Gridap script, except for the CartesianDiscreteModel call, which in GridapDistributed also requires the ranks and mesh_partition arguments to this function.

Using parallel solvers

GridapDistributed.jl is not a library of parallel linear solvers. The linear solver kernel within GridapDistributed.jl, defined with the backslash operator \, is just a sparse LU solver applied to the global system gathered on a master task (not scalable, but very useful for testing and debug purposes).

We provide the full set of scalable linear and nonlinear solvers in the PETSc library in GridapPETSc.jl. For an example which combines GridapDistributed with GridapPETSc.jl, we refer to the following tutorial. Additional examples can be found in the test/ folder of GridapPETSc. Other linear solver libraries on top of GridapDistributed can be developed in the future.

Partitioned meshes

GridapDistributed.jl provides a built-in parallel generator of Cartesian-like meshes of arbitrary-dimensional, topologically n-cube domains.

Distributed unstructured meshes are generated using GridapGmsh.jl. We also refer to GridapP4est.jl, for peta-scale handling of meshes which can be decomposed as forest of quadtrees/octrees of the computational domain. Examples of distributed solvers that combine all these building blocks can be found in the following tutorial.

A more complex example (MPI-parallel execution mode)

In the following example, we combine GridapDistributed (for the parallel implementation of the PDE discretisation), GridapGmsh (for the distributed unstructured mesh), and GridapPETSc (for the linear solver step). The mesh file can be found here.

using Gridap
using GridapGmsh
using GridapPETSc
using GridapDistributed
using PartitionedArrays
function main(ranks)
  options = "-ksp_type cg -pc_type gamg -ksp_monitor"
  GridapPETSc.with(args=split(options)) do
    model = GmshDiscreteModel(ranks,"demo.msh")
    order = 1
    dirichlet_tags = ["boundary1","boundary2"]
    u_boundary1(x) = 0.0
    u_boundary2(x) = 1.0
    reffe = ReferenceFE(lagrangian,Float64,order)
    V = TestFESpace(model,reffe,dirichlet_tags=dirichlet_tags)
    U = TrialFESpace(V,[u_boundary1,u_boundary2])
    Ω = Interior(model)
    dΩ = Measure(Ω,2*order)
    a(u,v) = ∫( ∇(u)⋅∇(v) )dΩ
    l(v) = 0
    op = AffineFEOperator(a,l,U,V)
    solver = PETScLinearSolver()
    uh = solve(solver,op)
    writevtk(Ω,"demo",cellfields=["uh"=>uh])
  end
end
with_mpi() do distribute 
  ranks = distribute_with_mpi(LinearIndices((6,)))
  main(ranks)
end

Build

Before using GridapDistributed.jl package, one needs to build the MPI.jl package. We refer to the main documentation of this package for configuration instructions.

MPI-parallel Julia script execution instructions

In order to execute a MPI-parallel GridapDistributed.jl driver, we can leverage the mpiexecjl script provided by MPI.jl. (Click here for installation instructions). As an example, assuming that we are located on the root directory of GridapDistributed.jl, an hypothetical MPI-parallel GridapDistributed.jl driver named driver.jl can be executed on 4 MPI tasks as:

mpiexecjl --project=. -n 4 julia -J sys-image.so driver.jl

where -J sys-image.so is optional, but highly recommended in order to reduce JIT compilation times. Here, sys-image.so is assumed to be a Julia system image pre-generated for the driver at hand using the PackageCompiler.jl package. See the test/TestApp/compile folder for example scripts with system image generation along with a test application with source available at test/TestApp/. These scripts are triggered from .github/workflows/ci.yml file on Github CI actions.

Known issues

A warning when executing MPI-parallel drivers: Data race conditions in the generation of precompiled modules in cache. See here.

How to cite GridapDistributed

In order to give credit to the Gridap and GridapDistributed contributors, we simply ask you to cite the Gridap main project as indicated here and the sub-packages you use as indicated in the corresponding repositories. Please, use the reference below in any publication in which you have made use of GridapDistributed:

@article{Badia2022,
  doi = {10.21105/joss.04157},
  url = {https://doi.org/10.21105/joss.04157},
  year = {2022},
  publisher = {The Open Journal},
  volume = {7},
  number = {74},
  pages = {4157},
  author = {Santiago Badia and Alberto F. Martín and Francesc Verdugo},
  title = {GridapDistributed: a massively parallel finite element toolbox in Julia},
  journal = {Journal of Open Source Software}
}

Contributing to GridapDistributed

GridapDistributed is a collaborative project open to contributions. If you want to contribute, please take into account:

Before opening a PR with a significant contribution, contact the project administrators by opening an issue describing what you are willing to implement. Wait for feedback from other community members.
We adhere to the contribution and code-of-conduct instructions of the Gridap.jl project, available here and here, resp. Please, carefully read and follow the instructions in these files.
Open a PR with your contribution.

Want to help? We have issues waiting for help. You can start contributing to the GridapDistributed project by solving some of those issues.

gridapdistributed.jl's People

Stargazers

Watchers

Forkers

bplcn diliphridya oriolcg carlosal1015 carlodev mfkiwl pitmonticone sounakdas013 ranocha zhaomengfu tamaratambyah dlinhart14 pmartorell

gridapdistributed.jl's Issues

To cut down non-Julia dependencies of GridapDistributed

Hi @amartinhuertas !

at some point in the future we want to cut down the number of non-Julia dependencies of GridapDistributed. My idea is that all plugins, e.g., GridapEmbedded, will have GridapDistributed in its dependencies, instead of having a distributed version of each plugin, e.g. GriapEmbedded + GridapEmbeddedDistributed, which would be too much packages. And we don't want to force the installation of e.g. PETSc to users willing to use GridapEmbedded with a direct solver.

This can be easily done by moving non-julia dependencies to packages like GridapPETSc. A detail to be solved is how to use eg. GridapPETSc in the test of GridapDistributed without adding the former to the dependencies of the latter. This is possible in theory but I have never explored this...

PETSc.jl Ubuntu + DistributedStokesTests.jl issues

I have found a clear lack of robustness of DistributedStokesTests.jl, at least in the status corresponding to commit 94671c9 (code available here) when combined with the PETSc binaries provided by Ubuntu in my system, in particular:

libpetsc3.7-dev/bionic,now 3.7.7+dfsg1-2build5 amd64 [installed,automatic]
libpetsc3.7.7/bionic,now 3.7.7+dfsg1-2build5 amd64 [installed,automatic]
libpetsc3.7.7-dbg/bionic 3.7.7+dfsg1-2build5 amd64
libpetsc3.7.7-dev/bionic,now 3.7.7+dfsg1-2build5 amd64 [installed,automatic]

(I did not test with other systems, nor other versions of the Ubuntu Petsc package)

The error has the form of a SEGFAULT within the second invocation to the run method here. You may see the the error output below. Also worths mentioning that if I comment the two run() invokations in the middle, then the program stucks (deadlock) in the second run() invokation. When hiting Ctrl+c to abort the program, then I get the following strack trace (presumibly the point at which the program is stucked):

in expression starting at /home/amartin/git-repos/GridapDistributed.jl/test/DistributedStokesTests.jl:147
pthread_cond_signal at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
exec_blas_async at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libopenblas64_.so (unknown line)
exec_blas at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libopenblas64_.so (unknown line)
gemm_thread_m at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libopenblas64_.so (unknown line)
dtrsm_64_ at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libopenblas64_.so (unknown line)
umfdl_blas3_update at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libumfpack.so.5 (unknown line)
umfdl_kernel at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libumfpack.so.5 (unknown line)
umfpack_dl_numeric at /home/amartin/software_installers/julia-1.4.0/bin/../lib/julia/libumfpack.so.5 (unknown line)
umfpack_numeric! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/SuiteSparse/src/umfpack.jl:268
#lu#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/SuiteSparse/src/umfpack.jl:161
lu at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/SuiteSparse/src/umfpack.jl:155 [inlined]
numerical_setup at /home/amartin/git-repos/Gridap.jl/src/Algebra/LinearSolvers.jl:237 [inlined]
solve! at /home/amartin/git-repos/Gridap.jl/src/Algebra/LinearSolvers.jl:184 [inlined]
solve! at /home/amartin/git-repos/Gridap.jl/src/Algebra/NonlinearSolvers.jl:22
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
solve! at /home/amartin/git-repos/Gridap.jl/src/FESpaces/FESolvers.jl:103
solve! at /home/amartin/git-repos/Gridap.jl/src/FESpaces/FESolvers.jl:15
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
solve at /home/amartin/git-repos/Gridap.jl/src/FESpaces/FESolvers.jl:37
solve at /home/amartin/git-repos/Gridap.jl/src/FESpaces/FESolvers.jl:43 [inlined]
run at /home/amartin/git-repos/GridapDistributed.jl/test/DistributedStokesTests.jl:122
unknown function (ip: 0x7effd26b78ab)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
#23 at /home/amartin/git-repos/GridapDistributed.jl/test/DistributedStokesTests.jl:151
SequentialCommunicator at /home/amartin/git-repos/GridapDistributed.jl/src/SequentialCommunicators.jl:8
unknown function (ip: 0x7effd26b68b2)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2158 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_eval_module_expr at /buildworker/worker/package_linux64/build/src/toplevel.c:181
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:640
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:872
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:872
include at ./Base.jl:377
exec_options at ./client.jl:288
_start at ./client.jl:484
jfptr__start_30059 at /home/amartin/git-repos/Gridap.jl/compile/Gridapv0.12.0.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4015d4)
unknown function (ip: (nil))
Allocations: 72926458 (Pool: 72918467; Big: 7991); GC: 85

I tried to find the cause of the problem, but I did not suceed so far. It seems to be happening in the numerical solution stage within UMFPACK. The surprising thing is that neither with (1) my own compilation of PETSc 3.9.0 (that was not compiled with umfpack support) nor (2) with the PETSc binaries in Travis (that, as far as I know, are also compiled with UMFPACK support) the issue seems to be being reproduced. I have as a possible hypothesis (closer to speculation than to an actual fact) that UMFPACK compiled within PETSc might be somehow conflicting with UMFPACK within the Julia REPL, but I do not have any evidence in this direction.

1.3061638074924613e-30 < 1.0e-9
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.7.7, Sep, 25, 2017 
[0]PETSC ERROR: julia on a x86_64-linux-gnu-real named sistemas-ThinkPad-X1-Carbon-6th by amartin Fri Jul 17 18:41:22 2020
[0]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.7/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-vurd6G/petsc-3.7.7+dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-vurd6G/petsc-3.7.7+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-vurd6G/petsc-3.7.7+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-vurd6G/petsc-3.7.7+dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-vurd6G/petsc-3.7.7+dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -fPIC" MAKEFLAGS=w
[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Extend face labels in CartesianDiscreteModel to distributed case

Abort() must be triggered when an error condition (e.g., an Exception) is locally triggered on a process

In order to avoid MPI deadlocks we must automatically call MPI.Abort() when an error condition (e.g., a failing test or an exception) is locally triggered on a processor. Problem: how can this be fulfilled when an error condition is produced within an external dependency, such as, e.g., Gridap.jl? Perhaps using Julia built-in support for exceptions? (i.e., try catch statements).

This function belongs to an interface and could not be used - non linear solver with GridapDistributed.jl

hello I would like to ask if it is possible to use non-linear solvers with GridapDistributed.jl.

i'm able to solve models with AffineFEOperator but always return the error (stacktrace below) when trying to solve with NLSolver and FEOperator.

it runs without GridapDistributed.jl.

using Gridap
using PartitionedArrays
using GridapGmsh
using Gridap.Geometry
using GridapDistributed
using LineSearches: BackTracking
using Random
using GridapPETSc

partition = (2,2)

function euler_bernoulli(parts)
  options = "-ksp_type cg -pc_type gamg -ksp_monitor"
  GridapPETSc.with(args=split(options)) do
    # Parmeters
    L = 0.48 # length
    H = 0.12 # heigth
    I = H^3/12 # moment of inertia
    P = 1e2 # load applied 
    E = 25e2 # modulus os elasticity (young module)

    model = GmshDiscreteModel(parts,joinpath(@__DIR__,"pirelliFR_lowres3D_wpg_wg.msh"))

      order = 1

      reffe = ReferenceFE(lagrangian,VectorValue{3,Float64},order)
      V0 = TestFESpace(model,reffe;
        conformity=:H1,
        dirichlet_tags=["ground","tyre"],
        dirichlet_masks=[(false,false,true),(false,false,true)])
      
      d1(x) = VectorValue(0,0.0,0)
      d2(x) = VectorValue(0,0.0,-5e-3)
      Ug = TrialFESpace(V0,[d1,d2])
      
      degree = 2*order
      Ω = Triangulation(model)
      dΩ = Measure(Ω,degree)
      
      neumanntags = ["tyre","ground"]
      Γ = BoundaryTriangulation(model,tags=neumanntags)
      dΓ = Measure(Γ,degree)
      n_Γ = get_normal_vector(Γ)
      
      res(u,v) = ∫( E*I*(  Δ(v) ⋅ Δ(u) ) - P ⋅ ∇(v)⊙∇(u))*dΩ #+ ∫(v ⋅ (-P))*dΓ
      op = FEOperator(res,Ug,V0)
      nls = NLSolver(
        show_trace=true, method=:newton, linesearch=BackTracking())
      solver = FESolver(nls)
      Random.seed!(1234)
      x = rand(Float64,num_free_dofs(Ug))
      uh0 = FEFunction(Ug,x)
      uh, = solve!(uh0,solver,op)
      
      writevtk(Ω,"re-tire-deformation-field",cellfields=["uh"=>uh])
    end
  end

  prun(euler_bernoulli, mpi, partition)

error stacktrace

┌ Error: 
│   exception =
│    This function belongs to an interface definition and cannot be used.
│    Stacktrace:
│      [1] error(s::String)
│        @ Base ./error.jl:33
│      [2] macro expansion
│        @ ~/.julia/packages/Gridap/kce3i/src/Helpers/Macros.jl:9 [inlined]
│      [3] consistent_local_views(a::Vector{Float64}, ids::PRange{MPIData{IndexSet, 2}, Exchanger{MPIData{Vector{Int32}, 2}, MPIData{Table{Int32}, 2}}, Nothing}, isconsistent::Bool)
│        @ GridapDistributed ~/.julia/packages/GridapDistributed/d0dd5/src/Algebra.jl:33
│      [4] FEFunction(f::GridapDistributed.DistributedSingleFieldFESpace{MPIData{TrialFESpace{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{VectorValue{3, Int32}}}}, 2}, PRange{MPIData{IndexSet, 2}, Exchanger{MPIData{Vector{Int32}, 2}, MPIData{Table{Int32}, 2}}, Nothing}, PVector{Float64, MPIData{Vector{Float64}, 2}, PRange{MPIData{IndexSet, 2}, Exchanger{MPIData{Vector{Int32}, 2}, MPIData{Table{Int32}, 2}}, Nothing}}}, free_values::Vector{Float64}, isconsistent::Bool) (repeats 2 times)
│        @ GridapDistributed ~/.julia/packages/GridapDistributed/d0dd5/src/FESpaces.jl:282
│      [5] (::var"#1#2"{MPIData{Int64, 2}})()
│        @ Main ~/apps/FEStudy/euler-bernoulli-mpi.jl:52
│      [6] #with#2
│        @ ~/.julia/packages/GridapPETSc/goJza/src/Environment.jl:38 [inlined]
│      [7] main_ex1(parts::MPIData{Int64, 2})
│        @ Main ~/apps/FEStudy/euler-bernoulli-mpi.jl:14
│      [8] prun(driver::typeof(main_ex1), b::MPIBackend, nparts::Tuple{Int64, Int64})
│        @ PartitionedArrays ~/.julia/packages/PartitionedArrays/fem1y/src/MPIBackend.jl:33
│      [9] top-level scope
│        @ ~/apps/FEStudy/euler-bernoulli-mpi.jl:59
│     [10] include(mod::Module, _path::String)
│        @ Base ./Base.jl:418
│     [11] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:292
│     [12] _start()
│        @ Base ./client.jl:495
└ @ PartitionedArrays ~/.julia/packages/PartitionedArrays/fem1y/src/MPIBackend.jl:35
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
ERROR: failed process: Process(`/home/ricardo/.julia/artifacts/e3cd65be9b22b5e4a245c774d02f88f9a4928794/bin/mpiexec -n 4 julia /home/ricardo/apps/FEStudy/euler-bernoulli-mpi.jl`, ProcessExited(1)) [1]

Stacktrace:
  [1] pipeline_error
    @ ./process.jl:531 [inlined]
  [2] run(::Cmd; wait::Bool)
    @ Base ./process.jl:446
  [3] run(::Cmd)
    @ Base process.jl:444
  [4] (::var"#1#2")(exe::Cmd)
    @ Main none:4
  [5] (::MPI.var"#10#11"{var"#1#2"})(cmd::String)
    @ MPI ~/.julia/packages/MPI/08SPr/src/environment.jl:25
  [6] (::JLLWrappers.var"#2#3"{MPI.var"#10#11"{var"#1#2"}, String})()
    @ JLLWrappers ~/.julia/packages/JLLWrappers/QpMQW/src/runtime.jl:49
  [7] withenv(::JLLWrappers.var"#2#3"{MPI.var"#10#11"{var"#1#2"}, String}, ::Pair{String, String}, ::Vararg{Pair{String, String}})
    @ Base env.jl:172
  [8] withenv_executable_wrapper(f::Function, executable_path::String, PATH::String, LIBPATH::String, adjust_PATH::Bool, adjust_LIBPATH::Bool)
    @ JLLWrappers ~/.julia/packages/JLLWrappers/QpMQW/src/runtime.jl:48
  [9] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./essentials.jl:716
 [10] invokelatest(::Any, ::Any, ::Vararg{Any})
    @ Base ./essentials.jl:714
 [11] mpiexec(f::Function; adjust_PATH::Bool, adjust_LIBPATH::Bool)
    @ MPICH_jll ~/.julia/packages/JLLWrappers/QpMQW/src/products/executable_generators.jl:21
 [12] mpiexec(f::Function)
    @ MPICH_jll ~/.julia/packages/JLLWrappers/QpMQW/src/products/executable_generators.jl:21
 [13] mpiexec(fn::var"#1#2")
    @ MPI ~/.julia/packages/MPI/08SPr/src/environment.jl:25
 [14] top-level scope
    @ none:4

thank you

Move assemble_coo_with_column_owner! to PArrays?

In PR #115, we introduced in GridapDistributed a variant of PArrays assemble_coo! named assemble_coo_with_column_owner! which also exchanges the process column owners of the entries. I guess that it would be ideal that this feature actually goes to PArrays.

Add parallel tests and support for MPI+PETSc in travis

Add a type parameter to DistributedVector representing the data type used to store the local entries

Issues in MPI PETSc Communicator implementation

This thread enumerates the most important issues/limitations that I (@amartinhuertas) found along the way in the development carried out so far in the branch mpi_petsc_communicator. The list corresponds to the current status of this branch, reflected in commit ID f595e3e. I propose to go over them in a meeting with @fverdugo and @santiagobadia.

MPIPETScDistributedIndexSet/MPIPETScDistributedVector

The current implementation of MPIPETScDistributedVector is subject to three main limitations:

Its eltype (lets call it T) must be such that T<:Union{Number,AbstractVector{<:Number}}.
T must be such that either sizeof(T)=sizeof(Float64) (if T<:Number) or sizeof(eltype(T))=sizeof(Float64) (if T<:AbstractVector{<:Number}), resp.
[UPDATE: solved in c57f212] If T<:AbstractVector{<:Number}, length for all entries of MPIPETScDistributedVector must match.

(1) and (2) are caused because we are implementing MPIPETScDistributedVector using the services provided by the ghosted variant of the parallel distributed-memory PETSc vector PETSc.Vec{Float64}, which is restricted to Float64 entries. We could consider primitive element types with size different to Float64 (e.g., say, Int32, or Float16), but I still have to think how to handle the case in which the size in bytes of the local part of the vector is not a multiple of 8 (i.e., the size in bytes of Float64). Question: Is such feature actually going to be needed? Not so far (see below). On the other hand, (3) might be relatively easily circumvented using an smarter algorithm to the one available now (see here for more details) in order to create the ad-hoc MPIPETScDistributedIndexSet. Note that this smarter algorithm requires, though, additional communication.

We note that we can currently cope with the above three limitations because of two main reasons:

The algorithm that currently generates the global numbering of DoFs only requires to exchange among nearest neighbours vectors of type DistributedVector{Vector{Int64}} (see here, here, and here) and the FE spaces tested so far are such that we have the same number of DOFs per each cell.
The communication required in order to assemble the coefficient matrix and the RHS vector of the linear system are, at present, entirely managed by PETSc (here and here, resp.).

On the other hand, the current implementation is such that:

[UPDATE: solved in e98d5f4] The entries of the MPIPETScDistributedVector are actually stored twice, in particular, in a Julia array, and in a PETSc.Vec data structure; see here and here, resp.
The set up of a MPIPETScDistributedIndexSet requires communication, in particular, here.

(1) is caused because the local numbering of entries (e.g., cells or DOFs) in GridapDistributed.jl does not match the one hard-coded in PETSc ghosted vectors, i.e., owned entries first, then ghost entries.
There a two main performance implications of (1). First, prior to communication, we have to copy the local entries from the Julia array into the PETsc.Vec array (see here), and perform the reverse operation after communication (see here). Second, right after solving a linear system using PETSc, we have to communicate (in order to get the values associated to ghost DoFs; see here), and then copy the local entries from the PETsc.Vec array to the Julia array (see here). (2) is required because the global numbering of entries (e.g., cells or DoFs) generated by GridapDistributed.jl is not such that the ones owned by the first processor are numbered first, followed by those of the second processor, and so on, as required by PETSc. As a consequence one has to manage, in the PETSc library parlance, a global APP ordering and a PETSc ordering, and transform global identifiers in the APP ordering into PETSc global numbering identifiers (here is where the communication (2) stems from). Question: Is this something reasonable that we can afford or should we look for a more efficient solution instead? (options: force a local/global ordering, use of PETSc.VecScatter{T} directly instead of ghosted vectors, etc.)

Parallel Assembly

The current parallel assembly process is implemented in the assembly_matrix_vector method (available here) dispatched for the DistributedAssembler <: Assembler data type, and the specialization of the methods allocate_coo_vectors, finalize_coo!, and sparse_from_coo for the PETSc matrix data type PETSc.Mat{Float64}, available here. In the current implementation of the parallel assembly, there are both performance pitfalls, and temporary workarounds that I had to apply in order to accommodate the current algorithm into the software design of Gridap.jl/GridapDistributed.jl. While I am positive that some of these can be solved by additional work, I believe that some others are caused because I had to stick into the current way the method assembly_matrix_vector drives the assembly process. In the sequel, I will explicitly mention when I believe the corresponding issue to be caused by this.

Let us enumerate and elaborate a little bit more on them:

[UPDATE: Solved in 1aae8ae] The current algorithm does not compute, before actual numerical assembly, the (full) sparsity pattern of the rows which are locally owned by each processor. Instead, we preallocate some storage for the matrix (with a rough, upper estimate of the number of nonzero elements per row), and entries are dynamically introduced into the matrix; see here and here. Our experience with FEMPAR reveals that this has a number of practical performance and memory implications that we want to avoid in the final version of the algorithm (see also issue #9).
[UPDATE: Solved in 1aae8ae] As a complementary side-note to 1., let me recall (to not forget/reuse) that: (1) we discussed here a possible way to accommodate the final/performant algorithm such that sticks into the current structure of assembly_matrix_vector; (2) I already wrote an implementation of this algorithm, but it does not stick to the current structure of assembly_matrix_vector, see here for more details; (3) in the most general case (non-conforming meshes, hanging DoF constraints), the final/performant algorithm requires to perform an ad-hoc communication among nearest neighbours in order to set up the sparsity pattern of the locally owned rows, see here for more details. At first glance, I do not see how it can be implemented given the aforementioned limitations of MPIPETScDistributedVector mentioned above.
[UPDATE: Solved in 25457d3] At present, the raw matrix contributions stored in the I,J,V COO vectors are added into the global matrix on a one-by-one basis; see here. This is not acceptable from the performance point of view when dealing with PETSc matrices. Up to my knowledge, there is no function in the PETSc API which lets one inject all entries in the I,J,V arrays in one shot. The highest granularity function in the PETSc API lets one add a 2D dense matrix (e.g., a cell matrix) into the global matrix in one shot, so that it would be more useful if we could delimit the contribution of each cell within the I,J,V arrays, or avoid the COO arrays during the actual numerical assembly, and inject the whole cell matrix.
[UPDATE: Solved in 25457d3] My major objections/concerns with respect to the current structure of assembly_matrix_vector are related to the way the assembly of the RHS is handled, that is somehow borrowed from SparseMatrixAssembler and its data type extensions in Gridap.jl. In particular, this function relies on the fill_matrix_and_vector_coo_numeric!(I,J,V,b,assem,data) method, with b being the global (but distributed) vector data structure, and assem the local assembler corresponding to each MPI task, with type (extending) SparseMatrixAssembler. There are two issues with this method. First, the contributions of the entries of the cell vector to those of the global vector are added on a one-by-one basis (see here), which is not acceptable from the performance point of view when dealing with PETSc vectors. Second, before updating the global entry, the code is written such that the entry is first read. PETsc vectors are such that you cannot read an entry that is not locally owned when using global numbering. To workaround this issue, I have a temporary hack now, which consists on re-defining the _assemble_matrix_and_vector_fill! in GridapDistributed.jl (see here). This hack is not acceptable.
[UPDATE: Option discarded during meeting in favor of the one that subassembles local portions of the RHS vector.] The solution that I think best balances all constraints to address 4. is to gather all cell contributions to the global vector into a pair of arrays, say (Ib, b), with the subassembled raw contributions, and then assemble them at once in single method, say vector_from_raw_entries. This has two inherent benefits: (1) no need to expose the "finalization" of the global vector in assembly_matrix_vector (see here for more details). (2) see here, "Other issues to be discussed".
(Not really an issue, but something to discuss.) I had to parameterize the type of the vals member variable of DistributedFEFunction (see here). This enables the possibility to write an specific variant of solve! for the particular FE functions at hand; see here for more details. Is this the way to go?

Introduce the general concept of `AbstractTimer` agnostic to Communicator

Misc pending tasks associated with refactoring in branch release-0.2

Merging into `master` PR #50

Update README
remove src/OLD, test/OLD, compile
Add NEWS.md

High priority

Medium priority

Nicer user API for PETScSolver (to be done in GridapPETSc). In particular, better handling of the lifetime of PETSc objects.
Implement distributed models from gmsh (in GridapGmsh) @fverdugo
implement lu! for PSparseMatrix following the same strategy as with \
Move mpi tests to their own CI job @amartinhuertas PR #47
Automatically generate sysimage to reduce mpi tests runtime @amartinhuertas PR #48
Think how the user can define the local and global vector type in the FESpace
Showing NLSolve.jl solver trace only in master MPI rank leads to a dead lock.
Peridodic BCs
Implement ZeroMeanFESpace

Low priority

Implement a more lazy initialization of the matrix exchanger in PartitionedArrasys since the exchanger is not always needed.
Overlap compression of the sparse matrix with communication of rhs assembly
Implement another strategy to represent local matrices in PartitionedArrays with a data layout compatible with petsc
interpolate_everywhere not available in GridapDistributed.jl, only interpolate Solved in PR #74

Bug: Integration of HDiv terms on SkeletonTriangulation, for SubAssembledRows strategy

I found a bug when assembling skeleton terms in a DG+HDiv example.

The MWE can be found on PR #112

Periodic boundary conditions on multi processors

I am having some strange behaviour on periodic boundary conditions. In the case in this repo, I have generated the .msh file using Gmsh and imposed the periodic boundary condition on the plane z = 0, and the parallel plane on the other side for a z = const.
The code used to generate the model is in the repo (.sh file), the OpenMesh.jl is the one used for the serial case. In the folder n64 there is the model printed by the code using 64 cores, and so on for the other folders. As you can see, if look at the zm in the _0 files, there are some points also over the other plane, and the number and distribution of this "random extra points" change with the number of processor; there are none in the sequential case.

Thanks for the support

repo:
https://github.com/carlodev/naca_3d_problem.git

PSparseMatrix * PVector fails for certain case

Hi all,

The following fails with a A check failed error:

using Gridap
using GridapDistributed
using PartitionedArrays

function main_ex1(parts)
    domain = (0,1,0,1)
    mesh_partition = (4,4)
    model = CartesianDiscreteModel(parts,domain,mesh_partition)
    order = 2
    u((x,y)) = (x+y)^order
    f(x) = -Δ(u,x)
    reffe = ReferenceFE(lagrangian,Float64,order)
    V = TestFESpace(model,reffe,dirichlet_tags="boundary")
    U = TrialFESpace(u,V)
    Ω = Triangulation(model)
    dΩ = Measure(Ω,2*order)
    a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
    l(v) = ∫( v*f )dΩ
    op = AffineFEOperator(a,l,U,V)
    uh = solve(op)
    
    K = op.op.matrix;
    U = get_free_dof_values(uh)
    K*U
    # dot(U,K*U) 
end

partition = (2,2)
with_backend(main_ex1, SequentialBackend(), partition)

This outputs:

ERROR: AssertionError: A check failed
Stacktrace:
 [1] macro expansion
   @ C:\Users\Zac\.julia\packages\PartitionedArrays\mKknH\src\Helpers.jl:59 [inlined]
 [2] mul!(c::PVector{Float64, SequentialData{Vector{Float64}, 2}, PRange{SequentialData{IndexRange, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}}, a::PSparseMatrix{Float64, SequentialData{SparseMatrixCSC{Float64, Int64}, 2}, PRange{SequentialData{IndexRange, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}, PRange{SequentialData{IndexRange, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int64}, 2}}}, b::PVector{Float64, SequentialData{Vector{Float64}, 2}, PRange{SequentialData{IndexSet, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}}, α::Bool, β::Bool)
   @ PartitionedArrays C:\Users\Zac\.julia\packages\PartitionedArrays\mKknH\src\Interfaces.jl:2403
 [3] mul!
   @ C:\Users\Zac\AppData\Local\Programs\Julia-1.8.5\share\julia\stdlib\v1.8\LinearAlgebra\src\matmul.jl:276 [inlined]
 [4] *(a::PSparseMatrix{Float64, SequentialData{SparseMatrixCSC{Float64, Int64}, 2}, PRange{SequentialData{IndexRange, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}, PRange{SequentialData{IndexRange, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int64}, 2}}}, 
b::PVector{Float64, SequentialData{Vector{Float64}, 2}, PRange{SequentialData{IndexSet, 2}, Exchanger{SequentialData{Vector{Int32}, 2}, SequentialData{Table{Int32}, 2}}, Nothing}})
   @ PartitionedArrays C:\Users\Zac\.julia\packages\PartitionedArrays\mKknH\src\Interfaces.jl:2774
 [5] main_ex1(parts::SequentialData{Int64, 2})
   @ Main c:\Users\Zac\Desktop\zacs-phd\scripts\TopOpt_Thermo_Distributed\bug.jl:24
 [6] with_backend(::typeof(main_ex1), ::SequentialBackend, ::Tuple{Int64, Int64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ PartitionedArrays C:\Users\Zac\.julia\packages\PartitionedArrays\mKknH\src\Interfaces.jl:65
 [7] with_backend(::Function, ::SequentialBackend, ::Tuple{Int64, Int64})
   @ PartitionedArrays C:\Users\Zac\.julia\packages\PartitionedArrays\mKknH\src\Interfaces.jl:63
 [8] top-level scope
   @ c:\Users\Zac\Desktop\zacs-phd\scripts\TopOpt_Thermo_Distributed\bug.jl:29

If you instead do U = op.op.matrix\op.op.vector, K*U works as expected.

Version info:

[f9701e48] GridapDistributed v0.2.7
[5a9dfac6] PartitionedArrays v0.2.15

Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700KF
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 20 on 20 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 20

Name inconsistency for the vector of spaces between MultiFieldFESpace and DistributedMultiFieldFESpace

Hi @amartinhuertas , @fverdugo,

In Gridap, the member variable that contains the vector of SingleFieldFESpace in the MultiFieldFESpace is called spaces, while in DistributedMultiFieldFESpace is called field_fe_space.

In GridapODEs we are accessing this variable as U.spaces, which results in an error when U is a DistributedMultiFieldFESpace. Do you thing the name can be changed to match the definition of MultiFieldFESpace? Otherwise, we'll have to define a getter for both types:

get_spaces(U::MultiFieldFESpace) = U.spaces
get_spaces(U::DistributedMultiFieldFESpace) = U.field_fe_space

What would be the best solution?

Error: Unable to perform "inner" on the given cell-fields

Hello, I have reproduced an MWE below based on the linear-elasticity bi-material tutorial code , that is parallelised using GridapDistributed.jl, shows an error It is not possible to perform the operation "inner" on the given cell fields. Any help in this regard will be highly appreciated.

using Gridap.Geometry
using GridapDistributed
using PartitionedArrays
using Gridap
using GridapGmsh

function main(parts)
      model = GmshDiscreteModel(parts,"/path/to/solid.msh")
      order = 1

      reffe = ReferenceFE(lagrangian,VectorValue{3,Float64},order)
      V0 = TestFESpace(model,reffe; conformity=:H1,dirichlet_tags=["surface_1","surface_2"], dirichlet_masks=[(true,false,false), (true,true,true)])

      g1(x) = VectorValue(0.005,0.0,0.0)
      g2(x) = VectorValue(0.0,0.0,0.0)


      U = TrialFESpace(V0,[g1,g2])

      degree = 2*order
      Ω = Triangulation(model)
      dΩ = Measure(Ω,degree)

      labels = get_face_labeling(model)
      labels = labels.labels.part

      dimension = 3
      tags = get_face_tag(labels,dimension)
      alu_tag = get_tag_from_name(labels,"material_1")

      function lame_parameters(E,ν)
           λ = (E*ν)/((1+ν)*(1-2*ν))
           μ = E/(2*(1+ν))
          (λ, μ)
      end

      E_alu = 70.0e9
      ν_alu = 0.33
      (λ_alu,μ_alu) = lame_parameters(E_alu,ν_alu)

      E_steel = 200.0e9
      ν_steel = 0.33
      (λ_steel,μ_steel) = lame_parameters(E_steel,ν_steel)

      function σ_bimat(ε,tag)
         if tag == alu_tag
             return λ_alu*tr(ε)*one(ε) + 2*μ_alu*ε
         else
              return λ_steel*tr(ε)*one(ε) + 2*μ_steel*ε
         end
      end

       a(u,v) = ∫( ε(v) ⊙ (σ_bimat∘(ε(u),tags)) )*dΩ
       l(v) = 0

       op = AffineFEOperator(a,l,U,V0)
       uh = solve(op)

       writevtk(Ω,"demo_bimat",cellfields=["uh"=>uh,"epsi"=>ε(uh),"sigma"=>σ_bimat∘(ε(uh),tags)])
end
partition = (1,3,2)
prun(main, mpi, partition)

`BoundaryTriangulation(model)` returns the faces in the whole subdomain boundary

At present, BoundaryTriangulation(model), where model is the local portion of a DistributedDiscreteModel, returns all faces in the whole subdomain boundary, and not only those that are in the actual boundary of the whole domain.

We are by-passing this issue at present with OwnedCellsStrategy as we are only integrating those boundary faces of the whole subdomain boundary that belong to local cells via a TriangulationPortion.

At the present moment, an alternative solution that I see, perhaps cleaner than the one that we have know, is to define a BoundaryTriangulation(DistributedDiscreteModel) method that returns DistributedData{TriangulationPortion{GenericBoundaryTriangulation}} (easier) or DistributedData{GenericBoundaryTriangulation} (harder, as we cannot rely on identifying boundary faces as faces which are only on a single cell).

Issues found/lessons learned with MPI-parallel Stokes solver

In this issue, I summarize the issues/lessons that I found/learned while developing MPIPETScDistributedStokesTests.jl, a parallel MPI program for the solution of the Stokes PDE. In order to avoid building a pressure FE space composed of zero mean functions (for reasons made clear below), the script, as of today (see link above), is solving a problem with manufactured solution on a square, with Dirichlet boundary conditions everywhere, except for the lower edge of the square, where we have Neumann boundary conditions. The issues are (prioritized by importance):

[UPDATE: Solved in 2a71fe7 1663849 c754797 0b95e0f 94df2cd e92b9c3] We are currently using a software design pattern in which we have a DistributedFESpace composed of local FE spaces. The polymorphism is in the local FE spaces (e.g., SingleFieldFESpace versus MultiFieldFESpace). DistributedFESpace is unique. DistributedFESpace glues together the parts by building a global DoF numbering across processors. While this pattern has quite a lot of expressivity, I do not see how it can be used to cleanly build a global zero mean FE space. We need some sort of global layer of code in which the processors agree in the single DOF to be fixed, and the zero mean constraint imposition has to be computed globally by computing the pressure mean/domain volume by assembling (reduce-sum) local contributions from each of the subdomains. Is it possible to acommodate in the interplay of DistributedFESpace+Local FE spaces these operations? Do you foresee any other similar scenario? (e.g., FESpaces with multipoint linear constraints?)
[UPDATE: Solved in 2a71fe7 1663849 c754797 0b95e0f 94df2cd e92b9c3] Just guessing, the problem in the point above could be solved if we had at least two types of SingleField DistributedFESpaces (the standard one, and the one that imposes the constraint), and a MultiFieldFEspace defined as the composition of global FE spaces, but this clearly breaks our original intended design pattern of having a single DistributedFEspace and besides, it needs extra communication (see issue #3).
I had to define a new num_dirichlet_dofs() method for MultiFieldFESpace, even this FESpace is not a SingleFESpace. This is is required in this line.
Also worths mentioning a very annoying BUG that drove me crazy during many hours, and that I consider convenient to document here in case it is useful for others. In particular, when restricting the multifield trial basis into its velocity and pressure components within the function that defines the bilinear form (see this line, if I use u as the variable name for velocity (note that the code is currently using y), then this variable assignment has a side-effect in the local scope of the run function, in particular, it modifies the binding of the local variable u of run previously set up in this line. This causes this line to crash later on as u is no longer an object of type Function.
[UPDATE: This BUG has been already solved by @fverdugo in Gridap.jl] We need a bug-fix for the following issue in Gridap.jl: gridap/Gridap.jl#323
[UPDATE: This issue has been solved after the excellent work by @fverdugo in PR #376.] In order to let the method that generates the global DoF numbering (see here) work with local MultiFieldFESpaces, I had to modify the current implementation of MultiFieldArray{T,N} in Gridap.jl such that it now extends AbstractArray{T,N}. The particular lines in which this is required are this one and this one. This has been implemented in a branch of Gridap.jl. See here for the current implementation of MultiFieldArray{T,N}. However, following concerns/issues are around:

For obvious reasons, getindex() semantics were changed from block-based semantics to scalar-based semantics, i.e., A[1,1] does not refer to the upper left block in the block structure of A, but the entry in the upper left corner of A. This already causes some Gridap.jl tests to fail. See here for details.
I added two member variables in order to accelerate indexing operations of MultiFieldArray{T,N}. However, there are case uses of this data type in which the blocks member variable is set up with undef (see here), and thus these member variables cannot be set up during creation, and cannot be created later on as the struct is immutable. In conclusion, I am not quite happy with the current solution.
Again guessing, I have the impresison that MultiFieldArray{T,N} is a multi-purpose data structure, that is re-used in many scenarios. A blocked cell dofs array is just one of these scenarios. Perhaps we could create a separate ad-hoc data type for such purpose. This would solve 1. but 2. would still be there. Any ideas?

PETSc.jl related issues/caveats/pendings tasks

In order to develop this work, and have something to start with (i.e. I am not claiming it to be necessarily the definitive solution, perhaps something more clever is to reduce it to a subset of strictly required features), I decided to use the PETSc.jl Julia package available here, mainly developed by Jared Crean and Steven G. Johnson. See also the following issue for a bit of history. Unfortunately, the development of this project was abandoned on Oct, 2016 aprox., and it was freezed on Julia v0.4 compatibility. Starting from the branch uptodate of the official PETSc.jl git repository, I had to perform a vast amount of changes to the code in this branch in order to modernize it to Julia v1.4. I did this work in the branch uptodate1.4 of my own fork of PETSc.jl, available here. Note: I have very recently realized about the existence PETSc2.jl, available here. This project stems from PETSc.jl, but I did not yet take a tight look into it.

Prior to the installation of the package in my machine, I had to set up the following bash environment variables before adding the package to the Pkg environment (please use your own paths).

MKLDIRPATH=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin
export LD_PRELOAD=$MKLDIRPATH/libmkl_avx2.so:$MKLDIRPATH/libmkl_def.so:$MKLDIRPATH/libmkl_sequential.so:$MKLDIRPATH/libmkl_core.so
export JULIA_MPI_BINARY=system
export JULIA_PETSC_RealDouble_DIR=/home/amartin/software_installers/petsc-3.9.2-build-dbg-gnu9-mpi
export JULIA_PETSC_RealDouble_ARCH=arch-linux2-c-debug
export JULIA_PETSC_RealSingle_NOBUILD=1
export JULIA_PETSC_ComplexDouble_NOBUILD=1

To keep the presentation concise, I am not going to thoroughly document nor justify the work that I performed to rehabilitate PETSc.jl to v1.4. I would only list the issues for which I am most concerned.

There are no high-level Julia wrappers for SNES (PETSc non-linear solvers and preconditioners) in PETSc.jl. I am positive that we will need them for GridapDistributed.jl.
The PETSc.jl developers used (an old version of, I do not know which one) Clang.jl in order to (semi-?)automatically generate the low-level Julia wrappers for the PETSc's C API (available here). The interface of these wrappers corresponds to PETSc v3.6.0, but the latest stable release is v3.15.0. While the PETSc's C API is already quite stable, I have realized that there are added, removed, and changed functions in v3.15.0 versus v3.6.0. Indeed, I had to manually change the interface of some Julia wrappers to reflect the one of v3.15.0 and fix some broken tests in the test suite of PETSc.jl. Ideally, we should be able to apply Clang.jl to v3.15.0, and replace the newly generated wrappers within the generated source directory, and expect that everything works with minimal changes, but I am not sure if this is the situation; see here and here for more details.
[I have succesfully used PETSc.jl so far with the folllowing PETSc versions: v3.7.7, v3.9.2., v3.12.4.] I used PETSc v3.9.2 (the one installed in my machine for convenience) all the way through in this work, i.e., I did not check that v3.15.0 also works. However, when I had to manually fix the Clang.jl-automatically generated Julia wrappers, I maked sure that the status of v3.9.2 and v3.15.0 matched for the functions at hand.
[Solved in 55d8c0db739a613042a34add4958a3110d39ac4d]. When the Julia statement using PETSc triggers the precompilation of the PETSc module (typically the first time it is used in a Julia REPL session), I get two warnings from the Julia REPL that I do not understand where do they come from. In particular,

  julia> using PETSc
[ Info: Precompiling PETSc [743b6f9b-fae8-55fc-a981-4e9a2885e829]
WARNING: Method definition (::Type{PETSc.Mat{T} where T})(Type{T}) where {T} in module PETSc at /home/amartin/git-repos/PETSc.jl/src/mat.jl:58 overwritten at /home/amartin/git-repos/PETSc.jl/src/mat.jl:75.
  ** incremental compilation may be fatally broken for this module **

WARNING: Method definition Type##kw(Any, Type{PETSc.Mat{T} where T}, Type{T}) where {T} in module PETSc at /home/amartin/git-repos/PETSc.jl/src/mat.jl:58 overwritten at /home/amartin/git-repos/PETSc.jl/src/mat.jl:75.
  ** incremental compilation may be fatally broken for this module **

[Update: I have confirmed that the error is triggered within one of the finalizer functions installed for the data types in PETSc.jl. The finalizer calls a MPI function when MPI has already been finalized. I have tried to use this without success.] When I execute the test suite of PETSc.jl from the pkg> shell with the environment of PETSc.jl activated, then I get the following execution error at the end:

(PETSc) pkg> test
    Testing PETSc
Status `/tmp/jl_5zIqOe/Manifest.toml`
  ....
testing types: DataType[Float64, Float32, Complex{Float64}]
testing datatype Float64
errnum = 73
errnum = 76
Test Summary:           | Pass  Total
Error Handling{Float64} |    2      2
Test Summary: | Pass  Total
KSP{Float64}  |    6      6
Test Summary: | Pass  Total
Vec{Float64}  |   91     91
Test Summary: | Pass  Total
IS{Float64}   |   18     18
Test Summary: | Pass  Total
Mat{Float64}  |  167    167
*** The MPI_Comm_compare() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[sistemas-ThinkPad-X1-Carbon-6th:10823] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
ERROR: Package PETSc errored during testing

This error, however, does not appear if I execute include(runtests.jl) from the Julia REPL. (?????)

[Update: This error disappears if I add the __precompile__(false) statement right at the beginning of the main module of GridapDistributed.jl. We do not want to deactivate precompiling, but does it tell you something?] Each time that GridapDistributed.jl is modified, the first time that I execute a parallel driver, the program fails during initialization. But the second, and subsequent times, it works ok. I suspect that it has to do with incremental precompilation, automatic initialization, etc. but not totally sure about the actual cause. In particular, the error looks like:

[1,3]<stderr>:[sistemas-ThinkPad-X1-Carbon-6th:17680] PMIX ERROR: NOT-FOUND in file ptl_tcp.c at line 688
[1,3]<stderr>:[sistemas-ThinkPad-X1-Carbon-6th:17680] OPAL ERROR: Unreachable in file pmix2x_client.c at line 109
[1,3]<stderr>:*** An error occurred in MPI_Init
[1,3]<stderr>:*** on a NULL communicator
[1,3]<stderr>:*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[1,3]<stderr>:***    and potentially your MPI job)
[1,3]<stderr>:[sistemas-ThinkPad-X1-Carbon-6th:17680] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

I commented the deactivation of the PETSc Error Handler that PETSc.jl was performing on module initialization. See here for more details. I realized that the error messages generated by the PETSc Error Handler are much more detailed and helpful than the ones that PETSc.jl generates otherwise, which barely include an error code and a generic error message (see here for more details).
From this commit of PETSc (petsc/petsc@2ebc710#diff-d46e9870b0b2f6361c8563135bfdaa89eab41a56290d02afb6ca42f5463ea629), the value of PETSC_INT changed from 0 to 16. This has implications on the PETSc julia wrappers, that have to define the associated constant accordingly. Accordingly to PETSc release dates, this change is reflected from v3.10.3 on.

Finally, a list of pending tasks:

[Update: solved in gridap/GridapDistributedPETScWrappers.jl@68a8d3d] The current build.jl script is a mess, with advanced (e.g., auto-compilation of PETSc from source) but obsolete features that I do not know even if they work/they can be modified to work. I would reduce it to the minimum, assuming that PETSc has been already installed on the system were PETSc.jl is about to run. We could take as basis the excellent script by @fverdugo and @vsande in GridapPETSc.jl. Click here for more details. MOST IMPORTANTLY: the current script does not allow one to customize the size of PETSc integers (i.e., its is hard-coded).

Allow access to physical groups in distributed fashion

Maybe this is possible, but I haven't found a way to do this in a parallel way, but I'm trying to adapt some parts of Tutorial 3 to a distributed fashion.

In there, the parts I want to copy is:

model = GmshDiscreteModel(parts, "./geometry.msh")
labels = get_face_labeling(model)
tags = get_face_tag(labels,dimension)

function σ_bimat(ε,tag)
  if tag == alu_tag
    return λ_alu*tr(ε)*one(ε) + 2*μ_alu*ε
  else
    return λ_steel*tr(ε)*one(ε) + 2*μ_steel*ε
  end
end

a(u,v) = ∫( ε(v) ⊙ (σ_bimat∘(ε(u),tags)) )*dΩ

So I want to use different parameters depending on the 3d physical group of the element. I was wondering if and how this is possible in a distributed fashion, since with this direct code I get:

ERROR: LoadError: MethodError: no method matching get_face_tag(::GridapDistributed.DistributedFaceLabeling{MPIData{FaceLabeling, 1}}, ::Int64)

Issues found/lessons learned with MPI-parallel p-Laplacian solver using `NLsolve.jl`

In this issue, I summarize the issues/lessons that I found/learned while developing MPIPETScDistributePLaplacianTests.jl, a parallel MPI program for the solution of the p-Laplacian non-linear PDE. As agreed, the efforts have been put in trying to leverage the NLsolve.jl Julia package. In the development process, I could explore, as a result of marathonian debugging sessions, quite deep into the call path within this package that is triggered when setting up the nonlinear solver in Gridap.jl as nls = NLSolver(show_trace=true, method=:newton). All observations refer to the current version of NLsolve.jl, which at the date of writing it is v4.4.0

Although I did not explore NLSolve.jl in full, and thus the observations in the sequel may not necessarily apply to other parts of the sw package, I am not sure if this package has been carefully developed with HPC/performance in mind, or it is rather though as a toy/academic software effort. In particular, I have the following concerns:

It performs a deep copy of the Jacobian matrix and residual vector in its own data structures right at the beginning of the nonlinear problem solution process. See here.
Depending on the combination of input parameters, the Newton solver ends performing unnecessary, costly computations. See this issue that I opened in NLsolve.jl. This is particularly severe with PETSc.jl, as, currently, transpose(A) does not return a lazy transpose of A, but explicitly builds the transpose matrix. (This can be worked out in PETSc.jl, though)
Perhaps with fault tolerance in mind, the Newton solver performs, at each iteration, an O(n) check on the current iterate solution in the search for NaNs. See here. It would be great if these sort of checks could be optionally activated/deactivated.

Apart from these concerns, NLsolve.jl uses extensively generic Julia interfaces, mainly AbstractArray and broadcasting interfaces. While some the methods in these interfaces are already efficiently supported by PETSc.jl, some other are not either supported at all or supported efficiently with the generic ÀbstractArray.jl fallbacks. In order to have MPIPETScDistributePLaplacianTests.jl working, I had to overload a number of different methods for PETSc.Vec{Float64}. In the sequel, I will enumerate what I had to overload and why. At present, these method definitions are written at the driver lever, but obviously we will have to move them to the appropriate packages. As I mentioned earlier, I did not go through the whole NLsolve.jl package, so that it is likely that there are other methods to be defined as well if we switch to other solvers in NLsolve.jl.

The minimal broadcasting-related methods required to efficiently support this line. See lines start-end.
Defined in Base module: copyto!(...), maximum(...), any(...) for PETSc.Vec{Float64} vectors. Required to support this, this, and this line, resp.
Defined in LinearAlgebra module: mul! and rmul!. Required to support this, and this line, resp.
Defined in NLSolversBase module: x_of_nans. Required to support this line. I think this particular line could be supported by extending the broadcast machinery above with op == Number; see the following line.
Defined in NLsolve module: check_isfinite. Required to support this line.
Defined in Distances module: SqEuclidean. Required to support this line. See also here.

Finally, I also have the following concerns:

Once the Jacobian matrix has been built for the first time, further Jacobian assembly operations are such that cell matrix entries are assembled into the global matrix on a one-by-one basis; see here and here. The same observation applies to residual assembly. I think that this may hit performance when dealing with PETSc matrices and vectors (I did not evaluate how severe this actually is). The highest granularity function in the PETSc API lets one add a 2D dense matrix (typically a cell matrix) into the global matrix in one shot, so that it would be more performant that we inject the whole cell matrix. Should we modify Gridap.jl such that the granularity of these operations is increased?
The output generated by NLsolve.jl on screen (triggered with show_trace=true) is shown by all MPI tasks simultaneously, so that the output guests messed up. See, e.g., https://travis-ci.com/github/gridap/GridapDistributed.jl/jobs/359332310#L1008

High-level API

Spare ideas, issues, limitations, etc.:

I have the feeling that the code somehow (implicitly) manages the concepts of communicator (parallel programming model) and backend. Two backends (e.g., PETSc and PartitionedArrays) might rely on the same communicator. At present, both concepts are somehow fused, there is no clear separation among them.
Concerns over the current way we are designing the distributed data structures leads to a large amount of boilerplate code.
It might be possible to improve auto-generation of this boilerplate code using, e.g., metaprogramming, and better interplay mechanisms among Gridap and GridapDistributed (see, e.g., next point).
It would be great that Gridap publishes in appropiate constants the list of unary, binary, functions, etc. that it overloads in order to support the high-level API. The implementation of GridapDistributed's high level API also requires it, and it is being copy &pasted now. With this feature, if we add more functionality in Gridap.jl high-level API, we would have it automatically into GridapDistributed.jl.
I did not systematically try to support Gridap's high-level API in GridapDistributed.jl. Instead, as an exploratory work, I implement that part ot the high level API which is strictly necessary to support the implementation of the script in the test source directory.
write VTK in parallel?
revisit FESpaces interfaces. (no kw-args)
Ambiguities in Base.:(∘) overloads.
kwargs_triangulation... issue in Triangulation methods.
We do not have reduce, only gather.
How to separate among args required for generating the Gridap Triangulation instances, and those required to adjust it such that it is suitable in a distributed context.
Frozen Triangulation constructor interfaces in a distributed context. We assume that part,gids,model are enough to adjust the proc-local triangulation portion no matter the strategy we are dealing with.

Auto-diff error in GridapDistributed code

Hello, I am attaching a below MWE, where I am trying to implement auto-diff in GridapDistributed code. It will be really good if there is any-way, to utilise auto-diff, using GridapDistributed code here. As below code, shows error when trying to do that, but works fine, when I am explicitly passing both residual and jacobian into the transientfeoperator() function. The error is at line::44. Any help will be really appreciated as it will reduce the code run-time very much.

using Gridap
using Gridap.ODEs
using GridapDistributed
using PartitionedArrays
using Test
using BenchmarkTools

function main(parts)
  θ = 0.2

  u(x,t) = (1.0-x[1])*x[1]*(1.0-x[2])*x[2]*t
  u(t::Real) = x -> u(x,t)
  f(t) = x -> ∂t(u)(x,t)-Δ(u(t))(x)

  domain = (0,1,0,1)
  partition = (10,10)
  model = CartesianDiscreteModel(parts,domain,partition)

  order = 2

  reffe = ReferenceFE(lagrangian,Float64,order)
  V0 = FESpace(
    model,
    reffe,
    conformity=:H1,
    dirichlet_tags="boundary"
  )
  U = TransientTrialFESpace(V0,u)

  Ω = Triangulation(model)
  degree = 2*order
  dΩ = Measure(Ω,degree)

  #
  m(u,v) = ∫(u*v)dΩ
  a(u,v) = ∫(∇(v)⋅∇(u))dΩ
  b(t,v) = ∫(v*f(t))dΩ

  res(t,u,v) = a(u,v) + m(∂t(u),v) - b(t,v)
  jac(t,u,du,v) = a(du,v)
  jac_t(t,u,dut,v) = m(dut,v)

  op = TransientFEOperator(res,U,V0;order = 1)
  op_constant = TransientConstantMatrixFEOperator(m,a,b,U,V0)

  t0 = 0.0
  tF = 1.0
  dt = 0.1

  U0 = U(0.0)
  uh0 = interpolate_everywhere(u(0.0),U0)

  ls = LUSolver()
  ode_solver = ThetaMethod(ls,dt,θ)

  sol_t = solve(ode_solver,op,uh0,t0,tF)
  sol_t_const = solve(ode_solver,op_constant,uh0,t0,tF)

  # l2(w) = w*w

  # tol = 1.0e-6

  # for (uh_tn, tn) in sol_t
  #   e = u(tn) - uh_tn
  #   el2 = sqrt(sum( ∫(l2(e))dΩ ))
  #   @test el2 < tol
  # end

  # for (uh_tn, tn) in sol_t_const
  #   e = u(tn) - uh_tn
  #   el2 = sqrt(sum( ∫(l2(e))dΩ ))
  #   @test el2 < tol
  # end

  # createpvd("test-transient-heat-equation") do pvd  
    for (uh,t) in sol_t
      writevtk(Ω,joinpath(@__DIR__,"..","mpi_test","test-transient-heat-equation_$t"),cellfields=["u"=>uh])
      # println(evaluate(uh.fields.part,Point(0.5,0.5)))
      # println(fieldnames(typeof(Ω.model.models.part.grid)))
    end
  # end

end

# end #module
partition = (2,2)
prun(main, mpi, partition)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Use mpiexecjl

MPI.jl comes with a wrapper for mpiexec, called mpiexecjl. See here. Could this be used instead of the instructions given in the README here?

Error when interpolating DistributedFEFunction

Hi @amartinhuertas , @fverdugo,

I'm facing an error when interpolating a DistributedFEFunction. The interpolate function works well when the input function is a Function, but it gives an error for a FE function defined in the same FE space. Do you have any idea on why this is happening and how to solve it? Thank you!

This is a reproducer:

using Gridap
using GridapDistributed
using PartitionedArrays

parts = get_part_ids(sequential,(1,))
domain = (0,1)
cells = (2,)
𝒯 = CartesianDiscreteModel(parts,domain,cells)
refFE = ReferenceFE(lagrangian,Float64,1)
V = TestFESpace(𝒯,refFE)
U = TrialFESpace(V)
u₀(x) = 1.0
uₕ₀ = interpolate(u₀,U)
uₕ = interpolate(uₕ₀,U)

And this is the error

ERROR: LoadError: MethodError: no method matching Gridap.CellData.CellField(::GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}, ::Gridap.Geometry.BodyFittedTriangulation{1, 1, Gridap.Geometry.CartesianDiscreteModel{1, Float64, typeof(identity)}, Gridap.Geometry.CartesianGrid{1, Float64, typeof(identity)}, Gridap.Arrays.IdentityVector{Int64}}, ::Gridap.CellData.ReferenceDomain)
Closest candidates are:
  Gridap.CellData.CellField(::AbstractArray{<:Number}, ::Gridap.Geometry.Triangulation, ::Gridap.CellData.DomainStyle) at C:\Users\ocolomesgene\.julia\packages\Gridap\775Gn\src\CellData\CellFields.jl:89
  Gridap.CellData.CellField(::Number, ::Gridap.Geometry.Triangulation, ::Gridap.CellData.DomainStyle) at C:\Users\ocolomesgene\.julia\packages\Gridap\775Gn\src\CellData\CellFields.jl:83
  Gridap.CellData.CellField(::Function, ::Gridap.Geometry.Triangulation, ::Gridap.CellData.DomainStyle) at C:\Users\ocolomesgene\.julia\packages\Gridap\775Gn\src\CellData\CellFields.jl:77
  ...
Stacktrace:
  [1] _cell_vals(fs::Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}, object::GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}})
    @ Gridap.FESpaces C:\Users\ocolomesgene\.julia\packages\Gridap\775Gn\src\FESpaces\SingleFieldFESpaces.jl:193
  [2] interpolate!(object::GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}, free_values::Vector{Float64}, fs::Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}})
    @ Gridap.FESpaces C:\Users\ocolomesgene\.julia\packages\Gridap\775Gn\src\FESpaces\SingleFieldFESpaces.jl:185
  [3] (::GridapDistributed.var"#158#159"{GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}})(V::Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}, vec::Vector{Float64})
    @ GridapDistributed C:\Users\ocolomesgene\Progs\git-repos\gridap\GridapDistributed.jl\src\FESpaces.jl:356
  [4] (::Base.var"#4#5"{GridapDistributed.var"#158#159"{GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}}})(a::Tuple{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}, Vector{Float64}})
    @ Base .\generator.jl:36
  [5] iterate
    @ .\generator.jl:47 [inlined]
  [6] collect(itr::Base.Generator{Base.Iterators.Zip{Tuple{Vector{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}}, Vector{Vector{Float64}}}}, Base.var"#4#5"{GridapDistributed.var"#158#159"{GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}}}})
    @ Base .\array.jl:724
  [7] map(::Function, ::Vector{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}}, ::Vector{Vector{Float64}})
    @ Base .\abstractarray.jl:2948
  [8] map_parts(::Function, ::PartitionedArrays.SequentialData{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}, 1}, ::Vararg{PartitionedArrays.SequentialData})
    @ PartitionedArrays C:\Users\ocolomesgene\Progs\git-repos\gridap\PartitionedArrays.jl\src\SequentialBackend.jl:56
  [9] interpolate!
    @ C:\Users\ocolomesgene\Progs\git-repos\gridap\GridapDistributed.jl\src\FESpaces.jl:355 [inlined]
 [10] interpolate(u::GridapDistributed.DistributedCellField{PartitionedArrays.SequentialData{Gridap.FESpaces.SingleFieldFEFunction{Gridap.CellData.GenericCellField{Gridap.CellData.ReferenceDomain}}, 1}, GridapDistributed.DistributedFEFunctionData{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}}}, f::GridapDistributed.DistributedSingleFieldFESpace{PartitionedArrays.SequentialData{Gridap.FESpaces.UnconstrainedFESpace{Vector{Float64}, Gridap.FESpaces.NodeToDofGlue{Int32}}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}, PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 1}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 1}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 1}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 1}}, Nothing}}})
    @ GridapDistributed C:\Users\ocolomesgene\Progs\git-repos\gridap\GridapDistributed.jl\src\FESpaces.jl:350

Doc badge link broken

Hi,
just wanted to report that the doc badge link in the Readme seems broken

TO-THINK: DistributedDiscreteModel also handles gids for lower dimensional objects

@fverdugo ... TO-Think ...

The current implementation of DistributedDiscreteModel only handles a global numbering of the cells.

Given that a DistributedDiscreteModel represents not only the cells, but all the objects of lower dimension in the mesh, wouldn't be reasonable that DistributedDiscreteModel handles a global numbering for all objects?

Let me know your thoughts ...

Infinite recursion in `get_comm` definition

Hi @fverdugo ...

the following construction:

# return an object for which
# its restriction to the parts of a communicator is defined.
# The returned object is not necessarily an instance
# of DistributedData
# Do nothing by default.
function get_distributed_data(object)
  object
end

get_comm(a) = get_comm(get_distributed_data(a))

leads to infinite recursion with the default definition of get_distributed_data, if get_comm is not defined. Here I foresee two possible options (perhaps there are more):

get_distributed_data becomes an @abstractmethod.
or
get_comm becomes an @abstractmethod.

On the other hand, at several places, we have sentences like: comm = get_comm(get_distributed_data(first(args))). Arent these latter ones somehow redundant if we keep the current definition of get_comm?

To be honest, I do not understand the overall logic of the interplay of get_distributed_data and get_comm ...

Unable to create *.pvd files.

Hello, I am unable to create *.pvd files, in the following demo code. The code runs correctly, but without generating any *.pvd files.

using Gridap
using Gridap.ODEs
using GridapDistributed
using PartitionedArrays
using Test

function main(parts)
  θ = 0.2

  u(x,t) = (1.0-x[1])*x[1]*(1.0-x[2])*x[2]*t
  u(t::Real) = x -> u(x,t)
  f(t) = x -> ∂t(u)(x,t)-Δ(u(t))(x)

  domain = (0,1,0,1)
  partition = (4,4)
  model = CartesianDiscreteModel(parts,domain,partition)

  order = 2

  reffe = ReferenceFE(lagrangian,Float64,order)
  V0 = FESpace(
    model,
    reffe,
    conformity=:H1,
    dirichlet_tags="boundary"
  )
  U = TransientTrialFESpace(V0,u)

  Ω = Triangulation(model)
  degree = 2*order
  dΩ = Measure(Ω,degree)

  #
  m(u,v) = ∫(u*v)dΩ
  a(u,v) = ∫(∇(v)⋅∇(u))dΩ
  b(t,v) = ∫(v*f(t))dΩ

  res(t,u,v) = a(u,v) + m(∂t(u),v) - b(t,v)
  jac(t,u,du,v) = a(du,v)
  jac_t(t,u,dut,v) = m(dut,v)

  op = TransientFEOperator(res,jac,jac_t,U,V0)
  op_constant = TransientConstantMatrixFEOperator(m,a,b,U,V0)

  t0 = 0.0
  tF = 1.0
  dt = 0.1

  U0 = U(0.0)
  uh0 = interpolate_everywhere(u(0.0),U0)

  ls = LUSolver()
  ode_solver = ThetaMethod(ls,dt,θ)

  sol_t = solve(ode_solver,op,uh0,t0,tF)
  sol_t_const = solve(ode_solver,op_constant,uh0,t0,tF)

  l2(w) = w*w

  tol = 1.0e-6

  for (uh_tn, tn) in sol_t
    e = u(tn) - uh_tn
    el2 = sqrt(sum( ∫(l2(e))dΩ ))
    print(el2)
    @test el2 < tol
  end

  for (uh_tn, tn) in sol_t_const
     e = u(tn) - uh_tn
     el2 = sqrt(sum( ∫(l2(e))dΩ ))
     @test el2 < tol
  end
   
createpvd("test-transient-heat-equation") do pvd  
  for (uh,t) in sol_t
    createvtk(Ω,"test-transient-heat-equation_$t"*".vtu",cellfields=["u"=>uh])
  end
end

end #module

Upgrade to Gridap 1.6 issues

Hi @fverdugo, @santiagobadia,

FYI ... in PR #32, I ported GridapDistributed.jl to Gridap 0.16.3. Along the process, I annotated a set of concerns that you can find in the list below. Apart from solving this, I think it would be great to speak about which are the priorities/future plans with the development of GridapDistributed.jl.

Several concerns I have detected along the process.

SparseMatrixAssembler in Gridap v0.16 does no longer conceptually hold test and trial FE spaces inside, while DistributedSparseMatrixAssembler does.
Introduction of ArtificeSparseMatrixToLeverageDefaults. Something more clever to solve this issue?
Review the need for the allocate_local_vector function and the workflow associated to it. Is there a better way to design this taking into account the way assembly operations have been redesigned in Gridap v0.16?
I am skipping symbolic_loop_vector! all the way through.
[Solved in cc56f4d] Bilinear and linear functions for multifield problems cannot be iterated over (i.e., tuples) as in Gridap.
I had to recover Reindexed and some of the associated machinery from Gridap 0.14.3. I need it for GridapDistributed.jl. Should we bypass this need? If yes, how?
Review DistributedFEFunctions and their current hierarchy.
Gridap.Algebra.add_entry!: We are inserting entries one by one. To see how we can introduce all cell entries at once with the new assembly interface.

Draft for generating pvd collection files

Draft for generating pvd collection files. Cc @oriolcg

parts = nothing # For serial
parts = get_part_ids(...) # For distributed

#Serial only
model = CartesianDiscreteModel(domain,cells)

#Serial and parallel
model = CartesianDiscreteModel(parts,domain,cells)

# Serial only
createpvd("results") do pvd
  for (x,t) in xt
    pvtk = createvtk(Ω,"filename_$t",cellfields=["x"=>x])
    pvd[t] = pvtk
  end
end

#Serial and parallel
createpvd(parts,"results") do pvd
  for (x,t) in xt
    pvtk = createvtk(Ω,"filename_$t",cellfields=["x"=>x])
    pvd[t] = pvtk
  end
end


# Serial. #TODO the versions taking nothing

function createpvd(args...;kwargs...)
  paraview_collection(args...;kwargs...)
end

function savepvd(pvd)
  vtk_save(pvd)
end


# Parallel

struct DistributedPvd{T}
  pvds{T}
end

function createpvd(parts::AbstractPData,args...;kwargs...)
  pvds = map_main(parts) do part
    paraview_collection(args...;kwargs...)
  end
  DistributedPvd(pvds)
end

function savepvd(pvd::DistributedPvd)
  map_main(pvd.pvds) do pvd
    vtk_save(pvd)
  end
end

function createpvd(f,parts::AbstractPData,args...;kwargs...)
  pvd = createpvd(parts,args...;kwargs...)
  try
    f(pvd)
  finally
    savepvd(pvd)
  end
end

function setindex!(pvd::DistributedPvd,pvtk::AbstractPData)
  map_parts(vtk_save,pvtk)
  map_main(pvtk,pvd.pvds) do pvtk,pvd
    pvd[t] = pvtk
  end
end

Truly global FE spaces, including multifield FE spaces

Related to issue #24

I have a first beta release of the distributed Stokes tests using truly global distributed multifield FE spaces. You can, e.g., see the differences among the previous and the new version lines 52-57 and lines 53-57, resp.

Recall that the motivation of this work resides in the following two points (quoting directly from issue #24):

We are currently using a software design pattern in which we have a DistributedFESpace composed of local FE spaces. The polymorphism is in the local FE spaces (e.g., SingleFieldFESpace versus MultiFieldFESpace). DistributedFESpace is unique. DistributedFESpace glues together the parts by building a global DoF numbering across processors. While this pattern has quite a lot of expressivity, I do not see how it can be used to cleanly build a global zero mean FE space. We need some sort of global layer of code in which the processors agree in the single DOF to be fixed, and the zero mean constraint imposition has to be computed globally by computing the pressure mean/domain volume by assembling (reduce-sum) local contributions from each of the subdomains. Is it possible to acommodate in the interplay of DistributedFESpace+Local FE spaces these operations? Do you foresee any other similar scenario? (e.g., FESpaces with multipoint linear constraints?)
Just guessing, the problem in the point above could be solved if we had at least two types of SingleField DistributedFESpaces (the standard one, and the one that imposes the constraint), and a MultiFieldFEspace defined as the composition of global FE spaces, but this clearly breaks our original intended design pattern of having a single DistributedFEspace and besides, it needs extra communication (see issue #3).

There are some points that I would like to discuss in a meeting, some of the are issues to be resolved, some others not necessarily:

We now have a hierarchy of global distributed FE spaces rooted at the abstract type DistributedFESpace. Currently, this data type only has three subtypes. I would like to discuss with you the current status of the hierarchy, how to appropriately design it to naturally accomodate growth, the API of the different types, etc.
Also discuss the differences and common aspects among the hierarchy of FESpaces in Gridap.jl
and GridapDistributed.jl. What else has to be implemented in GridapDistributed.jl that can be expressed in Gridap.jl and cannot be expressed with the current machinery in GridapDistibuted.jl? E.g., what happens with global spaces with linear multi-point constraints?
[UPDATE: We decided the trait is not necessary at the DistributedFESpace level, the one of the sequential version is to be re-used. The second issue has been solved in fbac993] Issue: I guess that MultiFieldDistributedFESpace should have a trait-like type parameter in the spirit of its sequential counterpart to control the generation of the global DoF identifiers of the multifield FE space. The so-called MultiFieldStyle type parameter. At present, we are generating the global DoF identifers of MultiFieldDistributedFESpace in a hard-coded way that follows the strategy here to assign DoF to processor ownership. Essentially, within each processor's local portion, we have first the DoFs
of the first field, then those of the second, and so on. In other words, the global vector is ordered and partitioned among processors as follows [F1_PO, F2_P0, F3_P0 ... FN_P0 | F1_P1 F2_P1 ... FN_PN | ... | F1_Pp F2_Pp ... FN_Pp ]. I wonder how flexible this should be in order to support all possible solvers in PETSc, e.g., GAMG for multifield PDEs.
[UPDATE: solved in 392108e] Issue: At several points (in particular, here, here, here and here) we are assuming in a hard-coded way that the MultifieldStyle is ConsecutiveFieldStyle. Clearly we should be able to write code in GridapDistributed.jl which is independent of MultifieldStyle of the local FE spaces, but I am afraid that in order to achieve such goal, we should improve the API and abstractions with MultifieldFESpace at Gridap.jl.
[UPDATE: We decided the trait is not necessary at the DistributedFESpace level, the one of the sequential version is to be re-used, thus most of this point does not apply anymore.] Issue: in Multifield FE spaces there is an operation to be abstracted towards single field FE space code re-use, namely to restrict the DoF values of a multifield FE function to a single field. Following Gridap.jl, I have named this operation restrict_to_field. See here and here.
I would like to discuss the current interfaces of these funtions, I guess that we should dispatch
also on MultifieldStyle, what else? Do you foresee limitations in the current interfaces?
I do not like to have communicator-depedent code in MultiFieldDistributedFESpace, how can we avoid that?

New tag for interface faces

Hello all,

Currently, the face labeling for a DistributedDiscreteModel contains a "boundary" tag which contains the entities that belong to the physical boundary (i.e the boundary of the global model). The faces belonging to the interface between processors (i.e belonging to local boundary but not to the global boundary) currently belong to the "interior" entity.

It would probably be beneficial to be able to distinguish these faces by creating a new entity/tag ("ghost_boundary", "interface", ..?). For instance, when implementing DD methods, patch-based smoothers, etc...

The question would then be how to select the "interface" faces in general for any distributed model (dimensions, conformal/non-conformal,etc...).
Ideas are most welcome!

Issue with Periodic BCs after porting to PArrays 0.3

Periodic BCs tests do not pass in https://github.com/gridap/GridapDistributed.jl/tree/partitioned_arrays_v0.3 branch

@fverdugo can you please give it a look whenever you have some time?

Thanks a lot!

Missing DistributedData.jl file

Hi @fverdugo ! ... I guess that in commit b39ed64 you forgot to add the DistributedData.jl file. This is what I get when I execute using GridapDistributed in the Julia REPL:

[ Info: Precompiling GridapDistributed [f9701e48-63b3-45aa-9a63-9bc6c271f355] ERROR: LoadError: could not open file /home/amartin/git-repos/GridapDistributed.jl/src/DistributedData.jl

evaluate function not working

Hello, below I am attaching minimum working example that is based on GridapDistributed.jl, where I want to evaluate uh at a given point, is giving me error. I would like to know, if there is any other way of implementing the evaluate function for type::GridapDistributed.DistributedCellField

using Gridap
using Gridap.ODEs
using GridapDistributed
using PartitionedArrays
using Test
using BenchmarkTools
# GridapDistributed.PRange
function main(parts)
  θ = 0.2

  u(x,t) = (1.0-x[1])*x[1]*(1.0-x[2])*x[2]*t
  u(t::Real) = x -> u(x,t)
  f(t) = x -> ∂t(u)(x,t)-Δ(u(t))(x)

  domain = (0,1,0,1)
  partition = (10,10)
  model = CartesianDiscreteModel(parts,domain,partition)

  order = 2

  reffe = ReferenceFE(lagrangian,Float64,order)
  V0 = FESpace(
    model,
    reffe,
    conformity=:H1,
    dirichlet_tags="boundary"
  )
  U = TransientTrialFESpace(V0,u)

  Ω = Triangulation(model)
  degree = 2*order
  dΩ = Measure(Ω,degree)

  #
  m(u,v) = ∫(u*v)dΩ
  a(u,v) = ∫(∇(v)⋅∇(u))dΩ
  b(t,v) = ∫(v*f(t))dΩ

  res(t,u,v) = a(u,v) + m(∂t(u),v) - b(t,v)
  jac(t,u,du,v) = a(du,v)
  jac_t(t,u,dut,v) = m(dut,v)

  op = TransientFEOperator(res,jac,jac_t,U,V0)
  op_constant = TransientConstantMatrixFEOperator(m,a,b,U,V0)

  t0 = 0.0
  tF = 1.0
  dt = 0.1

  U0 = U(0.0)
  uh0 = interpolate_everywhere(u(0.0),U0)

  ls = LUSolver()
  ode_solver = ThetaMethod(ls,dt,θ)

  sol_t = solve(ode_solver,op,uh0,t0,tF)
  sol_t_const = solve(ode_solver,op_constant,uh0,t0,tF)

  # l2(w) = w*w

  # tol = 1.0e-6

  # for (uh_tn, tn) in sol_t
  #   e = u(tn) - uh_tn
  #   el2 = sqrt(sum( ∫(l2(e))dΩ ))
  #   @test el2 < tol
  # end

  # for (uh_tn, tn) in sol_t_const
  #   e = u(tn) - uh_tn
  #   el2 = sqrt(sum( ∫(l2(e))dΩ ))
  #   @test el2 < tol
  # end

  # createpvd("test-transient-heat-equation") do pvd  
    for (uh,t) in sol_t
      writevtk(Ω,joinpath(@__DIR__,"..","mpi_test","test-transient-heat-equation_$t"),cellfields=["u"=>uh])
      println(evaluate(uh,Point(0.5,0.5)))
    end
  # end

end

# end #module
partition = (2,2)
# @btime prun(main, mpi, partition)
prun(main, mpi, partition)

Last travis CI build from 2 days ago

Hi @fverdugo ... we realized that the last Travis build is from 2 days ago: https://travis-ci.com/github/gridap/GridapDistributed.jl

Do u know whether this is the expected behaviour? Did you deactivate it? Thanks!

Evaluation of DistributedCellFields at points

I'm trying to evaluate my distributed solution at given x,y coordinates. In series the code works, but in parallel I cannot. I think the error is caused by the fact that the evaluation of CellFields is only implemented for DistributedCellPoints.

Has anyone faced this issue before? Or do you know how to solve this?

@fverdugo @amartinhuertas @JordiManyer @oriolcg

Changes in Gridap#master break Heat equation tests

Hello all,

Just a heads up, the changes to Gridap#master introduced in this PR break the current GridapDistributed tests...
The error can be reproduced by running the tests on the following branch, which is master but tracking Gridap#master instead of the current release.

This should probably be fixed before the next release of Gridap.

The log of the error:

HeatEquation: Error During Test at /home/user1/Documents/GridapDistributed.jl/test/sequential/runtests.jl:31
  Got exception outside of a @test
  LoadError: This function is not yet implemented
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] macro expansion
      @ ~/.julia/packages/PartitionedArrays/sB3N8/src/Helpers.jl:20 [inlined]
    [3] getindex(a::PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}, gid::Int64)
      @ PartitionedArrays ~/.julia/packages/PartitionedArrays/sB3N8/src/Interfaces.jl:1715
    [4] _broadcast_getindex
      @ ./broadcast.jl:636 [inlined]
    [5] _getindex
      @ ./broadcast.jl:666 [inlined]
    [6] _broadcast_getindex
      @ ./broadcast.jl:642 [inlined]
    [7] _getindex
      @ ./broadcast.jl:666 [inlined]
    [8] _broadcast_getindex
      @ ./broadcast.jl:642 [inlined]
    [9] getindex
      @ ./broadcast.jl:597 [inlined]
   [10] macro expansion
      @ ./broadcast.jl:961 [inlined]
   [11] macro expansion
      @ ./simdloop.jl:77 [inlined]
   [12] copyto!
      @ ./broadcast.jl:960 [inlined]
   [13] copyto!
      @ ./broadcast.jl:913 [inlined]
   [14] materialize!
      @ ./broadcast.jl:871 [inlined]
   [15] materialize!
      @ ./broadcast.jl:868 [inlined]
   [16] solve_step!(uf::PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}, solver::Gridap.ODEs.ODETools.ThetaMethod, op::Gridap.ODEs.TransientFETools.ODEOpFromFEOp{Gridap.ODEs.ODETools.Nonlinear}, u0::PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}, t0::Float64, cache::Nothing)
      @ Gridap.ODEs.ODETools ~/.julia/packages/Gridap/3B6HJ/src/ODEs/ODETools/ThetaMethod.jl:47
   [17] solve_step!(uF::PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}, solver::Gridap.ODEs.ODETools.ThetaMethod, op::Gridap.ODEs.TransientFETools.ODEOpFromFEOp{Gridap.ODEs.ODETools.Nonlinear}, u0::PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}, t0::Float64)
      @ Gridap.ODEs.ODETools ~/.julia/packages/Gridap/3B6HJ/src/ODEs/ODETools/ODESolvers.jl:27
   [18] iterate(sol::Gridap.ODEs.ODETools.GenericODESolution{PartitionedArrays.PVector{Float64, PartitionedArrays.SequentialData{Vector{Float64}, 2}, PartitionedArrays.PRange{PartitionedArrays.SequentialData{PartitionedArrays.IndexSet, 2}, PartitionedArrays.Exchanger{PartitionedArrays.SequentialData{Vector{Int32}, 2}, PartitionedArrays.SequentialData{PartitionedArrays.Table{Int32}, 2}}, Nothing}}})
      @ Gridap.ODEs.ODETools ~/.julia/packages/Gridap/3B6HJ/src/ODEs/ODETools/ODESolutions.jl:47
   [19] iterate(sol::Gridap.ODEs.TransientFETools.TransientFESolution)
      @ Gridap.ODEs.TransientFETools ~/.julia/packages/Gridap/3B6HJ/src/ODEs/TransientFETools/TransientFESolutions.jl:65
   [20] main(parts::PartitionedArrays.SequentialData{Int64, 2})
      @ Main.GridapDistributedTests.SequentialTests.HeatEquationTestsSeq.HeatEquationTests ~/Documents/GridapDistributed.jl/test/HeatEquationTests.jl:64
   [21] with_backend(::typeof(Main.GridapDistributedTests.SequentialTests.HeatEquationTestsSeq.HeatEquationTests.main), ::PartitionedArrays.SequentialBackend, ::Tuple{Int64, Int64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
      @ PartitionedArrays ~/.julia/packages/PartitionedArrays/sB3N8/src/Interfaces.jl:65
   [22] with_backend(::Function, ::PartitionedArrays.SequentialBackend, ::Tuple{Int64, Int64})
      @ PartitionedArrays ~/.julia/packages/PartitionedArrays/sB3N8/src/Interfaces.jl:63
   [23] top-level scope
      @ ~/Documents/GridapDistributed.jl/test/sequential/HeatEquationTests.jl:4
   [24] include(mod::Module, _path::String)
      @ Base ./Base.jl:419
   [25] include
      @ ~/Documents/GridapDistributed.jl/test/sequential/runtests.jl:1 [inlined]
   [26] macro expansion
      @ ~/Documents/GridapDistributed.jl/test/sequential/runtests.jl:31 [inlined]
   [27] macro expansion
      @ ~/bin/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1363 [inlined]
   [28] macro expansion
      @ ~/Documents/GridapDistributed.jl/test/sequential/runtests.jl:31 [inlined]
   [29] top-level scope
      @ ./timing.jl:262 [inlined]
   [30] top-level scope
      @ ~/Documents/GridapDistributed.jl/test/sequential/runtests.jl:0
   [31] include(mod::Module, _path::String)
      @ Base ./Base.jl:419
   [32] include(x::String)
      @ Main.GridapDistributedTests ~/Documents/GridapDistributed.jl/test/runtests.jl:1
   [33] top-level scope
      @ ~/Documents/GridapDistributed.jl/test/runtests.jl:3
   [34] include(fname::String)
      @ Base.MainInclude ./client.jl:476
   [35] top-level scope
      @ none:6
   [36] eval
      @ ./boot.jl:368 [inlined]
   [37] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:276
   [38] _start()
      @ Base ./client.jl:522
  in expression starting at /home/user1/Documents/GridapDistributed.jl/test/sequential/HeatEquationTests.jl:1

Symbolic algorithms sparsity pattern

Consider the possibility to implement algorithms in order to determine, before actual numerical assembly, the sparsity pattern of locally-owned rows. It becomes a must for performance and scalability in the case of PETSc.

Misc tasks (in the lack of a better name)

I would suggest the following tasks for @amartinhuertas

Create a new version of the assembling process that makes use of a GridPortion based on the owned cells (thus generating the owned vefs of lower dim) and thus consider a sub-assembled local matrix, which can be used for DD methods, we could provide a flag when creating the distributed fe space to decide which one use, or even compare both implementations (adding a comm in the subassembled one). I think he can learn many things about Gridap in this process (possibly related to some extent to issue #2)
Create a new version of the current approach for multifield that only does two comms, and extend the sub-assembled approach + comm for this case too

Implement localize function to remove ghost cells of a given triangulation

Retriving/pushing global data

Hi all,
GridapDistributed looks like a great package. I'm interested in integrating it into my existing code-base for level set-based topology optimisation. To do so (easily), I need to be able to retrieve free_values and cell_dof_values over the whole mesh (as in Gridap) and also be able to push like data from the global mesh to the parts.

Is this type of operation currently possible in GridapDistributed?

The license file does not specify the type of the license

It appears to be the text of the MIT license, except for the first line: the statement

MIT License

is missing. Perhaps a mistake?

Tutorials that show-case all building blocks in action

cc @fverdugo @oriolcg

To not forget. I think it would be great to write some tutorials showcasing all building blocks in action (GridapGmsh, GridapDistributed, GridapP4est, GridapPETsc). At present, we have the building blocks there, and a set of test/examples, but everything is not that well tighted together in a single place.

Add link to tutorial on GridapDistributed once it is already on main.

MPIPETScDistributedStokesTests.jl crashes with PETSc 3.7.7 + OpenMPI 2.1.1 (Ubuntu Bionic default versions)

The crash is available at the following Travis job log, which corresponds to commit 2a71fe7

https://travis-ci.com/github/gridap/GridapDistributed.jl/builds/184848670

I have been also be able to reproduce this crash on my local machine. I have invested some time in debugging, but I now find myself on a dead end (it seems to be a some sort of inconsistency on the internal data structures of PETSc to code VecScatter, but I am not sure, just hypothesis). With a compiled-from-source version of PETSc 3.9.2 + OpenMPI 3.1.2 this crash is not reproduced. Thus, I am going to try to update the travis environment using PETSc_jll to see whether this error disappears. Anyway, I found useful to document it here.

Treatment of Dirichlet values

There are no interpolate_dirichlet and interpolate_everywhere functions implemented. Is there any particular reason for that? Does it need to be implemented?

PLaplacianTests fail with Julia 1.7.3

The error is ERROR: Scalar indexing on DebugArray is not allowed for performance reasons. and it occurs at this line

With Julia 1.9.3 it runs sucessfully

The reason for this behavior is the change in the implementation of copy!(dst::AbstractVector, src::AbstractVector) which indexed dst directly in v1.7.3, see here, whereas it calls copyto! in v1.9.3, see here

Possible solutions I see:

increase compat
implement copy! for DebugArray and MPIArray in PartitionedArrays.jl
eliminate this overload which is not required after the this commit to PartitionedArrays.jl (although the test will fail with previous releases). I verified that copy(a::PSparseMatrix) and copy!(b::PSparseMatrix,a::PSparseMatrix) work with latest PartitonedArrays.jl.

@fverdugo , @amartinhuertas , @JordiManyer any thoughts?

The "Simple" example does not run on a serial machine

I am testing this on a tablet which I use to play with software.
The example from the README that starts on line 26 does not run.
The
ERROR: Number of MPI processors must be equivalent to number of parts.
is probably coming from MPI, but still, this being the "simple" example,
there should be some way of making it run even if someone does not have
a parallel machine access.