gpusph / gpusph Goto Github PK

The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics

Makefile 1.89% Shell 0.59% Python 2.38% Cuda 24.98% Awk 0.08% C++ 67.46% C 2.64%

cuda gpu multi-gpu sph fsi hpc multi-platform

gpusph's Introduction

GPUSPH

What is it

This repository holds the source code for GPUSPH, the first implementation of weakly-compressible Smoothed Particle Hydrodynamics (WCSPH) to run fully on Graphic Processing Units (GPU), using NVIDIA CUDA.

Quick start guide

Run make followed by make test to compile and run the default test problem. You can see a list of available test problems using make list-problems and run any of them with make $problem && ./GPUSPH where $problem is the selected problem.

Requirements

GPUSPH requires a recent version of the NVIDIA CUDA SDK (7.5 or higher, preferably 8.0 or higher), and a compatible host compiler. Please consult the NVIDIA CUDA documentation for further information.

Further information can be found on the project's website.

Contributing

If you wish to contribute to GPUSPH, please consult CONTRIBUTING.

gpusph's People

Contributors

Stargazers

Watchers

Forkers

jgphpc chrislupo ezhangle fei2015 shihb07 shinkamui felipegb94 kmccall882 danielshie ygu6 yyzreal waynezw0618 zyex1108 joshjliu umlvcheng armanpazouki lbrachel hxtsg dailyactie yuanhaogong athmokos mahgadalla cncae zjwzjw369 charlesxwang adilijiang wtkaczewski sanguinariojoe watersky803 rtorrisi danielecalanna mopolino8 acimpoeru robinhc dongyanchaotj dayongupup elongame tuannt8 huweibit geotyper prashant111192 hrbyrb cristian-v-achim tutu-2021 yandongbi zhenxizhao chenxin202 luchete80 jianjingzheng ubikoo-forks emc2-2022 rainbowwhale xiangzhang-1 simzc geosimx haoxiangmiao schrodingersalphago junyang0412 danielfkw weiwillweiwillworkyo candles mre-code-hub neokzp bazinga301

gpusph's Issues

src/main.cc:64:3: error: ‘COMPUTE’ was not declared in this scope

Hello,

I have some issue trying to install gpusph, I'm stuck to this error:

[root@server gpusph]# make test
[CC] main.osrc/main.cc: In function ‘void show_version()’:
src/main.cc:64:3: error: ‘COMPUTE’ was not declared in this scope
COMPUTE/10, COMPUTE%10);
^
make: *** [build/main.o] Erreur 1
[root@servergpusph]#

[root@server gpusph]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
[root@server gpusph]#

[root@servergpusph]# gcc --version
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)

Server on CentOS 6.4

Thank you for your help.

Features of GPUSPH

Dear developers,

I recently learned about this code while reading (Wei et al., 2017) paper. Few questions about the code that I could not find any clear answers in the documentation.

Can we use various particle size in v4.1? If so, is there any example available?
Any plan to release the surface tension code as described in (Wei et al., 2017)? If so, when?
How was the mooring system modeled as shown in the gallery section?
Are there any restrictions on the number of floating objects that can be used in a simulation?

Thanks.

Messy whole program structure

Your are clearly not the first CFD libraries which is asking the user to code his own solver, e.g. Palabos, ASL, and so far OpenFOAM. However, the way GPUSPH is doing that is a mess...

In my opinion, GPUSPH shall be refactored in such a way that everything outside src/problems can be compiled as one or more libraries, and subsequently everything inside src/problems can be linked linked against those libs.

Right now, there is some sort of "main", which is basically managing everything to rely the control to the selected problem... As a consequence, each compiled problem is a whole new GPUSPH instance... From all the issues potentially associated to this behaviour, my favourite one is the error hiding.

Fluid models in GPUSPH

I decided to work with GPUSPH software and I am trying to learn it on my spare time. I have a question on the fluid model of use in GPUSPH.
What is the default fluid model of the GPUSPH and what models are available (pre-defined) in the software?
For example Newtonian and non-newtonian fluids? Bingham plastic? Herschel-bulkley?

gcc cannot find sph

Checking for library sph
==>

int main(int argc, char **argv) {
(void)argc; (void)argv;
return 0;
}
err: /usr/bin/ld: cannot find -lsph
collect2: error: ld returned 1 exit status

Rigid bodies limitation

Hi all,
I am using GPUSPH for a few weeks for my simulation project. Its really great, thanks indeed for making it opensource.

My research is about the dynamic beahviour of rigid bodies when they hit fluids with different viscosities. For modelling of rigid bodies, I see that only Cube, Sphere and Cylinder shapes are available and not Cone shape. I am aware this is not inherently a limitation in GPUSPH, but in ODE engine. (and unluckily I need to use Cone shape!)

My question is that is that possible to make an arbitrary-shaped geometry (using Crixus) and use it as a rigid body in GPUSPH? And for undertaking ODE to take care of that object, setting a collision shape made of Trimesh.
If such option is not available yet, could it be considered as an enhancements for next releases of GPUSPH?

KEPSVISC with DYN_BOUNDARY issue

Dear gpusph developers,

First of all, thank you for making an excellent tool such as gpusph available to the public.

I am having an issue with running a simulation of a Dam Break with a gate using KEPSVISC with DYN_BOUNDARY boundary condition. The code compiles just fine, as it does not seem to be recognized as an invalid combination, however, when running the simulation, the following error is thrown and the simulation stops:

"Device 0 thread 139842695374592 iteration 0 last command: 7. Exception: src/cuda/forces.cu(504) : in unbind_textures() @ thread 0x139842695374592 : cudaSafeCall() runtime API error 77 : an illegal memory access was encountered
GPUSPH aborted by worker thread"

Below is a copy of the summary file of simulation parameters:

Simulation parameters:
deltap = 0.002
sfactor = 1.3
slength = 0.0026
kerneltype: 3 (Wendland)
kernelradius = 2
influenceRadius = 0.0052
SPH formulation: 1 (F1)
viscosity type: 5 (k-e model)
periodicity: 0 (none)
initial dt = 3.12e-05
simulation end time = 1.2
neib list construction every 1 iterations
adaptive time stepping enabled
safety factor for adaptive time step = 0.3
internal energy computation disabled
XSPH correction disabled
Density diffusion disabled
Ferrari correction disabled
moving bodies enabled
open boundaries disabled
water depth computation disabled
time-dependent gravity disabled
geometric boundaries:
DEM: disabled
planes: enabled, 6 defined
Testpoints post-processing enabled

Physical parameters:
gravity = (0, 0, -9.8) [9.8] fixed
numFluids = 1
rho0[ 0 ] = 999.7
B[ 0 ] = 89258.9
gamma[ 0 ] = 7
sscoeff[ 0 ] = 25
sspowercoeff[ 0 ] = 3
sound speed[ 0 ] = 25
partsurf = 0
Dynamic boundary parameters:
r0 = 0.002
k-e model viscosity parameters:
kinematicvisc[ 0 ] = 1.30739e-06 (m^2/s)
visccoeff[ 0 ] = 1.30739e-06
I have also uploaded the code of the problem's .cu and .h files to: http://www.mediafire.com/folder/9160kahh2zp1fcf,czc6m5a4i5zl5g1/shared

I have run a similar simulation but without moving bodies using the Semi-Analytical boundary condition without any issue.

I would greatly appreciate your help figuring out a solution for this problem.

Best regards,

Jabir

gcc can't compile AVX instructions for Chrono (error: ‘__m256d’ was not declared...)

/usr/local/include/chrono/core/ChMatrix.h: In member function ‘void chrono::ChMatrix<Real>::MatrMultiplyAVX(const chrono::ChMatrix<double>&, co     nst chrono::ChMatrix<double>&)’:
/usr/local/include/chrono/core/ChMatrix.h:558:17: error: ‘__m256d’ was not declared in this scope
                 __m256d sum = _mm256_setzero_pd();

Reported by 2 users, gcc 4.6 and gcc 4.8.4, latest Chrono and GPUSPH.

Workaround: add CXXFLAGS+=-march=native to Makefile.local. The cause might be a missing include in Chrono sources and does not happen with gcc 4.9.

Unclosed files

In STLMesh.cc:307 and STLMesh.cc:342 you are opening (to read) files, which are apparently never closed.

It's even more dangerous in GlobalData.h:663, where a file is written but never closed.

addXYZFile Trouble

Hello,

I am attempting to modify the provided CompleteSaExample problem to use the addXYZFile method, instead of the addHDF5File method, for adding semi-analytical boundaries to a problem. I created three XYZ files (just representing point clouds of x,y,z coordinates, no file extensions) from the provided hdfsph files for the example (ignoring the floating body cube at this time) and am able to upload them in the simulation for the t=0 timestep. However, within a few timesteps the fluid particles jump to high velocities and move rapidly up and out of bounds of the simulation (at least this is what it seems like).

If I comment out the fluid particles and only leave the inlet and container then no fluid particles get added, and nothing happens with the simulation. It runs fine, but nothing changes between each timestep.

Is there any documentation on using the addXYZFile method, or do you have any advice or directions you could provide me to help me get it working? If there is a different, preferred space to discuss topics like this please let me know and I will gladly move there.

Thank you!

Error: an illegal memory access was encountered

Hi all,

I am trying to model a box of mud (from StillWater example) and have a cylinder (rigid body from ODE) and let the the cylinder fall on the mud.

The code can be found here: https://github.com/Ehsanizadi/GPUSPH-Examples

when I run it with "./GPUSPH" command, i get the following log:

 * No devices specified, falling back to default (dev 0)...
GPUSPH version v3.0+50-c1c7ca3
Release version without fastmath for compute capability 3.5
Compiled for problem "Gprobe"
Hashkey is 64bits long
NetworkManager: no complete thread safety, current level: 2
[Network] rank 0 (1/1), host ehsan-UGent
 tot devs = 1 (1 * 1)
Initializing...
dt = 0.0001 (CFL conditions from soundspeed: 0.000975, from gravity 0.0273023, from viscosity 0.00660156)
Using problem-set max neibs num 3321 (safe computed value was 192)
Problem calling set grid params
Influence radius / expected cell side   : 0.1625, 0.19375
 - World origin: -0.5 , -0.5 , -0.5
 - World size:   1 x 1 x 1
 - Cell size:    0.2 x 0.2 x 0.2
 - Grid size:    5 x 5 x 5 (125 cells)
 - Cell linearizazion: x,z,y
 - Dp:   0.0625
 - R0:   0.0625
Generating problem particles...
Allocating shared host buffers...
  allocated 630.22 KiB on host for 5,041 particles
Copying the particles to shared arrays...

---
Boundary parts: 0
Boundary part mass: 0
Fluid parts: 1800
Fluid part mass: 0.472412
Vertex parts: 0
Vertex part mass: 0.472412

---
 - device at index 0 has 5,041 particles assigned and offset 0
Starting workers...
1 CUDA devices detected
Estimated memory consumption: 6880B/particle
number of rigid bodies particles = 3241
Device idx 0 (CUDA: 0) allocated 0 B on host, 33.16 MiB on device
  assigned particles: 5,041; allocated: 5,041
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 5041 Position items (78.77 KiB) on device 0 from position 0
Thread 0 uploading 5041 Velocity items (78.77 KiB) on device 0 from position 0
Thread 0 uploading 5041 Info items (39.38 KiB) on device 0 from position 0
Thread 0 uploading 5041 Hash items (39.38 KiB) on device 0 from position 0
Thread 0 uploading 5041 Boundary Elements items (78.77 KiB) on device 0 from position 0
Thread 0 uploading 5041 Gamma Gradient items (78.77 KiB) on device 0 from position 0
Thread 0 uploading 5041 Vertices items (78.77 KiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=1.000000e-04s, 5,041 parts (0, cum. 0 MIPPS), maxneibs 3321
src/forces.cu(419) : cudaSafeCall() Runtime API error 77: an illegal memory access was encountered.
Deallocating...

Do you know what is the problem?

Thank you in advance for your help.
Ehsan

Is Variable particle resolution can be used in GPUSPH?

Hello,
Is the variable particle resolution can be used in GPUSPH? or is there anyone who are working on this?
Thanks a lot!
Best regards,
Henry

where is gpusph V4.0?

The online documentation http://www.gpusph.org/documentation/install-guide/ from July 2016 indicates that there is a version 4.0 on github:

git clone https :// github . com / GPUSPH / gpusph . git
cd gpusph
git checkout v4.0
to get version 4.0 specifically. Otherwise, download the .zipped archive from http:
//github.com/GPUSPH/gpusph/archive/v4.0.zip, and then
unzip v4.0. zip
cd gpusph -4.0
(you may remove v4.0.zip afterwards).

But these instructions don't work and v4.0 is not visible on github.

I am unable to compile v3.0 on OS X because of the following compile error:

make: *** No rule to make target ode/ode.h', needed bybuild/Cone.o'. Stop.

Thanks for any help.

Problem DamBreak3D does not run with GPUSPH 4.0

Hello everyone,

I try to run DamBreak3D which successfully compiles on my Ubuntu machine. Execution leads to termination of the program with the following error message:

"Trying to set inertia on a plane!"

This function is called during fill_parts() of class XProblem which leads to the execution of SetInertia in class Plane which has not been implemented (yet).

I don't know if there is some mistake on my side. Anyway, the other problems such as DamBreakGate work just fine. Any comment is highly appreciated! :)

Cheers
Mathias

Minor memory leaks

In main.cc you are dynamically building the networkManager entity:

gdata.networkManager = new NetworkManager();

However, you are not checking if the memory has been correctly allocated (IDEM for the problem), and there are some exit conditions where the object is not removed:

	if (nm_worldsize > MAX_NODES_PER_CLUSTER) {
		cerr << "Too many nodes in cluster: " << nm_worldsize << " > " << MAX_NODES_PER_CLUSTER << endl;
		exit(1);
	}
(...)
		if (!gdata.clOptions->gpudirect) {
			// since H2D and D2H transfers have to wait for network transfers
			fprintf(stderr, "FATAL: asynchronous network transfers require --gpudirect\n");
			gdata.networkManager->finalizeNetwork();
			return 1;
		}

		if (gdata.devices > 1) {
			// since we were too lazy to implement a more complex mechanism
			fprintf(stderr, "FATAL: asynchronous network transfers only supported with 1 process per device\n");
			gdata.networkManager->finalizeNetwork();
			return 1;
		}

In NetworkManager.cc:70, you are in-place reallocating memory. If such operation fails, then the memory in m_requestsList is not realeased, and the pointer is nullified.

Can GPUSPH be used for the failure process of Soil?

Hello,
Can GPUSPH be used for the simulation of the failure process of soil under the scour of water ？ Thanks a lot!
Best regards,
Henry

shall asy/ folder be moved to doc/ ?? (or even to the wiki)

Not much to say... It seems to me that the files in asy/ folder are part of the documentation.

Particle Temperature

Hi Good morning!
I'm now using GPUSPH to simulate fluids. I find it an excellent tool to use, but I have some problem modifying it.
I want to add some temperature properties to the particles, but I find the code highly relevant and I need to edit a lot of places to make it modification work. So I'm wondering is there an interface we could use to add some other properties to each particle?
Thanks a lot. I'm so interested for the reply

Tutorial!?

Hey all,
I am new to SPH, but I would like to try it and possibly integrate it in my rigid body simulations.
I have two questions:

Is there any tutorial to learn GPUSPH? a pdf document...?
Or a hellow world example to demonstarte basic steps to run a simple code? And what commands needed to run an example? (I am running ubuntu)
What do you suggest for (quick) learning the basics of SPH theory? matehmatical formulation and generally the fundamentals?

Make Error (Opensuse 15, Cuda 10)

SCRIPTS] list-cuda-ccnvcc fatal : Don't know what to do with '/usr/local/lib'
[CC] XProblem.oIn file included from src/GlobalData.h:58:0,
from src/XProblem.cc:43:
src/linearization.h:38:36: fatal error: linearization_select.opt: No such file or directory
#include "linearization_select.opt"

illegal memory access was encountered

Hello, I am compiling gpusph with "make" then I execute " make ProblemExample".
And then I execute ./GPUSPH and I get the following error:

Device 0 thread 140735808663952 iteration 0 last command: 7. Exception: src/cuda/forces.cu(516) : in unbind_textures() @ thread 0x140735808663952 : cudaSafeCall() runtime API error 77 : an illegal memory access was encountered

The same error rises also when executing different problems.

My system information is the following:
g++ : (GCC) 6.4.0
nvcc : release 9.1, V9.1.85
GPU devices: 4 x Tesla V100-SXM2.

Documentation

Currently the documentation is outdated horribly and needs to be updated to GPUSPH 3.0.

All interested users who need help can in the meantime visit our interactive documentation in IRC (#gpusph on freenode)

Rigidbodies: time dependent properties / and / rigid body outputs

Hi,
I've simulated a falling cylinder hitting a mud layer. the cylinder will penetrate intor the mud layer. I have two questions:

As you see here the mud layer still fluctuating when the cylinder is hitting it. I needed the mud layer to be settle and reaches an almost stable situation when the cylinder hits it. So, I tried to make some delay in the time of application of gravity. I used gcallback, but I did not change the gravity of the fluid (since it should be there until the movements of the mud to be damped). In the gcallback class I used suchg code:

float3 Trial::g_callback(const float t)
{
    if(t<15){
                ODEGravity=make_float3(0.0, 0.0, 0.0);
        dWorldSetGravity(m_ODEWorld, ODEGravity.x, ODEGravity.y, ODEGravity.z);
        m_physparams.gravity=make_float3(0.,0.,-9.81f);
    }
    else{
        ODEGravity=make_float3(0.0, 0.0, -9.81);
        dWorldSetGravity(m_ODEWorld, ODEGravity.x, ODEGravity.y, ODEGravity.z);
        m_physparams.gravity=make_float3(0.,0.,-9.81f);
    }
    return m_physparams.gravity;
}

it worked. My question is that can I change whatever paramater as I wish within the time, like this method?
I was thinking that perhaps a having a function would be appreciated for changing the properties of rigid bodies, or even the fluid itself with regard to time. For example changing viscosity with time.

For the case I described above, how can I get the accelration/velocity/ and position of the rigid body (cylinder)? I don't think I can get them using "TESTPOINTSPART" sinc ethe rigid body is not in a fixed position always.

Issue with DYN_BOUNDARY and Moving Boundaries

Dear GPUSPH developers,

I am having a problem running a dam break simulation with a moving gate using DYN_BOUDNARY.
The simulation runs just fine without the gate, but once the gate is added, particles stick to the gate after it clears the top of the water column and act erratically, dropping dt drastically and crashing the simulation. Decreasing the initial time step size, increasing the deltap safety factor or increasing speed of sound did not seem to help neither did using Ferrari or density-diffusion filtering.

I have uploaded the .cu and .h files of the problem to:
http://www.mediafire.com/?v2ph2bzfss0iz

I would greatly appreciate any ideas you might have on the cause of the problem.

Best regards,

Jabir

choosing input & simulation parameters and visualizing H5SPH files

Hi,

I am very new to GPUSPH. I want to conduct simulations with new geometries and visualize them.

How do I enter the input and simulation parameters for a particular simulation?. Do I need to use Crixus for this?

Moreover, I see that the H5SPH file format is being used for input and I assume for output too. Which tool do I need to use for visualizing these files?

Thank you very much for making an advanced SPH code open source,

Best Regards,
Sunil

Running ./GPUSPH error

I have Ubuntu 14.04, with Nvidia GT 750M, and cuda installed. I can build GPUSPH withoutany problem. However, when I run ./GPUSPH, I get this error:

 * No devices specified, falling back to default (dev 0)...
GPUSPH version v3.0+48-d3a6449
Release version without fastmath for compute capability 3.0
Compiled for problem "DamBreak3D"
Hashkey is 64bits long
[ehsan-xps15:03212] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ess_singleton_module.c at line 231
[ehsan-xps15:03212] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ess_singleton_module.c at line 140
[ehsan-xps15:03212] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value A system-required executable either could not be found or was not executable by this user (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "A system-required executable either could not be found or was not executable by this user" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
[ehsan-xps15:3212] *** An error occurred in MPI_Init_thread
[ehsan-xps15:3212] *** on a NULL communicator
[ehsan-xps15:3212] *** Unknown error
[ehsan-xps15:3212] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: ehsan-xps15
  PID:        3212
--------------------------------------------------------------------------

Could you please help me?

Segmentation fault error while trying to run GPUSPH

Dear all,

I am getting this error after running GPUSPH in Ubuntu 16.04.

The code I am trying to run is available in the attached file and the error message is shown below.

wind_tunnel.tar.gz

I had to change the "XProblem.cc" because Crixus is creating some dummy points out of my mesh, shown in red in the image below.

* No devices specified, falling back to default (dev 0)...
GPUSPH version v4.1+11-c8a7fd6+custom
Release version with fastmath for compute capability 5.0
Chrono : enabled
HDF5   : enabled
MPI    : enabled
Compiled for problem "WindTunnel"
NetworkManager: no complete thread safety, current level: 2
[Network] rank 0 (1/1), host saullo-Aspire-VN7-592G
 tot devs = 1 (1 * 1)

paddle_amplitude (radians): 0.588003
Info stream: GPUSPH-13697
Initializing...
Reading particle data from the input: ../0.fluid.h5sph
Reading particle data from the input: ../0.walls.h5sph
Reading particle data from the input: ../0.flow_in.h5sph
Reading particle data from the input: ../0.constant_pressure.h5sph
Reading particle data from the input: ../0.flow_out.h5sph
Max fall height not set, autocomputed: 2.1
dt = 0.001 (CFL conditions from soundspeed: 0.0013, from gravity inf, from viscosity 135.669)
Using problem-set max neibs num 640 (safe computed value was 192)
Ferrari coefficient: 1.000000e+00
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side	: 0.26 / 0.26 / 0.31
 - World origin: 0 , 0 , 0
 - World size:   4 x 3 x 3
 - Cell size:    0.333333 x 0.333333 x 0.333333
 - Grid size:    12 x 9 x 9 (972 cells)
 - Cell linearizazion: y,z,x
 - Dp:   0.1
 - R0:   0.1
Generating problem particles...
WARNING: setting the mass by density can't work with a point-based geometry without a mesh!
WARNING: setting the mass by density can't work with a point-based geometry without a mesh!
WARNING: setting the mass by density can't work with a point-based geometry without a mesh!
WARNING: setting the mass by density can't work with a point-based geometry without a mesh!
WARNING: setting the mass by density can't work with a point-based geometry without a mesh!
  estimating 35999 particles to fill the world
VTKWriter will write every 0.1 (simulated) seconds
HotStart checkpoints every 0.1 (simulated) seconds
	will keep the last 8 checkpoints
v4.1-11-gc8a7fd6+custom
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 0
numOpenBoundaries : 3
  allocated 4.81 MiB on host for 35,999 particles (35,564 active)
Copying the particles to shared arrays...
---
Fixing connectivity...WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35522 index 35522 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35523 index 35523 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35524 index 35524 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35525 index 35525 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35526 index 35526 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35527 index 35527 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35528 index 35528 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35529 index 35529 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35530 index 35530 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35531 index 35531 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35532 index 35532 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35533 index 35533 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35534 index 35534 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35535 index 35535 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35536 index 35536 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35537 index 35537 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35538 index 35538 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35539 index 35539 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35540 index 35540 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35541 index 35541 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35542 index 35542 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35543 index 35543 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35544 index 35544 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35545 index 35545 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35546 index 35546 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35547 index 35547 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35548 index 35548 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35549 index 35549 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35550 index 35550 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35551 index 35551 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35552 index 35552 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35553 index 35553 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35554 index 35554 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35555 index 35555 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35556 index 35556 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35557 index 35557 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35558 index 35558 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35559 index 35559 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35560 index 35560 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35561 index 35561 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35562 index 35562 loaded from HDF5 points to non-existing vertices (0,0,0)!
WARNING CHANGED BY SAULLO XProblem.cc: connectivity: particle id 35563 index 35563 loaded from HDF5 points to non-existing vertices (0,0,0)!
DONE
Open boundaries: 3
Fluid: 70 parts, mass 0.0011845
Boundary: 23488 parts, mass 0
Vertices: 11964 parts, mass 9.12646e-05
Testpoint: 0 parts
Tot: 35564 particles
---
RB First/Last Index:
Preparing the problem...
Body: 0
	 Cg grid pos: 2 3 0
	 Cg pos: -0.0274784 -0.166667 0.0745509
[saullo-Aspire-VN7-592G:13697] *** Process received signal ***
[saullo-Aspire-VN7-592G:13697] Signal: Segmentation fault (11)
[saullo-Aspire-VN7-592G:13697] Signal code: Invalid permissions (2)
[saullo-Aspire-VN7-592G:13697] Failing at address: 0x7055c0000
[saullo-Aspire-VN7-592G:13697] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fe1a99b7390]
[saullo-Aspire-VN7-592G:13697] [ 1] ./GPUSPH[0x4e2ff0]
[saullo-Aspire-VN7-592G:13697] [ 2] ./GPUSPH[0x454ec3]
[saullo-Aspire-VN7-592G:13697] [ 3] ./GPUSPH[0x4595e7]
[saullo-Aspire-VN7-592G:13697] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fe1a99ad6ba]
[saullo-Aspire-VN7-592G:13697] [ 5] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe1a7aa341d]
[saullo-Aspire-VN7-592G:13697] *** End of error message ***

How to: add structural simulation !?

Good afternoon,

I am interested about how I can entered a new field of simulation in GPU-SPH software.
For example I am structural engineer and I want to add possibility of simulation of solid structure. So, I am interested about the procedure in which I can add this type of simulation related only to solid model.
Any advice regarding to start-steps is welcome (where I must add libraries, what programming language I must to use for that, and so on ...).

I waiting with higher interest your feedback.
Thank you for your support and collaboration.
Keep in touch.

OpenGL window

Hi,
Is the run-time OpenGL window still available? If so, how to enbale it?
It was good stuff to visually debug the simulation.

Ehsan

GPUSPH creates a context on the first GPU system even if it's not in use

This is due to the main thread using cudaMallocHost to allocate pinned memory for the cell data. Since cudaMallocHost requires a CUDA context, and no context is active on the main thread at the time, the runtime automatically creates one on the first available device. This can cause issues if the first GPU is set to exclusive mode.

A possible solution is to do a standard (or possibly page-aligned) allocation from the main thread, and then pin it and register for CUDA from one of the GPUWorkers (cudaHostRegister).

searching for help

Hi:
I used the GPUSPH to compile a model (called OffshorePile). then I could not compile any other other models. all the other tests linked to that model (OffshorePile). see below. Is there anybody who knows how to solve this problem...Thanks very much.

XXX:~/Downloads/gpusph$ make problems=AccuracyTest
[CU] OffshorePile.o
Compiled with problem OffshorePile
Compiled without fastmath
[LINK] dist/linux/x86_64/GPUSPH
Success.

Compiled with problem OffshorePile
Compiled without fastmath
[LINK] dist/linux/x86_64/GPUSPH
Success.
XXX:~/Downloads/gpusph$ make problems=Mesh_2

Compiled with problem OffshorePile
Compiled without fastmath
[LINK] dist/linux/x86_64/GPUSPH
Success.

MPI error - Communication failed between nodes

Hi,

I tried to run GPUSPH version 5 on the cluster using two nodes, two GPUs at each node, using the following line:
mpirun -np 4 -npernode 2 ./GPUSPH --device 0,1 # each device of Mahuika has two GPUs

However, the simulation was made only on one node and I got the following error messages:
[vgpuwbg001][[30539,1],0][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] [vgpuwbg001][[30539,1],1][connect/btl_openib_connect_udcm.c:1236$ error modifing QP to RTR errno says Invalid argument [vgpuwbg002][[30539,1],3][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vgpuwbg002][[30539,1],2][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
FATAL: cannot handle 1436584140 > 1073741823 cells
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.

Does anyone know how to fix it?

Thanks.

Rigid body sliding along an inclined plane

Hi, everyone. I want to simulate a rigid body sliding along an inclined plane. Its motion is influenced by water particles and ramp particles. Can GPUSPH deal with the interaction between the rigid body and the ramp?
Thank you!

errors when run BuoyancyTest with chrono

Hi
I have compiled the gpusph successfully and can run some examples in src/problems. But when I run the example BuoyancyTest and Objects I got such errors with chrono

I think the error is related to the chrono. but I donnot know what:
1.I install the chrono successful but what can I do to make a link between chrono and GPUSPH
2.when I install chrono,at step cmake which mudule should be set "ON" to support basic functions of GPUSPH.I see all the modules should be set "OFF" from the install document of GPUSPH.
3.In my computer, I have install another program which is linked to chrono.So if will there be conflects when i use chrono in GPUSPH. The error is seems to related to another program"Yade-Dual4.3"

all regards

Flexible Beam in GPUSPH

Hi there,
I am very new to GPUSPH and I am now searching on the capabilities of different SPH codes.

Is it possible to simulate a flexible elastic beam (for example based on Hookes elasticity or even more sophisticated solid mechanics based constitutive law) in GPUSPH?
The aim is to simulate somehing like this:https://www.youtube.com/watch?v=Te9LCqPgT-Q
but with open surface fluid. And getting the stersses/forces acting from the fluid to the flexible object at any desirable time frame.

Edit 1: I forgot to mention that the case will be a two-way interaction between fluid and structure.

Bad formats in printf

In NetworkManager.cc:121, you are printing the following:

printf("[Network] rank %u (%u/%u), host %s\n", process_rank, process_rank + 1, world_size, processor_name);

However, both process_rank and world_size are signed integers (indeed process_rank is initialized as -1)

Open boundaries do not work when running multi-GPU simulations

This concerns the semi-analytical boundary formulation only (the only one which has open boundaries).

Release the limit of objects number

Hi guys:
Now the number of objects we can generate in our model is limted, which make it impossible to simulate a large scale model. So if you have some good idea for releasing the limit, you can make a comment any time.
All regards
Dongxueyang

Whats the current status of GPUSPH?

Hi,

Its been a long time that there is no news/commits on this repository?
Is it still under development? What features are added? and when the new version is going to be released?

Ehsan

aux/UsingGit.txt is obsolete (and in my opinion useless)

Apparently the information in aux/UsingGit.txt refers to an old internal Git repo. Also, in case this kind of documentation is required (which is not in my opinion, there are thousands of publicly available superb Git tutorials), shall be moved to a wiki

Large simulations may lock up when running on Maxwell GPUs

When running large simulations, we seem to hit thrust issue 742 during the particle sort.

Since no solution is known yet for that issue, we might have to replace that part of the sorting code with something else (a possible candidate being the sort from Sean Baxter's Modern GPU soluion as proposed here Oblomov/titanxstall#2 —needs to check license compatibility).

Adding New Properties and Equations to Solve

Hello again @agnesLeroy ,

Thank you again for your responses on my previous issue, your help was invaluable and has really helped me get started using this great tool! I am hoping you or someone else can help me with another question.

I would like to add some additional physical parameters/properties and the corresponding equations to be solved, and would like some guidance on where to start. I've seen issue #23 regarding adding particle temperature, and while this is helpful I was wondering if I could get some additional guidance and tips on which pieces of the code I should modify to include new properties and equations. Also, since the particle temperature issue #23 was from 2017, I would like to confirm that the response is still true with the current version of GPUSPH.

Thank you in advance for your response!

fluid-rigid body coupling simulation

Hi guys:
Is there anyone who have made some simulation of floating drived by fluid?
I am interested in the interaction of fluid and rigid body and hope to find someone to discuss and communicate.
My email: [email protected]
All regards.
dong

GPUSPH on Nvidia Jetson TX2 (Tegra X2)

Currently I do not have a Nvidia GPU installed on my PC, nor do I have the capacity for one. I do however have a Nvidia Jetson TX2, which is an embedded platform hosting a CUDA enabled GPU (compute capability 6.2). On the Jetson, I have CUDA 9.0 installed via JetPack 3.2.1 (see Nvidia's website on embedded systems for more info). Based on the dependencies for running GPUSPH, I believe I meet the necessary requirements. Here are is the device info given by the script:

~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery'

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X2"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.2
  Total amount of global memory:                 7846 MBytes (8227401728 bytes)
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores
  GPU Max Clock rate:                            1301 MHz (1.30 GHz)
  Memory Clock rate:                             1600 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

I realize this may be the first you've heard of someone trying to run GPUSPH on this type of system and I didn't expect it to run out of the box. Reading through the Makefile supports this assumption, as I don't see anything relating to embedded platforms. That being said, I made an initiative to adjust the Makefile in hopes that I could successfully compile and run the software.

Carefully reading through the Makefile and comparing the execution results of "shell" commands against my system, I was able to ensure that all the necessary "includes" and "libs" were found. The only adjustment that I had to make was here

# override: TARGET_ARCH - set the target architecture
# override:               defaults to -m64 for 64-bit machines
# override:                           -m32 for 32-bit machines
ifeq ($(arch), x86_64)
	TARGET_ARCH ?= -m64
	# on Linux, toolkit libraries are under /lib64 for 64-bit
	ifeq ($(platform), Linux)
		LIB_PATH_SFX = 64
	endif
else 
	ifeq ($(arch), aarch64)
                # Had to comment this out
		#TARGET_ARCH ?= -m64
	        ifeq ($(platform), Linux)
		        LIB_PATH_SFX = 64
	        endif
	else
		# i386 or i686
		TARGET_ARCH ?= -m32
	endif
endif

This adjustment was made because the machine dependent option "-m64" does not exist for AArch64, hence appending the option on this line below causes compile errors

CXXFLAGS += $(TARGET_ARCH)

That being said, there is no machine dependent option being passed to the CXXFLAGS. I did however include the "-m64" option in the beginning of the nvcc-specific flags

CUFLAGS += -m64

Following those adjustments, the code compiles successfully after running

make

Note that I'm following the "default" options for GPUSPH (i.e. dam break problem with defaults) just to see if I can get the software to run. When I run the executable (./GPUSPH), here is the output I receive

 * No devices specified, falling back to default (dev 0)...
GPUSPH version v4.1+custom
Release version without fastmath for compute capability 6.2
Chrono : disabled
HDF5   : disabled
MPI    : disabled
Compiled for problem "DamBreak3D"
[Network] rank 0 (1/1), host 
 tot devs = 1 (1 * 1)
WARNING: setting number of layers for dynamic boundaries but not using DYN_BOUNDARY!
WARNING: number of layers for dynamic boundaries is low (3), suggested number is 4
Info stream: GPUSPH-24065
Initializing...
Water level not set, autocomputed: 0.4
Max fall height not set, autocomputed: 0.41
Max particle speed not set, autocomputed from max fall: 2.83623
setting dt = 0.00039 from CFL conditions (soundspeed: 0.00039, gravity: 0.0154445, viscosity: nan)
Using problem-set max neibs num 192 (safe computed value was 128)
Ferrari coefficient: 0.000000e+00 (default value, disabled)
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side	: 0.052 / 0.052 / 0.052
 - World origin: 0 , 0 , 0
 - World size:   1.6 x 0.67 x 0.6
 - Cell size:    0.0533333 x 0.0558333 x 0.0545455
 - Grid size:    30 x 12 x 11 (3,960 cells)
 - Cell linearizazion: y,z,x
 - Dp:   0.02
 - R0:   0.02
Generating problem particles...
VTKWriter will write every 0.005 (simulated) seconds
HotStart checkpoints every 0.005 (simulated) seconds
	will keep the last 8 checkpoints
v4.1+custom
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 1
numOpenBoundaries : 0
  allocated 1009.6 KiB on host for 13,601 particles (13,601 active)
Copying the particles to shared arrays...
---
Rigid body 1: 798 parts, mass nan, object mass 0
Open boundaries: 0
Fluid: 12800 parts, mass 0.008125
Boundary: 0 parts, mass 0
Testpoint: 3 parts
Tot: 13601 particles
---
RB First/Last Index:
	-12803	797
Preparing the problem...
Body: 0
	 Cg grid pos: 17 6 5
	 Cg pos: -0.00848052 -0.0279167 3.46945e-18
 - device at index 0 has 13,601 particles assigned and offset 0
Starting workers...
Thread 0x7faf074000 global device id: 0 (1)
thread 0x7faea9c1e0 device idx 0: CUDA device 0/1, PCI device 0000:00:00.0: NVIDIA Tegra X2
Device idx 0: free memory 648 MiB, total memory 7846 MiB
Estimated memory consumption: 508B/particle
number of forces rigid bodies particles = 798
Device idx 0 (CUDA: 0) allocated 0 B on host, 6.54 MiB on device
  assigned particles: 13,601; allocated: 13,601
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 13601 Position items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Velocity items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Info items (106.26 KiB) on device 0 from position 0
Thread 0 uploading 13601 Hash items (53.13 KiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=3.900000e-04s, 13,601 parts (0, cum. 0 MIPPS), maxneibs 0
Device 0 thread 548391207392 iteration 0 last command: 7. Exception: src/cuda/forces.cu(516) : in unbind_textures() @ thread 0x548391207392 : cudaSafeCall() runtime API error 4 : unspecified launch failure
GPUSPH aborted by worker thread
Elapsed time of simulation cycle: 2.6s
Peak particle speed was ~0 m/s at 0 s -> can set maximum vel 0 for this problem
Simulation end, cleaning up...
src/GPUWorker.cc(1018) : in deallocateDeviceBuffers() @ thread 0x548391207392 : cudaSafeCall() runtime API error 4 : unspecified launch failure
Deallocating...

As a means to get a better clue as to what cause the failure, I run

cuda-memcheck ./GPUSPH

Note that running this doesn't require any special compile options. Here is the output beginning with "Entering the main simulation cycle" (i.e. the output above, matches that presented above) ,

:
:
:
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=3.900000e-04s, 13,601 parts (0, cum. 0 MIPPS), maxneibs 0
========= Invalid __global__ read of size 16
=========     at 0x00000930 in /home/nvidia/GPUSPH/gpusph/src/cuda/forces_kernel.def:2359:void cuforces::forcesDevice<KernelType=3, SPHFormulation=1, BoundaryType=0, ViscosityType=1, unsigned long=261>(forces_params<KernelType=3, SPHFormulation=1, BoundaryType=0, ViscosityType=1, unsigned long=261>)
=========     by thread (71,0,0) in block (15,0,0)
=========     Address 0xfc0e152c0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe770]
=========     Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 [0xc984]
=========
:
:
:
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
=========     Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaDeviceSynchronize + 0x118) [0x2dec4]
=========
Device 0 thread 548319310304 iteration 0 last command: 7. Exception: src/cuda/forces.cu(516) : in unbind_textures() @ thread 0x548319310304 : cudaSafeCall() runtime API error 4 : unspecified launch failure
GPUSPH aborted by worker thread
Elapsed time of simulation cycle: 0.5s
Peak particle speed was ~0 m/s at 0 s -> can set maximum vel 0 for this problem
Simulation end, cleaning up...
========= Program hit cudaErrorInvalidResourceHandle (error 33) due to "invalid resource handle" on CUDA API call to cudaStreamDestroy. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
=========     Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaStreamDestroy + 0x134) [0x31aa4]
=========
:
:
:
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaFree. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
=========     Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaFree + 0x12c) [0x34d10]
=========
:
:
:
src/GPUWorker.cc(1018) : in deallocateDeviceBuffers() @ thread 0x548319310304 : cudaSafeCall() runtime API error 4 : unspecified launch failure
Deallocating...
========= ERROR SUMMARY: 41 errors

Note that I've only shown the unique errors and removed those that repeat for the sake of presentation here. Obviously I started with the first error, located at "at 0x00000930 in /home/nvidia/GPUSPH/gpusph/src/cuda/forces_kernel.def:2359". Here is the code (in "forces_kernel.def") where the error is referring to

			#if PREFER_L1
         		const float4 relPos = pos_corr - params.posArray[neib_index];  //   <-------- HERE
			#else
			const float4 relPos = pos_corr - tex1Dfetch(posTex, neib_index);
			#endif

Since the code was wrapped in an "if statement", I decided to try the alternative, which required that I change the definition in the "textures.cuh" source code to

#if defined(__COMPUTE__)
#if __COMPUTE__ >= 20 && __COMPUTE__/10 != 3
#define PREFER_L1 0
#else
#define PREFER_L1 0
#endif
#endif

In other words, I hard coded it such that "PREFER_L1" would always evaluate to false. I read the comments in the code about the L1 cache vs the shared memory, for which I also notice in the source code "cudautili.cu" there is a preference setting. I changed this as well to

		// Hard code this to use "shared" for 6.x compute capablity
		if (deviceProp.major == 3)
		{
			cacheConfig = cudaFuncCachePreferShared;
		}
		else if (deviceProp.major == 6)
		{
			cacheConfig = cudaFuncCachePreferShared;
		}

Therefore, I'm basically testing the code for the use of share vs L1 memory preference. I run a "make clean" then recompile the code via "make" and everything compiles as before. Running the code now (via ./GPUSPH) succeeds without the errors I was seeing before. Unfortunately, now the simulation blows up with the following output

 * No devices specified, falling back to default (dev 0)...
GPUSPH version v4.1+custom
Release version without fastmath for compute capability 6.2
Chrono : disabled
HDF5   : disabled
MPI    : disabled
Compiled for problem "DamBreak3D"
[Network] rank 0 (1/1), host 
 tot devs = 1 (1 * 1)
WARNING: setting number of layers for dynamic boundaries but not using DYN_BOUNDARY!
WARNING: number of layers for dynamic boundaries is low (3), suggested number is 4
Info stream: GPUSPH-25331
Initializing...
Water level not set, autocomputed: 0.4
Max fall height not set, autocomputed: 0.41
Max particle speed not set, autocomputed from max fall: 2.83623
setting dt = 0.00039 from CFL conditions (soundspeed: 0.00039, gravity: 0.0154445, viscosity: nan)
Using problem-set max neibs num 192 (safe computed value was 128)
Ferrari coefficient: 0.000000e+00 (default value, disabled)
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side	: 0.052 / 0.052 / 0.052
 - World origin: 0 , 0 , 0
 - World size:   1.6 x 0.67 x 0.6
 - Cell size:    0.0533333 x 0.0558333 x 0.0545455
 - Grid size:    30 x 12 x 11 (3,960 cells)
 - Cell linearizazion: y,z,x
 - Dp:   0.02
 - R0:   0.02
Generating problem particles...
VTKWriter will write every 0.005 (simulated) seconds
HotStart checkpoints every 0.005 (simulated) seconds
	will keep the last 8 checkpoints
v4.1+custom
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 1
numOpenBoundaries : 0
  allocated 1009.6 KiB on host for 13,601 particles (13,601 active)
Copying the particles to shared arrays...
---
Rigid body 1: 798 parts, mass nan, object mass 0
Open boundaries: 0
Fluid: 12800 parts, mass 0.008125
Boundary: 0 parts, mass 0
Testpoint: 3 parts
Tot: 13601 particles
---
RB First/Last Index:
	-12803	797
Preparing the problem...
Body: 0
	 Cg grid pos: 17 6 5
	 Cg pos: -0.00848052 -0.0279167 3.46945e-18
 - device at index 0 has 13,601 particles assigned and offset 0
Starting workers...
Thread 0x7faee0f000 global device id: 0 (1)
thread 0x7fae8371e0 device idx 0: CUDA device 0/1, PCI device 0000:00:00.0: NVIDIA Tegra X2
Device idx 0: free memory 614 MiB, total memory 7846 MiB
Estimated memory consumption: 508B/particle
number of forces rigid bodies particles = 798
Device idx 0 (CUDA: 0) allocated 0 B on host, 6.54 MiB on device
  assigned particles: 13,601; allocated: 13,601
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 13601 Position items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Velocity items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Info items (106.26 KiB) on device 0 from position 0
Thread 0 uploading 13601 Hash items (53.13 KiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=3.900000e-04s, 13,601 parts (0, cum. 0 MIPPS), maxneibs 0
Simulation time t=5.154129e-03s, iteration=14, dt=3.231076e-04s, 13,601 parts (1.6, cum. 1.6 MIPPS), maxneibs 80
Simulation time t=7.129510e-03s, iteration=20, dt=3.713324e-04s, 13,601 parts (1.7, cum. 1.7 MIPPS), maxneibs 80
Simulation time t=1.010620e-02s, iteration=28, dt=3.651520e-04s, 13,601 parts (1.4, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.083435e-02s, iteration=30, dt=3.626913e-04s, 13,601 parts (1.5, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.514638e-02s, iteration=42, dt=3.499534e-04s, 13,601 parts (1.6, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.800033e-02s, iteration=50, dt=3.665025e-04s, 13,601 parts (1.7, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=2.019732e-02s, iteration=56, dt=3.769902e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=2.165445e-02s, iteration=60, dt=3.777308e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=2.533739e-02s, iteration=70, dt=3.600024e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=3.015958e-02s, iteration=83, dt=3.818462e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=3.277038e-02s, iteration=90, dt=3.818220e-04s, 13,601 parts (1.7, cum. 1.6 MIPPS), maxneibs 82
Simulation time t=3.500048e-02s, iteration=96, dt=3.805821e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=3.647532e-02s, iteration=100, dt=3.681548e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=4.020716e-02s, iteration=110, dt=3.576268e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 84
Simulation time t=4.534000e-02s, iteration=124, dt=3.738058e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 85
Simulation time t=4.755537e-02s, iteration=130, dt=3.684289e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 85
Simulation time t=5.013070e-02s, iteration=137, dt=3.830418e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 87
Simulation time t=5.127811e-02s, iteration=140, dt=3.555218e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 87
Simulation time t=5.528104e-02s, iteration=151, dt=3.819206e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 93
Simulation time t=5.850480e-02s, iteration=160, dt=3.546642e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 93
Simulation time t=6.032260e-02s, iteration=165, dt=3.295202e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.215305e-02s, iteration=170, dt=3.509206e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.500402e-02s, iteration=178, dt=3.668144e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.574544e-02s, iteration=180, dt=3.755178e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=7.015466e-02s, iteration=192, dt=3.795494e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 106
Simulation time t=7.312123e-02s, iteration=200, dt=3.776072e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 106
Simulation time t=7.534076e-02s, iteration=206, dt=3.711822e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 109
Simulation time t=7.681028e-02s, iteration=210, dt=3.629877e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 109
Simulation time t=8.007353e-02s, iteration=219, dt=3.739279e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 110
Simulation time t=8.044746e-02s, iteration=220, dt=3.575024e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 110
Simulation time t=8.528919e-02s, iteration=234, dt=3.611622e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 119
Simulation time t=8.731700e-02s, iteration=240, dt=3.615249e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 119
Simulation time t=9.020710e-02s, iteration=249, dt=3.563746e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 123
Simulation time t=9.056347e-02s, iteration=250, dt=3.531481e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 123
Simulation time t=9.516194e-02s, iteration=264, dt=2.900131e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 127
Simulation time t=9.707781e-02s, iteration=270, dt=3.319965e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 127
Simulation time t=1.000128e-01s, iteration=279, dt=3.541400e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 136
Simulation time t=1.003670e-01s, iteration=280, dt=3.588895e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 136
Simulation time t=1.051173e-01s, iteration=297, dt=2.594648e-04s, 13,601 parts (0.98, cum. 1.4 MIPPS), maxneibs 145
Simulation time t=1.058675e-01s, iteration=300, dt=2.323153e-04s, 13,601 parts (1.1, cum. 1.4 MIPPS), maxneibs 145
Simulation time t=1.100258e-01s, iteration=315, dt=2.829456e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 149
Simulation time t=1.115843e-01s, iteration=320, dt=3.023205e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 149
Simulation time t=1.150202e-01s, iteration=332, dt=2.643027e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 161
Simulation time t=1.170675e-01s, iteration=340, dt=3.592710e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 161
Simulation time t=1.200720e-01s, iteration=353, dt=2.100472e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 168
Simulation time t=1.219651e-01s, iteration=360, dt=2.795108e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 168
Simulation time t=1.252873e-01s, iteration=374, dt=1.990724e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 173
Simulation time t=1.265921e-01s, iteration=380, dt=2.693839e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 173
Simulation time t=1.301633e-01s, iteration=395, dt=2.499819e-04s, 13,601 parts (1.1, cum. 1.4 MIPPS), maxneibs 179
Simulation time t=1.313476e-01s, iteration=400, dt=2.050968e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 179
Simulation time t=1.350395e-01s, iteration=417, dt=1.727778e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 189
Simulation time t=1.356967e-01s, iteration=420, dt=2.657721e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 189
WARNING: current max. neighbors numbers 193 greather than MAXNEIBSNUM (192) at iteration 420
	possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 198 greather than MAXNEIBSNUM (192) at iteration 430
	possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 198 greather than MAXNEIBSNUM (192) at iteration 440
	possible culprit: -1 (neibs: 0)
Simulation time t=1.401277e-01s, iteration=441, dt=1.918957e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 198
Simulation time t=1.419053e-01s, iteration=450, dt=1.062711e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 198
WARNING: current max. neighbors numbers 203 greather than MAXNEIBSNUM (192) at iteration 450
	possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 205 greather than MAXNEIBSNUM (192) at iteration 460
	possible culprit: -1 (neibs: 0)
Simulation time t=1.451039e-01s, iteration=467, dt=2.603823e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 205
Simulation time t=1.456839e-01s, iteration=470, dt=1.017485e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 205
WARNING: current max. neighbors numbers 209 greather than MAXNEIBSNUM (192) at iteration 470
	possible culprit: -1 (neibs: 0)
FATAL: timestep 2.09296e-08 under machine epsilon at iteration 476 - requesting quit...
WARNING: particle 1271 (id 645) has NAN position! (nan, nan, nan) @ (1, 0, 0) = (nan, nan, nan)
Simulation time t=1.463384e-01s, iteration=476, dt=2.092963e-08s, 13,601 parts (1, cum. 1.4 MIPPS), maxneibs 209
Elapsed time of simulation cycle: 4.8s
Peak particle speed was ~200.458 m/s at 0.146338 s -> can set maximum vel 2.2e+02 for this problem
Simulation end, cleaning up...
Deallocating...

This is as far as I got before decided to reach out for help. Based on the results from the test above, I believe it might have something to do with the memory. Here's some rudimentary comments concerning the memory on what I've found after some "google research"

GPU on Tegra TK1, TX1, and TX2 do not have their own memory. It is hard wired to the memory controller and shares system RAM. This also implies the GPU is not limited by PCIe bus speeds and PCIe management functions of a GPU don't apply.

I do not know what limits there might be on how much system RAM can be used by the GPU.

The Maxwell and Pascal architecture combined the texture and L1 cache into a single unified cache. All global, local, surface, and texture operations go through this cache.

here are some further links
Jetson TX2 GPU memory
L1 cache vs shared memory

Of course, I realize I'm not defining the source of the problem, but I've tried to provide as much info as I've gathered in my effort. I also realize that this is likely not the intended system for this application. I'm mainly interested in resolving this for the purpose of development (again, it's the only Nvidia GPU I have and it's cheap to buy for students ~$300). After testing and development, I would then later run the code on a more dedicated machine with better resources. So if there is anything that can be done to help me resolve these issues, I'd be grateful.

O and as far as host system, here are some of my specs

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/5/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 --with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10)

Building without GPU is possible?

I am wondering would it be possible to build GPUSPH without cuda?
I am using Ubuntu 14.04 (32 but version).

My graphic card is out dated and not supported by cuda.

Expected Viscosity Model Error

Dear All Developers,

I am a researching intern with EDF's GPUSPH group, and throughout my work benchmarking the software I noticed a discrepancy in the viscosity models as a whole (as far as I can tell).

It appears to be that there is a factor of two multiplied by the viscosity parameter defined by the function "set_kinematic_visc(0, 1.0e-2f);" because the expected output velocity in one of my benchmarked case studies is half the theoretical velocity profile.

This issue has been tested using the different viscosity models (KINEMATIC & DYNAMIC) available (with SA and DYNBOUNDARY) walls, however, the discrepancy appears to be there regardless of the models used.

Feel free to contact me either here or via my work email if more details or data is required.
[email protected]

How do handle 3rd party contributions?

Related with #29 and #30

How do you handle 3rd party contributions? I mean, let's say I have something superb to merge in your code (which is not the case yet)... May I create a Pull Request (PR) here, in github? Are you then managing that manually?

I strongly suggest you to forgive about the private repository, using github as the upstream one, keeping as minimum as possible owners/contributors, in such a way the development process is based in forks and PRs. This have a number of benefits (in fact is the standard nowadays). The most remarkably ones are:

Simplified relation with the community, including both, the users and devs
Significant reduction of management efforts, since just github infrastructure shall be maintained, which also have a lot of awesome features
Unit testing can be easily implemented
Wiki can be handled as a git repo as well (some tasks can be even automatized)
Better and more accurate stats

MPI errors unhandled?

AFAIK, you are neither using the C interface to get returned errors, nor the C++ MPI::ERRORS_THROW_EXCEPTIONS handler

no-slip boundary condition

Hello everyone!
Is the no-slip boundary setting already built in the GPUSPH code?
Now I am using GPUSPH to simulate the Poiseuille flow. But I have not found a proper setting to achieve the no-slip effect on the wall boundaries.
If there is no this type boundary condition setting, how to add this setting?
Thanks and regards!

DEMExample does not work with SPS viscosity

DEMExample does not work with SPS viscosity. Particles are leaving the domain for no known reason.