Currently I do not have a Nvidia GPU installed on my PC, nor do I have the capacity for one. I do however have a Nvidia Jetson TX2, which is an embedded platform hosting a CUDA enabled GPU (compute capability 6.2). On the Jetson, I have CUDA 9.0 installed via JetPack 3.2.1 (see Nvidia's website on embedded systems for more info). Based on the dependencies for running GPUSPH, I believe I meet the necessary requirements. Here are is the device info given by the script:
I realize this may be the first you've heard of someone trying to run GPUSPH on this type of system and I didn't expect it to run out of the box. Reading through the Makefile supports this assumption, as I don't see anything relating to embedded platforms. That being said, I made an initiative to adjust the Makefile in hopes that I could successfully compile and run the software.
Carefully reading through the Makefile and comparing the execution results of "shell" commands against my system, I was able to ensure that all the necessary "includes" and "libs" were found. The only adjustment that I had to make was here
This adjustment was made because the machine dependent option "-m64" does not exist for AArch64, hence appending the option on this line below causes compile errors
That being said, there is no machine dependent option being passed to the CXXFLAGS. I did however include the "-m64" option in the beginning of the nvcc-specific flags
Note that I'm following the "default" options for GPUSPH (i.e. dam break problem with defaults) just to see if I can get the software to run. When I run the executable (./GPUSPH), here is the output I receive
Note that running this doesn't require any special compile options. Here is the output beginning with "Entering the main simulation cycle" (i.e. the output above, matches that presented above) ,
Note that I've only shown the unique errors and removed those that repeat for the sake of presentation here. Obviously I started with the first error, located at "at 0x00000930 in /home/nvidia/GPUSPH/gpusph/src/cuda/forces_kernel.def:2359". Here is the code (in "forces_kernel.def") where the error is referring to
Since the code was wrapped in an "if statement", I decided to try the alternative, which required that I change the definition in the "textures.cuh" source code to
In other words, I hard coded it such that "PREFER_L1" would always evaluate to false. I read the comments in the code about the L1 cache vs the shared memory, for which I also notice in the source code "cudautili.cu" there is a preference setting. I changed this as well to
Therefore, I'm basically testing the code for the use of share vs L1 memory preference. I run a "make clean" then recompile the code via "make" and everything compiles as before. Running the code now (via ./GPUSPH) succeeds without the errors I was seeing before. Unfortunately, now the simulation blows up with the following output
* No devices specified, falling back to default (dev 0)...
GPUSPH version v4.1+custom
Release version without fastmath for compute capability 6.2
Chrono : disabled
HDF5 : disabled
MPI : disabled
Compiled for problem "DamBreak3D"
[Network] rank 0 (1/1), host
tot devs = 1 (1 * 1)
WARNING: setting number of layers for dynamic boundaries but not using DYN_BOUNDARY!
WARNING: number of layers for dynamic boundaries is low (3), suggested number is 4
Info stream: GPUSPH-25331
Initializing...
Water level not set, autocomputed: 0.4
Max fall height not set, autocomputed: 0.41
Max particle speed not set, autocomputed from max fall: 2.83623
setting dt = 0.00039 from CFL conditions (soundspeed: 0.00039, gravity: 0.0154445, viscosity: nan)
Using problem-set max neibs num 192 (safe computed value was 128)
Ferrari coefficient: 0.000000e+00 (default value, disabled)
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side : 0.052 / 0.052 / 0.052
- World origin: 0 , 0 , 0
- World size: 1.6 x 0.67 x 0.6
- Cell size: 0.0533333 x 0.0558333 x 0.0545455
- Grid size: 30 x 12 x 11 (3,960 cells)
- Cell linearizazion: y,z,x
- Dp: 0.02
- R0: 0.02
Generating problem particles...
VTKWriter will write every 0.005 (simulated) seconds
HotStart checkpoints every 0.005 (simulated) seconds
will keep the last 8 checkpoints
v4.1+custom
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 1
numOpenBoundaries : 0
allocated 1009.6 KiB on host for 13,601 particles (13,601 active)
Copying the particles to shared arrays...
---
Rigid body 1: 798 parts, mass nan, object mass 0
Open boundaries: 0
Fluid: 12800 parts, mass 0.008125
Boundary: 0 parts, mass 0
Testpoint: 3 parts
Tot: 13601 particles
---
RB First/Last Index:
-12803 797
Preparing the problem...
Body: 0
Cg grid pos: 17 6 5
Cg pos: -0.00848052 -0.0279167 3.46945e-18
- device at index 0 has 13,601 particles assigned and offset 0
Starting workers...
Thread 0x7faee0f000 global device id: 0 (1)
thread 0x7fae8371e0 device idx 0: CUDA device 0/1, PCI device 0000:00:00.0: NVIDIA Tegra X2
Device idx 0: free memory 614 MiB, total memory 7846 MiB
Estimated memory consumption: 508B/particle
number of forces rigid bodies particles = 798
Device idx 0 (CUDA: 0) allocated 0 B on host, 6.54 MiB on device
assigned particles: 13,601; allocated: 13,601
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 13601 Position items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Velocity items (212.52 KiB) on device 0 from position 0
Thread 0 uploading 13601 Info items (106.26 KiB) on device 0 from position 0
Thread 0 uploading 13601 Hash items (53.13 KiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=3.900000e-04s, 13,601 parts (0, cum. 0 MIPPS), maxneibs 0
Simulation time t=5.154129e-03s, iteration=14, dt=3.231076e-04s, 13,601 parts (1.6, cum. 1.6 MIPPS), maxneibs 80
Simulation time t=7.129510e-03s, iteration=20, dt=3.713324e-04s, 13,601 parts (1.7, cum. 1.7 MIPPS), maxneibs 80
Simulation time t=1.010620e-02s, iteration=28, dt=3.651520e-04s, 13,601 parts (1.4, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.083435e-02s, iteration=30, dt=3.626913e-04s, 13,601 parts (1.5, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.514638e-02s, iteration=42, dt=3.499534e-04s, 13,601 parts (1.6, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=1.800033e-02s, iteration=50, dt=3.665025e-04s, 13,601 parts (1.7, cum. 1.6 MIPPS), maxneibs 81
Simulation time t=2.019732e-02s, iteration=56, dt=3.769902e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=2.165445e-02s, iteration=60, dt=3.777308e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=2.533739e-02s, iteration=70, dt=3.600024e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 81
Simulation time t=3.015958e-02s, iteration=83, dt=3.818462e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=3.277038e-02s, iteration=90, dt=3.818220e-04s, 13,601 parts (1.7, cum. 1.6 MIPPS), maxneibs 82
Simulation time t=3.500048e-02s, iteration=96, dt=3.805821e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=3.647532e-02s, iteration=100, dt=3.681548e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 82
Simulation time t=4.020716e-02s, iteration=110, dt=3.576268e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 84
Simulation time t=4.534000e-02s, iteration=124, dt=3.738058e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 85
Simulation time t=4.755537e-02s, iteration=130, dt=3.684289e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 85
Simulation time t=5.013070e-02s, iteration=137, dt=3.830418e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 87
Simulation time t=5.127811e-02s, iteration=140, dt=3.555218e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 87
Simulation time t=5.528104e-02s, iteration=151, dt=3.819206e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 93
Simulation time t=5.850480e-02s, iteration=160, dt=3.546642e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 93
Simulation time t=6.032260e-02s, iteration=165, dt=3.295202e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.215305e-02s, iteration=170, dt=3.509206e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.500402e-02s, iteration=178, dt=3.668144e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=6.574544e-02s, iteration=180, dt=3.755178e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 99
Simulation time t=7.015466e-02s, iteration=192, dt=3.795494e-04s, 13,601 parts (1.5, cum. 1.5 MIPPS), maxneibs 106
Simulation time t=7.312123e-02s, iteration=200, dt=3.776072e-04s, 13,601 parts (1.6, cum. 1.5 MIPPS), maxneibs 106
Simulation time t=7.534076e-02s, iteration=206, dt=3.711822e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 109
Simulation time t=7.681028e-02s, iteration=210, dt=3.629877e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 109
Simulation time t=8.007353e-02s, iteration=219, dt=3.739279e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 110
Simulation time t=8.044746e-02s, iteration=220, dt=3.575024e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 110
Simulation time t=8.528919e-02s, iteration=234, dt=3.611622e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 119
Simulation time t=8.731700e-02s, iteration=240, dt=3.615249e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 119
Simulation time t=9.020710e-02s, iteration=249, dt=3.563746e-04s, 13,601 parts (1.3, cum. 1.5 MIPPS), maxneibs 123
Simulation time t=9.056347e-02s, iteration=250, dt=3.531481e-04s, 13,601 parts (1.1, cum. 1.5 MIPPS), maxneibs 123
Simulation time t=9.516194e-02s, iteration=264, dt=2.900131e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 127
Simulation time t=9.707781e-02s, iteration=270, dt=3.319965e-04s, 13,601 parts (1.4, cum. 1.5 MIPPS), maxneibs 127
Simulation time t=1.000128e-01s, iteration=279, dt=3.541400e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 136
Simulation time t=1.003670e-01s, iteration=280, dt=3.588895e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 136
Simulation time t=1.051173e-01s, iteration=297, dt=2.594648e-04s, 13,601 parts (0.98, cum. 1.4 MIPPS), maxneibs 145
Simulation time t=1.058675e-01s, iteration=300, dt=2.323153e-04s, 13,601 parts (1.1, cum. 1.4 MIPPS), maxneibs 145
Simulation time t=1.100258e-01s, iteration=315, dt=2.829456e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 149
Simulation time t=1.115843e-01s, iteration=320, dt=3.023205e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 149
Simulation time t=1.150202e-01s, iteration=332, dt=2.643027e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 161
Simulation time t=1.170675e-01s, iteration=340, dt=3.592710e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 161
Simulation time t=1.200720e-01s, iteration=353, dt=2.100472e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 168
Simulation time t=1.219651e-01s, iteration=360, dt=2.795108e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 168
Simulation time t=1.252873e-01s, iteration=374, dt=1.990724e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 173
Simulation time t=1.265921e-01s, iteration=380, dt=2.693839e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 173
Simulation time t=1.301633e-01s, iteration=395, dt=2.499819e-04s, 13,601 parts (1.1, cum. 1.4 MIPPS), maxneibs 179
Simulation time t=1.313476e-01s, iteration=400, dt=2.050968e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 179
Simulation time t=1.350395e-01s, iteration=417, dt=1.727778e-04s, 13,601 parts (1.4, cum. 1.4 MIPPS), maxneibs 189
Simulation time t=1.356967e-01s, iteration=420, dt=2.657721e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 189
WARNING: current max. neighbors numbers 193 greather than MAXNEIBSNUM (192) at iteration 420
possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 198 greather than MAXNEIBSNUM (192) at iteration 430
possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 198 greather than MAXNEIBSNUM (192) at iteration 440
possible culprit: -1 (neibs: 0)
Simulation time t=1.401277e-01s, iteration=441, dt=1.918957e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 198
Simulation time t=1.419053e-01s, iteration=450, dt=1.062711e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 198
WARNING: current max. neighbors numbers 203 greather than MAXNEIBSNUM (192) at iteration 450
possible culprit: -1 (neibs: 0)
WARNING: current max. neighbors numbers 205 greather than MAXNEIBSNUM (192) at iteration 460
possible culprit: -1 (neibs: 0)
Simulation time t=1.451039e-01s, iteration=467, dt=2.603823e-04s, 13,601 parts (1.3, cum. 1.4 MIPPS), maxneibs 205
Simulation time t=1.456839e-01s, iteration=470, dt=1.017485e-04s, 13,601 parts (1.2, cum. 1.4 MIPPS), maxneibs 205
WARNING: current max. neighbors numbers 209 greather than MAXNEIBSNUM (192) at iteration 470
possible culprit: -1 (neibs: 0)
FATAL: timestep 2.09296e-08 under machine epsilon at iteration 476 - requesting quit...
WARNING: particle 1271 (id 645) has NAN position! (nan, nan, nan) @ (1, 0, 0) = (nan, nan, nan)
Simulation time t=1.463384e-01s, iteration=476, dt=2.092963e-08s, 13,601 parts (1, cum. 1.4 MIPPS), maxneibs 209
Elapsed time of simulation cycle: 4.8s
Peak particle speed was ~200.458 m/s at 0.146338 s -> can set maximum vel 2.2e+02 for this problem
Simulation end, cleaning up...
Deallocating...
This is as far as I got before decided to reach out for help. Based on the results from the test above, I believe it might have something to do with the memory. Here's some rudimentary comments concerning the memory on what I've found after some "google research"
Of course, I realize I'm not defining the source of the problem, but I've tried to provide as much info as I've gathered in my effort. I also realize that this is likely not the intended system for this application. I'm mainly interested in resolving this for the purpose of development (again, it's the only Nvidia GPU I have and it's cheap to buy for students ~$300). After testing and development, I would then later run the code on a more dedicated machine with better resources. So if there is anything that can be done to help me resolve these issues, I'd be grateful.