sakalisc / splash-3 Goto Github PK
View Code? Open in Web Editor NEWThe Splash-3 benchmark suite
The Splash-3 benchmark suite
Some applications omit the count parameter for BARINIT. While this works for the default macro (pthread.m4.stougie
), it does not work when one wishes to use pthread_barrier_init
, which requires a count to be passed.
As an example, inside codes/apps/barnes/code.c.in:
- BARINIT(Global->Barload);
- BARINIT(Global->Bartree);
- BARINIT(Global->Barcom);
- BARINIT(Global->Baraccel);
- BARINIT(Global->Barstart);
- BARINIT(Global->Barpos);
+ BARINIT(Global->Barload, NPROC);
+ BARINIT(Global->Bartree, NPROC);
+ BARINIT(Global->Barcom, NPROC);
+ BARINIT(Global->Baraccel, NPROC);
+ BARINIT(Global->Barstart, NPROC);
+ BARINIT(Global->Barpos, NPROC);
I was tring to run Barnes-Hut on MacOS and noticed some deadlocks in hackcofm
.
It seems like the condition variable in most cells are never initialized properly (only the root one?). This leads to waiters for Done(cell)
never releasing the mutex, as the call to pthread_cond_wait
fails. Consequently, the thread trying to signal that waiter gets blocked too.
The Linux distribution I tried run it on does not seem to mind the uninitialized CVs.
I assume this can be fixed by either properly initializing the CVs or using an array similar to the locks. Initializing the CV in makecell/makeleaf
makes it run for me, but calling pthread_cond_init
that often might not be not an ideal solution.
I don't know if this is a relevant issue for you, but it's probably not a bad thing to let you now.
Best regards,
Robert
Some of the inputs (e.g. tk25.O
) cause Cholesky to segfault. The issue seems to arise when accessing a block that has an invalid pointer to the next block.
Commit d895904 incorporates Parsec 3's Splash-2X enhancements to allow for larger input set sizes. However, the input folders themselves have not been updated to include the larger input sets. For example, the barnes application input_native on PARSEC looks like this:
4194304
1230.025
0.05
1.0
2.0
5.0
0.075
0.25
NUMPROCS
However, the largest input provided by Splash-3x is 16384
(as opposed to 4194304
above). It would be nice if the native inputs released with Parsec 3.0 were also released here.
Hi,
Sorry for bothering you guys. I have to build the Splash 3 with an older version of dragonegg 4.6 + llvm 3.0, which don't support the c11 standard, and I can't migrate our older llvm compiler to the latest one. So do you have any solutions to replace c11 stuff, such as atomic_thread_fence with older alternative one?
BTW, the reason why I don't use splash 2 is our research needs to bypass data race.
Best,
Jianping.
I followed the instructions and run with the recommended parameters. The program seems to execute and exit normally. But the result shows this. The compute time is always zero. Is this normal?
gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
m4 (GNU M4) 1.4.17
Hack code: Plummer model
nbody dtime eps tol dtout tstop fcells NPROC
16384 0.02500 0.0500 1.00 0.250 0.075 2.00 4
COMPUTESTART = 1535169466
COMPUTEEND = 1535169466
COMPUTETIME = 0
TRACKTIME = 0
PARTITIONTIME = 0 -nan
TREEBUILDTIME = 0 -nan
FORCECALCTIME = 0 -nan
RESTTIME = 0 -nan
Creating a two cluster, non uniform distribution for 16384 particles
Starting FMM with 4 processors
Finished FMM
PROCESS STATISTICS
Track Tree List Part Pass Inter Bar Intra Other
Proc Time Time Time Time Time Time Time Time Time
0 0 0 0 0 0 0 0 0 0
TIMING INFORMATION
Start time : 1535169466
Initialization finish time : 1535169466
Overall finish time : 1535169466
Total time with initialization : 0
Total time without initialization : 0
Total time for steps 3 to 5 : 0
Ocean simulation with W-cycle multigrid solver
Processors : 4
Grid size : 258 x 258
Grid resolution (meters) : 20000.00
Time between relaxations (seconds) : 28800
Error tolerance : 1e-07
PROCESS STATISTICS
Total Multigrid Multigrid
Proc Time Time Fraction
0 0 0 -nan
TIMING INFORMATION
Start time : 1535169466
Initialization finish time : 1535169466
Overall finish time : 1535169466
Total time with initialization : 0
Total time without initialization : 0
(excludes first timestep)
Ocean simulation with W-cycle multigrid solver
Processors : 4
Grid size : 258 x 258
Grid resolution (meters) : 20000.00
Time between relaxations (seconds) : 28800
Error tolerance : 1e-07
PROCESS STATISTICS
Total Multigrid Multigrid
Proc Time Time Fraction
0 1 1 1.000
TIMING INFORMATION
Start time : 1535169466
Initialization finish time : 1535169466
Overall finish time : 1535169467
Total time with initialization : 1
Total time without initialization : 1
(excludes first timestep)
TIMING STATISTICS MEASURED BY MAIN PROCESS:
Overall start time 1535169467
Overall end time 1535169467
Total time with initialization 0
Total time without initialization 0
Rasiosity Statistics
Histogram of interactions/elem
Interactions Occurrence
-------------------------------
(Over 100) 168 (126291.523438)
100 3 (4874.575684)
98 6 (1521.356079)
97 5 (1734.110107)
96 5 (1352.056885)
94 2 (2449.900391)
93 1 (9905.465820)
92 2 (50253.605469)
91 1 (22364.552734)
90 7 (20320.214844)
89 1 (1195.967529)
88 2 (9905.674805)
87 1 (1311.672607)
86 4 (3626.220459)
85 4 (1450.736816)
84 5 (4989.483398)
83 1 (1512.704102)
82 4 (81670.343750)
81 4 (6313.362793)
80 1 (13567.184570)
79 1 (1191.440430)
78 1 (72000.250000)
77 2 (1345.461182)
76 1 (28252.785156)
75 4 (23670.160156)
72 1 (2930.986328)
71 2 (8949.666992)
70 3 (11491.526367)
69 1 (28252.785156)
67 1 (29830.343750)
66 1 (22372.632812)
65 3 (8065.339844)
64 4 (17389.933594)
63 2 (7623.680664)
62 3 (23875.185547)
61 1 (22680.052734)
60 1 (36381.175781)
59 4 (8072.653320)
57 4 (3139.145508)
54 1 (21323.609375)
52 2 (8099.991699)
49 1 (4661.109863)
46 1 (4661.109863)
45 2 (71999.750000)
44 4 (67961.914062)
41 3 (4641.196777)
39 1 (42452.933594)
38 2 (3360.855957)
37 3 (15079.329102)
36 1 (5919.844727)
35 4 (2469.837891)
34 1 (42452.933594)
33 2 (19543.990234)
32 1 (3889.380859)
30 1 (14858.612305)
29 1 (3889.380859)
28 6 (16926.972656)
27 3 (7069.240723)
26 3 (9905.535156)
25 1 (3241.141113)
24 2 (14858.602539)
22 2 (1904.186157)
21 1 (9905.674805)
20 3 (42897.675781)
13 1 (360186.750000)
12 18 (5481.369141)
11 1 (127359.734375)
10 24 (3665.947998)
8 1 (360186.750000)
7 1 (360186.750000)
6 12 (68938.164062)
5 4 (45358.750000)
4 266 (6870.366211)
3 154 (18073.607422)
2 926 (10214.654297)
1 558 (12964.450195)
0 403 (38763.160156)
Configurations
Patch assignment: Static equal number
Always inserting at top of list for visibility testing (not sorted)
Recursive pruning enabled for BSP tree traversal
Patch cache: Enabled
Always check all other queues when task stealing (not neighbor scheme)
Parameters
Number of processors: 4
Number of task queues: 4
Number of tasks / queue: 200
Area epsilon: 5000.000000
#inter parallel refine: 5
#visibility comp / task: 4
BF epsilon: 0.100000
Energy convergence: 0.050000
Iterations to converge: 3 times
Resource Usage
Number of patches: 364
Total number of elements: 2688
Total number of interactions: 39973
completely visible: 6656
completely invisible: 13166
partially visible: 20151
Interaction coherence (root interaction not counted)
Common for 4 siblings: 3212
Common for 3 siblings: 396
Common for 2 siblings: 246
Common for no sibling: 192
Avg. elements per patch: 7.4
Avg. interactions per patch: 109.8
Avg. interactions per element:14.9
Number of elements in equivalent uniform mesh: 7783
Elem(hierarchical)/Elem(uniform): 34.54%
Number of processors: 4
Global shared memory size: 64 MB
Samples per pixel: 1
Number of primitive objects: 7629
Number of primitive elements: 46423
****** Hierarchial uniform grid memory allocation summary *******
< struct >: < current > < maximum > < sizeof >
< bytes >: < bytes > < bytes > < bytes >
grid: 59760 59760 144
hashtable entries: 678968 678968 8
emptycell entries: 6632 6632 8
voxel: 1251480 1251480 40
bintree_node: 12370320 12370320 120
Totals: 14367160 14367160
TIMING STATISTICS MEASURED BY MAIN PROCESS:
Overall start time 1535169467
Overall end time 1535169467
Total time with initialization 0
Total time without initialization 0
usage: VOLREND num_processes input_file ROTATE_STEPS
Using 4 procs on 3 steps of 512 mols
Other parameters:
TSTEP = 1.50e-16
NORDER = 6
NSAVE = -1
NRST = 3000
NPRINT = 3
NFMC = 0
CUTOFF = 6.212752
TEMPERATURE = 298.00 K
DENSITY = 0.99800 G/C.C.
NUMBER OF MOLECULES = 512
NUMBER OF PROCESSORS = 4
TIME STEP = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA = 6
NO. OF TIME STEPS = 3
FREQUENCY OF DATA SAVING = -1
FREQUENCY TO WRITE RST FILE= 3000
SPHERICAL CUTOFF RADIUS = 6.2128 ANGSTROM
NS = 7.9999899999999995
BOXL = 24.851010
CUTOFF = 6.212752
XS = 3.106380
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
3 1.57495 0.05127 10.55761 -2.15831
10.026 305.74022 -19.57198
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0
Exited Happily with XTT = 10.0255 (note: XTT value is garbage if NPRINT > NSTEP)
Using 4 procs on 3 steps of 512 mols
Other parameters:
TSTEP = 1.50e-16
NORDER = 6
NSAVE = -1
NRST = 3000
NPRINT = 3
NFMC = 0
CUTOFF = 6.212752
64 boxes with 4 processors
TEMPERATURE = 298.00 K
DENSITY = 0.99800 G/C.C.
NUMBER OF MOLECULES = 512
NUMBER OF PROCESSORS = 4
TIME STEP = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA = 6
NO. OF TIME STEPS = 3
FREQUENCY OF DATA SAVING = -1
FREQUENCY TO WRITE RST FILE= 3000
xprocs = 1 yprocs = 2 zprocs = 2
x_inc = 4 y_inc = 2 z_inc = 2
x_left = 0 y_left = 0 z_left = 0
SPHERICAL CUTOFF RADIUS = 6.2128 ANGSTROM
NS = 7.9999999999999893
BOXL = 24.851010
CUTOFF = 6.212752
BOX_LENGTH = 6.212752
BOX_PER_SIDE = 4
XS = 3.106376
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
3 4711.30613 1586.02005 9.16081 -1.85845
6304.629 1430180.83630 414.12462
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0
Exited Happily with XTT = 6304.63 (note: XTT value is garbage if NPRINT > NSTEP)
Sparse Cholesky Factorization
Problem:
4 Processors
Postpass partition size: 32
16384 byte cache
true partitions
Fan-out, no block copy-across
LB domain, embedded distribution
No ordering
1295 supers, 3.05 nodes/super, 211 max super
1295/531 supers before/after
165039042/170264150 (1.03) ops before/after amalgamation
before partition
Divide for 4 P, 17 domains, 0.43 of work static, 0.95 eff, (inf overall)
284946 total domain updates
970 max height, 170264150 ops, 58510.02 conc, 120.94 bl for 4 P
Target partition size 0, postpass size 32
Processor array is 2 by 2
No redistribution
Supers: 69: 1 85: 1 104: 1 111: 1 137: 1 142: 1 396: 1
Blocks: 27: 1 28: 5 29: 1 33: 12 34: 5 35: 6 36: 2
32 partitions
32 partitions, 493 blocks
170264150 operations for factorization
PROCESS STATISTICS
Total
Proc Time
0
TIMING INFORMATION
Start time : 1535169467
Initialization finish time : 1535169467
Overall finish time : 1535169467
Total time with initialization : 0
Total time without initialization : 0
FFT with Blocking Transpose
65536 Complex Doubles
4 Processors
65536 Cache lines
16 Byte line size
4096 Bytes per page
iter_num = 64
iter_num = 64
iter_num = 64
iter_num = 64
Transpose: iter_num = 0
Transpose: iter_num = 4096
Transpose: iter_num = 8192
FFt1DOnce: iter_num = 1024
Transpose: iter_num = 12288
Step 1: 0
Step 2: 0
Transpose: iter_num = 4096
Transpose: iter_num = 0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 3: 0
Transpose: iter_num = 0
Transpose: iter_num = 4096
Step 4: 0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 5: 0
PROCESS STATISTICS
Computation Transpose Transpose
Proc Time Time Fraction
0 0 0 -nan
TIMING INFORMATION
Start time : 1535169467
Initialization finish time : 1535169467
Overall finish time : 1535169467
Total time with initialization : 0
Total time without initialization : 0
Overall transpose time : 0
Overall transpose fraction : -nan
Blocked Dense LU Factorization
512 by 512 Matrix
4 Processors
16 by 16 Element Blocks
PROCESS STATISTICS
Total Diagonal Perimeter Interior Barrier
Proc Time Time Time Time Time
0 0 0 0 0 0
TIMING INFORMATION
Start time : 1535169467
Initialization finish time : 1535169467
Overall finish time : 1535169467
Total time with initialization : 0
Total time without initialization : 0
Blocked Dense LU Factorization
512 by 512 Matrix
4 Processors
16 by 16 Element Blocks
PROCESS STATISTICS
Total Diagonal Perimeter Interior Barrier
Proc Time Time Time Time Time
0 0 0 0 0 0
TIMING INFORMATION
Start time : 1535169467
Initialization finish time : 1535169467
Overall finish time : 1535169467
Total time with initialization : 0
Total time without initialization : 0
Integer Radix Sort
1048576 Keys
4 Processors
Radix = 1024
Max key = 67108864
PROCESS STATISTICS
Total Rank Sort
Proc Time Time Time
0 0 0 0
TIMING INFORMATION
Start time : 1535169467
Initialization finish time : 1535169467
Overall finish time : 1535169467
Total time with initialization : 0
Total time without initialization : 0
Instead of relying on the system version of libtiff, Volrend should preferably use the bundled version. This requires
codes/apps/ocean/non_contiguous_partitions/jacobcalc.c
fails to cross-compile for ARM using arm-linux-gnueabi-gcc
as the compiler.
Steps to reproduce:
Makefile.config
and change CC := gcc
to CC := arm-linux-gnueabi-gcc
(assuming arm-linux-gnueabi-gcc
is available for use).make
in the codes/
directory.Error output:
make -C apps/ocean/non_contiguous_partitions
make[1]: Entering directory '/home/agdhruv/Desktop/Splash-3/codes/apps/ocean/non_contiguous_partitions'
m4 -Ulen -Uindex /home/agdhruv/Desktop/Splash-3/codes/pthread_macros/pthread.m4.stougie decs.h.in > decs.h
m4 -Ulen -Uindex /home/agdhruv/Desktop/Splash-3/codes/pthread_macros/pthread.m4.stougie jacobcalc.c.in > jacobcalc.c
arm-linux-gnueabi-gcc -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -static jacobcalc.c
In file included from jacobcalc.c:25:0:
decs.h:115:11: error: size of array ‘q_multi’ is too large
double q_multi[MAX_LEVELS][IMAX][JMAX];
^~~~~~~
decs.h:116:11: error: size of array ‘rhs_multi’ is too large
double rhs_multi[MAX_LEVELS][IMAX][JMAX];
^~~~~~~~~
../../../Makefile.config:31: recipe for target 'jacobcalc.o' failed
make[1]: *** [jacobcalc.o] Error 1
make[1]: Leaving directory '/home/agdhruv/Desktop/Splash-3/codes/apps/ocean/non_contiguous_partitions'
Makefile:4: recipe for target 'all' failed
make: *** [all] Error 2
Why is this error occurring? How do I go around it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.