sakalisc / splash-3 Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 26.0 13.1 MB

The Splash-3 benchmark suite

Makefile 0.20% C 16.15% GLSL 83.65%

splash-3's People

Contributors

Stargazers

Watchers

splash-3's Issues

BARINIT incomplete for some applications

Some applications omit the count parameter for BARINIT. While this works for the default macro (pthread.m4.stougie), it does not work when one wishes to use pthread_barrier_init, which requires a count to be passed.

As an example, inside codes/apps/barnes/code.c.in:

-   BARINIT(Global->Barload);
-   BARINIT(Global->Bartree);
-   BARINIT(Global->Barcom);
-   BARINIT(Global->Baraccel);
-   BARINIT(Global->Barstart);
-   BARINIT(Global->Barpos);
+   BARINIT(Global->Barload, NPROC);
+   BARINIT(Global->Bartree, NPROC);
+   BARINIT(Global->Barcom, NPROC);
+   BARINIT(Global->Baraccel, NPROC);
+   BARINIT(Global->Barstart, NPROC);
+   BARINIT(Global->Barpos, NPROC);

Uninitialized condition variables in Barnes

I was tring to run Barnes-Hut on MacOS and noticed some deadlocks in hackcofm.

It seems like the condition variable in most cells are never initialized properly (only the root one?). This leads to waiters for Done(cell) never releasing the mutex, as the call to pthread_cond_wait fails. Consequently, the thread trying to signal that waiter gets blocked too.

The Linux distribution I tried run it on does not seem to mind the uninitialized CVs.

I assume this can be fixed by either properly initializing the CVs or using an array similar to the locks. Initializing the CV in makecell/makeleaf makes it run for me, but calling pthread_cond_init that often might not be not an ideal solution.

I don't know if this is a relevant issue for you, but it's probably not a bad thing to let you now.

Best regards,
Robert

Segfault in Cholesky with certain inputs

Some of the inputs (e.g. tk25.O) cause Cholesky to segfault. The issue seems to arise when accessing a block that has an invalid pointer to the next block.

Splash-3x does not provide native inputs

Commit d895904 incorporates Parsec 3's Splash-2X enhancements to allow for larger input set sizes. However, the input folders themselves have not been updated to include the larger input sets. For example, the barnes application input_native on PARSEC looks like this:

4194304
123

0.025
0.05
1.0
2.0
5.0
0.075
0.25
NUMPROCS

However, the largest input provided by Splash-3x is 16384 (as opposed to 4194304 above). It would be nice if the native inputs released with Parsec 3.0 were also released here.

Do you know how to replace some c11 stuff, such as atomic_thread_fence with older alternative one?

Hi,
Sorry for bothering you guys. I have to build the Splash 3 with an older version of dragonegg 4.6 + llvm 3.0, which don't support the c11 standard, and I can't migrate our older llvm compiler to the latest one. So do you have any solutions to replace c11 stuff, such as atomic_thread_fence with older alternative one?

BTW, the reason why I don't use splash 2 is our research needs to bypass data race.

Best,
Jianping.

The program does not execute.

I followed the instructions and run with the recommended parameters. The program seems to execute and exit normally. But the result shows this. The compute time is always zero. Is this normal?

gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
m4 (GNU M4) 1.4.17


		Hack code: Plummer model

     nbody     dtime       eps       tol     dtout     tstop    fcells     NPROC
     16384   0.02500    0.0500      1.00     0.250     0.075      2.00         4

COMPUTESTART  =   1535169466
COMPUTEEND    =   1535169466
COMPUTETIME   =            0
TRACKTIME     =            0
PARTITIONTIME =            0	 -nan
TREEBUILDTIME =            0	 -nan
FORCECALCTIME =            0	 -nan
RESTTIME      =            0	 -nan
Creating a two cluster, non uniform distribution for 16384 particles
Starting FMM with 4 processors
Finished FMM
                                   PROCESS STATISTICS
             Track        Tree        List        Part        Pass       Inter        Bar        Intra       Other
 Proc        Time         Time        Time        Time        Time       Time         Time       Time        Time
    0            0           0           0           0           0           0           0           0           0

                                   TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169466
Total time with initialization    :                0
Total time without initialization :                0

Total time for steps 3 to 5 :            0


Ocean simulation with W-cycle multigrid solver
    Processors                         : 4
    Grid size                          : 258 x 258
    Grid resolution (meters)           : 20000.00
    Time between relaxations (seconds) : 28800
    Error tolerance                    : 1e-07


                       PROCESS STATISTICS
                  Total          Multigrid         Multigrid
 Proc             Time             Time            Fraction
    0                 0                  0              -nan

                       TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169466
Total time with initialization    :                0
Total time without initialization :                0
    (excludes first timestep)


Ocean simulation with W-cycle multigrid solver
    Processors                         : 4
    Grid size                          : 258 x 258
    Grid resolution (meters)           : 20000.00
    Time between relaxations (seconds) : 28800
    Error tolerance                    : 1e-07


                       PROCESS STATISTICS
                  Total          Multigrid         Multigrid
 Proc             Time             Time            Fraction
    0                 1                  1             1.000

                       TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169467
Total time with initialization    :                1
Total time without initialization :                1
    (excludes first timestep)

TIMING STATISTICS MEASURED BY MAIN PROCESS:
	Overall start time	          1535169467
	Overall end time	          1535169467
	Total time with initialization	                   0
	Total time without initialization	                   0
Rasiosity Statistics

    Histogram of interactions/elem
	 Interactions  Occurrence
	 -------------------------------
	 (Over 100)      168 (126291.523438)
	    100          3 (4874.575684)
	    98          6 (1521.356079)
	    97          5 (1734.110107)
	    96          5 (1352.056885)
	    94          2 (2449.900391)
	    93          1 (9905.465820)
	    92          2 (50253.605469)
	    91          1 (22364.552734)
	    90          7 (20320.214844)
	    89          1 (1195.967529)
	    88          2 (9905.674805)
	    87          1 (1311.672607)
	    86          4 (3626.220459)
	    85          4 (1450.736816)
	    84          5 (4989.483398)
	    83          1 (1512.704102)
	    82          4 (81670.343750)
	    81          4 (6313.362793)
	    80          1 (13567.184570)
	    79          1 (1191.440430)
	    78          1 (72000.250000)
	    77          2 (1345.461182)
	    76          1 (28252.785156)
	    75          4 (23670.160156)
	    72          1 (2930.986328)
	    71          2 (8949.666992)
	    70          3 (11491.526367)
	    69          1 (28252.785156)
	    67          1 (29830.343750)
	    66          1 (22372.632812)
	    65          3 (8065.339844)
	    64          4 (17389.933594)
	    63          2 (7623.680664)
	    62          3 (23875.185547)
	    61          1 (22680.052734)
	    60          1 (36381.175781)
	    59          4 (8072.653320)
	    57          4 (3139.145508)
	    54          1 (21323.609375)
	    52          2 (8099.991699)
	    49          1 (4661.109863)
	    46          1 (4661.109863)
	    45          2 (71999.750000)
	    44          4 (67961.914062)
	    41          3 (4641.196777)
	    39          1 (42452.933594)
	    38          2 (3360.855957)
	    37          3 (15079.329102)
	    36          1 (5919.844727)
	    35          4 (2469.837891)
	    34          1 (42452.933594)
	    33          2 (19543.990234)
	    32          1 (3889.380859)
	    30          1 (14858.612305)
	    29          1 (3889.380859)
	    28          6 (16926.972656)
	    27          3 (7069.240723)
	    26          3 (9905.535156)
	    25          1 (3241.141113)
	    24          2 (14858.602539)
	    22          2 (1904.186157)
	    21          1 (9905.674805)
	    20          3 (42897.675781)
	    13          1 (360186.750000)
	    12          18 (5481.369141)
	    11          1 (127359.734375)
	    10          24 (3665.947998)
	    8          1 (360186.750000)
	    7          1 (360186.750000)
	    6          12 (68938.164062)
	    5          4 (45358.750000)
	    4          266 (6870.366211)
	    3          154 (18073.607422)
	    2          926 (10214.654297)
	    1          558 (12964.450195)
	    0          403 (38763.160156)
    Configurations
	Patch assignment: Static equal number
	Always inserting at top of list for visibility testing (not sorted)
	Recursive pruning enabled for BSP tree traversal
	Patch cache:      Enabled
	Always check all other queues when task stealing (not neighbor scheme)
    Parameters
	Number of processors:    4
	Number of task queues:   4
	Number of tasks / queue: 200
	Area epsilon:            5000.000000
	#inter parallel refine:  5
	#visibility comp / task: 4
	BF epsilon:              0.100000
	Energy convergence:      0.050000
    Iterations to converge:   3 times
    Resource Usage
	Number of patches:            364
	Total number of elements:     2688
	Total number of interactions: 39973
	          completely visible: 6656
	        completely invisible: 13166
	           partially visible: 20151
	Interaction coherence (root interaction not counted)
	       Common for 4 siblings: 3212
	       Common for 3 siblings: 396
	       Common for 2 siblings: 246
	       Common for no sibling: 192
	Avg. elements per patch:      7.4
	Avg. interactions per patch:  109.8
	Avg. interactions per element:14.9
	Number of elements in equivalent uniform mesh: 7783
	Elem(hierarchical)/Elem(uniform): 34.54%

Number of processors:     	4
Global shared memory size:	64 MB
Samples per pixel:        	1

Number of primitive objects: 	7629
Number of primitive elements:	46423

****** Hierarchial uniform grid memory allocation summary ******* 

     < struct >:            < current >   < maximum >    < sizeof > 
     <  bytes >:             <  bytes >   <   bytes >    <  bytes > 

     grid:                      59760         59760           144 
     hashtable entries:        678968        678968             8 
     emptycell entries:          6632          6632             8 
     voxel:                   1251480       1251480            40 
     bintree_node:           12370320      12370320           120 

     Totals:                 14367160      14367160      

TIMING STATISTICS MEASURED BY MAIN PROCESS:
        Overall start time               1535169467
        Overall end time             1535169467
        Total time with initialization                     0
        Total time without initialization                     0
usage:  VOLREND num_processes input_file ROTATE_STEPS
Using 4 procs on 3 steps of 512 mols
Other parameters:
	TSTEP = 1.50e-16
	NORDER = 6
	NSAVE = -1
	NRST = 3000
	NPRINT = 3
	NFMC = 0
	CUTOFF = 6.212752


TEMPERATURE                =   298.00 K
DENSITY                    =  0.99800 G/C.C.
NUMBER OF MOLECULES        =      512
NUMBER OF PROCESSORS       =        4
TIME STEP                  = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA   =        6 
NO. OF TIME STEPS          =        3 
FREQUENCY OF DATA SAVING   =       -1 
FREQUENCY TO WRITE RST FILE=     3000 
SPHERICAL CUTOFF RADIUS    =   6.2128 ANGSTROM

NS = 7.9999899999999995
BOXL =  24.851010
CUTOFF =   6.212752
XS =   3.106380
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
         3        1.57495      0.05127     10.55761                      -2.15831
           10.026        305.74022        -19.57198
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0

Exited Happily with XTT = 10.0255 (note: XTT value is garbage if NPRINT > NSTEP)
Using 4 procs on 3 steps of 512 mols
Other parameters:
	TSTEP = 1.50e-16
	NORDER = 6
	NSAVE = -1
	NRST = 3000
	NPRINT = 3
	NFMC = 0
	CUTOFF = 6.212752

64 boxes with 4 processors


TEMPERATURE                =   298.00 K
DENSITY                    =  0.99800 G/C.C.
NUMBER OF MOLECULES        =      512
NUMBER OF PROCESSORS       =        4
TIME STEP                  = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA   =        6 
NO. OF TIME STEPS          =        3 
FREQUENCY OF DATA SAVING   =       -1 
FREQUENCY TO WRITE RST FILE=     3000 
xprocs = 1	yprocs = 2	zprocs = 2
x_inc = 4	 y_inc = 2	 z_inc = 2
x_left = 0	 y_left = 0	 z_left = 0
SPHERICAL CUTOFF RADIUS    =   6.2128 ANGSTROM

NS = 7.9999999999999893
BOXL =  24.851010
CUTOFF =   6.212752
BOX_LENGTH =   6.212752
BOX_PER_SIDE = 4
XS =   3.106376
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
         3     4711.30613   1586.02005      9.16081     -1.85845 
         6304.629    1430180.83630        414.12462
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0

Exited Happily with XTT = 6304.63 (note: XTT value is garbage if NPRINT > NSTEP)

Sparse Cholesky Factorization
     Problem:         
     4 Processors
     Postpass partition size: 32
     16384 byte cache


true partitions
Fan-out, no block copy-across
LB domain, embedded distribution
No ordering
1295 supers, 3.05 nodes/super, 211 max super
1295/531 supers before/after
165039042/170264150 (1.03) ops before/after amalgamation
before partition
Divide for 4 P, 17 domains, 0.43 of work static, 0.95 eff, (inf overall)
284946 total domain updates
970 max height, 170264150 ops, 58510.02 conc, 120.94 bl for 4 P
Target partition size 0, postpass size 32
Processor array is 2 by 2
No redistribution
Supers: 69: 1  85: 1  104: 1  111: 1  137: 1  142: 1  396: 1  
Blocks: 27: 1  28: 5  29: 1  33: 12  34: 5  35: 6  36: 2  
32 partitions
32 partitions, 493 blocks
170264150 operations for factorization

                            PROCESS STATISTICS
              Total
 Proc         Time 
    0              
                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0


FFT with Blocking Transpose
   65536 Complex Doubles
   4 Processors
   65536 Cache lines
   16 Byte line size
   4096 Bytes per page

iter_num = 64
iter_num = 64
iter_num = 64
iter_num = 64
Transpose: iter_num = 0
Transpose: iter_num = 4096
Transpose: iter_num = 8192
FFt1DOnce: iter_num = 1024
Transpose: iter_num = 12288
Step 1:        0
Step 2:        0
Transpose: iter_num = 4096
Transpose: iter_num = 0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 3:        0
Transpose: iter_num = 0
Transpose: iter_num = 4096
Step 4:        0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 5:        0

                 PROCESS STATISTICS
            Computation      Transpose     Transpose
 Proc          Time            Time        Fraction
    0                 0              0          -nan

                 TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0
Overall transpose time            :                0
Overall transpose fraction        :             -nan


Blocked Dense LU Factorization
     512 by 512 Matrix
     4 Processors
     16 by 16 Element Blocks


                            PROCESS STATISTICS
              Total      Diagonal     Perimeter      Interior       Barrier
 Proc         Time         Time         Time           Time          Time
    0             0             0             0             0             0

                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0


Blocked Dense LU Factorization
     512 by 512 Matrix
     4 Processors
     16 by 16 Element Blocks


                            PROCESS STATISTICS
              Total      Diagonal     Perimeter      Interior       Barrier
 Proc         Time         Time         Time           Time          Time
    0             0             0             0             0             0

                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0


Integer Radix Sort
     1048576 Keys
     4 Processors
     Radix = 1024
     Max key = 67108864


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0              0               0               0

                 TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0

Have volrend use the bundled libtiff

Instead of relying on the system version of libtiff, Volrend should preferably use the bundled version. This requires

Having the makefile extract and build the sources
Fixing the errors/warnings in the sources

Unsuccessful build for ARM (cross-compilation)

codes/apps/ocean/non_contiguous_partitions/jacobcalc.c fails to cross-compile for ARM using arm-linux-gnueabi-gcc as the compiler.

Steps to reproduce:

Edit Makefile.config and change CC := gcc to CC := arm-linux-gnueabi-gcc (assuming arm-linux-gnueabi-gcc is available for use).
Run make in the codes/ directory.

Error output:

make -C apps/ocean/non_contiguous_partitions
make[1]: Entering directory '/home/agdhruv/Desktop/Splash-3/codes/apps/ocean/non_contiguous_partitions'
m4 -Ulen -Uindex /home/agdhruv/Desktop/Splash-3/codes/pthread_macros/pthread.m4.stougie decs.h.in > decs.h
m4 -Ulen -Uindex /home/agdhruv/Desktop/Splash-3/codes/pthread_macros/pthread.m4.stougie jacobcalc.c.in > jacobcalc.c
arm-linux-gnueabi-gcc -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -static jacobcalc.c
In file included from jacobcalc.c:25:0:
decs.h:115:11: error: size of array ‘q_multi’ is too large
    double q_multi[MAX_LEVELS][IMAX][JMAX];
           ^~~~~~~
decs.h:116:11: error: size of array ‘rhs_multi’ is too large
    double rhs_multi[MAX_LEVELS][IMAX][JMAX];
           ^~~~~~~~~
../../../Makefile.config:31: recipe for target 'jacobcalc.o' failed
make[1]: *** [jacobcalc.o] Error 1
make[1]: Leaving directory '/home/agdhruv/Desktop/Splash-3/codes/apps/ocean/non_contiguous_partitions'
Makefile:4: recipe for target 'all' failed
make: *** [all] Error 2

Why is this error occurring? How do I go around it?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.