Bunch of small bug fixes
Conditional expressions (?:) should be printed correctly now.
Added precedence levels for operations.
Added &&*, ||* and /=*
OpenCL and C code generation is currently broken!!
This is because of addition of new intermediate representation and
common subexpression elimination experiments. Been working on
getting the CUDA part to work first.
Global Arrays
Experimenting with a new kind of Kernel. These kernels
have types like: GlobalArray Pull (Exp a) -> Kernel (GlobalArray Push (Exp a))
This also lets the Obsidian programmer express some aspects of what we
usually refer to as Kernel coordination.
Two functions (block and unblock) chop global arrays up into small arrays suitable
for processing by what we have so far called a Kernel.
There are some restrictions (maybe totally necessary): There is no way to,
within a kernel, go from (GlobalArray Push a) to (GlobalArray Pull a).
Push Arrays: Explanation to come
Notes
(Solved, it works, see below)
OpenCL: The kind of pointer arithmetic + casting, that has worked so well
in CUDA, seems to not work in OpenCL on the local memory.
Will need to scan the specification/documentation to find exactly what is
allowed and not on pointers to local memory in an OpenCL kernel. (Solved, see below)
The address space qualifiers are important! (__local int*)shared vs (int*)shared.
(int*)shared tried to cast local memory array called shared into a global array, so illegal!
Obsidian Int are β64βbit on 64bit architectures. Is it also true that
a CUDA int is 64bit on 64bit architectures?
and to look into:
Break out SPMDC into its own library and try to abstract this
from any OpenCL, CUDA Specifics.
Update the Cabal file. Or fix whatever the problem with that is.
PRIO => Detect if output is computed directly from input.
If that happens store result fist in a variable and
then to output array from there.
Many-in -> Many-out operations
PRIO => Coordination
Improve upon the things in Sync.hs
Syncing needs to be improved. (work on more kinds of arrays, needs type class magic)
Fix SyncAnalysis
IN PROGRESS: Perform CSE
IN PROGRESS: Kernel Coordination layer
DONE: Generate the vSwap/iSwap kind of kernels. That is Kernels that operate
on an array based on its global tid instead of local tid.
DONE: Generate Kernels that take scalar parameters
PRIO => Filters and unsafe operations that lead to nondeterminism