Name: Hüseyin Tuğrul BÜYÜKIŞIK
Type: User
Company: ZeroDensity
Bio: Physics engineer
Web developer
GPGPU
Weird equations
Optimizations
Lens undistortion
Signal filtering
Location: Earth, Sun System, Milkyway Galaxy, Observable Universe
Hüseyin Tuğrul BÜYÜKIŞIK's Projects
Gpu-accelerated The Powder Toy - just an attempt through cellular automata
2D RPG/RTS/Simulation game that lets you design a CPU & manage your corporation against other corporations.
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Project(OpenCL 1.2 backend) to generate KutuphaneCL.dll to be used by Cekirdekler GPGPU API(Cekirdekler.dll)
OpenCL 2.0 support for Cekirdekler
Heavy weight string with compression
testing bitonic sort algorithm on cuda
C# fully OpenCL(C99)-accelerated game demo and benchmark, prealpha- stage abondonware.
C++ compressed FASTA sequence cache backed by the combined video memory of system to decrease RAM usage.
C++ adaptive grid for fast collision detection between AABB particles.
Gpu accelerated neural network trainer that supports multiple GPUs with OpenCL.
Simple load-balancing library for balancing GPGPU workloads between a GPU and a CPU or any number of devices in a computer or multiple computers.
Computing a function when only its inverse is known, using Newson-Raphson method for 1D,2D,3D arrays in parallel.
This is a Kalman filter used to calculate the angle, rate and bias from from the input of an accelerometer/magnetometer and a gyroscope.
Async Test
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
Asynchronous cache that implements Least Recently Used (LRU) - Clock - Second Chance algorithm with O(1) hit O(1) miss complexity. This Async cache hides latency of cache-misses behind each other and behind cache-hits.
Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of 0.6 cycles per 32bit integer.
Classic Snake-Game With Independent Grid-Updates For Efficient Parallelization And Constant Computation Time
Simple (2 files), fast (1.8GB/s by 1 core of fx8150), video (mp4,ogg,..), stream cache (LRU implementation) for NodeJS.
User stats info.
Ultra fast simulated annealing with OpenCL & multiple accelerators, GPUs, CPUs.
deforming sphere surface using vertices, normals and time
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
Back your array of data by graphics card memory (multi-gpu) with a paging system (as a virtual memory simulation).