Giter Site home page Giter Site logo

histogram_with_cuda's Introduction

Histogram_with_CUDA

Firstly, basic histogram and kogge-stone scan algorithm are executed. Basic histogram function doesn’t use shared memory. Also atomicAdd is used because atomicAdd provide read-modify-write operation performed by single hardware instruction on a memory location address. So atomicAdd prevent data race in paralel thread execution. Then, assigned each thread to enhance the content of an scan element.Compile-time constant SECTION_SIZE is defined for size of a section. SECTION_SIZE is used as the block size of the kernel initialization, so I've had an equal number of threads and partition elements. Then made final adjustments to these cross-sectional scanning results for large input sequences. Histo[] array has all the threads in the block to load array elements together into a common memory array scan[] .Barrier synchronization is used to allow all threads to repeat their current insertions in before starting the next iteration.Also cdf min is find to calculate histogram equalization. At the end of the kernel, each thread writes its result to the assigned output array scanning[]. Then histogram equalize is calculated.

My device is GeForce 940MX, warp size is 32. My SECTION_SIZE is 256. 256 / 32 = 8 warps are used.

Then, private histogram and brent kunt scan algorithm are executed. These two algorithms run faster than the previous one. The private histogram use shared memory. Private histogram provides much less contention and serialization for accessing both private copies and the final copy. Therefore, it improves performance.Since the Brent-Kung algorithm always uses consecutive threads in each iteration, the control deviation problem does not occur until the number of active threads falls below the warp size. This can increase the efficiency of the algorithm. Then cdf min is find to calculate histogram equalization. At the end of the kernel, each thread writes its result to the assigned output array scanning[]. Then histogram equalize is calculated.

So in the second part, using more efficient codes for both histogram and scan has accelerated the process.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.