Giter Site home page Giter Site logo

Comments (5)

envp avatar envp commented on July 28, 2024 1

I ran cuda-memcheck on the executable, here's the output if anyone else is looking at this. It seems that computeGradients makes invalid writes into __shared__ memory

memerrors.txt

Edit: I found the issue. The computeGradient kernel was being launched with insufficient parameters. We fixed this issue by passing a third kernel launch parameter indicating the amount of shared memory to be allocated for the kernel. We For getting it to run on K20c we had to change lines 633-637 in localcues/rotate.cu

The original looked like so

...
    computeGradient<48, 25, true, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<48, 25, true, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
  } else {
    computeGradient<32, 32, false, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<32, 32, false, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...

This is to be changed to:

...
    // Internally computeGradient requires allocation of 3 ___shared___ float arrays
    // Each of which contains nthreads * UNROLL (48 * 4 ) elements
    size_t sharedMemorySize = 48 * 3 * UNROLL * sizeof(float);
    computeGradient<48, 25, true, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<48, 25, true, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
  } else {
    // Internally computeGradient requires allocation of 3 ___shared___ float arrays
    // Each of which contains nthreads * UNROLL (32 * 4 ) elements
    size_t sharedMemorySize = 32 * 3 * UNROLL * sizeof(float);
    computeGradient<32, 32, false, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<32, 32, false, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...

The values may be refactored further, and have been stated explicitly above for the sake of clarity.

References:
[1] https://stackoverflow.com/questions/25500961/using-shared-memory-in-cuda-gives-memory-write-error

from damascene.

bryancatanzaro avatar bryancatanzaro commented on July 28, 2024

Hi Prasanna -
Unfortunately, we haven't had time to maintain this code. It worked with CUDA 2.0 and GPUs from 2009, but I have not gotten it running on anything modern. I'm sure there are bugs in the code, but just don't have time to find them. =(

I have seen this bug as well - just don't have a fix for it at this time.

from damascene.

prasannavk avatar prasannavk commented on July 28, 2024

Ok, good to know. Thanks!

from damascene.

hyenal avatar hyenal commented on July 28, 2024

Did you try to change the SMVERSIONFLAGS in common.mk to sm_20 ?
Actually support for sm_12 architecture was dropped in CUDA 7.0 so it might a reason why you get illegal memory access

from damascene.

acherunilam avatar acherunilam commented on July 28, 2024

I've changed all instances of SMVERSIONFLAGS to sm_35 since I'm running it on a Tesla K20. But I too face the same error:

Using cuda device 1: Tesla K20c
Processing: damascene/polynesia.ppm, output in damascene/polynesiaPb.pgm and damascene/polynesia.pb

 Eig 9 Tol 0.001000 Texton 1Image found: 321 x 481 pixels
Available 246022144 bytes on GPU
>+< rgbUtoGrayF | 0.729000 | ms
Convolving
Beginning kmeans
	Changes: 162604
	Changes: 83239
	Changes: 53176
	Changes: 40912
	Changes: 33108
	Changes: 25228
	Changes: 22345
	Changes: 19411
	Changes: 1971686769
	Changes: -153835340
	9 iterations until termination
Kmeans completed
>+< texton | 375.548004 | ms
>+< rgbUtoLab3F | 1.990000 | ms
>+< normalizeLab | 0.015000 | ms
>+< mirrorImage | 1.276000 | ms
Beginning Local cues computation
CUDA error at parabola.cu:58 code=77(cudaErrorIllegalAddress) "cudaMemcpy2DToArray(cuda_parabola_pixels, 0, 0, devPixels, border_width*sizeof(int), border_width*sizeof(int), border_height*norients, cudaMemcpyDeviceToDevice)"

Any suggestions?

from damascene.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.