Hi, I am trying to reproduce results from the paper related to gPb w

I've changed all instances of SMVERSIONFLAGS to <code

CUDA Error: an illegal memory access was encountered build/parabola.cu 88 about damascene HOT 5 OPEN

prasannavk commented on July 28, 2024 1

CUDA Error: an illegal memory access was encountered build/parabola.cu 88

from damascene.

Comments (5)

envp commented on July 28, 2024 1

I ran cuda-memcheck on the executable, here's the output if anyone else is looking at this. It seems that computeGradients makes invalid writes into __shared__ memory

memerrors.txt

Edit: I found the issue. The computeGradient kernel was being launched with insufficient parameters. We fixed this issue by passing a third kernel launch parameter indicating the amount of shared memory to be allocated for the kernel. We For getting it to run on K20c we had to change lines 633-637 in localcues/rotate.cu

The original looked like so

...
    computeGradient<48, 25, true, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<48, 25, true, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
  } else {
    computeGradient<32, 32, false, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<32, 32, false, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...

This is to be changed to:

...
    // Internally computeGradient requires allocation of 3 ___shared___ float arrays
    // Each of which contains nthreads * UNROLL (48 * 4 ) elements
    size_t sharedMemorySize = 48 * 3 * UNROLL * sizeof(float);
    computeGradient<48, 25, true, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<48, 25, true, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
  } else {
    // Internally computeGradient requires allocation of 3 ___shared___ float arrays
    // Each of which contains nthreads * UNROLL (32 * 4 ) elements
    size_t sharedMemorySize = 32 * 3 * UNROLL * sizeof(float);
    computeGradient<32, 32, false, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
    computeGradient<32, 32, false, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...

The values may be refactored further, and have been stated explicitly above for the sake of clarity.

References:
[1] https://stackoverflow.com/questions/25500961/using-shared-memory-in-cuda-gives-memory-write-error

from damascene.

bryancatanzaro commented on July 28, 2024

Hi Prasanna -
Unfortunately, we haven't had time to maintain this code. It worked with CUDA 2.0 and GPUs from 2009, but I have not gotten it running on anything modern. I'm sure there are bugs in the code, but just don't have time to find them. =(

I have seen this bug as well - just don't have a fix for it at this time.

from damascene.

prasannavk commented on July 28, 2024

Ok, good to know. Thanks!

from damascene.

hyenal commented on July 28, 2024

Did you try to change the SMVERSIONFLAGS in common.mk to sm_20 ?
Actually support for sm_12 architecture was dropped in CUDA 7.0 so it might a reason why you get illegal memory access

from damascene.

acherunilam commented on July 28, 2024

I've changed all instances of SMVERSIONFLAGS to sm_35 since I'm running it on a Tesla K20. But I too face the same error:

Using cuda device 1: Tesla K20c
Processing: damascene/polynesia.ppm, output in damascene/polynesiaPb.pgm and damascene/polynesia.pb

 Eig 9 Tol 0.001000 Texton 1Image found: 321 x 481 pixels
Available 246022144 bytes on GPU
>+< rgbUtoGrayF | 0.729000 | ms
Convolving
Beginning kmeans
	Changes: 162604
	Changes: 83239
	Changes: 53176
	Changes: 40912
	Changes: 33108
	Changes: 25228
	Changes: 22345
	Changes: 19411
	Changes: 1971686769
	Changes: -153835340
	9 iterations until termination
Kmeans completed
>+< texton | 375.548004 | ms
>+< rgbUtoLab3F | 1.990000 | ms
>+< normalizeLab | 0.015000 | ms
>+< mirrorImage | 1.276000 | ms
Beginning Local cues computation
CUDA error at parabola.cu:58 code=77(cudaErrorIllegalAddress) "cudaMemcpy2DToArray(cuda_parabola_pixels, 0, 0, devPixels, border_width*sizeof(int), border_width*sizeof(int), border_height*norients, cudaMemcpyDeviceToDevice)"

Any suggestions?

from damascene.

CUDA Error: an illegal memory access was encountered build/parabola.cu 88 about damascene HOT 5 OPEN

Comments (5)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent