Comments (5)
I ran cuda-memcheck on the executable, here's the output if anyone else is looking at this. It seems that computeGradients makes invalid writes into __shared__
memory
Edit: I found the issue. The computeGradient
kernel was being launched with insufficient parameters. We fixed this issue by passing a third kernel launch parameter indicating the amount of shared memory to be allocated for the kernel. We For getting it to run on K20c we had to change lines 633-637 in localcues/rotate.cu
The original looked like so
...
computeGradient<48, 25, true, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
computeGradient<48, 25, true, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
} else {
computeGradient<32, 32, false, false><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
computeGradient<32, 32, false, true><<<gridDim, blockDim>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...
This is to be changed to:
...
// Internally computeGradient requires allocation of 3 ___shared___ float arrays
// Each of which contains nthreads * UNROLL (48 * 4 ) elements
size_t sharedMemorySize = 48 * 3 * UNROLL * sizeof(float);
computeGradient<48, 25, true, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientA);
computeGradient<48, 25, true, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, kernelRadius, kernelLength, devIntegrals, integralImagePitchInInts, devGradientB);
} else {
// Internally computeGradient requires allocation of 3 ___shared___ float arrays
// Each of which contains nthreads * UNROLL (32 * 4 ) elements
size_t sharedMemorySize = 32 * 3 * UNROLL * sizeof(float);
computeGradient<32, 32, false, false><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, topNorm, bottomNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientA);
computeGradient<32, 32, false, true><<<gridDim, blockDim, sharedMemorySize>>>(width, height, width * height, border, rotatedWidth, leftNorm, rightNorm, 0, 0, devIntegrals, integralImagePitchInInts, devGradientB);
...
The values may be refactored further, and have been stated explicitly above for the sake of clarity.
References:
[1] https://stackoverflow.com/questions/25500961/using-shared-memory-in-cuda-gives-memory-write-error
from damascene.
Hi Prasanna -
Unfortunately, we haven't had time to maintain this code. It worked with CUDA 2.0 and GPUs from 2009, but I have not gotten it running on anything modern. I'm sure there are bugs in the code, but just don't have time to find them. =(
I have seen this bug as well - just don't have a fix for it at this time.
from damascene.
Ok, good to know. Thanks!
from damascene.
Did you try to change the SMVERSIONFLAGS
in common.mk to sm_20 ?
Actually support for sm_12 architecture was dropped in CUDA 7.0 so it might a reason why you get illegal memory access
from damascene.
I've changed all instances of SMVERSIONFLAGS
to sm_35
since I'm running it on a Tesla K20. But I too face the same error:
Using cuda device 1: Tesla K20c
Processing: damascene/polynesia.ppm, output in damascene/polynesiaPb.pgm and damascene/polynesia.pb
Eig 9 Tol 0.001000 Texton 1Image found: 321 x 481 pixels
Available 246022144 bytes on GPU
>+< rgbUtoGrayF | 0.729000 | ms
Convolving
Beginning kmeans
Changes: 162604
Changes: 83239
Changes: 53176
Changes: 40912
Changes: 33108
Changes: 25228
Changes: 22345
Changes: 19411
Changes: 1971686769
Changes: -153835340
9 iterations until termination
Kmeans completed
>+< texton | 375.548004 | ms
>+< rgbUtoLab3F | 1.990000 | ms
>+< normalizeLab | 0.015000 | ms
>+< mirrorImage | 1.276000 | ms
Beginning Local cues computation
CUDA error at parabola.cu:58 code=77(cudaErrorIllegalAddress) "cudaMemcpy2DToArray(cuda_parabola_pixels, 0, 0, devPixels, border_width*sizeof(int), border_width*sizeof(int), border_height*norients, cudaMemcpyDeviceToDevice)"
Any suggestions?
from damascene.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from damascene.