Comments (9)
Just a guess, it could be a fail of one of:
cuMemAlloc()
cuMemcpyHtoD()
cuMemsetD8()
These could be called before m_name_view_cls is set.
DVVector::DVVector(const char* elem_cls, size_t size, void* hdata)
: DVVectorLike(elem_cls, (std::string(elem_cls)+"&").c_str(), size)
{
TRTC_Try_Init();
CUdeviceptr dptr;
if (!CheckCUresult(cuMemAlloc(&dptr, m_elem_size*m_size), "cuMemAlloc()")) return;
m_data = (void*)dptr;
if (hdata)
{
if (!CheckCUresult(cuMemcpyHtoD(dptr, hdata, m_elem_size*m_size), "cuMemcpyHtoD()")) return;
}
else
{
if (!CheckCUresult(cuMemsetD8(dptr, 0, m_elem_size*m_size), "cuMemsetD8()")) return;
}
m_name_view_cls = std::string("VectorView<") + m_elem_cls + ">";
}
If this is the case, there should be some related error message in stdout, before the code dump.
Since that this happens in a constructor, it is not properly returned to Python, which is a design flaw..
from thrustrtc.
Thanks! Exactly!
This is what is shown above the error message:
cuMemAlloc() failed with Error code: 700
Error Name: CUDA_ERROR_ILLEGAL_ADDRESS
Error Description: an illegal memory access was encountered
What could be the reason???
from thrustrtc.
It is quite unlikely that cuMemAlloc() throws a CUDA_ERROR_ILLEGAL_ADDRESS.
But the following link reminded me that this could be "the result of a previous kernel call":
https://forums.developer.nvidia.com/t/cumemalloc-throw-cuda-error-illegal-address/59068
So maybe you should check the previous kernel call, is it a built-in kernel or a custom kernel?
Then we check for things like an out-of-bound access.
from thrustrtc.
Thank you!
It's tricky. It is a set of tests which individually work OK, however when run in a sequence using pytest, some fail. Moreover, the error messages differ from run to run... will continue investigating. Thank you for your help!
from thrustrtc.
I've just released a 0.3.16 patch, adding a new switch ThrustRTC.Set_Kernel_Debug(), which could be helpful to this case.
When it is set True, cuCtxSynchronize() will be called after each kernel launch.
If an error is returned, the CUDA code of the kernel will be dumped and the program will be terminated.
from thrustrtc.
Thank you, @fynv!
I'll test it tomorrow
from thrustrtc.
@fynv trying out with 0.3.16 and with Set_Kernel_Debug(True) called changes indeed the behavior - a "Failed to build kernel" message is printed and test execution stops here. This is a progress. Yet, the dumped code is the same as was dumped previously - still need to debug more to understand where the CUDA_ERROR_ILLEGAL_ADDRESS comes from here. Will write back. Thanks!
from thrustrtc.
getting close (something related with passing size=0), will fix tomorrow
Thanks again for support!
from thrustrtc.
well, fixed. the cause was passing 0 as second arg to device_vector()
Again, thanks
from thrustrtc.
Related Issues (20)
- report wrong number of argument for launch_n as exception HOT 3
- import fails after fresh install with pip (/lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found) HOT 7
- Give more useful debug feedback for internal error HOT 6
- update conda package HOT 5
- ThrustRTC example failed execution - Loading libnvrtc failed. cuMemAlloc() failed with Error code: 2 HOT 5
- Large alloc bug, regression in 0.3.11 HOT 5
- Possibility of doing FFT and representing Complex Numbers with ThrustRTC HOT 9
- release/packaging syncinc HOT 2
- provide ThrustRTC.__version__ HOT 2
- Fill() regression in 0.3.9 HOT 18
- raise an exception in case of incompatible CUDA version HOT 8
- better error handling for 0-size vector creation HOT 2
- Sequence() works on Compute Capability 6.1, but doesn't work on 8.6 HOT 7
- Raise "NVRTC version too high for GPU (compute_7)" on RTX2060 when "Sort_By_Key" HOT 2
- About python trtc.Functor HOT 3
- About mutliprocess HOT 1
- Inability to do arithmetic operations for different argument types - bug or feature? HOT 4
- Question: JCuda integration HOT 1
- segfault in reduction (due to NULL returned from native.n_reduce()) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thrustrtc.