Giter Site home page Giter Site logo

Comments (9)

fynv avatar fynv commented on June 1, 2024

Just a guess, it could be a fail of one of:
cuMemAlloc()
cuMemcpyHtoD()
cuMemsetD8()
These could be called before m_name_view_cls is set.

DVVector::DVVector(const char* elem_cls, size_t size, void* hdata)
	: DVVectorLike(elem_cls, (std::string(elem_cls)+"&").c_str(), size)
{
	TRTC_Try_Init();
	
	CUdeviceptr dptr;
	if (!CheckCUresult(cuMemAlloc(&dptr, m_elem_size*m_size), "cuMemAlloc()")) return;
	m_data = (void*)dptr;
	if (hdata)
	{
		if (!CheckCUresult(cuMemcpyHtoD(dptr, hdata, m_elem_size*m_size), "cuMemcpyHtoD()")) return;
	}
	else
	{
		if (!CheckCUresult(cuMemsetD8(dptr, 0, m_elem_size*m_size), "cuMemsetD8()")) return;
	}

	m_name_view_cls = std::string("VectorView<") + m_elem_cls + ">";
}

If this is the case, there should be some related error message in stdout, before the code dump.

Since that this happens in a constructor, it is not properly returned to Python, which is a design flaw..

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

Thanks! Exactly!
This is what is shown above the error message:

cuMemAlloc() failed with Error code: 700
Error Name: CUDA_ERROR_ILLEGAL_ADDRESS
Error Description: an illegal memory access was encountered

What could be the reason???

from thrustrtc.

fynv avatar fynv commented on June 1, 2024

It is quite unlikely that cuMemAlloc() throws a CUDA_ERROR_ILLEGAL_ADDRESS.
But the following link reminded me that this could be "the result of a previous kernel call":
https://forums.developer.nvidia.com/t/cumemalloc-throw-cuda-error-illegal-address/59068

So maybe you should check the previous kernel call, is it a built-in kernel or a custom kernel?
Then we check for things like an out-of-bound access.

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

Thank you!
It's tricky. It is a set of tests which individually work OK, however when run in a sequence using pytest, some fail. Moreover, the error messages differ from run to run... will continue investigating. Thank you for your help!

from thrustrtc.

fynv avatar fynv commented on June 1, 2024

I've just released a 0.3.16 patch, adding a new switch ThrustRTC.Set_Kernel_Debug(), which could be helpful to this case.
When it is set True, cuCtxSynchronize() will be called after each kernel launch.
If an error is returned, the CUDA code of the kernel will be dumped and the program will be terminated.

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

Thank you, @fynv!
I'll test it tomorrow

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

@fynv trying out with 0.3.16 and with Set_Kernel_Debug(True) called changes indeed the behavior - a "Failed to build kernel" message is printed and test execution stops here. This is a progress. Yet, the dumped code is the same as was dumped previously - still need to debug more to understand where the CUDA_ERROR_ILLEGAL_ADDRESS comes from here. Will write back. Thanks!

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

getting close (something related with passing size=0), will fix tomorrow
Thanks again for support!

from thrustrtc.

slayoo avatar slayoo commented on June 1, 2024

well, fixed. the cause was passing 0 as second arg to device_vector()
Again, thanks

from thrustrtc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.