Giter Site home page Giter Site logo

Comments (7)

fredRos avatar fredRos commented on July 22, 2024

Perhaps a good way to go is to first throw C++ errors? This projects wraps all of what we need from CUDA in C++. This example also shows how to time to microsecond precision with cuda

https://github.com/eyalroz/cuda-api-wrappers/blob/master/examples/modified_cuda_samples/simpleStreams/simpleStreams.cu

from galario.

fredRos avatar fredRos commented on July 22, 2024

A nice view on assert in C++
https://www.softwariness.com/articles/assertions-in-cpp/

bottom line: don't use assert as it's optimized away in a release build, and it doesn't report useful info

from galario.

fredRos avatar fredRos commented on July 22, 2024

For the current C interface we would have to register a call back function in case of error that could throw a python exception instead of aborting as now. Or perhaps there is a simpler way to create a generic cython function that we call internally and which forwards all arguments and return values. I think cython it not flexible enough to do this.

If we expose galario as a C++ library, we should throw C++ exceptions and the cython docs and this question explain how to translate those into python exceptions https://stackoverflow.com/questions/10684983/handling-custom-c-exceptions-in-cython

from galario.

fredRos avatar fredRos commented on July 22, 2024

We could handle memory problems inside the cuda code and retry say 5 times (user can set it perhaps). It's annoying that the whole MCMC exits if several processes happen to request too much memory at the same time

We are not the only ones
https://devtalk.nvidia.com/default/topic/909017/should-cudamallochost-need-retry-/

From numba/numba#1531

A PR was merged to improve the deallocation code. There are user configurable options exposed as environment variables. See #2046 (comment) for details. Hopefully, those options will solve most of the problems. Otherwise, you can force a deallocation with cuda.current_context().deallocations.clear().

Another reason that users should rarely need to force deallocation is that the gpu allocation function in numba will now force deallocation and retry if the first attempt resulted in out-of-memory error.

https://stackoverflow.com/questions/30909368/once-cudamalloc-returns-out-of-memory-every-cuda-api-call-returns-failure

from galario.

fredRos avatar fredRos commented on July 22, 2024

From an CUDA guru

I read across a few pages online and found this StackOverflow post with a lot of information: https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api
Especially two entries
https://stackoverflow.com/a/14051069/1548394 uses Thrust’s headers for a try-catch-mechanism… Might be ok for you. Thrust comes with CUDA, so it should always be there when CUDA is as well. But Thrust is a whole new programming paradigm, so who knows what side effect this has?
https://stackoverflow.com/a/20478474/1548394 pitches their CUDA-C++ wrappers, which include modern error handling. Also this is probably too much for you, but you might want to get inspired by the way they handle errors; i.e. https://github.com/eyalroz/cuda-api-wrappers/blob/master/examples/by_runtime_api_module/error_handling.cu. Also: https://github.com/eyalroz/cuda-api-wrappers/blob/master/src/cuda/api/error.hpp

In general, I think you probably want to have some flexibility: You want to have one macro which aborts which you can use for most of the CUDA calls, and you want to have an additional macro which does not abort but catch the error and does a specific, case-to-case pre-programmed task.

from galario.

fredRos avatar fredRos commented on July 22, 2024

I think all our users currently use the python interface, so nobody will be sad to see the C interface go. We can then require C++ and raise C++ exceptions that can be translated to python via cython like this http://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html#exceptions

To begin with, memory errors should be separate (bad_alloc) from all other cuda errors, so they are easier to handle from python

from galario.

fredRos avatar fredRos commented on July 22, 2024

Fixed by #125

from galario.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.