Comments (7)
Perhaps a good way to go is to first throw C++ errors? This projects wraps all of what we need from CUDA in C++. This example also shows how to time to microsecond precision with cuda
from galario.
A nice view on assert in C++
https://www.softwariness.com/articles/assertions-in-cpp/
bottom line: don't use assert as it's optimized away in a release build, and it doesn't report useful info
from galario.
For the current C interface we would have to register a call back function in case of error that could throw a python exception instead of aborting as now. Or perhaps there is a simpler way to create a generic cython function that we call internally and which forwards all arguments and return values. I think cython it not flexible enough to do this.
If we expose galario as a C++ library, we should throw C++ exceptions and the cython docs and this question explain how to translate those into python exceptions https://stackoverflow.com/questions/10684983/handling-custom-c-exceptions-in-cython
from galario.
We could handle memory problems inside the cuda code and retry say 5 times (user can set it perhaps). It's annoying that the whole MCMC exits if several processes happen to request too much memory at the same time
We are not the only ones
https://devtalk.nvidia.com/default/topic/909017/should-cudamallochost-need-retry-/
From numba/numba#1531
A PR was merged to improve the deallocation code. There are user configurable options exposed as environment variables. See #2046 (comment) for details. Hopefully, those options will solve most of the problems. Otherwise, you can force a deallocation with cuda.current_context().deallocations.clear().
Another reason that users should rarely need to force deallocation is that the gpu allocation function in numba will now force deallocation and retry if the first attempt resulted in out-of-memory error.
from galario.
From an CUDA guru
I read across a few pages online and found this StackOverflow post with a lot of information: https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api
Especially two entries
• https://stackoverflow.com/a/14051069/1548394 uses Thrust’s headers for a try-catch-mechanism… Might be ok for you. Thrust comes with CUDA, so it should always be there when CUDA is as well. But Thrust is a whole new programming paradigm, so who knows what side effect this has?
• https://stackoverflow.com/a/20478474/1548394 pitches their CUDA-C++ wrappers, which include modern error handling. Also this is probably too much for you, but you might want to get inspired by the way they handle errors; i.e. https://github.com/eyalroz/cuda-api-wrappers/blob/master/examples/by_runtime_api_module/error_handling.cu. Also: https://github.com/eyalroz/cuda-api-wrappers/blob/master/src/cuda/api/error.hpp
In general, I think you probably want to have some flexibility: You want to have one macro which aborts which you can use for most of the CUDA calls, and you want to have an additional macro which does not abort but catch the error and does a specific, case-to-case pre-programmed task.
from galario.
I think all our users currently use the python interface, so nobody will be sad to see the C interface go. We can then require C++ and raise C++ exceptions that can be translated to python via cython like this http://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html#exceptions
To begin with, memory errors should be separate (bad_alloc
) from all other cuda errors, so they are easier to handle from python
from galario.
Fixed by #125
from galario.
Related Issues (20)
- Add link to paper citations HOT 2
- Coordinate mesh grid
- Update Copyright statement for 2020
- AttributeError: 'function' object has no attribute 'called' HOT 5
- Initial guess for the parameters HOT 1
- galario.c still in source
- travis tests are not reported anymore... HOT 1
- Docs deploy is broken HOT 3
- Perform tests for Python 3.8
- Move to GitHub Actions HOT 2
- Fix numpy warnings
- Move docs to readthedocs
- Update 2020->2021 in Copyright notices
- Trouble building galario for GPU use HOT 7
- Test building GPU version on CUDA 11
- Scaling with CUDA cores and changing GPUs
- Move uvtable.txt to a permanent repository HOT 4
- dxy returned by get_image_size is radians - Fix typo in docs HOT 1
- origin at chi2Image HOT 1
- unrecognized argument when running ctest
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from galario.