Comments (10)
Unfortunately, I've been unable to reproduce this issue so far. I've just push a few shot-in-the-dark changes that'll hopefully help narrow things down. If you have the time, it'd be very helpful if you could try the following things:
- In
src/cpp_api.cu
, try removing the three occurrences ofSyncedMultiStream synced_stream{stream, 2};
and replacing all occurrences ofsynced_stream.get(1)
withstream
. - Just to clarify: do I understand correctly that your previous answer means
TCNN_MIN_GPU_ARCH=70
worked formlp_learning_an_image
, or was that with default architecture settings? - Does
config_oneblob.json
as opposed to the defaultconfig_hash.json
work?
py .\samples\mlp_learning_an_image_pytorch.py .\data\images\albert.jpg .\data\config_oneblob.json
Thanks again!
from tiny-cuda-nn.
I have upgraded my CUDA version into 11.3 and then met the same issue as in #48 when running python setup.py install
. I referred to the solution in #49, and modified setup.py into the latest version (fb8f845), and finally tinycudann was installed successfully. mlp_learning_an_image_pytorch
seems to be running fine now. Many thanks!
from tiny-cuda-nn.
Hi,
I had the same issue, although I had to use CUDA_LAUNCH_BLOCKING=1
to see the error. It points to this check.
My environment:
- Ubuntu 18.04 in Docker container, Ubuntu 20.04.3 on host
- CUDA 11.5
- Drivers version: 495.29.05
- GPU: Tesla T4
- Python 3.9
- PyTorch 1.9.1+cu111
- I installed
tinycudann
throughpython3.9 setup.py bdist_wheel
with the latestmaster
commit (e1ccb40) - Command used:
CUDA_LAUNCH_BLOCKING=1 gdb --args python3.9 samples/mlp_learning_an_image_pytorch.py data/images/albert.jpg data/config_hash.json 1000
Here's a full backtrace (code compiled in debug here):
================================================================
This script replicates the behavior of the native CUDA example
mlp_learning_an_image.cu using tiny-cuda-nn's PyTorch extension.
This extension >> runs ~2x slower than native << as of now.
================================================================
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
NetworkWithInputEncoding(n_input_dims=2, n_output_dims=1, seed=1337, dtype=torch.float16, hyperparams={'encoding': {'base_resolution': 16, 'interpolation': 'Linear', 'log2_hashmap_size': 15, 'n_features_per_level': 2, 'n_levels': 16, 'otype': 'Grid', 'per_level_scale': 1.5, 'type': 'Hash'}, 'network': {'activation': 'ReLU', 'n_hidden_layers': 2, 'n_neurons': 64, 'otype': 'CutlassMLP', 'output_activation': 'None'}, 'otype': 'NetworkWithInputEncoding'})
Writing 'reference.jpg'... done.
Beginning optimization with 1000 training steps.
/mnt/tiny_cuda_nn/samples/mlp_learning_an_image_pytorch.py:68: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
xs = xs * torch.tensor([shape[1], shape[0]], device=xs.device).float()
Got cutlass error: Error Internal at: 363
Thread 1 "python3.9" hit Breakpoint 2, __GI_exit (status=1) at exit.c:139
139 exit.c: No such file or directory.
(gdb) bt
#0 __GI_exit (status=1) at exit.c:139
#1 0x00007f19f9d67db4 in tcnn::fc_multiply_impl<cutlass::gemm::device::Gemm<cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm70, cutlass::gemm::GemmShape<128, 128, 32, false>, cutlass::gemm::GemmShape<64, 64, 32, false>, cutlass::gemm::GemmShape<8, 8, 4, false>, tcnn::ActivationEpilogue<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, (cutlass::FloatRoundStyle)2>, cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>, 2, 8, 8, false, cutlass::arch::OpMultiplyAdd, false> > (stream=0xb2131300, args=...) at /mnt/tiny_cuda_nn/include/tiny-cuda-nn/cutlass_matmul.h:363
#2 0x00007f19f9d5c24d in tcnn::fc_multiply<tcnn::LayerConfig<cutlass::gemm::GemmShape<128, 128, 32, false>, cutlass::gemm::GemmShape<64, 64, 32, false> >, __half, (tcnn::MatrixLayout)0, __half, (tcnn::MatrixLayout)0, __half, (tcnn::MatrixLayout)1, __half, (tcnn::MatrixLayout)1> (stream=0xb2131300, A=..., B=..., C=..., D=..., act=tcnn::Activation::ReLU, transfer=false, sum_source=false) at /mnt/tiny_cuda_nn/include/tiny-cuda-nn/cutlass_matmul.h:444
#3 0x00007f19f9d566f6 in tcnn::fc_multiply<tcnn::LayerConfig<cutlass::gemm::GemmShape<128, 128, 32, false>, cutlass::gemm::GemmShape<64, 64, 32, false> >, __half, (tcnn::MatrixLayout)0, __half, __half, __half> (stream=0xb2131300, A=..., B=..., C=..., D=..., act=tcnn::Activation::ReLU, transfer=false, sum_source=false) at /mnt/tiny_cuda_nn/include/tiny-cuda-nn/cutlass_matmul.h:476
#4 0x00007f19f9d51f17 in tcnn::fc_multiply<tcnn::LayerConfig<cutlass::gemm::GemmShape<128, 128, 32, false>, cutlass::gemm::GemmShape<64, 64, 32, false> >, __half, (tcnn::MatrixLayout)0, __half, __half> (stream=0xb2131300, A=..., B=..., D=..., act=tcnn::Activation::ReLU) at /mnt/tiny_cuda_nn/include/tiny-cuda-nn/cutlass_matmul.h:482
#5 0x00007f19f9d4d5d2 in tcnn::compute_layer<tcnn::LayerConfig<cutlass::gemm::GemmShape<128, 128, 32, false>, cutlass::gemm::GemmShape<64, 64, 32, false> >, __half> (stream=0xb2131300, is_inference=false, activation=tcnn::Activation::ReLU, weights=..., input=..., output=..., activation_output=...) at /mnt/tiny_cuda_nn/src/cutlass_mlp.cu:146
#6 0x00007f19f9d49bc9 in tcnn::CutlassMLP<__half>::forward (this=0xb0216020, stream=0xb2131300, input=..., output=0x7ffe92a07700, use_inference_matrices=false, prepare_input_gradients=false) at /mnt/tiny_cuda_nn/src/cutlass_mlp.cu:241
#7 0x00007f19f9d250f0 in tcnn::NetworkWithInputEncoding<__half>::forward (this=0x5d38290, stream=0xb2131300, input=..., output=0x7ffe92a07700, use_inference_matrices=false, prepare_input_gradients=false) at /mnt/tiny_cuda_nn/include/tiny-cuda-nn/network_with_input_encoding.h:99
#8 0x00007f19f9d22333 in tcnn::cpp::NetworkWithInputEncoding::forward (this=0x5d38250, stream=0x0, n_elements=65536, input=0x7f19ef407800, output=0x7f190ec00000, params=0x7f19ee2ba600, prepare_input_gradients=false) at /mnt/tiny_cuda_nn/src/cpp_api.cu:83
#9 0x00007f19f9e7185f in Module::fwd (this=0x5bb26b0, input=..., params=...) at /mnt/tiny_cuda_nn/bindings/torch/tinycudann/bindings.cpp:99
#10 0x00007f19f9e8ac9d in pybind11::cpp_function::cpp_function<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}::operator()(Module*, at::Tensor, at::Tensor) const (this=0x53c0fd8, c=0x5bb26b0, args#0=..., args#1=...)
at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/pybind11.h:84
#11 0x00007f19f9ea74ba in pybind11::detail::argument_loader<Module*, at::Tensor, at::Tensor>::call_impl<std::tuple<tcnn::cpp::Context, at::Tensor>, pybind11::cpp_function::cpp_function<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&, 0ul, 1ul, 2ul, pybind11::detail::void_type>(pybind11::cpp_function::cpp_function<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul>, pybind11::detail::void_type&&) && (this=0x7ffe92a079d0, f=...) at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/cast.h:2042
#12 0x00007f19f9ea084f in pybind11::detail::argument_loader<Module*, at::Tensor, at::Tensor>::call<std::tuple<tcnn::cpp::Context, at::Tensor>, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&>(pybind11::cpp_function::cpp_function<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&) && (this=0x7ffe92a079d0, f=...) at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/cast.h:2014
#13 0x00007f19f9e945b2 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}, std::tuple<tcnn::cpp::Context, at::Tensor>, Module*, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&&, std::tuple<tcnn::cpp::Context, at::Tensor> (*)(Module*, at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (this=0x0, call=...) at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/pybind11.h:192
#14 0x00007f19f9e94864 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}, std::tuple<tcnn::cpp::Context, at::Tensor>, Module*, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<std::tuple<tcnn::cpp::Context, at::Tensor>, Module, at::Tensor, at::Tensor, pybind11::name, pybind11::is_method, pybind11::sibling>(std::tuple<tcnn::cpp::Context, at::Tensor> (Module::*)(at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(Module*, at::Tensor, at::Tensor)#1}&&, std::tuple<tcnn::cpp::Context, at::Tensor> (*)(Module*, at::Tensor, at::Tensor), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/pybind11.h:170
#15 0x00007f19f9e699ad in pybind11::cpp_function::dispatcher (self=0x7f19ffa24a80, args_in=0x7f19f068aa40, kwargs_in=0x0) at /usr/local/lib/python3.9/dist-packages/torch/include/pybind11/pybind11.h:767
#16 0x00000000005ef5c6 in ?? ()
#17 0x00000000005110d9 in _PyObject_MakeTpCall ()
#18 0x00000000005e7f5b in ?? ()
#19 0x0000000000573b7c in _PyEval_EvalFrameDefault ()
#20 0x00000000005117d4 in _PyFunction_Vectorcall ()
#21 0x00007f1af1b7416b in THPFunction_apply(_object*, _object*) () from /usr/local/lib/python3.9/dist-packages/torch/lib/libtorch_python.so
#22 0x00000000005ef5e8 in ?? ()
#23 0x00000000005110d9 in _PyObject_MakeTpCall ()
#24 0x0000000000574af2 in _PyEval_EvalFrameDefault ()
#25 0x00000000005117d4 in _PyFunction_Vectorcall ()
#26 0x00000000005e7ff8 in ?? ()
#27 0x0000000000423f31 in ?? ()
#28 0x00000000005717ef in _PyEval_EvalFrameDefault ()
#29 0x000000000056ee02 in ?? ()
#30 0x0000000000512269 in _PyObject_Call_Prepend ()
#31 0x000000000054f353 in ?? ()
#32 0x00000000005110d9 in _PyObject_MakeTpCall ()
#33 0x0000000000573aa6 in _PyEval_EvalFrameDefault ()
#34 0x000000000056ee02 in ?? ()
#35 0x000000000056eb2b in _PyEval_EvalCodeWithName ()
#36 0x000000000056ead3 in PyEval_EvalCode ()
#37 0x0000000000644f79 in ?? ()
#38 0x0000000000644821 in ?? ()
#39 0x0000000000470edb in ?? ()
#40 0x000000000064458c in PyRun_SimpleFileExFlags ()
#41 0x0000000000613a0a in Py_RunMain ()
#42 0x00000000006135a9 in Py_BytesMain ()
#43 0x00007f1b0a030bf7 in __libc_start_main (main=0x50d0b0, argc=5, argv=0x7ffe92a08cf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe92a08ce8) at ../csu/libc-start.c:310
#44 0x00000000006134ba in _start ()
from tiny-cuda-nn.
Thanks for reporting this!
To help narrow this down: does the native mlp_learning_an_image
sample run correctly?
Also: which specific GPU(s) are you using?
from tiny-cuda-nn.
@Tom94: thanks for the quick feedback, and for the entire project! mlp_learning_an_image
seems to be running fine, and I'm on a Tesla T4. I'll try to see if I can reproduce it on other GPUs.
EDIT: In my case I had to play a bit with the TCNN_MIN_GPU_ARCH
variable to have a single wheel that I can run on different GPUs (TCNN_MIN_GPU_ARCH=70
here), but since @OctoberKat seems to be using the default setup.py
, that may not be the cause of this.
from tiny-cuda-nn.
Some more tests seem to indicate that this happens with TCNN_MIN_GPU_ARCH=70
but not TCNN_MIN_GPU_ARCH=75
. @OctoberKat's GeForce GTX 1080 Ti has a CC of 6.1 if I'm not mistaken, so maybe the problem comes from here? Does cutlass
has a minimum supported CC? Or maybe the older architectures are not as well tested with recent cutlass
releases? 🤷♂️
@Tom94 you might be able to reproduce the error if you force TCNN_MIN_GPU_ARCH
in setup.py
.
from tiny-cuda-nn.
I also tried to reproduce this with the CMake build and TCNN_CUDA_ARCHITECTURES="70;75"
(so that TCNN_MIN_GPU_ARCH=70
), but I could not reproduce the error.
from tiny-cuda-nn.
- This does not seem to make a difference, even with the latest commits.
- Yes,
TCNN_MIN_GPU_ARCH=70
worked withmlp_learning_an_image
on a CC 7.5 GPU. - Same error with
config_oneblob.json
.
But since everything worked with the CMake build, I went back to my modified setup.py
that allows me to generate debug builds. For debug builds, I use -G
for nvcc
, but apparently it can cause miscompilations with cutlass
, and it seems to be the cause of my problems. If I add it to the nvcc
flags in the CMake build, I can reproduce the problem. With the current master branch (f32361e) and no -G
, I'm no longer seeing the error with the PyTorch build. I was really not expecting that to be a source of problems!
Still, in the original bug report, the original setup.py
without all my shenanigans is used, so there is probably another problem. @OctoberKat can run the various tests you suggested to see if that leads to different results.
I'll continue to run some tests today, and get back to you if anything strange happens again. Thanks again for the help and the fast feedback, greatly appreciated!
from tiny-cuda-nn.
Likewise, thank you very much for helping out! Unfortunately, I can't think of anything beyond what you've found based on your latest results...
Knowing about -G
being problematic is good to say the least (and, frankly, shocking).
@OctoberKat one more idea that could help with your setup would be upgrading CUDA (I see you have 11.0) to a more recent version (such as 11.6).
from tiny-cuda-nn.
Ok, I will have a try. Thank you everyone.
from tiny-cuda-nn.
Related Issues (20)
- 'tiny cuda nn' issue HOT 1
- something to do with tinycudann
- Backward method of the grid encoding
- How to calculate FLOPs?
- Is the RTX4070ti supported?
- install issue HOT 6
- Add auxiliary losses directly imposed on params HOT 1
- README executable instructions include non-executable shell prompts
- Question about the bounding box
- initiailization of hash grid
- tinycudann ImportError: tinycudann_bindings/_80_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: HOT 2
- Enable 5D grids HOT 2
- Link Against tiny-cuda-nn in C++ Program HOT 1
- [Question]: can tiny-cuda-nn build a network with layer's bias=0?
- Problems encountered during installation HOT 1
- Already setted the CUDA_HOME but still:CUDA_HOME environment variable is not set. Please set it to your CUDA install root. HOT 1
- Manual installation with torch extension fails: parameter packs not expanding after cmake build success HOT 1
- Tiny cuda nn compilation issue
- pt
- pip install of tiny-cuda-nn does not install it in
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiny-cuda-nn.