Comments (4)
Yes, that's correct! The command line was
tiny-cuda-nn> .\build\bench_image_ours.exe .\data\images\albert.exr .\data\config.json
with n_neurons: 128
and n_neurons: 64
, respectively.
The benchmark was run on Windows / MSVC 2019 / CUDA 11.3. Fan speed & power envelope of the GPU was also cranked to 100% and 114%, respectively, to minimize the impact of dynamic clocking. Unfortunately, the artificial 10-second pauses inbetween the measurements aren't quite enough to work around this in all cases. It's best to monitor GPU clock and temperature (e.g. using MSI Afterburner) to confirm.
from tiny-cuda-nn.
Thanks for clarifying!
I'm getting somewhat confusing results though. I had issues in building the project in my environment, and it is linked against CUTLASS 2.3, and some loop unrolling failed. Unfortunately, it's hard to pinpoint which loop unroll failed.
In any case, I observe lower performance than yours, except for the case of neurons=128, where I get 2x throughput, which is actually faster than the case of neurons=64 (close to 1e9 elements per second). Maybe there is a bug in my patches, I haven't checked for correctness. I also haven't looked into the profiler carefully -- I'm guessing some of the kernels are spilling. I am benchmarking on a 3090 without any power/fan tricks.
Also, what is the extent of the modifications of CUTLASS wrt the latest version available on github? I saw the PreReLU options in GemmShape, but those are only used for the resnet and I ignored them.
from tiny-cuda-nn.
In any case, I observe lower performance than yours, except for the case of neurons=128, where I get 2x throughput, which is actually faster than the case of neurons=64 (close to 1e9 elements per second). Maybe there is a bug in my patches, I haven't checked for correctness. I also haven't looked into the profiler carefully -- I'm guessing some of the kernels are spilling. I am benchmarking on a 3090 without any power/fan tricks.
It might be worth verifying the correctness of the results (are the output images trained correctly?) to see whether something is wrong under the hood is affecting the performance numbers. As you say, 2x higher throughput sounds too good to be true. :)
Also, if your assessment is based on the console output rather than the emitted .json
files, it's worth double-checking the ordering: the program first benchmarks CutlassMLP
, which is expected to be slower than the graphs from the README, before benchmarking FullyFusedMLP
. It also interleaves training and inference. In pseudocode, the ordering is:
for network in ["CutlassMLP", "FullyFusedMLP"]:
for batch_size in [2**i for i in range(14, 21)]:
bench_training_speed(network, batch_size)
bench_inference_speed(network, batch_size)
from tiny-cuda-nn.
Also, what is the extent of the modifications of CUTLASS wrt the latest version available on github? I saw the PreReLU options in GemmShape, but those are only used for the resnet and I ignored them.
I haven't actually followed CUTLASS development for a while, but the PreReLU
option is indeed the only change I remember making at the time.
from tiny-cuda-nn.
Related Issues (20)
- something to do with tinycudann
- Backward method of the grid encoding
- How to calculate FLOPs?
- Is the RTX4070ti supported?
- install issue HOT 6
- Add auxiliary losses directly imposed on params HOT 1
- README executable instructions include non-executable shell prompts
- Question about the bounding box
- initiailization of hash grid
- tinycudann ImportError: tinycudann_bindings/_80_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: HOT 2
- Enable 5D grids HOT 2
- Link Against tiny-cuda-nn in C++ Program HOT 1
- [Question]: can tiny-cuda-nn build a network with layer's bias=0?
- Problems encountered during installation HOT 1
- Already setted the CUDA_HOME but still:CUDA_HOME environment variable is not set. Please set it to your CUDA install root. HOT 1
- Manual installation with torch extension fails: parameter packs not expanding after cmake build success HOT 1
- Tiny cuda nn compilation issue
- pt
- pip install of tiny-cuda-nn does not install it in
- tinycudnn not working with conda environment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiny-cuda-nn.