Comments (3)
Calling input.precumputeMetadata(..) can help if called in the context of multiple data preparation threads, see e.g.
https://github.com/facebookresearch/SparseConvNet/blob/master/examples/Assamese_handwriting/data.py
If you are downsampling with size-2 stride-2 operations, use input.precomputeMetadata(2).
If you are downsampling with size-3 stride-2 operations, use input.precomputeMetadata(3).
precomputeMetadata uses Convolution_InputSgsToRulesAndOutputSgs as it is assumed multiple threads will be running anyway.
If you don't call precomputeMetadata, then the Convolution_InputSgsToRulesAndOutputSgs_OMP function is called as needed.
What is your network architectures? What is the input?
from sparseconvnet.
Thanks for your reply, I find a mistake in my previous experiments.
precumputeMetadata
speeds most time in ValidConvolution_SgsToRules
computation, I replace it with OMP version, and it works. Now precumputeMetadata
takes about 4.3s. Besides, setLocations
takes about 2s totally.
But data preparation (CPU) still takes longer time than GPU computation, 6.2s vs 3.0s, which causes GPU wait for data.
My data preparation code is simillar to your examples. My input is 256x192x256, my network is
self.sparseModel = scn.Sequential().add(scn.ValidConvolution(3, 1, 16, 3, False))
self.sparseModel.add(scn.MaxPooling(3, 2, 2))
self.sparseModel.add(scn.SparseResNet(3, 16,[['b', 64, 1, 1]]))
self.sparseModel.add(scn.MaxPooling(3, 2, 2))
self.sparseModel.add(scn.ResNetUNet(3, 64, 2, 4))
from sparseconvnet.
At training time, you should be able to use threads to run the single-threaded precomputeMetaData in in parallel to keep the GPU busy.
from sparseconvnet.
Related Issues (20)
- Some questions about operational efficiency
- About the parameters of InputLayer HOT 3
- Dense to Sparse for input is quite slow
- AttributeError: module 'sparseconvnet.SCN' has no attribute 'Metadata_2' HOT 3
- RuntimeError: CUDA error: an illegal memory access was encountered HOT 2
- voxel input HOT 1
- Cloning face an error
- undefined symbol: _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv HOT 2
- How to compute FLOPs for spraseconvnet HOT 3
- RuntimeError: expected scalar type Long but found Float HOT 5
- Building failure related to gcc version
- RuntimeError: expected scalar type Long but found Float
- Dilated convolution HOT 1
- Directly applying convolution HOT 1
- Rewrite for convolution operation
- Output with empty tensor
- setup.py中的C++17要改成C++14
- question from paper HOT 2
- Segment Fault due to resolution
- RuntimeError: Error compiling objects for extension
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparseconvnet.