Comments (5)
References (in code)
- http://eigen.tuxfamily.narkive.com/EJttLl1E/tensor-execution-on-gpus
- https://eigen.tuxfamily.org/dox/TopicCUDA.html
- https://www.tensorflow.org/versions/r1.3/extend/adding_an_op#gpu_kernels
- https://bitbucket.org/eigen/eigen/src/94a7dc5d6049149ad474e828a07b9e0484c7760f/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceCuda.h?at=default&fileviewer=file-view-default
References (build)
-std=c++11
does not work with findCUDA: https://stackoverflow.com/questions/34960818/compiling-cu-using-nvcc-in-cmake- cmake 3.8+: https://stackoverflow.com/questions/36551469/triggering-c11-support-in-nvcc-with-cmake
- Eigen compatibility and pre-processor flags: https://eigen.tuxfamily.org/dox/TopicCUDA.html
- old example but has some helpful hints: https://codeyarns.com/2013/09/13/how-to-build-cuda-programs-using-cmake/
References (CUDA compilation)
- CUDA compiler flags: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compilation-phases
from evonet.
Docker and GPU support not available for Windows using CUDA
Will need to use a dual boot option to install the NVIDIA drivers directly on Linux
References:
- https://github.com/NVIDIA/nvidia-docker
- https://hub.docker.com/r/nvidia/cuda/
- https://stackoverflow.com/questions/33834463/getting-access-to-gpu-on-docker-on-windows-10
from evonet.
WINDOWS 10, VS2017, and CUDA 9.2 compilation
Windows SDK
- error: "MSB8036 The Windows SDK version 10.0.15063.0 was not found"
- fix:
- installed Windows universal app SDK https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
- retargeted solution for Windows SDK Version 10.0.17 (if necessary)
- reference: https://social.msdn.microsoft.com/Forums/vstudio/en-US/a739a8db-4e6e-478f-99c2-1348fc031985/compilation-error-with-windows-sdk-version-100150630?forum=visualstudiogeneral
VS2017 and CUDA 9.2 compatibility
- error: "unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!"
- fix:
- modified file "c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include\crt\host_config.h"
- replaced line 131 "#if _MSC_VER < 1600 || _MSC_VER > 1913" to "#if _MSC_VER < 1600 || _MSC_VER > [insert very large number]"
- reference: https://devtalk.nvidia.com/default/topic/1022648/cuda-9-unsupported-visual-studio-version-error/
Force VS2017 to build x64 and target x64 architecture
- error:
- errors about not being able to compile CUDA project for 32 bit platforms
- errors about using a x64 bit compiler to target a x86 machine
- fix: use the "-T" and "-G" cmake options to specify the host and the compiler, respectively
- reference: https://cmake.org/cmake/help/v3.8/generator/Visual%20Studio%2015%202017.html
Linking to Boost in VS2017
- error: "LNK1104 cannot open file 'libboost_unit_test_framework-vc141-mt-gd-x64-1_67.lib'"
- fix: multi-select all projects, update the "library" directories to include the Boost "lib" folder
- reference: https://docs.microsoft.com/en-us/cpp/ide/vcpp-directories-property-page
int conversion in Eigen library
- error:
- "C2397 conversion from 'unsigned __int64' to '__int64' requires a narrowing conversion"
- "C2397 conversion from 'unsigned __int64' to 'int' requires a narrowing conversion"
- fix: ensured that all integer types used when allocating the size of Tensors (e.g., Eigen::Tensor<float, 2> tensor(int_type, int_type); ) are the same
use of POSIX sleep in Eigen library
- error:
- fix: commented out line 91
//sleep(1);
in TensorDeviceCuda.h
Missing "math_functions.hpp" in Cuda 9.2
- error: C1083 Cannot open include file: 'math_functions.hpp': No such file or directory
- fix: copied "math_functions.hpp" from \include\crt to \include directory
- reference: https://stackoverflow.com/questions/43113508/math-functions-hpp-not-found-when-using-cuda-with-eigen
MSVC compiler "quirk" and Error in Eigen Macros.h file
- error: C1017 invalid integer constant expression
- fix:
- had to use the develop branch of eigen
- https://stackoverflow.com/questions/48341389/error-while-compiling-eigen-library-v3-3-4-with-vs2017-nvcc-cuda-9-0
- references:
- https://stackoverflow.com/questions/26959188/fatal-error-c1017-invalid-integer-constant-expression-when-using-if-false
- https://groups.google.com/forum/#!topic/theano-users/8rAixiyot2w
- https://msdn.microsoft.com/en-us/library/h5sh3k99.aspx
- https://cmake.org/cmake/help/v3.8/manual/cmake-generators.7.html#command-line-build-tool-generators
Cygwin64 linking errors using CUDA
- errors:
- c++: error: /subsystem:console: No such file or directory
- c++: error: opengl32.lib: No such file or directory
- fix: none found
Cygwin64 linking errors to the Boost library
- error: undefined reference to `boost::unit_test::framework::master_test_suite()
- fix: None found
- references: https://stackoverflow.com/questions/49699013/undefined-reference-to-boostunit-testframeworkmaster-test-suite
from evonet.
Results of CPU profiling
- Bottleneck 1: Pruning of nodes/link/weights
Save node/link/weight pruning until the very end - Bottleneck 2:
forwardPropogateLayerNetInput
Make a "cache" before the first epoch of training of all layers and steps of operation
Use the "cache" to allocate memory for needed tensors
Update node output/derivative/error values only when requested by the user - Bottleneck 3:
MapValuesToNodes
Refactor to "materialize" node values on the fly as requested by the user (i.e., retrieve actual values from tensors) - Bottleneck 4:
backwardPropogateLayerError
Same as Bottleneck 2 except for the back propagation steps
Code changes
LossFunction
- base class for
LossFunctionOp
andLossFunctionGradOp
- ... operator()(..., Eigen::ThreadPoolDevice& device) const = 0;
- ... operator()(..., Eigen::GpuDevice& device) const = 0;
CalculateActivation
- refactor to use Eigen::Tensor<float, 0>
- ... operator()(..., Eigen::ThreadPoolDevice& device) const = 0;
- ... operator()(..., Eigen::GpuDevice& device) const = 0;
Model
allocateForwardPropogationLayerTensors
andallocateBackwardPropogationLayerTensors
- alternatively
convertNodesToTensors
- refactor
backPropogateLayerError
to another class to implement parallelism - refactor
forwardPropogateLayerNetInput
to another class to implement parallelism
Graph of operations
FP:
- sequence 1:
- source nodes (outputs), links (weights) -> (MatMul) net input -> (SplitByActivation) split input -> (PerElement) sink nodes (output) -> (PerElement) sink nodes (derivative)[Do this during BP]
- subsequence 1:
- source nodes (outputs), links (weights) -> (MatMul) net input -> (PerElement) sink nodes (output) -> (PerElement) sink nodes (derivative)
- repeat for all subsequences
- repeat for all sequences
Error:
- output nodes (output), expected output -> (Custom) output nodes (error)
BP:
- sequence 1:
- source nodes (error), links (weights) -> (MatMul) sink nodes (tmp) -> (SplitByTime) sink nodes (tmp), sink nodes (derivative) -> (DotProd) sink nodes (errors)
- subsequence 1
- source nodes (error), links (weights) -> (MatMul) sink nodes (tmp), sink nodes (derivative) -> (DotProd) sink nodes (errors)
- repeat for all sequences
Structures
Tensors:
- Output tensors (batch x time [same as nodes])
- Derivative tensors (batch x time [same as nodes])
- Error tensors (batch x time [same as nodes])
- Link tensors [same as Weight]
Node ids:
- Matching node ID vectors and Link ID vectors
Tensor container:
- std::vector to hold output, derivative, and error tensors in order
Structure to hold Argument
- Tensor type
- time-step
- tensor index
enum to hold Operation
type
- MatMul
- Dot
- PerElement
- None
Structure to hold single Instruction
- 0:
Argument
- 1:
Argument
- operation:
Operation
ExecutionGraph
- std::vector<
Instruction
> of operations
from evonet.
Results of CPU profiling
Results after adding in thread support for PopulationTrainer
and add in thread support and node caching for Model
FP and BP steps
Bottlenecks
calculateNetNodeInput_
saveCurrentOutput()
and all other "save..." methods called duringFPTT
- Low priority others:
BPTT
andupdateWeights
from evonet.
Related Issues (20)
- Regularizer multiplier
- Store model during training after a certain number of epochs
- Model serialization
- VAE clustering example HOT 1
- GPU compliant Tensor operation classes HOT 5
- Functor Classes code cleanup
- TensorMultiMap
- Multi-model support
- Breaks when moving to device and host memory
- Improved unit tests for ModelBuilder
- ModelReplicator:addNode2
- GPU kernals for Convolution with Prod, Max, Count, Mean, ModVar accumulation functions
- Include ModelName and ModelID in validations errors export
- Model Tensor Data Export and Import
- Model, Node, and Weight Tensor Data structures
- GPU compliant model evaluation methods
- MoE example
- Learning rate schedulers
- Example methods to add to the main library
- GetModelResults
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evonet.