Giter Site home page Giter Site logo

Tensor device support about evonet HOT 5 CLOSED

dmccloskey avatar dmccloskey commented on July 17, 2024
Tensor device support

from evonet.

Comments (5)

dmccloskey avatar dmccloskey commented on July 17, 2024

References (in code)

References (build)

References (CUDA compilation)

from evonet.

dmccloskey avatar dmccloskey commented on July 17, 2024

Docker and GPU support not available for Windows using CUDA

Will need to use a dual boot option to install the NVIDIA drivers directly on Linux

References:

from evonet.

dmccloskey avatar dmccloskey commented on July 17, 2024

WINDOWS 10, VS2017, and CUDA 9.2 compilation

Windows SDK

VS2017 and CUDA 9.2 compatibility

Force VS2017 to build x64 and target x64 architecture

Linking to Boost in VS2017

int conversion in Eigen library

  • error:
    • "C2397 conversion from 'unsigned __int64' to '__int64' requires a narrowing conversion"
    • "C2397 conversion from 'unsigned __int64' to 'int' requires a narrowing conversion"
  • fix: ensured that all integer types used when allocating the size of Tensors (e.g., Eigen::Tensor<float, 2> tensor(int_type, int_type); ) are the same

use of POSIX sleep in Eigen library

  • error:
  • fix: commented out line 91 //sleep(1); in TensorDeviceCuda.h

Missing "math_functions.hpp" in Cuda 9.2

MSVC compiler "quirk" and Error in Eigen Macros.h file

Cygwin64 linking errors using CUDA

  • errors:
    • c++: error: /subsystem:console: No such file or directory
    • c++: error: opengl32.lib: No such file or directory
  • fix: none found

Cygwin64 linking errors to the Boost library

from evonet.

dmccloskey avatar dmccloskey commented on July 17, 2024

Results of CPU profiling

  • Bottleneck 1: Pruning of nodes/link/weights
    Save node/link/weight pruning until the very end
  • Bottleneck 2: forwardPropogateLayerNetInput
    Make a "cache" before the first epoch of training of all layers and steps of operation
    Use the "cache" to allocate memory for needed tensors
    Update node output/derivative/error values only when requested by the user
  • Bottleneck 3: MapValuesToNodes
    Refactor to "materialize" node values on the fly as requested by the user (i.e., retrieve actual values from tensors)
  • Bottleneck 4: backwardPropogateLayerError
    Same as Bottleneck 2 except for the back propagation steps

Code changes

LossFunction

  • base class for LossFunctionOp and LossFunctionGradOp
  • ... operator()(..., Eigen::ThreadPoolDevice& device) const = 0;
  • ... operator()(..., Eigen::GpuDevice& device) const = 0;

CalculateActivation

  • refactor to use Eigen::Tensor<float, 0>
  • ... operator()(..., Eigen::ThreadPoolDevice& device) const = 0;
  • ... operator()(..., Eigen::GpuDevice& device) const = 0;

Model

  • allocateForwardPropogationLayerTensors and allocateBackwardPropogationLayerTensors
  • alternatively convertNodesToTensors
  • refactor backPropogateLayerError to another class to implement parallelism
  • refactor forwardPropogateLayerNetInput to another class to implement parallelism

Graph of operations

FP:

  • sequence 1:
    • source nodes (outputs), links (weights) -> (MatMul) net input -> (SplitByActivation) split input -> (PerElement) sink nodes (output) -> (PerElement) sink nodes (derivative)[Do this during BP]
    • subsequence 1:
      • source nodes (outputs), links (weights) -> (MatMul) net input -> (PerElement) sink nodes (output) -> (PerElement) sink nodes (derivative)
    • repeat for all subsequences
  • repeat for all sequences

Error:

  • output nodes (output), expected output -> (Custom) output nodes (error)

BP:

  • sequence 1:
  • source nodes (error), links (weights) -> (MatMul) sink nodes (tmp) -> (SplitByTime) sink nodes (tmp), sink nodes (derivative) -> (DotProd) sink nodes (errors)
  • subsequence 1
  • source nodes (error), links (weights) -> (MatMul) sink nodes (tmp), sink nodes (derivative) -> (DotProd) sink nodes (errors)
  • repeat for all sequences

Structures

Tensors:

  • Output tensors (batch x time [same as nodes])
  • Derivative tensors (batch x time [same as nodes])
  • Error tensors (batch x time [same as nodes])
  • Link tensors [same as Weight]

Node ids:

  • Matching node ID vectors and Link ID vectors

Tensor container:

  • std::vector to hold output, derivative, and error tensors in order

Structure to hold Argument

  • Tensor type
  • time-step
  • tensor index

enum to hold Operation type

  • MatMul
  • Dot
  • PerElement
  • None

Structure to hold single Instruction

  • 0: Argument
  • 1:Argument
  • operation: Operation

ExecutionGraph

  • std::vector<Instruction> of operations

from evonet.

dmccloskey avatar dmccloskey commented on July 17, 2024

Results of CPU profiling

Results after adding in thread support for PopulationTrainer and add in thread support and node caching for Model FP and BP steps

Bottlenecks

  1. calculateNetNodeInput_
  2. saveCurrentOutput() and all other "save..." methods called during FPTT
  3. Low priority others: BPTT and updateWeights

from evonet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.