gorgonia / gorgonia Goto Github PK

Gorgonia is a library that helps facilitate machine learning in Go.

License: Apache License 2.0

Go 96.34% C 2.81% Assembly 0.02% Python 0.05% Cuda 0.78%

machine-learning artificial-intelligence neural-network computation-graph differentiation golang go gradient-descent gorgonia deep-learning

gorgonia's Issues

`concatOp` and `sliceIncrOp` needs serious optimization

In a slice-heavy neural network, concatOp and sliceIncrOp are the major bottlenecks because it relies on FlatIterator.

Here's the relevant pprof:

Showing top 10 nodes out of 104 (cum >= 83.20s)
      flat  flat%   sum%        cum   cum%
    20.38s  7.58%  7.58%     54.31s 20.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
    17.12s  6.36% 13.94%     59.04s 21.95%  runtime.findrunnable
    15.52s  5.77% 19.71%     15.52s  5.77%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).ndNext
    13.97s  5.19% 24.90%     18.15s  6.75%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
    12.56s  4.67% 29.57%     14.50s  5.39%  runtime.runqgrab
    10.53s  3.91% 33.48%     39.56s 14.70%  runtime.chansend
     8.79s  3.27% 36.75%      9.92s  3.69%  runtime.casgstatus
        8s  2.97% 39.72%         8s  2.97%  runtime.releaseSudog
     7.03s  2.61% 42.34%      7.95s  2.96%  runtime.lock
     6.91s  2.57% 44.91%     83.20s 30.93%  runtime.schedule

and cumulatively:

Showing top 30 nodes out of 104 (cum >= 15.62s)
      flat  flat%   sum%        cum   cum%
     0.02s 0.0074% 0.0074%    172.46s 64.10%  runtime.goexit
         0     0% 0.0074%    100.49s 37.35%  github.com/chewxy/cubNN/TestNN2
         0     0% 0.0074%    100.49s 37.35%  testing.tRunner
     0.02s 0.0074% 0.015%    100.44s 37.33%  github.com/chewxy/cubNN.(*neuralnetwork2).train
     0.37s  0.14%  0.15%     99.69s 37.06%  github.com/chewxy/gorgonia.(*tapeMachine).RunAll
     0.05s 0.019%  0.17%     99.31s 36.91%  github.com/chewxy/gorgonia.(*execOp).exec
     1.05s  0.39%  0.56%     99.22s 36.88%  github.com/chewxy/gorgonia.execOp.exec
     1.07s   0.4%  0.96%     93.24s 34.66%  runtime.mcall
     3.11s  1.16%  2.12%     91.07s 33.85%  runtime.park_m
     6.91s  2.57%  4.68%     83.20s 30.93%  runtime.schedule
     2.77s  1.03%  5.71%     65.08s 24.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Chan.func1
    17.12s  6.36% 12.08%     59.04s 21.95%  runtime.findrunnable
    20.38s  7.58% 19.65%     54.31s 20.19%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
     2.23s  0.83% 20.48%     43.19s 16.05%  runtime.chansend1
         0     0% 20.48%     41.33s 15.36%  github.com/chewxy/gorgonia.(*sliceIncrOp).Do
     0.08s  0.03% 20.51%     41.33s 15.36%  github.com/chewxy/gorgonia.sliceIncrOp.Do
    10.53s  3.91% 24.42%     39.56s 14.70%  runtime.chansend
         0     0% 24.42%     37.17s 13.82%  github.com/chewxy/gorgonia.(*concatOp).Do
         0     0% 24.42%     37.17s 13.82%  github.com/chewxy/gorgonia.concatOp.Do
     0.04s 0.015% 24.44%     37.07s 13.78%  github.com/chewxy/gorgonia/tensor.Concat
     0.21s 0.078% 24.52%     37.02s 13.76%  github.com/chewxy/gorgonia/tensor/f64.(*Tensor).Concat
     3.02s  1.12% 25.64%     35.35s 13.14%  github.com/chewxy/gorgonia/tensor/f64.assignArray
     3.91s  1.45% 27.09%     33.61s 12.49%  github.com/chewxy/gorgonia/tensor/f64.(*Tensor).VAdd
     1.25s  0.46% 27.56%     30.55s 11.36%  runtime.chanrecv2
     6.22s  2.31% 29.87%     29.30s 10.89%  runtime.chanrecv
     2.24s  0.83% 30.70%     24.64s  9.16%  runtime.systemstack
     5.65s  2.10% 32.80%     20.15s  7.49%  runtime.runqsteal
    13.97s  5.19% 38.00%     18.15s  6.75%  github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
     1.79s  0.67% 38.66%     15.65s  5.82%  runtime.recv
     0.27s   0.1% 38.76%     15.62s  5.81%  runtime.goready

Of particular note is also the assignArray function. It uses the (*FlatIterator).Chan() method, which may or may not be a detriment.

Simplify Value type

Currently Value is this:

type Value interface {
    Type() Type
    Shape() types.Shape
    Size() int
    Dtype() Dtype
    Eq(other Value) bool
    Data() interface{}

    clone() (Value, error)
    zero() Value

    fmt.Formatter
}

As I added the Data() inteface method into Value, I realized that this could have been better (better because we're going with the "smaller interfaces means less leaks" idea:

type Value interface {
    Shape() types.Shape
    Size() int
    Data() interface{}
    fmt.Formatter
}

With this definition of Value, we'd of course need to write these functions:

func typeOf(v Value) Type {}
func dtypeOf(v Value) Dtype {}
func valueEq(a, b Value) boo {}
func cloneValue(v Value) (Value, error) {}
func zero(v Value) Value {}

But our interface becomes smaller, and the Tensor type can be removed completely, because types.Tensor inherently already fulfils the new Value interface.

And instead of having a catchall-type for Scalar, this can possibly be done:

type F64 float64
type F32 float32
type I int

then we can get rid of NewScalarValue and NewTensorValue. This would simplify the end API too (see also: #33 )

Add Tile() to Tensor

Related to #16 . Could probably be implemented at the same time

Standardize Solvers

Currently the different Solvers have different features. They should all have the same features: l1reg, l2reg, clip.

Also the Solver code is messy. Clean it up, with tests.

Add Norm For Tensor (and Op for Gorgonia)

API should be something along the lines of this: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html

Reason: NTMs need Norms.

Add Stack() to Tensor

It'd be like Numpy's stack.

Preliminary design looks something like this:

func (t *Tensor) Stack(other *Tensor, axis int) (*Tensor, error)

And a package-level function:

func Stack(axis int, ...*Tensor) (*Tensor, error)

Rework errors

This task is broken into two parts:

Wrap all errors in Gorgonia with the errors package, with meaningful error messages.
Remove all of the runtime.Caller() function from the basic errors.

Add Solve() Method to Tensor

Solve a matrix. Should be fairly easy if you are familiar with linear algebra.

return error if Tensor is not a matrix

Table driven tests for Transpose()

Rewrite tests for all the individual Tensor packages to use table driven tests for Transpose.

Reason is I think the current tests are incomplete, and something is leaking. A table driven test would be more complete.

Rationale: I was working on improving the performance of Materialize() and kept running into a bug with Transpose()

Distributed Computing

There are many ways to do distributed computing for something like Gorgonia. There are a few things that need to be cleared up when discussing distributed neural networks.

Firstly, which part is distributed? The currently dominant methods basically works by splitting up the calculation of the gradients and the gradient updates on different parts of the network.

Other more traditional systems have different batches being parallelly trained across the network - but this usually relies on special algorithms that are capable of handling delays and latencies.

Or the entire neural network, if large enough, could be split up across the network. This is Google level engineering that I have no ability to emulate.

The more future-looking method involves synthetic/approximated gradients, functioning more like a database with locks and updates. I am personally in favour of this future-looking design. However, it is a deceptively simple problem and I have run into various hairy issues with this.

Of course, one can also combine the multiple notions of distributedness, but I think that may be a bit too ambitious.

Existing Implementations

These gradient descent methods lend themselves to being easily parallalized:

~~God's~~ Jeff Dean's DownpourSGD
Google's Delay Tolerant Adagrad
HogWild is also worthy of checking out.

Things To Be Aware/Think About

Latency kills progress
CAP theorem - well, marginally. Distributed NNs are far from requiring consistency. In fact I'd argue that distributed NNs require linearizability the most
Network consensus - given the abundance of RAFT implementations in Go, I'd say this is one of the few problems to be least worried about.
CapnProto looks good, but everyone else is using Protobuf to do their talking. Why?

Support Batched BLAS/External (Cgo/CUDA/OpenCL) for *LispMachine

*LispMachine should keep track of node dependencies and then perform batched BLAS calls. This would enable future use of CuBLAS ops

add link

link for http://deeplearning.net/software/theano/ should be added for theano

Add Sort() to Tensor

In-place sort of a Tensor

Restore AVX/SSE code for tensorf32

Something went wrong with the transfer to this repository, and all the assembly files wrt Float32 operations failed to pass the tests. Figure out what's wrong and fix it.

Argmax and Argmin are dead wrong

Writing new tests confirmed this

Add Argmin() to Tensor

Argmax already exists (see reference implementation), should be trivial to implement argmin

Add MaxPool Op

MaxPool basically subsamples a Tensor, and returns the max value of that tensor. It can currently sorta be achieved with funny slicing and maxing, but an op on its own would be better

Remove the panic()'s

There are a few places in Gorgonia where we panic on errors instead of returning meaningful errors. for most (all?) these cases, we can probably return meaningful errors instead. This is a non-trivial amount of work, as the signature of the panic-ing functions will change, so some work needs to be done for to adjust those functions and all places where those functions are called.

Add Col2Im Op

The inverse operation of #13

Add Im2Col Op

Im2Col takes a image as a 3-Tensor and makes it into a colvec. It is extremely useful in building convolutional neural networks for image-related stuff

Add Conv2D to Gorgonia operations

Subtle UX bug

This will fail:

g := NewGraph()
x := NewVector(g, Float64, WithShape(4))
e := NewMatrix(g, Float64, WithShape(4, 10))
w := NewMatrix(g, Float64, WithShape(20, 10))
w2 := NewMatrix(g, Float64, WithShape(10, 20))
xe := Must(Mul(x, e))
act := Must(Cube(Must(Mul(w, xe))))
do := Must(Dropout(act, 0.5))

act2 := Must(Cube(Must(Mul(do, w2))))
cost := Must(Sum(act2))

_, err := Grad(cost, x, w, w2)
if err != nil {
	// ioutil.WriteFile("fullGraph.dot", []byte(g.ToDot()), 0644)
	t.Errorf("%+v", err)
}

Specifically it will fail when calculating the gradients of this line:

Must(Mul(do, w2))

>>> Shape mismatch: (20) and (10)

This is because Mul(a, b) has its semantics overloaded. When a is a vector and b is a matrix, Mul does bᵀ × a, but there is no way for the Grad function to know this. Therefore Mul(vec, Mat) is allowed (no panics), but when it comes to calculating the symbolic derivatives, it fails due to shape mismatch.

Current Solution

A hacky solution would be to do this: where Mul(vec, Mat) is called, switch the mat and vec around, to be this: Mul(Mat, vec), but this should still be fixed because it is poor usability

// +build fastmath

Port over the fastmath functions.

Add Kronecker() to Tensor

It's like Outer() but applies to Tensors greater than dimension 1. Can be quite difficult. A lot of weird corner cases to think about.

Create `May`

May is the maybe monad. Causes bugs like these to be non-existant.

On the plus side, if the maybe monad is exported, it also helps users - they'd now have Must() and May()

Table driven tests for Repeat()

Current tests for repeat may be incomplete.

Which File

github.com/chewxy/gorgonia/tensor/f64/matop_test.go

Which function

TestRepeat

What to Test

Repeat scalar on 1, 2, n axes
Repeat colvec on 1, 2, n axes
Repeat rowvec on 1, 2, n axes
Repeat vector on 1, 2, n axes
Repeat matrix on 1, 2, n axes
Idiotic actions a user might do (these should all return errors).

Rework `Op`

Currently Op is not extensible by 3rd parties who want to write their own ops. The main roadblock were the unexported methods, and the current remaining roadblocks is the type system.

The Ideal Op interface should be this:

type Op interface {
	// metadata
	Type() Type
	Arity() int
	InferShape(types.Shape...) types.Shape
	ReturnsPtr() bool
	CallsExtern() bool
	OverwritesInput() int

	// the actual op
	Do(...Value) (Value, error)

	// serialization and shit
	WriteHash(h hash.Hash)
	Hashcode() uint32
	fmt.Stringer
}

Further optional op types:

type SymDiffOp interface {
	Op

	DiffWRT(int) []bool
	SymDiff(inputs Nodes, outputNode, gradNode *Node) (Nodes, error)
}

How to Get There

Export all the methods
Add Arity() method
Rework all the InferShape() methods
Rework anything that calls SymDiff() and DiffWRT to use ~~SymDiffOp~~ SDOp
Move type system to external package (see #26)
Clean up Value types and interface (see #44)
~~Move Op into its own package~~ (Unfeasible)

Readme example differentiation typo

The code only prints the value if an error occurred. The ifs should test for err == nil.

if xgrad, err := x.Grad(); err != nil {
    fmt.Printf("dz/dx: %v", xgrad)
}

if ygrad, err := y.Grad(); err != nil {
    fmt.Printf("dz/dy: %v", ygrad)
}

Add SVD() to Tensor

Single Value Decomp. Should be fairly easy if familiar with linear algebra.

Returns error if Tensor is not a matrix

Refactor type system out into its own package

Required before #3 happens

Type system should be refined too:

remove typeClass (think about this first!)
concretify (replace *typeVariable with a unified type using pruneCompletely()) more aggressively, instead of relying on *typeVariable everywhere, which leads to a lot of GC pressure
keep functionType but export it.

link, hacktoberfest

add link for Theano on wiki page

Write tests for Apply()

Test with:

vanilla Tensor
view (viewOf != nil)
thunked transposes (transposeWith !=nil)

Upgrade Slice definition to include step

Current Slice definition:

type Slice interface{
    Start() int
    End() int
}

Upgrade to:

type Slice interface {
    Start() int
    End() int
    Step() int
}

Add Max() to Tensor

The skeleton's there:

func (t *Tensor) Max(along ...int) (retVal *Tensor, err error) {
    return nil, nil
}

This may be required to do #15

Fix up TensorDot

TensorDot() is currently broken and under the process of re-writing. This needs to be fixed ASAP

errors clean up

After the initial work on the errors, there is some more work that needs to be done to clean things up:

We need to add a way to handle the errors at the top of the package (probably gorgonia.go) which uses errors.Print to print the stack trace of errors which are caught to the user.
For this, we may wish to have a Handle(err) function which allows the users of this library to cleanly have access to our errors, which will errensially do a `errors.Print(err) if err is not nil.
We may wish to add this error handling to our tests. Currently, we user t.error(err) when an error is encountered which does not give us the stack trace.adding a Handle(err) would be a very cheap and effective way to add a stack trace to the errors in our tests which would help in debugging.

Add RollAxis

func (t *Tensor) RollAxis(axis, start int)

Similar to Numpy's rollaxis, which is essentially this:

 axes = list(range(0, n))
    axes.remove(axis)
    axes.insert(start, axis)
    return a.transpose(axes)

RNN Example

Write an example RNN.

Sigmoid for Tensors are slow

Mainly due to the fact that Tensor.Apply(sigmoidFn) is slow. There should be a way to optimize this for entire arrays

CuBLAS

Now that the float32 bugs appear to have been resolved, time to start porting CuBLAS

CSV read/writing to Tensor

Currently Tensors are gobbable, and certain tensors have WriteNpy methods to write to numpy files. Writing to CSV seems like a good idea too.

Dropout is plain wrong.

Transpose on Views Bug

T := tf64.NewTensor(tf64.WithShape(8, 10), tf64.WithBacking(tf64.RangeFloat64(0, 80)))
T2, _ := T.Slice(ss(0))
T2.T()

fmt.Printf("%v\n", T2.AP)

yields:

Shape: (10, 1), Stride: [1], Dims: 2, Lock: false

Should be:

Shape: (10), Stride: [1], Dims, 1, Lock false

Further investigation shows that this bug is entirely due to an issue in the T() and Slice() method, which doesn't play that well with views.

Add Min() to Tensor

Refer to #19

Fill Out all the Batched Functions in Blase

This issue will only close once every BLAS subroutine is covered by Blase

Break all the User Unfriendly API (and replace them with better ones)

There are a great number of things that I am not happy about with regards to the API of Gorgonia. The original package Gorgonia was based on was designed to do a few machine learning things well (notably LSTMs, and deep perceptrons). As it becomes more and more general purpose, there would need to be some API changes. The only way I can discover these API-unfriendliness is through the creation of varied neural-network stuff.

For now this issue will act as a living document of sorts. Bear in mind that these are extremely trivial to fix with gorename so they will all concentrate here in this issue.

Here are the current ones on my list of bugbears, please feel free to add your own by commenting

`NewMatrix`, `NewVector` functions

Example:

x := NewMatrix(g, Float64, WithName("x"), WithShape(2,3), WithValue(xT)

This is clearly Bad Design with capital letters. There are two things that I'm not happy about with this:

Given that we already know it's a Matrix/Vector, why not enforce the shape right away?
The New... prefix makes one think that one is creating a new Matrix, not a new *Node that represents and holds a Tensor with 2 dimensions. An alternative would be the older IsAVector() name, but the reason for changing from that is because Is...() is typically reserved for functions that return bool

Proposed Fix

x := NewNodeOfVector(g, Float64, 5, WithName("x"))
y := NewNodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))

x := NodeOfVector(g, Float64, 5, WithName("x"))
y := NodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))

`NewNodeFromAny` should be called `NodeFromValue`

It's currently called NewNodeFromAny just to fit into the whole New...() schema.

`NewTensor` in each of the `Tensor` packages should really just be `New`

~~This will be fixed in #71 - the whole tensor package has been rewritten from ground up to be more generic.~~
FIXED. The tensor package was rewritten from ground up to be more generic.

Create Mental Separation When Creating Nodes

The one thing I like about CNTK is BrainScript, which HN user IshKebab's comment made me look deeper into. I find that it creates two modes of thinking - one mode when defining the computation graph, one mode for writing code surrounding the runtime of the computation graph. This was clearly what was lacking in Theano.

On the other hand, Theano and Tensorflow have both shared semantics with Numpy, which made defining the computation graph a lot more familiar.

Add Correlate() method to Tensors

https://en.wikipedia.org/wiki/Cross-correlation

Create axis iterator for AP

What

An AxisIterator is one where you iterate along an axis or multiple axes (but no greater than the len(ap.shape))

type AxisIterator struct {
    *AP

    // additional fields for tracking position etc
}

The AxisIterator conforms to a hypothetical iterator interface:

type Iterator interface {
    next() (int, error)
}

Nice to have features:

support arbitrary starting position
step

Purpose

replace the various iterator implementation in each concrete Tensor type
works as helper struct/function for various access needs - instead of writing them out manually.

Multislice Bug

Simplest Reproduction Case

import T "github.com/chewxy/gorgonia"

x := NewMatrix(g, T.Float64, WithShape(2,3), WithName("x"))

T.Slice(x, T.S(0), T.S(1)

What Happens

Panic. Specifically index out of range when inferring shape of sliceOp

Suggested Fix

Rework all the slicing related stuff to share one common architecture

Add Convolve() to Tensors

Requires #6 before it can be done ideally