gorgonia / gorgonia Goto Github PK
View Code? Open in Web Editor NEWGorgonia is a library that helps facilitate machine learning in Go.
Home Page: https://gorgonia.org/
License: Apache License 2.0
Gorgonia is a library that helps facilitate machine learning in Go.
Home Page: https://gorgonia.org/
License: Apache License 2.0
In a slice-heavy neural network, concatOp
and sliceIncrOp
are the major bottlenecks because it relies on FlatIterator
.
Here's the relevant pprof:
Showing top 10 nodes out of 104 (cum >= 83.20s)
flat flat% sum% cum cum%
20.38s 7.58% 7.58% 54.31s 20.19% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
17.12s 6.36% 13.94% 59.04s 21.95% runtime.findrunnable
15.52s 5.77% 19.71% 15.52s 5.77% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).ndNext
13.97s 5.19% 24.90% 18.15s 6.75% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
12.56s 4.67% 29.57% 14.50s 5.39% runtime.runqgrab
10.53s 3.91% 33.48% 39.56s 14.70% runtime.chansend
8.79s 3.27% 36.75% 9.92s 3.69% runtime.casgstatus
8s 2.97% 39.72% 8s 2.97% runtime.releaseSudog
7.03s 2.61% 42.34% 7.95s 2.96% runtime.lock
6.91s 2.57% 44.91% 83.20s 30.93% runtime.schedule
and cumulatively:
Showing top 30 nodes out of 104 (cum >= 15.62s)
flat flat% sum% cum cum%
0.02s 0.0074% 0.0074% 172.46s 64.10% runtime.goexit
0 0% 0.0074% 100.49s 37.35% github.com/chewxy/cubNN/TestNN2
0 0% 0.0074% 100.49s 37.35% testing.tRunner
0.02s 0.0074% 0.015% 100.44s 37.33% github.com/chewxy/cubNN.(*neuralnetwork2).train
0.37s 0.14% 0.15% 99.69s 37.06% github.com/chewxy/gorgonia.(*tapeMachine).RunAll
0.05s 0.019% 0.17% 99.31s 36.91% github.com/chewxy/gorgonia.(*execOp).exec
1.05s 0.39% 0.56% 99.22s 36.88% github.com/chewxy/gorgonia.execOp.exec
1.07s 0.4% 0.96% 93.24s 34.66% runtime.mcall
3.11s 1.16% 2.12% 91.07s 33.85% runtime.park_m
6.91s 2.57% 4.68% 83.20s 30.93% runtime.schedule
2.77s 1.03% 5.71% 65.08s 24.19% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Chan.func1
17.12s 6.36% 12.08% 59.04s 21.95% runtime.findrunnable
20.38s 7.58% 19.65% 54.31s 20.19% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).Next
2.23s 0.83% 20.48% 43.19s 16.05% runtime.chansend1
0 0% 20.48% 41.33s 15.36% github.com/chewxy/gorgonia.(*sliceIncrOp).Do
0.08s 0.03% 20.51% 41.33s 15.36% github.com/chewxy/gorgonia.sliceIncrOp.Do
10.53s 3.91% 24.42% 39.56s 14.70% runtime.chansend
0 0% 24.42% 37.17s 13.82% github.com/chewxy/gorgonia.(*concatOp).Do
0 0% 24.42% 37.17s 13.82% github.com/chewxy/gorgonia.concatOp.Do
0.04s 0.015% 24.44% 37.07s 13.78% github.com/chewxy/gorgonia/tensor.Concat
0.21s 0.078% 24.52% 37.02s 13.76% github.com/chewxy/gorgonia/tensor/f64.(*Tensor).Concat
3.02s 1.12% 25.64% 35.35s 13.14% github.com/chewxy/gorgonia/tensor/f64.assignArray
3.91s 1.45% 27.09% 33.61s 12.49% github.com/chewxy/gorgonia/tensor/f64.(*Tensor).VAdd
1.25s 0.46% 27.56% 30.55s 11.36% runtime.chanrecv2
6.22s 2.31% 29.87% 29.30s 10.89% runtime.chanrecv
2.24s 0.83% 30.70% 24.64s 9.16% runtime.systemstack
5.65s 2.10% 32.80% 20.15s 7.49% runtime.runqsteal
13.97s 5.19% 38.00% 18.15s 6.75% github.com/chewxy/gorgonia/tensor/types.(*FlatIterator).singleNext
1.79s 0.67% 38.66% 15.65s 5.82% runtime.recv
0.27s 0.1% 38.76% 15.62s 5.81% runtime.goready
Of particular note is also the assignArray
function. It uses the (*FlatIterator).Chan()
method, which may or may not be a detriment.
Currently Value is this:
type Value interface {
Type() Type
Shape() types.Shape
Size() int
Dtype() Dtype
Eq(other Value) bool
Data() interface{}
clone() (Value, error)
zero() Value
fmt.Formatter
}
As I added the Data() inteface
method into Value
, I realized that this could have been better (better because we're going with the "smaller interfaces means less leaks" idea:
type Value interface {
Shape() types.Shape
Size() int
Data() interface{}
fmt.Formatter
}
With this definition of Value
, we'd of course need to write these functions:
func typeOf(v Value) Type {}
func dtypeOf(v Value) Dtype {}
func valueEq(a, b Value) boo {}
func cloneValue(v Value) (Value, error) {}
func zero(v Value) Value {}
But our interface becomes smaller, and the Tensor
type can be removed completely, because types.Tensor
inherently already fulfils the new Value
interface.
And instead of having a catchall-type for Scalar, this can possibly be done:
type F64 float64
type F32 float32
type I int
then we can get rid of NewScalarValue
and NewTensorValue
. This would simplify the end API too (see also: #33 )
Related to #16 . Could probably be implemented at the same time
Currently the different Solver
s have different features. They should all have the same features: l1reg, l2reg, clip.
Also the Solver
code is messy. Clean it up, with tests.
API should be something along the lines of this: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html
Reason: NTMs need Norms.
It'd be like Numpy's stack.
Preliminary design looks something like this:
func (t *Tensor) Stack(other *Tensor, axis int) (*Tensor, error)
And a package-level function:
func Stack(axis int, ...*Tensor) (*Tensor, error)
This task is broken into two parts:
runtime.Caller()
function from the basic errors.Solve a matrix. Should be fairly easy if you are familiar with linear algebra.
return error if Tensor is not a matrix
Rewrite tests for all the individual Tensor packages to use table driven tests for Transpose.
Reason is I think the current tests are incomplete, and something is leaking. A table driven test would be more complete.
Rationale: I was working on improving the performance of Materialize()
and kept running into a bug with Transpose()
There are many ways to do distributed computing for something like Gorgonia. There are a few things that need to be cleared up when discussing distributed neural networks.
Firstly, which part is distributed? The currently dominant methods basically works by splitting up the calculation of the gradients and the gradient updates on different parts of the network.
Other more traditional systems have different batches being parallelly trained across the network - but this usually relies on special algorithms that are capable of handling delays and latencies.
Or the entire neural network, if large enough, could be split up across the network. This is Google level engineering that I have no ability to emulate.
The more future-looking method involves synthetic/approximated gradients, functioning more like a database with locks and updates. I am personally in favour of this future-looking design. However, it is a deceptively simple problem and I have run into various hairy issues with this.
Of course, one can also combine the multiple notions of distributedness, but I think that may be a bit too ambitious.
These gradient descent methods lend themselves to being easily parallalized:
*LispMachine
should keep track of node dependencies and then perform batched BLAS calls. This would enable future use of CuBLAS ops
link for http://deeplearning.net/software/theano/ should be added for theano
In-place sort of a Tensor
Something went wrong with the transfer to this repository, and all the assembly files wrt Float32 operations failed to pass the tests. Figure out what's wrong and fix it.
Writing new tests confirmed this
Argmax already exists (see reference implementation), should be trivial to implement argmin
MaxPool basically subsamples a Tensor, and returns the max value of that tensor. It can currently sorta be achieved with funny slicing and maxing, but an op on its own would be better
There are a few places in Gorgonia where we panic
on errors instead of returning meaningful errors. for most (all?) these cases, we can probably return meaningful errors instead. This is a non-trivial amount of work, as the signature of the panic-ing functions will change, so some work needs to be done for to adjust those functions and all places where those functions are called.
The inverse operation of #13
Im2Col takes a image as a 3-Tensor and makes it into a colvec. It is extremely useful in building convolutional neural networks for image-related stuff
This will fail:
g := NewGraph()
x := NewVector(g, Float64, WithShape(4))
e := NewMatrix(g, Float64, WithShape(4, 10))
w := NewMatrix(g, Float64, WithShape(20, 10))
w2 := NewMatrix(g, Float64, WithShape(10, 20))
xe := Must(Mul(x, e))
act := Must(Cube(Must(Mul(w, xe))))
do := Must(Dropout(act, 0.5))
act2 := Must(Cube(Must(Mul(do, w2))))
cost := Must(Sum(act2))
_, err := Grad(cost, x, w, w2)
if err != nil {
// ioutil.WriteFile("fullGraph.dot", []byte(g.ToDot()), 0644)
t.Errorf("%+v", err)
}
Specifically it will fail when calculating the gradients of this line:
Must(Mul(do, w2))
>>> Shape mismatch: (20) and (10)
This is because Mul(a, b)
has its semantics overloaded. When a
is a vector and b
is a matrix, Mul
does bᵀ × a
, but there is no way for the Grad
function to know this. Therefore Mul(vec, Mat)
is allowed (no panics), but when it comes to calculating the symbolic derivatives, it fails due to shape mismatch.
A hacky solution would be to do this: where Mul(vec, Mat)
is called, switch the mat and vec around, to be this: Mul(Mat, vec)
, but this should still be fixed because it is poor usability
Port over the fastmath functions.
It's like Outer()
but applies to Tensors greater than dimension 1. Can be quite difficult. A lot of weird corner cases to think about.
May
is the maybe monad. Causes bugs like these to be non-existant.
On the plus side, if the maybe monad is exported, it also helps users - they'd now have Must()
and May()
Current tests for repeat may be incomplete.
github.com/chewxy/gorgonia/tensor/f64/matop_test.go
TestRepeat
Currently Op
is not extensible by 3rd parties who want to write their own ops. The main roadblock were the unexported methods, and the current remaining roadblocks is the type system.
The Ideal Op interface should be this:
type Op interface {
// metadata
Type() Type
Arity() int
InferShape(types.Shape...) types.Shape
ReturnsPtr() bool
CallsExtern() bool
OverwritesInput() int
// the actual op
Do(...Value) (Value, error)
// serialization and shit
WriteHash(h hash.Hash)
Hashcode() uint32
fmt.Stringer
}
Further optional op types:
type SymDiffOp interface {
Op
DiffWRT(int) []bool
SymDiff(inputs Nodes, outputNode, gradNode *Node) (Nodes, error)
}
The code only prints the value if an error occurred. The ifs should test for err == nil.
if xgrad, err := x.Grad(); err != nil {
fmt.Printf("dz/dx: %v", xgrad)
}
if ygrad, err := y.Grad(); err != nil {
fmt.Printf("dz/dy: %v", ygrad)
}
Single Value Decomp. Should be fairly easy if familiar with linear algebra.
Returns error if Tensor is not a matrix
Required before #3 happens
Type system should be refined too:
typeClass
(think about this first!)*typeVariable
with a unified type using pruneCompletely()
) more aggressively, instead of relying on *typeVariable
everywhere, which leads to a lot of GC pressurefunctionType
but export it.add link for Theano on wiki page
Test with:
viewOf != nil
)transposeWith !=nil
)Current Slice definition:
type Slice interface{
Start() int
End() int
}
Upgrade to:
type Slice interface {
Start() int
End() int
Step() int
}
TensorDot()
is currently broken and under the process of re-writing. This needs to be fixed ASAP
After the initial work on the errors, there is some more work that needs to be done to clean things up:
gorgonia.go
) which uses errors.Print
to print the stack trace of errors which are caught to the user.Handle(err)
function which allows the users of this library to cleanly have access to our errors, which will errensially do a `errors.Print(err) if err is not nil.t.error(err)
when an error is encountered which does not give us the stack trace.adding a Handle(err)
would be a very cheap and effective way to add a stack trace to the errors in our tests which would help in debugging.func (t *Tensor) RollAxis(axis, start int)
Similar to Numpy's rollaxis, which is essentially this:
axes = list(range(0, n))
axes.remove(axis)
axes.insert(start, axis)
return a.transpose(axes)
Write an example RNN.
Mainly due to the fact that Tensor.Apply(sigmoidFn) is slow. There should be a way to optimize this for entire arrays
Now that the float32 bugs appear to have been resolved, time to start porting CuBLAS
Currently Tensors are gob
bable, and certain tensors have WriteNpy
methods to write to numpy files. Writing to CSV seems like a good idea too.
T := tf64.NewTensor(tf64.WithShape(8, 10), tf64.WithBacking(tf64.RangeFloat64(0, 80)))
T2, _ := T.Slice(ss(0))
T2.T()
fmt.Printf("%v\n", T2.AP)
yields:
Shape: (10, 1), Stride: [1], Dims: 2, Lock: false
Should be:
Shape: (10), Stride: [1], Dims, 1, Lock false
Further investigation shows that this bug is entirely due to an issue in the T()
and Slice()
method, which doesn't play that well with views.
Refer to #19
This issue will only close once every BLAS subroutine is covered by Blase
There are a great number of things that I am not happy about with regards to the API of Gorgonia. The original package Gorgonia was based on was designed to do a few machine learning things well (notably LSTMs, and deep perceptrons). As it becomes more and more general purpose, there would need to be some API changes. The only way I can discover these API-unfriendliness is through the creation of varied neural-network stuff.
For now this issue will act as a living document of sorts. Bear in mind that these are extremely trivial to fix with gorename
so they will all concentrate here in this issue.
Here are the current ones on my list of bugbears, please feel free to add your own by commenting
NewMatrix
, NewVector
functionsExample:
x := NewMatrix(g, Float64, WithName("x"), WithShape(2,3), WithValue(xT)
This is clearly Bad Design with capital letters. There are two things that I'm not happy about with this:
New...
prefix makes one think that one is creating a new Matrix, not a new *Node
that represents and holds a Tensor
with 2 dimensions. An alternative would be the older IsAVector()
name, but the reason for changing from that is because Is...()
is typically reserved for functions that return bool
x := NewNodeOfVector(g, Float64, 5, WithName("x"))
y := NewNodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))
or
x := NodeOfVector(g, Float64, 5, WithName("x"))
y := NodeOfMatrix(g, Float32, 2, 3, WithName("y"), WithValue(yT))
NewNodeFromAny
should be called NodeFromValue
It's currently called NewNodeFromAny
just to fit into the whole New...()
schema.
NewTensor
in each of the Tensor
packages should really just be New
This will be fixed in #71 - the whole tensor
package has been rewritten from ground up to be more generic.
FIXED. The tensor
package was rewritten from ground up to be more generic.
See also: Package names
The one thing I like about CNTK is BrainScript, which HN user IshKebab's comment made me look deeper into. I find that it creates two modes of thinking - one mode when defining the computation graph, one mode for writing code surrounding the runtime of the computation graph. This was clearly what was lacking in Theano.
On the other hand, Theano and Tensorflow have both shared semantics with Numpy, which made defining the computation graph a lot more familiar.
An AxisIterator
is one where you iterate along an axis or multiple axes (but no greater than the len(ap.shape)
)
type AxisIterator struct {
*AP
// additional fields for tracking position etc
}
The AxisIterator conforms to a hypothetical iterator interface:
type Iterator interface {
next() (int, error)
}
Nice to have features:
iterator
implementation in each concrete Tensor
typeimport T "github.com/chewxy/gorgonia"
x := NewMatrix(g, T.Float64, WithShape(2,3), WithName("x"))
T.Slice(x, T.S(0), T.S(1)
Panic. Specifically index out of range when inferring shape of sliceOp
Rework all the slicing related stuff to share one common architecture
Requires #6 before it can be done ideally
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.