idsia / brainstorm Goto Github PK
View Code? Open in Web Editor NEWFast, flexible and fun neural networks.
License: Other
Fast, flexible and fun neural networks.
License: Other
If we remove monitor_kwargs
from the Trainer
, and start()
from the hooks, it will become very simple to create and call a standalone hook without using a trainer. This seems to be useful/required.
Some alterations to the call signature for hooks might needed to make things less awkward.
Any objections to this?
I think we should (or need to) rearrange the axes of the weight matrices. The convention should be that the first axis always corresponds to the number of units/neurons in the layer. This will:
This does not require too much work yet. If it sounds good I can do it.
I wanted to continue the discussion about merging the forward- and backward buffers into a single buffer
. The main advantage I see in this is that the code will be easier to read: whenever I show people brainstorm code, one of the first questions that pops up is "what do these buffers mean/why are there 2 of them" -- admittedly my sample size so far is 2 (I myself was also confused, so let's make it 3 ;) ). Also a pattern that emerges often within the code is something along the lines of:
Ha = forward_buffers.internals.Ha
dHa = backward_buffers.internals.Ha
Which indicates, to me at least, that the true name of backward_buffers.internals.Ha
is should actually be buffers.internals.dHa
, anyways.
If we implement this change, there are two ways to go:
Explicitly list the 'backward buffers' in the get_internal_structure
:
def get_internal_structure(self):
internals = OrderedDict()
internals['Ha'] = ShapeTemplate('T', 'B', self.size)
internals['dHa'] = ShapeTemplate('T', 'B', self.size)
return internals
Have a flag for each buffer requested via get_internal_structure
, which indicates whether we need a "backward" buffer as well. In that case, just append a "d" to the name of the "forward buffer" we created. e.g.:
def get_internal_structure(self):
internals = OrderedDict()
internals['Ha'] = ShapeTemplate('T', 'B', self.size, needs_gradient_buffer=True)
return internals
The advantage of the 2nd approach is that it would still be fairly easy to lazily allocate backward buffers, and that it's less wordy. The downsize is that it might be a bit "magical". Which is why I actually prefer the first approach, even though it wastes memory and requires more typing. Typically, if you have enough memory the run a forward path, you'll have enough for the backward path as well, IMO.
Anyways... thoughts?
It should be part of the architecture post-processing
Using PyCudaHandler, I get:
[Traceback (most recent call last):
File "mnist_lstm.py", line 85, in <module>
trainer.train(network, train_getter, valid_getter=valid_getter)
File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 50, in train
File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 126, in _emit_monitoring
File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 136, in _call_monitor
File "build/bdist.linux-x86_64/egg/brainstorm/training/monitors.py", line 289, in __call__
File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 169, in run_network
File "build/bdist.linux-x86_64/egg/brainstorm/structure/network.py", line 62, in provide_external_data
File "build/bdist.linux-x86_64/egg/brainstorm/handlers/pycuda_handler.py", line 49, in copy_to
pycuda._driver.LogicError: cuMemcpyDtoD failed: invalid argument in <brainstorm.training.monitors.MonitorAccuracy object at 0x7fea2cf61050>
@untom, perhaps you have an idea about what might be causing this?
Layers should be testing for various supported activation functions (instead of just defaults).
Additionally, we need to make sure we are testing all layers equally extensively.
Currently, all buffers (parameters, internals, gradients, ...) are assumed to have the dtype (typically either float or double). This is a bit restrictive: For example, in a max-pooling operation, one would like to store which cell in the current window has the maximum value (as discussed in #29). Something similar would happen in a Maxout-Layer, or when implementing a Top-K Autoencoder. I can work around this for the max-pooling OP, but in general it would be nice to be able to specify an optional dtype for each ShapeTemplate.
We need a layer that has multiple inputs and just concatenates them along the last feature dimension.
For CPU that one can be omitted, because the NumpyHandler
supports slicing the features, but for usage with the PyCudaHandler
this is the only way of merging the outputs of two layers.
It will be useful to allow deepcopy-ing a BufferView e.g. during tests and introspection. The BufferView class currently does not allow this.
From what I can see, the mechanism for generating descriptions and serializing networks etc. does not fully work yet. @Qwlouse, you were working on this. Any comments on what else is needed?
I think we should have network.save() and trainer.save() methods for dumping to disk, and load_network() and load_trainer() functions for reading the dumps. It shouldn't be more complicated than this, I think. Thoughts?
For double buffering to work properly, the new strategy of copying data to device in provide_external_data
instead of the iterators requires some changes to how InputLayer works. This is a TODO, but it looks like there is another problem:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/training/trainer.py", line 157, in run_it
net.provide_external_data(next(it))
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/network.py", line 290, in provide_external_data
self.handler.set_from_numpy(buf, data[name])
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/pycuda_handler.py", line 73, in set_from_numpy
mem.set(arr.astype(self.dtype))
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/pycuda/gpuarray.py", line 243, in set
_memcpy_discontig(self, ary, async=async, stream=stream)
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/pycuda/gpuarray.py", line 1190, in _memcpy_discontig
drv.memcpy_htod(dst.gpudata, src)
LogicError: cuMemcpyHtoD failed: invalid device context
Because the internals.Ha view is now a sliced view, numpy.dot complains
--> H.dot(flat_input, WX, flat_Ha)
ValueError: output array is not acceptable (must have the right type, nr dimensions, and be a C-Array)
This is a bit annoying. We might be able to solve it by manually iterating over time.
Or (for some cases including this one) the problem would vanish if we switch to hub-wise memory layout again...
should wrap over any other handler and provide lots of checking
We just briefly discussed this and I think it would be good to change the buffer layout for our convolutional layers from 1 to 2:
(time, batch_size, channels, height, width)
(time, batch_size, height, width, channels)
time
, batch_size
, height
, and width
but not channels. And that you could then do with a single flatten operation.We should have naming conventions to make it easier to remember and use operations provided by the handlers.
Suggested naming scheme:
t := tensor (array of any dimensionality)
m := matrix (2D array)
v := vector (1D array)
s := scalar
Prefix element-wise operations with elem_
Conventions: _ e.g.
elem_add_tt : adds 2 tensors element-wise
dot_mm : adds 2 matrices
This is a big feature that would allow you to do
inp = InputLayer(20)
outp = ForwardLayer(4)
inp[:10] >> ForwardLayer() >> outp
inp[10:] >> ForwardLayer() >> outp
Currently, all monitors write to stdout. If brainstorm is used from an IPython notebook, and some monitor as an update
interval, this will inevitably lead to a completely frozen browser session, as IPython notebooks usually cannot deal with the massive amount of output produced by many monitors.
It would be nice if the user could set where each monitors writes its output. sys.stdout
is a sensible default, but for many applications it makes more sense to log to a file instead. Ideally this setting could be changed on a per-monitor basis, where the destination
(or whatever one wants to call it) parameter could either be a file-like object or a string denoting a filename.
(Optionally, we could still print to stdout and a file if verbose==True
, and just to the file if verbose==False
)
Behavior such as nesting will depend on the type of data iterators. Currently, we can nest all available iterators such that the innermost ones are Online, Minibatches or Undivided. This works since all iterators work with Numpy arrays.
Once we have database iterators, things may change: the data
attribute of the iterator may not contain named Numpy arrays as it currently does, since the entire dataset can not be held in memory. For such settings, we may choose to generalize iterators (so that they change behavior based on data type), or implement a separate set of iterators which can not be mixed. For example, we can have NumpyDataIterators
and DatabaseIterators
.
The best way forward might become clearer once we start working with larger datasets stored in databases/files.
Right now the construction layers are named the same as the actual layer implementations.
This leads to confusion and should be changed.
Proposals:
2 will prevent code duplication. 1 might be more intuitive to use?
Additionally, me and @Qwlouse talked some time ago about perhaps renaming Monitors (perhaps call them 'Hooks'?) This is because Monitors can do more than just monitoring stuff.
Would it make sense to provide basic plotting functionality to monitors that output per-epoch results (i.e. the accuracy monitor)?
Currently the signature is copy_to(dest, src)
, following the numpy.copy_to
signature. But the ordering is confusing. I vote for changing it to copy_to(src, dest)
to make it more intuitive, and because I don't think it is that important to be consistent with this numpy method (which is rarely used anyways).
If you think that would still be confusing we could rename the method to more clearly deviate from the numpy one.
While implementing convolutions/pooling, I've stumbled upon the following problem: Depending on how you implement the operations, you might need more/less storage. Specifically, the GPU and CPU implementations might need different amounts of it / might not need it at all. Currently I have two examples for this:
In both cases, one of the two handlers needs additional storage, while the other doesn't. What's even weirder: the argmax
can be seen as a buffer, and could be handled by the buffer manager. However, that'd lead to wasting memory on the GPU, where we'd allocate the buffer but never use it (which might also be confusing to users who inspect these buffers expecting them to mean something). The descriptors OTOH are cudnn-specific structures and probably not meant to be stored in buffers.
I can think of twho solutions
handler.allocate_pool/conv_specific_memory(...)
that returns some sort of opaque datastructure (maybe a list of descriptors, allocations), which is then stored within each layer and always passed to the conv/pooling methods... # in layer ctor:
self._pooling_data = self.handler.allocate_pool_specific_memory()`
# in forward path
def forward_pass(...):
# each specific handler implementation is free to ignore the last argument if he doesn't need it
self.handler.conv2d_forward_batch(inputs, window, outputs, pad, stride, self._pooling_data)
I'm not superhappy with either solution, since both are slightly ugly. I like solution 1 slightly more, but it has the additional problem of making the API a bit more complicated. What do you guys think?
The recent added functionality seems to have broken the PyCudaHandler (including the tests). Running the tests results in:
Traceback (most recent call last):
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/_pytest/config.py", line 543, in importconftest
mod = conftestpath.pyimport()
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/py/_path/local.py", line 650, in pyimport
__import__(modname)
File "/home/arkade/Dropbox/codes/brainstorm/test/conftest.py", line 7, in <module>
from brainstorm.structure.architecture import (
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/__init__.py", line 5, in <module>
from brainstorm.structure import *
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/__init__.py", line 4, in <module>
from brainstorm.structure.network import Network
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/network.py", line 11, in <module>
from brainstorm.structure.buffers import BufferManager
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/buffers.py", line 6, in <module>
from brainstorm.handlers import default_handler
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/__init__.py", line 10, in <module>
from brainstorm.handlers.pycuda_handler import PyCudaHandler
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/pycuda_handler.py", line 40, in <module>
class PyCudaHandler(Handler):
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/pycuda_handler.py", line 79, in PyCudaHandler
array_type = pycuda.gpuarray.GPUArray
NameError: name 'pycuda' is not defined
Changing L79 in pycuda_handler.py
to array_type = gpuarray.GPUArray
leads to the bigger problem:
Traceback (most recent call last):
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/_pytest/config.py", line 543, in importconftest
mod = conftestpath.pyimport()
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/py/_path/local.py", line 650, in pyimport
__import__(modname)
File "/home/arkade/Dropbox/codes/brainstorm/test/conftest.py", line 7, in <module>
from brainstorm.structure.architecture import (
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/__init__.py", line 5, in <module>
from brainstorm.structure import *
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/__init__.py", line 4, in <module>
from brainstorm.structure.network import Network
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/network.py", line 11, in <module>
from brainstorm.structure.buffers import BufferManager
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/structure/buffers.py", line 6, in <module>
from brainstorm.handlers import default_handler
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/__init__.py", line 10, in <module>
from brainstorm.handlers.pycuda_handler import PyCudaHandler
File "/home/arkade/Dropbox/codes/brainstorm/brainstorm/handlers/pycuda_handler.py", line 627, in <module>
_mod = SourceModule(__softmax_kernel_code)
File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/pycuda/compiler.py", line 262, in __init__
self.module = module_from_buffer(cubin)
LogicError: cuModuleLoadDataEx failed: invalid device context -
Sooner or later, we should think about introducing CUDA streams for our GPU implementation. Side-Effect: Looking at the profiling outputs, across various example the most expensive call we make is usually the set_from_numpy
call in the PyCudaHandler. We should be able to completely eliminate the cost of that call completely once we use streams, as the memory-transfers can all be done asynchronously (and we could finally implement a sensible double-buffering on GPUs).
I can think of two ways to add Streams:
Specify Stream for each Call
Add a stream=None
optional argument to all the handler functions, so that the caller can specify the stream on which to execute. When the stream is not specified, we run on the default stream. We could pass either real cuda-streams, or just stream-IDs (integers). Calls would then maybe look like this:
_h.dot_add_mm(dIa[t], x[t], dWi, transa=True, stream=_h.stream[1])
_h.dot_add_mm(dFa[t], x[t], dWf, transa=True, stream=_h.stream[2])
_h.dot_add_mm(dOa[t], x[t], dWo, transa=True, stream=_h.stream[3])
_h.dot_add_mm(dZa[t], x[t], dWz, transa=True, stream=_h.stream[4])
...
_h.synchronize_all_streams()
Add a separate function for specifying streams:
_h.set_stream(1)
_h.dot_add_mm(dIa[t], x[t], dWi, transa=True)
_h.set_stream(2)
_h.dot_add_mm(dFa[t], x[t], dWf, transa=True)
_h.set_stream(3)
_h.dot_add_mm(dOa[t], x[t], dWo, transa=True)
_h.set_stream(4)
_h.dot_add_mm(dZa[t], x[t], dWz, transa=True)
...
_h.synchronize_all_streams()
In this short example, option 1 clearly looks better (IMO), but I can see option 2 working out nicely, too.
Another thing to consider is that we might set up some rules about streams. For example, something like "outputs should always be computed on streams 0-4"... or maybe it even makes sense to have different streams for outputs, internals and parameters, so we know which ones we need to synchronize on before starting computations in a new layer (or not, IDK).
I think we should run some rudimentary profiling before the initial release, just to get rid of the worst performance offenders.
On a related note, it'd be nice to have an automatic benchmarking script, that times all handler operations and layer passes. That way it would be easy to measure the effect of speeding up handler operations and layer implementations.
Addition of convolution/pooling layers will significantly add capabilities to the library. The base has been designed keeping this in mind, so having these is a pre-alpha goal.
IMHO a strong point of brainstorm is that you can very easily implement your own layer, and distribute it to others without having to fork the library. For that process it'd be very helpful to have a convenient way of running your custom layer against all the layer-tests.
I think that should not be hard to implement and would provide a lot of value.
This happens for things like kernel sizes, strides etc. If possible, this behavior should be avoided.
For the trainer it is important to have a common buffer for all the parameters and one for all the gradients. Right now I solved the problem by introducing a separate view to the main constant-sized buffer directly in the BufferManager:
net.buffer.parameters
net.buffer.gradients
But that is problematic, because not every constant-sized view is also a parameter. Only views inside a parameters
category should go into that view. We could easily solve that problem without changing the layout code much by providing a fake layer called parameters. In the BufferManager it would then look like that:
net.buffer.forward.parameters # all parameters
net.buffer.backward.parameters # all gradients
The only objection against that could be that it is confusing to have this fake layer.
Injectors do operations, which should only be done through handlers.
Should make sure that each row in the recurrent weight matrix is used as input weights to a single neuron.
How should we format the docstrings? I do like the numpy style, but PyCharm doesn't seem to pick up type hints from them.
Example Numpy (no pycharm support):
def foo(a, b):
"""
One line summary.
More detailed description
Parameters
----------
a : int
Some number
b : str
Some text
Returns
-------
str
The repeated text
"""
return "".join([b] * a)
def foo(a, b):
"""
One line summary.
More detailed description
Args:
a (int): Some number
b (str): Some text
Returns:
str: The repeated text
"""
return "".join([b] * a)
Pycharm works fine with reStructuredText:
def foo(a, b):
"""
One line summary.
More detailed description
:param a: Some number
:type a: int
:param b: Some text
:type b: str
:returns: the repeated text
:rtype: str
"""
return "".join([b] * a)
Epytext also works and is very similar except for the usage of @
:
def foo(a, b):
"""
One line summary.
More detailed description
@param a: Some number
@type a: int
@param b: Some text
@type b: str
@returns: the repeated text
@rtype: str
"""
return "".join([b] * a)
Sphinx should be able to support all of them.
We need a mechanism to transfer data to the device while a forward pass is running.
Same holds for the targets.
similiar to param buffer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.