Giter Site home page Giter Site logo

LSTM Support about nninit HOT 9 CLOSED

Kaixhin avatar Kaixhin commented on July 28, 2024
LSTM Support

from nninit.

Comments (9)

bshillingford avatar bshillingford commented on July 28, 2024

Might be better to try to support https://github.com/Element-Research/rnn/blob/master/FastLSTM.lua instead, since it has batched GEMMs. This LSTM implementation has batched GEMMs as well, but uses nngraph: https://github.com/karpathy/char-rnn/blob/master/model/LSTM.lua.

You'd need a way to apply :init() to a submatrix. In the rnn.FastLSTM case, it'd be something like for i from 0 to 3, applying the initializations to the square (assuming LSTM input size = state/output size) submatrix given by i2g.weight:narrow(1, 1+i*outputSize, outputSize). Note the output dimension for weight matrices in nn.Linear is the 1st dim.

from nninit.

nicholas-leonard avatar nicholas-leonard commented on July 28, 2024

@Kaixhin So I guess we could just pluck them out form the respective sub-components of the LSTM or FastLSTM modules. For the first part, we could use the internal references to self.inputGate (https://github.com/Element-Research/rnn/blob/master/LSTM.lua#L53) and such.

@bshillingford is right about how to do this for FastLSTM.

If you make this extension of the rnn package, I will make nninit a default dependency of the rnn package.

from nninit.

Kaixhin avatar Kaixhin commented on July 28, 2024

@nicholas-leonard I realised that the internal references were sufficient for nn.LSTM, but suspected something more powerful was needed in general. As @bshillingford pointed out, that something is indexing. I've introduced this for the basic functions in a new branch with 24cf519, along with a nn.FastLSTM example. Since this is general, I don't see any need to introduce nninit as a dependency for rnn, but if you want to introduce it as a full or optional dependency that's fine too.

The main issue with nninit as a dependency is that the API is still in flux, so it would be good to settle on something (even if breaking changes are introduced along with incrementing the luarocks version number). Points to consider and give feedback on:

  • wInit and bInit alongside initialisation schemes passed as strings means that the nn.Module namespace is not polluted.
  • Do addConstant and mulConstant make sense, or should they be integrated as modifiers to the constant function?
  • Should indexing be added to every initialiser? Ideally yes, but what should the order of arguments be? Presumably indexing should be the last argument?
  • In relation to the last point, the API for calculating gain needs to be changed to accept one argument - the current idea to support lrelu would be to use a table that contains the string 'lrelu' and the leakiness parameter.

from nninit.

bshillingford avatar bshillingford commented on July 28, 2024

@Kaixhin Maybe there's a way to more strongly decouple the modified tensors from the nn Modules.

Re: passing initialisation schemes as strings, what do you think about putting the schemes into a namespace then accepting them directly as arguments?
E.g. might look like

local nninit = require 'nninit'
local m = nn.Linear(5,5):wInit(nninit.orthogonal)

where that's either the function itself or a wrapper object. Subjectively cleaner.

You've done some decoupling already but as a private API. Publicising the private API (maybe in a sub-namespace?) for initialisation primitives might look something like:

local nninit = require 'nninit'
local lstm_input = nn.Linear(input_size, rnn_size*4) -- input weights
for _, wt in ipairs(lstm_input.weight:split(1, rnn_size)) do
  nninit.inits.xavier(wt)
end
local lstm_recur = nn.Linear(rnn_size, rnn_size*4) -- recurrent weights
for _, wt in ipairs(lstm_recur.weight:split(1, rnn_size)) do
  nninit.inits.orthogonal(wt)
end

Less concise, but more flexible... there might be a way to find a good middle ground here. Perhaps with a function as the arg you can generalise more:

local lstm_recur = nn.Linear(rnn_size, rnn_size*4)
    :init(function(m) return m.weight:narrow(1, 1, rnn_size) end, ...)
    :init(function(m) return m.weight:narrow(1, rnn_size, rnn_size) end, ...)
    :init(function(m) return m.weight:narrow(1, 2*rnn_size, rnn_size) end, ...)...

where the first arg to init is something to initialise (let's call it target). If target is a string like bias or weight, you'd operate on self[target], and if target is a function, the function takes self and returns the tensor to modify. I guess that could be shortened by letting target return a multiple things:

local lstm_recur = nn.Linear(rnn_size, rnn_size*4)
    :init(function(m) return unpack(m.weight:split(1, rnn_size)) end, ...)

Having a single init might clean up the code a bit (in the sense of duplication and coupling) while improving flexibility. This might be making things too general for a small number of use cases, since almost all nn modules only have weight and bias. Also not sure how you'd address addConstant and mulConstant in this case.

Thoughts on these random ideas?

from nninit.

Kaixhin avatar Kaixhin commented on July 28, 2024

Would like to have init instead of wInit and bInit, and the first argument could be w or b. Second argument can be the namespaced function, e.g. nninit.orthogonal(). All optional arguments within this should be passed through a table, such that (in the case of xavier) dist does not have to be specified in order to specify gain.

Functions for the target are powerful but lengthy - is the indexing operator not general enough? In this case would have the indices (either in table or binary mask form) as the third and final argument to init. Concretely:

lstm.i2g
    :init('w', nninit.kaiming({dist = 'uniform', gain = {gain = 'lrelu', leakiness = 0.3}}))
    :init('b', nninit.constant(1), {{3*hiddenSize-1, 3*hiddenSize}})

eye is an example of where exposing initialisation primitives would fail, as it depends on the module type. Another potential example would be initialising nn.SpatialConvolution with filters used in image processing (such as a Gaussian filter) - these initialisations would not make sense for linear layers (and many already do not make sense for biases). A further argument for this would be that the concept of abstraction in Torch is the module/layer, where a conscious decision has been made to tie weights and biases to modules rather than have them act as standalone tensors.

from nninit.

bshillingford avatar bshillingford commented on July 28, 2024

Good points. Indexing should be general enough in most cases, but I'm
thinking of other obscure cases: suppose I write a layer for learned biases
of some arbitrary shape, and it's parameters don't have the name 'weight'
or 'bias', or lie inside a table inside the module. It would be useful to
expose primitives if you want to do things like initialize conv filters to
multiple scales of Gaussian filters too, but again this is an uncommon case
and I suppose indexing solves this anyway (I think?).

I agree re Torch's philosophy, but I see your library as a way to address
Torch's philosophy's shortcomings. That all said, I agree with your
comments, thanks for sharing the library.

Sent from mobile
On Dec 1, 2015 11:57 PM, "Kai Arulkumaran" [email protected] wrote:

Would like to have init instead of wInit and bInit, and the first
argument could be w or b. Second argument can be the namespaced function,
e.g. nninit.orthogonal(). All optional arguments within this should be
passed through a table, such that (in the case of xavier) dist does not
have to be specified in order to specify gain.

Functions for the target are powerful but lengthy - is the indexing
operator
https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor--dim1dim2--or--dim1sdim1e-dim2sdim2e-
not general enough? In this case would have the indices (either in table or
binary mask form) as the third and final argument to init. Concretely:

lstm.i2g
:init('w', nninit.kaiming({dist = 'uniform', gain = {gain = 'lrelu', leakiness = 0.3}}))
:init('b', nninit.constant(1), {{3_hiddenSize-1, 3_hiddenSize}})

eye is an example of where exposing initialisation primitives would fail,
as it depends on the module type. Another potential example would be
initialising nn.SpatialConvolution with filters used in image processing
(such as a Gaussian filter) - these initialisations would not make sense
for linear layers (and many already do not make sense for biases). A
further argument for this would be that the concept of abstraction in Torch
is the module/layer, where a conscious decision has been made to tie
weights and biases to modules rather than have them act as standalone
tensors.


Reply to this email directly or view it on GitHub
#3 (comment).

from nninit.

Kaixhin avatar Kaixhin commented on July 28, 2024

I would like to be convinced that this could let weights be treated more generally and hence overcome a shortcoming with Torch, but this seems tricky. For example, nninit.inits.xavier(wt) depends on the fan in and fan out of the module - it could be worked out using the dimensions, but then eye will fail on nn.Linear vs. nn.TemporalConvolution.

I do see the case for more arbitrary parameters and am happy to go ahead with target being a string or function. I think that in the more general case if multiple tensors need to be returned then they should be placed in a table (rather than being unpacked), but in what situation would you provide a sequence of tensors when they would all be initialised by the same method?

One trivial way to do more complicated initialisations is to introduce a tensor function that just calls copy on a supplied tensor that matches the target size. Rather than operating directly on the weights (for example), using this within :init() would allow the convenience of chaining.

from nninit.

Kaixhin avatar Kaixhin commented on July 28, 2024

OK new API almost ready on the indexing branch. The accessor argument is either a string, a string and set of indices, or a function. The "primitives" are somewhat exposed, and could potentially be used with nil as the first argument (a bit messy, but non-trivial initialisations in this case seem tied to the layer type/properties). A copy method has been added for custom initialisations.

eye and orthogonal need fixing as they are the most tied to the module. The example will also need revising once those two methods are fixed.

from nninit.

Kaixhin avatar Kaixhin commented on July 28, 2024

Example now updated and simply having eye and orthogonal only process module weights. Will merge into master tomorrow unless there is any further API scoping to be done.

from nninit.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.