Giter Site home page Giter Site logo

nninit's Introduction

nninit

Parameter initialisation schemes for Torch7 neural network modules. Works with nn, and therefore nngraph. Allows arbitrary indexing of weights/biases/parameters. Supported modules:

  • nn.Linear / nn.LinearNoBias
  • nn.LookupTable
  • nn.TemporalConvolution
  • nn.SpatialConvolution / cudnn.SpatialConvolution
  • nn.VolumetricConvolution / cudnn.VolumetricConvolution

Readme contents:

Installation

luarocks install nninit

Usage

nninit adds an init method to nn.Module, with the following API:

module:init(accessor, initialiser, ...)

The accessor argument is used to extract the tensor to be initialised from the module. The initialiser argument is a function that takes the module, tensor, and further options; it adjusts the tensor and returns the module, allowing init calls to be chained. nninit comes with several initialiser functions. ... represents additional arguments for the initialiser function.

Accessors

The accessor argument is used to extract the tensor to be initialised from the module. It can either be a string, table, or function.

string

The tensor is accessed as a property of the module. For example:

module:init('weight', nninit.constant, 1)

table

The tensor is first accessed as a property of the module from the first element, and a subtensor is then extracted using Torch's indexing operator applied to the second element. For example:

module:init({'weight', {{1, 5}, {}}}, nninit.uniform, -1, 1)

function

The tensor must be returned as the result of the function applied to the module. For example:

module:init(function(m) return m.weight:narrow(1, 1, 10) end, nninit.normal, 0, 0.01)

Initialisers

nninit.copy(module, tensor, init)

Copies the init tensor to the tensor to be initialised.

nninit.constant(module, tensor, val)

Fills tensor with the constant val.

nninit.addConstant(module, tensor, val)

Adds to current tensor with the constant val.

nninit.mulConstant(module, tensor, val)

Multiplies current tensor by the constant val.

nninit.normal(module, tensor, mean, stdv)

Fills tensor ~ N(mean, stdv).

nninit.addNormal(module, tensor, mean, stdv)

Adds to current tensor with ~ N(mean, stdv).

nninit.uniform(module, tensor, a, b)

Fills tensor ~ U(a, b).

nninit.addUniform(module, tensor, a, b)

Adds to current tensor with ~ U(a, b).

nninit.eye(module, tensor)

Only supports the module weights as the tensor. Relies on the module type to determine appropriate identity.
Fills weights with the identity matrix (for linear layers/lookup tables).
Fills filters with the Dirac delta function (for convolutional layers). Normalises by the number of input layers.

nninit.xavier(module, tensor, [{[dist], [gain]}])

Fills tensor with stdv = gain * sqrt(2 / (fanIn + fanOut)). Uses the uniform distribution by default.
Optional named parameters dist and gain can be passed in via a table.
Also known as Glorot initialisation.

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics.

nninit.kaiming(module, tensor, [{[dist], [gain]}])

Fills tensor with stdv = gain * sqrt(1 / fanIn). Uses the normal distribution by default.
Optional named parameters dist and gain can be passed in via a table. The initialisation scheme typically includes the gain for ReLU units, which has to be manually specified in nninit.kaiming with the option {gain = 'relu'}.
Also known as He initialisation.

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. arXiv preprint arXiv:1502.01852.

nninit.orthogonal(module, tensor, [{[gain]}])

Only supports tensors with at least 2 dimensions.
Fills tensor with a (normally distributed) random orthogonal matrix.
Optional named parameter gain can be passed in via a table.

Saxe, A. M., McClelland, J. L., & Ganguli, S. (2013). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.

nninit.sparse(module, tensor, sparsity)

Sets (1 - sparsity) percent of the tensor to 0, where sparsity is between 0 and 1. For example, a sparsity of 0.2 drops out 80% of the tensor.

Martens, J. (2010). Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML-10).

nninit.convolutionAware(module, tensor, [{[gain], [std]}])

Only supports 2D convolutions with a symmetric filter size.
Fills convolution tensor with matrices that are orthogonal in the frequency space. The initialisation scheme described in the paper includes the gain for ReLU units, which has to be manually specified with the option {gain = 'relu'}. The optional named parameter std can be passed in via a table. It specifies the noise to break symmetry in the inverse Fourier transform.

Aghajanyan, A. (2017). Convolution Aware Initialization. arXiv preprint arXiv:1702.06295.

Dists

The 2 types of distribution supported are 'normal' and 'uniform'.

Gains

Gains can be calculated depending on the succeeding nonlinearity. If gain is a number it is used directly; if gain is a string the following mapping is used. By default gains (where applicable) are set to 1.

Gain Parameters Mapping
'linear' 1
'sigmoid' 1
'tanh' 5 / 3
'relu' sqrt(2)
'lrelu' leakiness sqrt(2 / (1 + leakiness^2))

If the gain must be calculated from additional parameters, gain must be passed as table with the string as the first element as well as named parameters. For example:

module:init('weight', nninit.kaiming, {gain = {'lrelu', leakiness = 0.3}})

Example

local nn = require 'nn'
require 'cunn'
local cudnn = require 'cudnn'
require 'rnn'
local nninit = require 'nninit'

local getBias = function(module)
  return module.bias
end

local batchSize = 5
local imgSize = 16
local nChannels = 3
local nFilters = 8
local rho = 6
local hiddenSize = 2

local cnn = nn.Sequential()
cnn:add(cudnn.SpatialConvolution(nChannels, nFilters, 2, 2):init('weight', nninit.eye)
                                                           :init('weight', nninit.mulConstant, 1/2)
                                                           :init('weight', nninit.addNormal, 0, 0.01)
                                                           :init(getBias, nninit.constant, 0))
cnn:add(nn.View(nFilters*15*15))
cnn:add(nn.Linear(nFilters*15*15, nFilters):init('weight', nninit.kaiming, {
  dist = 'uniform',
  gain = {'lrelu', leakiness = 0.3}
}))
cnn:add(nn.RReLU(1/3, 1/3))
cnn:add(nn.Linear(nFilters, 6):init('weight', nninit.orthogonal, {gain = 'relu'}))
cnn:add(cudnn.ReLU())
cnn:add(nn.Linear(6, 4):init('weight', nninit.xavier, {dist = 'normal', gain = 1.1}))
cnn:add(nn.Linear(4, hiddenSize):init('weight', nninit.sparse, 0.2)
                                :init(getBias, nninit.constant, 0))

local model = nn.Sequential()
model:add(nn.Sequencer(cnn))
local lstm = nn.FastLSTM(hiddenSize, hiddenSize, rho)
-- Note that chaining will pass through the module initialised, never parents
lstm.i2g:init({'bias', {{2*hiddenSize+1, 3*hiddenSize}}}, nninit.constant, 1) -- High forget gate bias
model:add(nn.Sequencer(lstm))
model:cuda()

local inputs = {}
for i = 1, rho do
  table.insert(inputs, torch.ones(batchSize, nChannels, imgSize, imgSize):cuda())
end
print(model:forward(inputs))

Development

To develop nninit/use it to test new initialisation schemes, git clone/download this repo and use luarocks make rocks/nninit-scm-1.rockspec to install nninit locally.

Acknowledgements

nninit's People

Contributors

anibali avatar kaixhin avatar sniklaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nninit's Issues

nngraph

How would you use it for nngraph layers?
Edit2: It seems to be problem with cudnn layers?

edit: This seems awesome addition to torch :)

Why isn't "convolution-aware initialization" redundant?

Plancherel's theorem implies that orthogonality in the spatial domain is equivalent to orthogonality in the frequency domain. From my understanding, CAI doesn't do anything special in the frequency domain aside from simply initializing the filters of each kernel such that they form an orthonormal set. If my understanding is correct, then vanilla orthogonal initialization should accomplish the same thing, making CAI redundant.

See this Gist for a simple demo illustrating my point.

Specification for `eye`

The eye function is wrong for the convolutional layers. In the 2D case every filter can abide by the specification for torch.eye, and the same can be extended for 3D along the diagonal. In 1D perhaps the closest is a vector of 1s? This solution would be the most consistent with torch.eye, which is good.

Alternatively, considering that these are convolutions, the identity would be the delta function (i.e. a 1 as close to the middle of a tensor as possible). Asking @bshillingford to clarify what he thinks makes more sense.

LSTM Support

Although this library works fine with nngraph, it would be good to also support rnn - specifically the LSTM module. Given the new API introduced with #2, how can the elements of the cell be initalised individually? Any feedback @nicholas-leonard?

A notable reason to support this would be to implement the large forget gate bias introduced in:

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.

The idea of nninit is to allow experimentation with initialisations/free maintainers from implementing "best practices".

Better API

Currently nninit is a bit clunky (in an effort to avoid side-effects). I would like to modify nn.Module to have something like an init method, with an API along the lines of:

nn.Linear(4096, 1000):init('weight', 'xavier', 'normal'):init('weight', 'sparse', 0.3):init('bias', 'constant', 0)

I think returning the module and therefore being able to chain calls makes it a lot more elegant. The current way of entering parameters can also be discussed. Any thoughts @soumith / @skaae?

calcFan in nninit.orthogonal

Is there a reason why nninit.orthogonal does not use calcFan and instead calculates the fanIn / fanOut without taking the underlying module into consideration? Thanks!

inconsistencies with nninit.orthogonal

I am experiencing inconsistencies with the orthogonal initialization. In the example below, both modules have the same number of weights but the latter is significantly faster to initialize.

th> nn.SpatialConvolution(100, 100, 3, 3):init('weight', nninit.orthogonal)
nn.SpatialConvolution(100 -> 100, 3x3)
                                                                                  [7.6399s]
th> nn.SpatialConvolution(100, 100, 3, 3).weight:nElement()
90000   
                                                                                  [0.0006s]
----

th> nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1):init('weight', nninit.orthogonal)
nn.SpatialConvolution(900 -> 100, 1x1)
                                                                                  [0.0605s]
th> nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1).weight:nElement()
90000   
                                                                                  [0.0006s]

Is this a desired behavior or a bug? The cause of this is that nninit.orthogonal uses fanIn and fanOut to determinethe size of the matrix that is ought to be orthogonalized, and it does not seem to be the right way of doing it.

local fanIn = sizes[2]
local fanOut = sizes[1]
for d = 3, #sizes do
    fanIn = fanIn * sizes[d]
    fanOut = fanOut * sizes[d]
end

----

nn.SpatialConvolution(100, 100, 3, 3)
fanIn: 900
fanOut: 900

----

nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1)
fanIn: 900
fanOut: 100

Thank you for this very handy library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.