soumith / cuda-convnet2.torch Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 16.0 5.18 MB

Torch7 bindings for cuda-convnet2 kernels!

License: Apache License 2.0

Lua 4.37% Cuda 81.69% C++ 12.79% C 0.40% Makefile 0.61% Shell 0.01% CMake 0.13%

cuda-convnet2.torch's People

Contributors

Stargazers

Watchers

Forkers

chagge szagoruyko adampolyak bluelzx sheryanresutov lns hadeson colesbury lijian8 zxc1234w imclab hongjuny hate-machine voidbag labpet

cuda-convnet2.torch's Issues

SpatialCrossResponseNormalization

Question about ccn2.SpatialConvolutionLocal

Hi,

I'm trying to run train.lua in the test folder, but with ccn2.SpatialConvolutionLocal(originally ccn2.SpatialConvolution).

But it keeps giving me error

/usr/local/bin/luajit: /usr/local/share/lua/5.1/ccn2/SpatialConvolutionLocal.lua:23: attempt to perform arithmetic on field 'kH' (a nil value)

Is there additional things I should know to use locally connected layers ?

Thank you in advance :)

Hyungwon

SpatialCrossResponseNormalization 'blocked' option

We discovered today that with 'blocked = true' normalisation output is different from imagenet caffe network. Should it be set to false by default and passed as an option?

cuda-convnet2.torch/SpatialCrossResponseNormalization.lua

Line 32 in 064b123

self.addScale, self.powScale, self.minDiv, true)

Pull this commit with a huge speedup from upstream

akrizhevsky/cuda-convnet2#16

"Considerable speedup(1.5x under VGG model with miniBatch of 32, 1.1x under AlexNet with miniBatch of 128), and the optimizations focus on fully employing gpu-releated functions." - @bestimage-tencent

SpatialContrastiveNormalization

SpatialMaxPooling

Slow gradWeight computation for RGB case

(edit : I just missed something while checking the code, sorry.)

CUDA error

tmp/luarocks_ccn2-scm-1-5152/cuda-convnet2.torch/cudaconv3/src/filter_acts.cu(2086) : getLastCudaError() CUDA error : filterActs: kernel execution failed : (8) invalid device function.
I read few others got the same problem because of different version. I am having : "GeForce 820M" card

Where is THCudaTensor_isSameSizeAs?

conv_util.cu has multiple references to THCudaTensor_isSameSizeAs. When I google this, I currently only find a link to this repository. I think I can work around this temporarily, since all uses are in assertions, and therefore should have no side effects.

SpatialConvolutionLocal weight matrix

from the code, the weight matrix from a spatial convolution local layer is a 2d matrix:

self.weight = torch.Tensor(outputSize*nInputPlane*filterSize, nOutputPlane)

is the 1st dimension's combination order exactly like the multiplications above? or..is there a way to decompose this 2d weight matrix into the form that is like nn.SpatialConvolutionLocal?
such that it is a 6d matrix like the following from nn.SpatialConvolutionLocal:

self.weight = torch.Tensor(self.oH, self.oW, nOutputPlane, nInputPlane, kH, kW)

thanks.

cnn2 can not support 224*224 input size?

I modified the benchmark.lua file in /convnet-benchmarks-master/torch7/layerwise_benchmarks and changed the iw and ih into 224, but I got these error:

SpatialConvolutionBlockSparse

"invalid device function" from forward-prop

Can I use cuda-convnet2 with a GTX 650? The following snippet (extracted from benchmark.lua) fails with the error message

/tmp/luarocks_ccn2-scm-1-7061/cuda-convnet2.torch/cudaconv3/src/filter_acts.cu(2085) : getLastCudaError() CUDA error : filterActs: kernel execution failed : (8) invalid device function .

require 'ccn2'
n = ccn2.SpatialConvolution(64, 128, 9, 1):cuda()
i = torch.randn(64, 64, 64, 128):cuda()
n:forward(i)

SpatialSubSampling

cuda convnet2 insisting on #filters to be divisible by 32

The following assert exists in cuda-convent2:
https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/img_acts.cu#L1208

This causes failure in some cases that cuda-convnet2 should support, for example:

model = nn.Sequential()
model:add(ccn2.SpatialConvolutionLocal(16, 16, 63, 9))
model:add(nn.ReLU())
model:backward(torch.rand(16,63,63,128))

will fail, because the number of filters is 16 which doesn't pass the assert check. However, the documentation here mentions that this number should be a multiple of 16.

Is this check really required?
In some cases, such as this, the flow of the code uses a filter cache size of 16:
https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/img_acts.cu#L1345
does this mean that removing the assert will make the code above work correctly?

SpatialResponseNormalization

What's done so far?

I removed all parts of NVMatrix within cudaconv3, so we have all layers in cuda-convnet2 exposed as C functions which take in THCudaTensor*

This weekend I will write lua/ffi wrappers around the now exposed C functions.

Feel free to contribute!
Cheers,
S

bug in cuda-convnet2 for very large layers

Nvidia cards don't allow textures bigger than 512MB. Because this code uses texture memory, this imposes a limit on the sizes of various buffers. For example if your layer has too many filters (such that its output size exceeds 512MB), the code will crash.

TODO: add non-texture-using routines to bypass this.

Already tracked by Alex, this issue here will help me track this repo's progress on it.

https://code.google.com/p/cuda-convnet2/issues/detail?can=2&start=0&num=100&q=&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&groupby=&sort=&id=1