After using nntools a little bit I quickly noticed th

Data vectors as columns vs rows about lasagne HOT 6 CLOSED

lasagne commented on May 12, 2024

Data vectors as columns vs rows

from lasagne.

Comments (6)

bmcfee commented on May 12, 2024

My $2-e2: stick with the sklearn model because it generalizes gracefully to multi-dimensional examples. It's annoying to be inconsistent with the literature, but it does make for simpler code.

from lasagne.

craffel commented on May 12, 2024

Ah I see, because when data points are n-dimensional, we would have to repeat : for each dimension to retrieve the first data point (e.g. A[:, :, 0] for 2-d) in my representation, right? As opposed to A[0] regardless of the number of dimensions.

from lasagne.

bmcfee commented on May 12, 2024

Exactly. In memory, it makes sense to have the data index be the least-frequently varying index, so that data for one example is contiguous in memory.

It also generalizes better to ragged data, so that A could actually be a list (not an ndarray), and each A[i] can be a $whatever.

from lasagne.

craffel commented on May 12, 2024

OK, that's convincing enough.

from lasagne.

benanne commented on May 12, 2024

Another reason is that it's pretty much the theano default (for the softmax function for example).

Cuda-convnet unfortunately puts the batch size last, but I have some ideas about how to deal with that (also talked to @f0k about it). I'll try to get that done soon.

By the way, you can technically do A[..., 0] to always get example 0 if the batch size is last :) It's not a very commonly used feature though.

from lasagne.

f0k commented on May 12, 2024

+1 for data points in rows!

To elaborate on one of the arguments, as @bmcfee said, memory layout is an important point. Fortran and Matlab use column-major layout, so it makes sense to have data vectors as columns. C and numpy use row-major layout by default, so it makes sense to have data vectors as rows -- you will waste a lot of performance due to frequent cache misses when putting data vectors in columns in numpy, unless you take special care to keep your matrices in column-major layout throughout your code. (Always assuming that you mostly want to iterate over the data points, not over the features.)
On the GPU, things look a little different again... leaning towards Fortran, cuBLAS assumes column-major layout, but Theano prefers to have matrices in row-major layout and just tells cuBLAS to transpose matrices as needed (every BLAS function involving matrices takes a flag for each matrix argument specifying whether it's to be transposed).

from lasagne.

Recommend Projects

Data vectors as columns vs rows about lasagne HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent