cs231n / cs231n.github.io Goto Github PK

View Code? Open in Web Editor NEW

10.1K 10.1K 4.1K 64.95 MB

Public facing notes page

License: MIT License

HTML 5.21% JavaScript 1.47% CSS 2.37% Jupyter Notebook 69.97% TeX 19.47% Makefile 0.25% Ruby 0.01% Perl 1.24%

cs231n.github.io's People

Contributors

Stargazers

Watchers

Forkers

manishrocksag iandanforth xuhaoyu galv twistedmove 7404n hohocode jxlijunhao erickousz cgmsnoek jlippi darcy0511 lqshixinlei rajendraranabhat nicktesla amnotgod qinhongwei jmozah nielsjohnson kleopatra999 vikeshkhanna jbarrow criloz amightyo mlstudy virginiayung snazz2001 johnvonnewmann zhushun0008 nonva zhangyuancv timmyzhao hihihippp tomzhang kirinjulia1992 more-free lenovor jeppe shannonyu tigerneil yalechang soledad89 chonglinsun ysh329 chagge pharrell90 ilovejs bsidhom fish2cat interxuxing elliottsj albahnsen npmoores eecrazy rapternmn moria huamichaelchen sleepingkoala flrgsr mihail911 warmspringwinds fcchou keyua-cisco tendence cfandy agibsonccc youngstone guillaumesl fch808 odewahn joordiaz shyamalschandra kevinhsu mkolod phoenixqm shikharateverest asiagood mazefeng fdoperezi lbsswu rnap caerang cvfish chrisvnicholson wannabecitizen stevenryoung jinac huyouare rachtsingh dizzyeyes arvindpereira pikma quietcoolwu xuli19 carpedm20 ticcky pagakarthik satwantkumar mindis andyyuan78

cs231n.github.io's Issues

Broken link for "discussion in Alex Krizhevsky's cuda-convnet library API"

Here it is linked to this wiki page which is not working anymore, and does not appear to have a cached version online as verified with this service. The wiki of the linked project seems to no longer work -- is there a mirror available anywhere?

Missing LICENSE file

Since you do not have a LICENSE file, the copyright is unclear. Am I or a company I may work for allowed to use this material and under what conditions?

Convolutional Layer dot product

There's a tricky point in the paragraph on intuition of convolutional layers (ConvNet Layers > Convolutional Layer > Overview and Intuition). The paragraph reads as follows:

The CONV layer's parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume, producing a 2-dimensional activation map of that filter. As we slide the filter, across the input, we are computing the dot product between the entries of the filter and the input.

If I'm not mistaken, the dot product is only used if you stretched the local regions of the image out into columns, as described much later in the part about "Implementation as Matrix Multiplication". This threw me off a bit when studying the chapter. In the simple/intuitive case you'd do element-wise matrix multiplication, correct? I think that'd make more sense in the first paragraph, and I'd be happy to fix this and create a pull request if the above is accurate!

Backpropagation and Symbolic Computation notebooks

@karpathy,

this issue is not an really an issue with the course or its associated material in general (infact I think they are very well written!). In the past weeks I've spent my spare time on digesting symbolic calculus on computational graphs. I've created three notebooks from my findings that I'll happily share with the community. Currently these notebooks cover:

foundations on computational graphs, expression trees, calculus and backpropagation,
symbolic and numeric implementations in Python,
application to function optimization

You can find the notebooks here:

https://github.com/cheind/py-cgraph

I'd hope someone finds them interesting. In case you find them helpful it would be very welcomed if linked from a proper place.

Thanks,
Christoph

linear-classify incorrect matrix multiplication

Under the Interpreting a linear classifier section, the ship score in the included graphic should be 60.75 instead of 61.95.

Shown below are the simplification steps (it appears that the bias was not subtracted in the original maths, see step 5.)

1. <0, 0.25, 0.2, -0.3> * <56, 231, 24, 2> + (-1.2)
2. 0*56 + 0.25*231 + 0.2*24 + (-0.3)*2 - 1.2
3. 0 + 57.75 + 4.8 - 0.6 - 1.2
4. 62.55 - 0.6 - 1.2
5. 61.95 - 1.2
6. 60.75

Code Example for "Validation sets for Hyperparameter tuning"

In the code example given in http://cs231n.github.io/classification/ in the the section "Validation sets for Hyperparameter tuning" , it would be more efficient to train using the Nearest Neighbor class outside the for loop for "k". In the present state of the example, there's no point in training for every iteration of k, which is just going to be used for predicting the k-nearest neighbors.

Possible correction in PCA description

In the Data Preprocessing section, PCA is described in the following way:

# Assume input data matrix X of size [N x D]
X -= np.mean(X, axis = 0) # zero-center the data (important)
cov = np.dot(X.T, X) / X.shape[0] # get the data covariance matrix
U,S,V = np.linalg.svd(cov)

Why do you calculate the SVD of the covariance matrix? As I understand it, there are two equivalent (mutually exclusive) alternatives in order to perform PCA:

Calculate the eigenvalues V of the covariance matrix X.T*X
Calculate the SVD of the data X: U*S*V.T

If you want to use the eigenvalues of the covariance matrix to project your data X, then X*V (with V as the eigenvectors of the covariance matrix) is consistent with the explanation given in the next line, described as 'projecting into the eigenbasis'. However, if you want to use SVD, then the projection is already performed by calculating U*S. It is also possible to compute the projection via X*V where V are the right singular vectors from the SVD (this is due to the fact that the eigenvectors of the covariance matrix are equal to the right singular vectors V)

I think there was a confusion in describing this part of the course. Or did I miss something?

AWS Tutorial Outdated, AWS Community AMI doesn't exist

cs231n_caffe_torch7_keras_lasagne_v2 doesn't exist in AWS Community AMIs anymore

linear-classify.md - Redundant transposition

Hi,

in linear-classify.md you first introduce the linear classifier as

$$f(x_i, W, b) = W x_i + b $$
In the above equation, we are assuming that the image (x_i) has all of its pixels flattened out to a single column vector of shape [D x 1]. The matrix W (of size [K x D]), and the vector b (of size [K x 1]) are the parameters of the function

Later, in SVM loss you have

$$ L_i = \sum_{j\neq y_i} \max(0, w_j^T x_i - w_{y_i}^T x_i + \Delta) $$
where (w_j) is the j-th row of (W) reshaped as a column.

This sounds overly complex to me:

(w_j) is the j-th row of (W) reshaped as a column

is the justification to have the transpose in w_j^T x_i. But all it means, referring to your initial definition of W, is that you have the dot-product of the w_j's row with x_i. So, given your initial definition the loss could be defined as

$$ L_i = \sum_{j\neq y_i} \max(0, w_j x_i - w_{y_i} x_i + \Delta) $$
where w_j is the j-th row of (W).

Am I overlooking something?

Possible Typo in Assignment 3

In q1 Dropout section it sets the dropout parameters to the following to numerically check the backward pass of dropout:

dropout_param = {'p': 0.8, 'mode': 'test', 'seed': 123}

This doesn't make much sense - shouldn't mode be set to 'train'?

Assignment 1 linear_classifier.py small typo

On linear_classifier.py line 82, the input X is described as having shape (D, N) when the notebook passes it with dimension (N, D).

Sentence truncated in convolutional-networks.md

In the Layer Sizing Patterns section, one sentence reads:

Another sligthly less common setting is to use 3x3 receptive fields with a stride of 2, but this makes.

I suspect you wanted to say something about information loss with a larger receptive field!

(Also, many thanks for the great tutorial with links to recent literature.)

assignment3: Image Generation part pretrained model get 'nan'

I tried to work on the assignment3's Image Generation part, for the 'feature inversion' cell, I get a lot of 'nan' through testing the model with 'kitten.jpg'. However, I can solve this by changing the data type of the pretrained model to 'np.float64'. It killed a lot of my time and is it an issue that needs to be pointed out?

where to get the class video ?

Memory usage calculation for VGGNet

In the 'VGGNet in detail' section the amount of memory used by the network is shown as follows:

TOTAL memory: 24M * 4 bytes ~= 93MB

however adding up all the memory values for the listed layers produces a total of only ~15MB * 4 bytes. Is the memory usage calculation correct? If the calculation is correct then a quick explanation of how the figure of 24M was derived would be useful.

Some math formula is in wrong position

For example in http://cs231n.github.io/optimization-2:

MathJax 2.5.3 in Google Chrome 46.0.2490.80 (64-bit) in Arch Linux.

Update MathJax please?

error in running assignment 1

coco_captioning.zip and pretrained_model.h5 are gone

Have you taken the files off or did the path change to http://cs231n.stanford.edu/coco_captioning.zip ?
Would be great if you could put them back up.

Typos in Convolutional Networks page

Hi Andrej,

Thanks for putting such a great resource about convnets and Deep Learning online. I believe this would be greatly helpful for many people including me.

I was reading the convnets page and found this minor issue that you might want to fix. The page says:

The Krizhevsky et al. architecture ...  Real-world example: Since (227 - 11)/4 + 1 = 55, we can see that ...

However, they actually use crops of 224x224. So the formula would be (224 - 11)/4 + 1 = 54.25, which is not an integer but I guess still valid configurations, right?

For the case of Zeiler et al. (Visualizing and Understanding convnets), this formula also gives a non-integer result: (224 - 7)/2 + 1 = 109.5..

Best,

Anh

bad sentence in convolutional-networks/#layersizepat

When discussing pool layers it looks like this sentence was stopped short.

Another slightly less common setting is to use 3x3 receptive fields with a stride of 2, but this makes. It is very uncommon ...

problem with importing numpy

hello
Thanks for the course and the materials. Although I think I had prepared the prerequisites completely, when i start to work offline with jupiter notebook I have problem with importing numpy. I want to know how can i fix the problem.

Re: neural-networks-3, Loss function

Hi Andrej, Fei-Fei!
First, thanks you all for great course!
Second, I`d like to argue with slide

I am doing caffenet benchmark and one of the sections are lr_policy.
The linear decay policy gives the best test accuracy, given same start_lr, amond different "step", sqrt and sqr decay. And if you compare curve to your slide, "step" policy looks much more "good", while linear one looks like "too low". :)
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Lr_policy.md

Typo in https://cs231n.github.io/neural-networks-case-study/

In section "Compute the loss":

corect_logprobs = -np.log(probs[range(num_examples),y])

and

# compute the loss: average cross-entropy loss and regularization
data_loss = np.sum(corect_logprobs)/num_examples
reg_loss = 0.5*reg*np.sum(W*W)
loss = data_loss + reg_loss

corect_logprobs => correct_logprobs

No Keras on AWS instance

This tutorial (http://cs231n.github.io/aws-tutorial/) says that Caffe, Torch7, Theano, Keras and Lasagne are pre-installed. But when I tried /anaconda2/bin/python -> import keras, ImportError: No module named keras occurs. I assume keras is not installed. Am I missing something?

Gradient check Taylor expansion

Hi. I think the statement that you make here: http://cs231n.github.io/neural-networks-3/#gradcheck , namely that the first gradient check has an error of O(h), while the second one has an error of O(h^2) is false. It makes no sense that the two expressions have different errors, since f(x+h) can be Taylor expanded around f(x-h) and will give the same error rate as if it would have been expanded around f(x).

But formally: f(x+ h) - f(x) = h * grad_f(x) + O(h^2) , f(x+ h) - f(x-h) = f(x) + h * grad_f(x) + O(h^2) - (f(x) - h * grad_f(x) + O(h^2)) = 2h * grad_f(x) + O(h^2). This shows that the two expressions have the exact same error.

Question about an example in ConvNet note

To be honest, I'm not sure that I understood correctly or not. So I wrote it in here as an issue, not as a pull request.

In the very last paragraph of "Converting FC layers to CONV layers" section in ConvNet note( http://cs231n.github.io/convolutional-networks/#convert ), there is an example about efficiently applying the original ConvNet using stride of 16 (rather than 32). The document says that it could be solved by combining result of the original one and the shifted, in both width and height by 16, one. But, I think it needs to be applied 4 times to fill the matrix, as far as I understood. (if not, only black or white region of the chess board would be filled, once you consider the result of the final matrix as a chess board)

(not 100% sure.. but maybe..)

Typo in neural-networks-1

In the paragraph Coarse model it says "the output spikes in many systems in known to be important", the last in should be a is.

typo in assignment 1 softmax

In the second cell of the softmax exercise, the test mask overlaps the training mask (get_CIFAR10_data function)

href in `linear-classify.md` is broken

href in linear-classify.md to interacitve web demo is broken

See screenshot https://www.dropbox.com/s/ecvwru5sf81xlfd/Screenshot%202016-07-05%2014.09.26.png?dl=0

typo in Python tutorial

Python Numpy Tutorial has the following typo in Array indexing section:

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

I'm pretty sure it should be

# Create the following rank **3** array with shape (3, 4)

Potential type

In the file 'neural-networks-3.md' there is a sentence under the heading 'Hyperparameter ranges' reading:
'That is, we are generating a random random with a uniform distribution, but then raising it to the power of 10. '

I have a feeling that 'random random' is the case, perhaps should be random variable?

typo in LSTM_captioning.ipynb

In the markdown cell explaining the LSTM line 8,
I think
We then compute the input gate g∈ℝH,
should have been
We then compute the input gate i∈ℝH,

ePUB or PDF ?

Could you generate ePub or PDF of this docs ?

The image will not load when I read .md file locally with chrome extension

It seems that the address for image is always "file:///C:/assets/...." so that .md file cannot load the image (for example, file:///C:/Users/users/Documents/cs231n.github.io/...). I temporarily fixed this by copying the assets folder into the root of my C disc directly. I am not sure if this is a bug with the file itself or chrome extension.

Also a minor typo "noone" is in the sentence "In one dimension, the "sum of indicator bumps" function (g(x) = \sum_i c_i \mathbb{1}(a_i < x < b_i)) where (a,b,c) are parameter vectors is also a universal approximator, but noone would suggest that we use this functional form in Machine Learning."

Error: "Double requirement given" when using requirements.txt for assignment1

Following requirements are mentioned twice in reuirements.txt that comes with the zip file for assignemnt 1.

jupyter (1.0.0 both times)
pillow (3.1.0 and 3.0.0)

This gives the error "Double requirement given" when running pip install -r requirements.txt

Python 2.7 travis flake8 problem.

Every time the travis checks are ran after my commits, I have the follwing mistake and I saw a lot of pull requests having the same:

Flake8 test *

+flake8 --exit-zero '--exclude=test_*,six.py' skimage doc/examples viewer_examples

Traceback (most recent call last):

File "/home/travis/venv/bin/flake8", line 7, in

from flake8.main import main

File "/home/travis/venv/local/lib/python2.7/site-packages/flake8/main.py", line 5, in

import setuptools

File "/home/travis/venv/local/lib/python2.7/site-packages/setuptools/init.py", line 11, in

from setuptools.extension import Extension

File "/home/travis/venv/local/lib/python2.7/site-packages/setuptools/extension.py", line 8, in

from .dist import _get_unpatched

File "/home/travis/venv/local/lib/python2.7/site-packages/setuptools/dist.py", line 21, in

packaging = pkg_resources.packaging

AttributeError: 'module' object has no attribute 'packaging'`

Is there a way to avoid it?

Thank you

Cannot find AMI for cs231n_caffe_torch7_keras_lasagne_v2 on AWS

From this tutorial,

After searching for cs231n_caffe_torch7_keras_lasagne_v2 in the Community AMIs search bar there are no results. The page says there are No AMIs found matching your filter criteria:

I tried searching for ami-125b2c72 as well, but AWS still said that no AMIs matched the criteria. I haven't selected any filters on the side bar, and am searching in the Community AMIs tab, not the AWS Marketplace.

Is anyone else having this issue?

Bug in linear-classify-demo

As @karpathy said in the video, when drag and drop the circle point in the demo, it disappeared.
That's because you use e.clientX and e.clientY in handleMouseDown and handleMouseMove function.
Try to use e.pageX and e.pageY instead.

typo in rnn_layers.py

I am sorry if I seem like I'm spamming issues.. I'm not sure if this is the best way to report typos
If you want me to report through other channel, please let me know.

in
def lstm_step_backward()
I think
dx, dh, dc, dWx, dWh, db = None, None, None, None, None, None
is supposed to be
dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None

Possible typo Optimization-1

The loss formulae for the 3 1-dimensional points imply that point i belongs to class i, while this hypothesis has not been stated explicitly before.

$$ \begin{align} L_0 = & \max(0, w_1^Tx_0 - w_0^Tx_0 + 1) + \max(0, w_2^Tx_0 - w_0^Tx_0 + 1) \ L_1 = & \max(0, w_0^Tx_1 - w_1^Tx_1 + 1) + \max(0, w_2^Tx_1 - w_1^Tx_1 + 1) \ L_2 = & \max(0, w_0^Tx_2 - w_2^Tx_2 + 1) + \max(0, w_1^Tx_2 - w_2^Tx_2 + 1) \ L = & (L_0 + L_1 + L_2)/3 \end{align} $$

loading Tensorflow checkpoint fails & proposed fix

NetworkVisualization-TensorFlow - Pretrained Model

After downloading Tensorflow checkpoint, the script throws exception because 'cs231n/datasets/squeezenet.ckpt' cannot be detected.

These are valid checkpoint files in the directory:

Tensorflow, however, can still detect .ckpt.* in a directory.

Proposed fix:

Either detect wildcards if not glob.glob(SAVE_PATH + "*"): or meta file if not os.path.exists(SAVE_PATH + ".meta"):

Link to non-functioning page

From: https://github.com/cs231n/cs231n.github.io/blob/master/classification.md

In the classification notes there is a link to http://planetmath.org/vectorpnorm this page is currently (at the time of writing) throwing an error.

Suggest replacement with a different resource until the site is fixed.

assignment1 jupyter notebook knn import error with `import imread`

This is a very basic issue:

      6 import matplotlib.pyplot as plt
      7 

/Volumes/theory/Experiments/tensorflow/assignment1/cs231n/data_utils.py in <module>()
      4 import numpy as np
      5 import os
----> 6 from scipy.misc import imread

I've tried all sort of possible solutions online, but the main problem is that this module is not round. imread. I've tried to restart from scratch on macOS system w/out getting past this. Does anyone have any suggestions?

Inconsistent regularization in assignment 1 cs231n.classifiers.linear_svm.svm_loss_naive

Regularization defined in the function svm_loss_naive in cs231n/classifiers/linear_svm.py is inconsistent with the formulas in lectures and notes.

Regularization is defined as 0.5 * reg * np.sum(W * W) with an extra factor of 0.5.

This causes erroneous results and headaches when the effect of regularization (based on formulas from the notes) is added in gradient calculation.

Possible typo in neural-networks-2?

Hello! I was reading through https://cs231n.github.io/neural-networks-2/#init recently, and was confused by these two sentences (discussing the initialization variance for ReLU layer weights):

[...] the variance of neurons in the network should be 2.0/n.
This gives the initialization w = np.random.randn(n) * sqrt(2.0/n)

I might be misunderstanding this, but it seems to me that the second line and the referenced paper both create weights with a variance of 2.0/n, which would make the neurons have a variance of 2.0 (before any kind of activation function).

Is that a correct interpretation? And if so, shouldn't the first line read that the neurons should have a variance of 2.0 instead of 2.0/n (or that the weights should have a variance of 2.0/n)?

Gradient calculation incorrect? perhaps?

http://cs231n.github.io/optimization-2/

# set some inputs
x = -2; y = 5; z = -4

# perform the forward pass
q = x + y # q becomes 3
f = q * z # f becomes -12

# perform the backward pass (backpropagation) in reverse order:
# first backprop through f = q * z
dfdz = q # df/dz = q, so gradient on z becomes 3
dfdq = z # df/dq = z, so gradient on q becomes -4
# now backprop through q = x + y
dfdx = 1.0 * dfdq # dq/dx = 1. And the multiplication here is the chain rule!
dfdy = 1.0 * dfdq # dq/dy = 1

dfdx = 1.0 * dfdq # dq/dx = 1. And the multiplication here is the chain rule!
dfdy = 1.0 * dfdq

Shouldn't the above be -4 ?

In the circuit diagram it's not immediately clear where the final 1 comes from?

Also isn't the partial derivative of f(x,y,z) with respect to x = z? also that of y.
Just to make sure I am not crazy I punched into wolfram.
http://www.wolframalpha.com/widgets/view.jsp?id=d052b64db910c143ed1f1a05298ba14c

so I get

fx = z
fy = z
fz = x + y

meaning the gradient is

fxyz = [z,z,x+y]

so for the inputs
gradient is fxyz = [-4,-4,3] ?

I am sorry I am not sure if the mistake is on my end.

Thanks and let me know ? =)

If you can explain to me where the 1 comes from as well that would be great =)
would it be the length of the final vector?

typo in assignment 3 rnn_layers.py

in line 55 I think
- dWx: Gradients of input-to-hidden weights, of shape (N, H)
should have been
- dWx: Gradients of input-to-hidden weights, of shape (D, H)

Build instructions?

Could you please include the script or documentation how this project is converted into HTML?

possibly a typo in assignment 2?

In 'solver.py' line 30
shouldn't

data = {
    'X_train': # training data
    'y_train': # training labels
    'X_val': # validation data
    'X_train': # validation labels  <--
  }

have to be as the following??

data = {
    'X_train': # training data
    'y_train': # training labels
    'X_val': # validation data
    'y_val': # validation labels   <--
  }

This could be trivial..

What is the text classification's high limit of classes number?

We can use CNN to classify more than 10 thousands of images of the ImageNet.
I find that CNN could only classify 10-20 text classes as this paper write.
So what is the high limit of short text classification's classes' number?