Giter Site home page Giter Site logo

ml-class-1's Introduction

=============================================
* Machine Learning (Fall 2011) - Homework 2 *
* ----------------------------------------- *
*  Daniel Foreman-Mackey --- [email protected]  *
=============================================

NOTE: again, I implemented my assignment in Python and it depends on NumPy.

Implementation Notes
--------------------

The backprop module contains the main code for this assignment.  The basic code
for constructing, training and testing a "Machine" (roughly the equivalent of a
"trainer" in the LUSH skeleton code) is implemented in backprop/backprop.py.
The learning modules are implemented in backprop/modules.py as subclasses of the
abstract LearningModule class.  There is also a test suite provided in
backprop/tests.py that can be run as

    % python backprop/tests.py

or using the python unit testing framework nose (http://readthedocs.org/docs/nose)
by running

    % nosetests

in the main directory.  These tests are described later in this document.

The dataset used for the experiments in this assignment is found in the dataset
module.  The raw data is found in dataset/isolet*.data and the Python object that
loads, preprocess and wraps the data is implenented in dataset/dataset.py.

Finally, the required experiments are implemented in main.py and can be run as

    % python main.py


Unit Tests
----------

To numerically test the back-propagation method for a general module
    X_{out} = F(X_{in} | W),
we can set up a simple machine:

        L
        ^
        |
     ------
    | LOSS |
     ------
        |
     ---------------
    | F(X_{in} | W) |
     ---------------
        |
     -----------
    | TestInput |
     -----------

Where the TestInput generates a random dataset { X_{in}, Y }.  Then, we can
place a constraint on the difference between the back-propagated dL/dX_{in} and
the component-wise centered finite-difference calculation of the same gradient.
The finite-difference measurement of (dL/dX_{in})_k is

    [ L ( F(X_{in} + dX_{k} | W), Y ) - L ( F(X_{in} - dX_{k} | W), Y ) ] / (2 delta)

where dX_{k} is zero everywhere except in the k-th element where it has some small
non-zero value, delta.  A similar procedure is applied to approximate dL/dW_{j}
varying W with a vector dW_{j} = [0, ... delta, ..., 0].

Empirically, I found that reasonable constraints on the L1 norms of the derivatives
were

    1/N || (dL/dX_{in})_bprop - (dL/dX_{in})_fdiff || < delta

and

    1/N || (dL/dW)_bprop - (dL/dW)_fdiff || < delta

These tests are implemented in backprop/tests.py.


Experiments
-----------

The required experiments are implemented in main.py.  Each experiment appends a
row to the file results.dat with the number of parameters in the first column and
the fractional test error in the second column.  To Monte Carlo the experiments
to estimate the mean and variance of the test error for a given experiment, you
can run something like

    % python main.py 50

which will run each experiment 50 times and aggregate the results in results.dat.

The experiments implemented in main.py are:

    1. Logistic regression (linear->bias->sigmoid->Euclidean)
    2. Single-layer (linear->bias->soft-max->cross-entropy)
    3. Double-layer (linear->bias->sigmoid->linear->bias->soft-max->cross-entropy)
       with 10, 20, 40 and 80 hidden units.

The results of a 50 sample Monte Carlo are given in results.dat and plotted in
results.png.  Since the fractional error for logistic regression is much worse
than for the other networks, a scaled version of results.png is given in
results_scaled.png.  In both of these figures, the horizontal blue line indicates
a fractional error of 5%.  This shows that either the double-layer network with 40
or 80 hidden units or (surprisingly) the single layer soft-max network will
consistently provide <5% test error.  The code to generate these figures is in
plots.py.


Optional Experiments
--------------------

I also implemented the extended architectures:

    1. Triple-layer (linear->bias->sigmoid->linear->bias->sigmoid->
                                linear->bias->soft-max->cross-entropy)
    2. RBF hybrid (linear->bias->sigmoid->RBF->bias->sift-max->cross-entropy)
    3. SVM-like (RBF->neg-exp->linear->bias->soft-max->cross-entropy)

Both the triple-layer and RBF hybrid nets had similar (<5% test error)
performance to the double-layer but not consistently better.  Despite significant
experimentation, I was unable to get the SVM-like architecture to work at all!

These optional architectures can be run using

    % python main.py --optional

ml-class-1's People

Contributors

dfm avatar

Stargazers

Vikram Jha avatar

Watchers

James Cloos avatar Vikram Jha avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.