invinciblejha / ml-class-1 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from dfm/ml-class
Machine learning class at NYU
This project forked from dfm/ml-class
Machine learning class at NYU
============================================= * Machine Learning (Fall 2011) - Homework 2 * * ----------------------------------------- * * Daniel Foreman-Mackey --- [email protected] * ============================================= NOTE: again, I implemented my assignment in Python and it depends on NumPy. Implementation Notes -------------------- The backprop module contains the main code for this assignment. The basic code for constructing, training and testing a "Machine" (roughly the equivalent of a "trainer" in the LUSH skeleton code) is implemented in backprop/backprop.py. The learning modules are implemented in backprop/modules.py as subclasses of the abstract LearningModule class. There is also a test suite provided in backprop/tests.py that can be run as % python backprop/tests.py or using the python unit testing framework nose (http://readthedocs.org/docs/nose) by running % nosetests in the main directory. These tests are described later in this document. The dataset used for the experiments in this assignment is found in the dataset module. The raw data is found in dataset/isolet*.data and the Python object that loads, preprocess and wraps the data is implenented in dataset/dataset.py. Finally, the required experiments are implemented in main.py and can be run as % python main.py Unit Tests ---------- To numerically test the back-propagation method for a general module X_{out} = F(X_{in} | W), we can set up a simple machine: L ^ | ------ | LOSS | ------ | --------------- | F(X_{in} | W) | --------------- | ----------- | TestInput | ----------- Where the TestInput generates a random dataset { X_{in}, Y }. Then, we can place a constraint on the difference between the back-propagated dL/dX_{in} and the component-wise centered finite-difference calculation of the same gradient. The finite-difference measurement of (dL/dX_{in})_k is [ L ( F(X_{in} + dX_{k} | W), Y ) - L ( F(X_{in} - dX_{k} | W), Y ) ] / (2 delta) where dX_{k} is zero everywhere except in the k-th element where it has some small non-zero value, delta. A similar procedure is applied to approximate dL/dW_{j} varying W with a vector dW_{j} = [0, ... delta, ..., 0]. Empirically, I found that reasonable constraints on the L1 norms of the derivatives were 1/N || (dL/dX_{in})_bprop - (dL/dX_{in})_fdiff || < delta and 1/N || (dL/dW)_bprop - (dL/dW)_fdiff || < delta These tests are implemented in backprop/tests.py. Experiments ----------- The required experiments are implemented in main.py. Each experiment appends a row to the file results.dat with the number of parameters in the first column and the fractional test error in the second column. To Monte Carlo the experiments to estimate the mean and variance of the test error for a given experiment, you can run something like % python main.py 50 which will run each experiment 50 times and aggregate the results in results.dat. The experiments implemented in main.py are: 1. Logistic regression (linear->bias->sigmoid->Euclidean) 2. Single-layer (linear->bias->soft-max->cross-entropy) 3. Double-layer (linear->bias->sigmoid->linear->bias->soft-max->cross-entropy) with 10, 20, 40 and 80 hidden units. The results of a 50 sample Monte Carlo are given in results.dat and plotted in results.png. Since the fractional error for logistic regression is much worse than for the other networks, a scaled version of results.png is given in results_scaled.png. In both of these figures, the horizontal blue line indicates a fractional error of 5%. This shows that either the double-layer network with 40 or 80 hidden units or (surprisingly) the single layer soft-max network will consistently provide <5% test error. The code to generate these figures is in plots.py. Optional Experiments -------------------- I also implemented the extended architectures: 1. Triple-layer (linear->bias->sigmoid->linear->bias->sigmoid-> linear->bias->soft-max->cross-entropy) 2. RBF hybrid (linear->bias->sigmoid->RBF->bias->sift-max->cross-entropy) 3. SVM-like (RBF->neg-exp->linear->bias->soft-max->cross-entropy) Both the triple-layer and RBF hybrid nets had similar (<5% test error) performance to the double-layer but not consistently better. Despite significant experimentation, I was unable to get the SVM-like architecture to work at all! These optional architectures can be run using % python main.py --optional
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.