The cnn_gaze from paulweihan

尝试使用CNN实现视线预测
会逐步从简单的神经网络实现入手，搭起自己的CNN模型。


项目进展参考网站：

http://paulweihan.github.io

实现LeNet5 model00 done
改lenet5 model01-03 done
改googlenet model04，05 done

model00 得到和［ CVPR 2015 ］ Appearance-Based Gaze Estimation in the Wild 近似的结果


主要参考：
    
    1，深度学习教程网站的python代码
        http://deeplearning.net/tutorial/lenet.html#lenet
        http://blog.csdn.net/u012162613/article/details/43225445
        基于python theano实现的经典LeNet结构，第二个链接对第一个链接中给出的代码进行了注释：
        代码来自于深度学习教程：Convolutional Neural Networks (LeNet) ［第一个链接］，这个代码实现的是一个简化了的LeNet5，具体如下：
        没有实现location-specific gain and bias parameters
        用的是maxpooling，而不是average_pooling
        分类器用的是softmax，LeNet5用的是rbf
        LeNet5第二层并不是全连接的，本程序实现的是全连接
        另外，代码里将卷积层和子采用层合在一起，定义为“LeNetConvPoolLayer“（卷积采样层），这好理解，因为它们总是成对出现。但是有个地方需要注意，代码中将卷积后的输出直接作为子采样层的输入，而没有加偏置b再通过sigmoid函数进行映射，即没有了下图中fx后面的bx以及sigmoid映射，也即直接由fx得到Cx。

        最后，代码中第一个卷积层用的卷积核有20个，第二个卷积层用50个，而不是上面那张LeNet5图中所示的6个和16个。


    2，斯坦福的cs231n课程网站：
        http://cs231n.stanford.edu
        http://cs231n.stanford.edu/syllabus.html

 ＊ 3，Hacker's guide to Neural Networks
        理解BP，SVM，NN算法及其实现
        http://karpathy.github.io/neuralnets/

 ＊ 4，Unsupervised Feature Learning and Deep Learning
        http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial
        注：最新的教程网址是：http://ufldl.stanford.edu/
        吴恩达／Andrew Ng主编
        学之前先看：他的机器学习
        http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

    5，代码基于githup的Deep Learning的toolbox
        https://github.com/rasmusbergpalm/DeepLearnToolbox
        http://blog.csdn.net/zouxy09/article/details/9993743/

 ＊ 6，Caffe网站：
        http://caffe.berkeleyvision.org

        Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Eprint Arxiv, 2014.

        http://www.csdn.net/article/2015-01-22/2823663

        CUDA：https://developer.nvidia.com/cuda-downloads

    7, 菜鸟从零开始学习Deep learning
        http://blog.csdn.net/yihaizhiyan/article/category/2388401


    主要论文：
    ［ CVPR 2015 ］ Appearance-Based Gaze Estimation in the Wild
    ［ CVPR 2014 ］  Learning-by-Synthesis for Appearance-based 3D Gaze Estimation

    GoogLenet
        Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions[J]. Eprint Arxiv, 2014.

    DeepID-Net:
        http://www.ee.cuhk.edu.hk/~wlouyang/projects/imagenetDeepId/index.html

        Ouyang W, Luo P, Zeng X, et al. DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection[C]. Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015.

        Ouyang W, Luo P, Zeng X, et al. DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection[J]. Eprint Arxiv, 2014.
.       
        Sun Y, Wang X, Tang X. Deep Learning Face Representation from Predicting 10,000 Classes[C]// Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014:1891-1898.

    RCNN:
        https://github.com/rbgirshick/rcnn
        bibtex
        @inproceedings{girshick14CVPR,
        Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra},
        Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
        Booktitle = {Computer Vision and Pattern Recognition},
        Year = {2014}
        }

    经典论文：
            Notes on Convolutional Neural Networks
            Gradient-Based Learning Applied to Document Recognition
My notes: paulweihan.github.io
paulweihan / cnn_gaze Goto Github PK

cnn_gaze's Introduction

cnn_gaze's People

Contributors

Stargazers

Watchers

Forkers

cnn_gaze's Issues

Can you provide the .caffe model

Is the number of output of layer ip2 three or two?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent