Giter Site home page Giter Site logo

cnn_gaze's Introduction

尝试使用CNN实现视线预测
会逐步从简单的神经网络实现入手,搭起自己的CNN模型。


项目进展参考网站:

http://paulweihan.github.io

实现LeNet5 model00 done
改lenet5 model01-03 done
改googlenet model04,05 done

model00 得到和[ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild 近似的结果


主要参考:
    
    1,深度学习教程网站的python代码
        http://deeplearning.net/tutorial/lenet.html#lenet
        http://blog.csdn.net/u012162613/article/details/43225445
        基于python theano实现的经典LeNet结构,第二个链接对第一个链接中给出的代码进行了注释:
        代码来自于深度学习教程:Convolutional Neural Networks (LeNet) [第一个链接],这个代码实现的是一个简化了的LeNet5,具体如下:
        没有实现location-specific gain and bias parameters
        用的是maxpooling,而不是average_pooling
        分类器用的是softmax,LeNet5用的是rbf
        LeNet5第二层并不是全连接的,本程序实现的是全连接
        另外,代码里将卷积层和子采用层合在一起,定义为“LeNetConvPoolLayer“(卷积采样层),这好理解,因为它们总是成对出现。但是有个地方需要注意,代码中将卷积后的输出直接作为子采样层的输入,而没有加偏置b再通过sigmoid函数进行映射,即没有了下图中fx后面的bx以及sigmoid映射,也即直接由fx得到Cx。

        最后,代码中第一个卷积层用的卷积核有20个,第二个卷积层用50个,而不是上面那张LeNet5图中所示的6个和16个。


    2,斯坦福的cs231n课程网站:
        http://cs231n.stanford.edu
        http://cs231n.stanford.edu/syllabus.html

 * 3,Hacker's guide to Neural Networks
        理解BP,SVM,NN算法及其实现
        http://karpathy.github.io/neuralnets/

 * 4,Unsupervised Feature Learning and Deep Learning
        http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial
        注:最新的教程网址是:http://ufldl.stanford.edu/
        吴恩达/Andrew Ng主编
        学之前先看:他的机器学习
        http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

    5,代码基于githup的Deep Learning的toolbox
        https://github.com/rasmusbergpalm/DeepLearnToolbox
        http://blog.csdn.net/zouxy09/article/details/9993743/

 * 6,Caffe网站:
        http://caffe.berkeleyvision.org

        Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Eprint Arxiv, 2014.

        http://www.csdn.net/article/2015-01-22/2823663

        CUDA:https://developer.nvidia.com/cuda-downloads

    7, 菜鸟从零开始学习Deep learning
        http://blog.csdn.net/yihaizhiyan/article/category/2388401


    主要论文:
    [ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild
    [ CVPR 2014 ]  Learning-by-Synthesis for Appearance-based 3D Gaze Estimation

    GoogLenet
        Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions[J]. Eprint Arxiv, 2014.

    DeepID-Net:
        http://www.ee.cuhk.edu.hk/~wlouyang/projects/imagenetDeepId/index.html

        Ouyang W, Luo P, Zeng X, et al. DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection[C]. Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015.

        Ouyang W, Luo P, Zeng X, et al. DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection[J]. Eprint Arxiv, 2014.
.       
        Sun Y, Wang X, Tang X. Deep Learning Face Representation from Predicting 10,000 Classes[C]// Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014:1891-1898.

    RCNN:
        https://github.com/rbgirshick/rcnn
        bibtex
        @inproceedings{girshick14CVPR,
        Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra},
        Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
        Booktitle = {Computer Vision and Pattern Recognition},
        Year = {2014}
        }

    经典论文:
            Notes on Convolutional Neural Networks
            Gradient-Based Learning Applied to Document Recognition
My notes: paulweihan.github.io

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.