paulweihan / cnn_gaze Goto Github PK
View Code? Open in Web Editor NEWMy repository for gaze estimation model based on CNN
My repository for gaze estimation model based on CNN
尝试使用CNN实现视线预测 会逐步从简单的神经网络实现入手,搭起自己的CNN模型。 项目进展参考网站: http://paulweihan.github.io 实现LeNet5 model00 done 改lenet5 model01-03 done 改googlenet model04,05 done model00 得到和[ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild 近似的结果 主要参考: 1,深度学习教程网站的python代码 http://deeplearning.net/tutorial/lenet.html#lenet http://blog.csdn.net/u012162613/article/details/43225445 基于python theano实现的经典LeNet结构,第二个链接对第一个链接中给出的代码进行了注释: 代码来自于深度学习教程:Convolutional Neural Networks (LeNet) [第一个链接],这个代码实现的是一个简化了的LeNet5,具体如下: 没有实现location-specific gain and bias parameters 用的是maxpooling,而不是average_pooling 分类器用的是softmax,LeNet5用的是rbf LeNet5第二层并不是全连接的,本程序实现的是全连接 另外,代码里将卷积层和子采用层合在一起,定义为“LeNetConvPoolLayer“(卷积采样层),这好理解,因为它们总是成对出现。但是有个地方需要注意,代码中将卷积后的输出直接作为子采样层的输入,而没有加偏置b再通过sigmoid函数进行映射,即没有了下图中fx后面的bx以及sigmoid映射,也即直接由fx得到Cx。 最后,代码中第一个卷积层用的卷积核有20个,第二个卷积层用50个,而不是上面那张LeNet5图中所示的6个和16个。 2,斯坦福的cs231n课程网站: http://cs231n.stanford.edu http://cs231n.stanford.edu/syllabus.html * 3,Hacker's guide to Neural Networks 理解BP,SVM,NN算法及其实现 http://karpathy.github.io/neuralnets/ * 4,Unsupervised Feature Learning and Deep Learning http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial 注:最新的教程网址是:http://ufldl.stanford.edu/ 吴恩达/Andrew Ng主编 学之前先看:他的机器学习 http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning 5,代码基于githup的Deep Learning的toolbox https://github.com/rasmusbergpalm/DeepLearnToolbox http://blog.csdn.net/zouxy09/article/details/9993743/ * 6,Caffe网站: http://caffe.berkeleyvision.org Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Eprint Arxiv, 2014. http://www.csdn.net/article/2015-01-22/2823663 CUDA:https://developer.nvidia.com/cuda-downloads 7, 菜鸟从零开始学习Deep learning http://blog.csdn.net/yihaizhiyan/article/category/2388401 主要论文: [ CVPR 2015 ] Appearance-Based Gaze Estimation in the Wild [ CVPR 2014 ] Learning-by-Synthesis for Appearance-based 3D Gaze Estimation GoogLenet Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions[J]. Eprint Arxiv, 2014. DeepID-Net: http://www.ee.cuhk.edu.hk/~wlouyang/projects/imagenetDeepId/index.html Ouyang W, Luo P, Zeng X, et al. DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection[C]. Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015. Ouyang W, Luo P, Zeng X, et al. DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection[J]. Eprint Arxiv, 2014. . Sun Y, Wang X, Tang X. Deep Learning Face Representation from Predicting 10,000 Classes[C]// Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014:1891-1898. RCNN: https://github.com/rbgirshick/rcnn bibtex @inproceedings{girshick14CVPR, Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra}, Title = {Rich feature hierarchies for accurate object detection and semantic segmentation}, Booktitle = {Computer Vision and Pattern Recognition}, Year = {2014} } 经典论文: Notes on Convolutional Neural Networks Gradient-Based Learning Applied to Document Recognition My notes: paulweihan.github.io
Thanks for the introduction.
Could you provide the .caffe model for the CVPR gaze estimation paper?
In the paper, the network outputs theta and phi, which means the output number should be two, but in the proto, the ip2 output number is three. How to understande it? Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.