The articles from jinming-su

articles's Issues

Studying Method

Do you have a dream? If no, you shouldn't continue.

Yesterday, I start to know Weifeng Liu, a perseon who is familiar with CS, AI, and Other. From his blog, I get some lessons.

Interest is everywhere, but devotion and perseverance is really lack.

Actually, everyone is born with curiosity, and it is difficult to find someone is boring about any domain.

some could persevere in a interest in long time, and they will solve any difficulty when they happen it. But Othres will give up when they face with a difficult problem.
The success relying on skills could be copyed.

From the existence of school and education, we can see learning skills is more easy to success.
Thinking is most important.

If you are not a man who is easy to be absorbed, you will find everything is easy to distract your attention, so that your thinking can't concentrate on one thing even half hour, and your time is scrap, so that it is difficult for you to accumulate longly and think deeply in a domain.

This reality would make you upset, and make you more distracted easily. However, this reality also make you feel worried, in order to avoid this worry you maybe want to find other excitement, THe result is vicious circle.

《数学之美》概要

注：本来准备以后用英文写博客的，但是最近一直在读数学之美，可是my english is poor
这本书中不同与其他一些书的地方，本书中加入了许多作者的个人感悟，和一些对**当代教育和其他领域的一些看法。同时，也介绍了许多在数学界或者工程界的泰斗级人物。

贾里尼克
布尔
欧拉

书籍推荐

从一到无穷大　
时间简史

节目推荐

穿越虫洞

###自然语言处理从规则到统计
1.语言模型：

 p(S)=p(w1,w2,w3,w4,w5,…,wn)
	=p(w1)p(w2|w1)p(w3|w1,w2)...p(wn|w1,w2,...,wn-1)//链规则
p(S)被称为语言模型，即用来计算一个句子概率的模型。

2.马尔科夫模型，马尔科夫链，隐马尔科夫模型

P(Xn+1=x∣X0,X1,X2,…，Xn)=P(Xn+1=x∣Xn)
这里x为过程中的某个状态。上面这个恒等式可以被看作是马尔可夫性质。

信息论

１．信息的度量（香农--熵，即不确定性）
$$H(x)=-\sum_xP(x)logP(x)$$
2. 信息是消除系统不确定性的唯一方法
3. 二元模型的不确定性小于一元模型（条件熵）
$$H(x|y)=-\sum_{x,y}P(x,y)logP(x|y) $$
可证明$$H(x)\geq H(x|y)$$
4. 互信息（两个事件相关性的量化度量,用于消除歧义）
$$I(X;Y)=H(X)-H(X|Y)$$
5. 相对熵（衡量两个取值为正数的函数的相似性）

PageRank网页排名算法

网页排名高的网站贡献的链接权重大（布林，二维矩阵相乘？）
　　不是特别的理解

TF-IDF(词频-逆文本频率指数)

找到每个词在文中的词频
为每个词添加权重
停止词的权重为零

地图

地址的定位
　　使用有限状态自动机，其中基于概率进行模糊匹配
路线的规划
　　动态规划

新闻的向量分类法

信息指纹

网络爬虫中url的存储转化

把字符串转化为一个正数
转化为特定长度的伪随机数（梅森螺旋算法）
信息指纹具有不可逆性

最大熵

最大熵指出，需要对一个随机事件的概率分布进行预测时，我们的预测应当满足全部的已知条件，而对未知的情况不要做任何的主观假设。
对于一组不自相矛盾的信息，最大熵存在且唯一－－希萨

布隆分析器

集合判重

文本自动分类问题

期望最大化算法

jinming-su / articles Goto Github PK

articles's People

Contributors

Stargazers

Watchers