Giter Site home page Giter Site logo

jack-cherish / machine-learning Goto Github PK

View Code? Open in Web Editor NEW
8.5K 8.5K 5.0K 1.11 MB

:zap:机器学习实战(Python3):kNN、决策树、贝叶斯、逻辑回归、SVM、线性回归、树回归

Home Page: https://cuijiahua.com/blog/ml/

Python 100.00%
adaboost adaboost-algorithm decision-tree knn logistic machine-learning navie-bayes-algorithm python python3 regression smo svm

machine-learning's Introduction

machine-learning's People

Contributors

cugtyt avatar eecn avatar jack-cherish avatar jiangf13 avatar youngmstudio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning's Issues

关于朴素贝叶斯代码的问题

您好,在您的代码bayes.py中,第127行对应元素相乘应该不是这么写的。应该是vec2Classify和p1Vec中对应位置的元素相乘,再将结果加起来。因为vec2Classify代表一个0-1矩阵,0表示单词未出现,1表示单词出现。而计算这个值应该是如果第N个单词出现了,那么将p(W_N|1)考虑进去,而未出现的单词并不用考虑进去。

另外我在阅读原书时有一个小小的疑惑,为什么朴素贝叶斯不是按照下面这种方式计算单词出现的概率:
例如我们考虑条件概率p(W_i|C1)时,原书只考虑了W_i等于1时的概率,而没有考虑W_i等于0时的概率,为什么不是分别计算出p(W_i=1|C1)和p(W_i=0|C1),然后根据测试集中,某个单词所在的位置W_i是否为0计算。如假设某一个向量为(0,1,1,0,0),类别为1
那么计算过程为:p(1|W_1=0,W_2=1,W_3=1,W_4=0,W_5=0)=p(W_1=0|1)*p(W_2=1|1)...p(W_5=0)*p(1)/p(W)

而原书的计算过程为:p(1|W_2=1,W_3=1)=p(W_2=1|1)*p(W_3=1)*p(1)/p(W)
也就是原书并没有考虑某一项为0的条件概率,请问为什么会是这么做的?

感谢!


不好意思,我明白我提出的那个问题了,原书认为p(W_1=0|1)=1,所以自然省略了这一项

'utf-8' codec can't decode byte 0x92 in position 880: invalid start

博主你好,在clone你的关于贝叶斯进行垃圾邮件分类的代码时,运行发现python3对文件的解码方式与python2有所不同,会爆出题目所示的错误。我在网上搜了,刚入门python不久,没有找到正确的解决方法。希望楼主赐教?具体报错地方在bayes-modify.py文件的191和195行。

English version

Hi, seems a nice Repo, do you also have english version of it?

朴素贝叶斯代码中的一个问题,个人看法。

作者在文中提到,将w展开为一个个独立特征,那么就可以将p(w/ci)展开为p(w0, w1, ... | ci)。那么在整个数据集中,应该有 p(w0_0) + p(w0_1) = 1。
但是在实际代码中,计算不同类别 p(w0) 处分母时,却加上了该数据行中单词出现的总次数,这里应该是有误的。如果将每个特征看做独立,这里应该只需要加1。

def trainNB0(trainMatrix, trainCategory):
    nTrainDocs = len(trainMatrix)
    nWords = len(trainMatrix[0])
    pAbusive = sum(trainCategory) / float(nTrainDocs)
    p0Num = np.zeros(nWords)
    p1Num = np.zeros(nWords)
    p0Denom = 0
    p1Denom = 1
    for i in range(nTrainDocs):
        if trainCategory[i] == 1:
            p1Num += trainMatrix[i]
            # 此处将分母加上了所有单词出现的次数
            # p1Denom += sum(trainMatrix[i])
            p1Denom += 1
        else:
            p0Num += trainMatrix[i]
            # p0Denom += sum(trainMatrix[i])
            p0Denom += 1
    p1Vect = p1Num / p1Denom
    p0Vect = p0Num / p0Denom
    return p0Vect, p1Vect, pAbusive

决策树最优特征

想问一下楼主,为什么初始的最优特征赋值为-1?可以赋值为其他值么?

翻译

请问您一下,

您介意我能把您的repo翻译成英文吗?

逻辑回归中关于梯度下降问题

你好,楼主,又来请教问题,在随机梯度下降的代码中有代码没太懂,我认为应该是批量梯度下降才应该加和,但是为什么随机梯度下降要加和呢? h = sigmoid(sum(dataMatrix*weights)) 求指点,谢谢!!

错误率问题

按照您的思路和自己的结合写了约会那个测试,我的结果是8%错误率。怎么你的博客是3%??然后我把你github这个程序拷贝直接运行,结果是4%。。咋回事???求解答谢谢啦

关于SMO算法想请教您几个问题?

博主您好,关于SMO算法想请教您几个问题,之前也没有实现过SVM,看了您文章叫手撕很带劲,自己又重新连推导带手撕了一下,感谢感谢~
(1)之前问您的支持向量到超平面间隔应该相等的问题,因为我实现的结果和您的差不多,间隔看起来也不相等,我输出了自己模型的预测结果,每个支持向量到超平面距离也确实不相等,差别还很大,我也clone了您的SMO代码,输出了间隔如下:
(margin, label, example):
-1.973460528 -1.0 [3.542485, 1.977398]
-1.38201380007 -1.0 [2.114999, -0.004466]
5.08844420727 1.0 [8.127113, 1.274372]
-2.43899604116 -1.0 [4.658191, 3.507396]
5.08764339399 1.0 [8.197181, 1.545132]
4.85883569251 1.0 [7.40786, -0.121961]
4.58800136353 1.0 [6.960661, -0.245353]
3.8990174816 1.0 [6.080573, 0.418886]
-1.8990174816 -1.0 [3.107511, 0.758367]
间隔不等,超过容错率了,这一点没想通。
(2)eta=K11+K22-2K12,原SMO文章在eta<=0以后计算的是边界的目标函数值,不知道为什么eta<=0时alpha在边界上?
(3)我觉得在主while循环内部应该定义alphaPairsChanged = 0,加上后测试的代码只需要迭代3次。

完整版SMO问题

你好,可以问一下在selectJ中误差的值一定需要再calEk吗,这样的话,程序之前计算并存入误差的步骤是否可以省略。

SVM算法中有个步骤不是很懂

在博主的svm-simple.py 文件中第 147 行
fXi = float(np.multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b
书上说fXi是我们预测的类别,我想问下为什么预测的类别是这样计算的?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.