Giter Site home page Giter Site logo

apachecn / ailearning Goto Github PK

View Code? Open in Web Editor NEW
38.1K 1.7K 11.3K 167.36 MB

AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

Home Page: http://ailearning.apachecn.org/

License: Other

Python 93.01% Jupyter Notebook 1.81% Shell 0.05% HTML 0.40% CSS 2.29% Dockerfile 0.01% JavaScript 2.43%
fp-growth apriori mahchine-leaning naivebayes svm adaboost kmeans svd pca logistic

ailearning's People

Contributors

cyrilbois avatar daskisnow avatar edxzh avatar jiangzhonglian avatar joinb-ai avatar junxnone avatar sunfeilong avatar timgates42 avatar vutting4221 avatar wizardforcel avatar yhjyh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ailearning's Issues

小疑问

py 3.x是趋势,为什么默认展示的md文本是2.x?能否更新为3.x呢?

Adaboost多类别分类问题

你好!感谢你的代码,请问Adaboost 用作多类别分类时该怎么实现?比如我这边有20个类别。

Logistic回归代码,读取训练及测试数据时错误“could not convert string to float:”

在运行colic_test()函数以及multi_test()函数是会报错“could not convert string to float:”

找了下原因,代码段:

for line in f_train.readlines():
    curr_line = line.strip().split('\t')   #这里在读取到最后一行的时候,输出了下为[""]
    line_arr = [float(curr_line[i]) for i in range(21)]
    training_set.append(line_arr)
    training_labels.append(float(curr_line[21]))

以及:

for line in f_test.readlines():
        num_test_vec += 1
        curr_line = line.strip().split('\t')   #这里在读取到最后一行的时候,输出了下为[""]
        #if len(curr_line) == 1: continue
        line_arr = [float(curr_line[i]) for i in range(21)]

建议加一句判断读取的curr_line是否为[""],如果是则跳出本次循环。

for line in f_train.readlines():
    curr_line = line.strip().split('\t')
    if len(curr_line) == 1: continue    #这里如果就一个空的元素,则跳过本次循环
    line_arr = [float(curr_line[i]) for i in range(21)]
    training_set.append(line_arr)
    training_labels.append(float(curr_line[21]))

亲测好用。

贝叶斯问题

demo里的训练集的数据都是相同的长度。请问如果训练集的数据是混乱的,长短不一。这样去训练数据,感觉矩阵处理会比较麻烦。所以请问,训练集的数据长度相同,格式一致,这是必须要遵守的基本要求吗?还是说,是为了出于demo显示的考虑,demo处理数据方便,所以将真实数据“理想化”处理了?谢谢

代码都编译不过啊。。。

在 MachineLearning/src/py3.x/5.Logistic/logistic.py 里:

# random.uniform(x, y) 方法将随机生成下一个实数,它在[x,y]范围内,x是这个范围内的最小值,y是这个范围内的最大值。
            rand_index = int(np.random.uniform(0, len(data_index)))
            h = sigmoid(np.sum(data_mat[dataIndex[randIndex]] * weights))
            error = class_labels[dataIndex[randIndex]] - h
            weights = weights + alpha * error * data_mat[dataIndex[randIndex]]
            del(data_index[rand_index])

这一段 dataIndex 和 randIndex 名字写错了。

k-means工作流程 翻译错字

第十章中K-Means 工作流程
原文

首先, 随机确定 K 个初始点作为质心(不是数据中的点).
然后将数据集中的每个点分配到一个簇中, 具体来讲, 就是为每个点找到距其最近的质心, 并将其分配该质心所对应的簇. 这一步完成之后, 每个簇的质心更新为该簇说有点的平均值.

应该更改为

首先, 随机确定 K 个初始点作为质心(不是数据中的点).
然后将数据集中的每个点分配到一个簇中, 具体来讲, 就是为每个点找到距其最近的质心, 并将其分配该质心所对应的簇. 这一步完成之后, 每个簇的质心更新为该簇所有点的平均值.

将“说有”更改为“所有”。

图像压缩

在使用SVD对图像压缩的时候,为什么最后32*32显示的全是数字“0”,

运行问题!!

您好,关于推荐系统这一部分,基于item和基于user,程序是怎么运行的呢?哪几个文件属于一起的呀?当我直接运行RS-itemcf.py的时候,报错如下:
C:\software\Anaconda3\python.exe D:/python/MachineLearning-dev/src/py3.x/16.RecommenderSystems/RS-itemcf.py
File "D:/python/MachineLearning-dev/src/py3.x/16.RecommenderSystems/RS-itemcf.py", line 3
Created on 2015-06-22
^
SyntaxError: invalid syntax

Process finished with exit code 1
不知应当如何解决呢,请大神帮忙解答一下,感激不尽!!!

第10章 K-均值聚类算法中的一个BUG

在第10章开头的loadDataSet()函数中,使用了map函数来处理数据,该map函数在python2会直接返回一个list,但是在python3中只返回该列表的地址,想要得到该list,需要用list(map())来强制转换。

第12章FP-Growth算法的一个bug(疑似)

def createTree(dataSet, minSup=1):


...


            orderedItems = [v[0] for v in sorted(localD.items(), key=lambda p: p[1], reverse=True)]


...

这里单以localD的值作为key进行排序,当两个项计数值一样时,会产生不确定的结果,比如('c',3) == ('d',3)
但是不知道为什么在python2没跑出错
移植python3后会产生随机结果
fix:

orderedItems = [v[0] for v in sorted(localD.items(), key=lambda p: (p[1],p[0]), reverse=True)]

视频制作细节

在制作讲解视频时,对于屏幕,可否仅仅使用键盘来滚动屏幕,避免使用鼠标滚轮?
因为这样对于讲解者可能下意识的滚动屏幕,但是对于观看者来说,屏幕晃动的厉害,只能看个大概的图,而屏幕上的文字就别想认真看清楚了。

1 syntax error

flake8 testing of https://github.com/apachecn/AiLearning on Python 3.7.0

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./src/py3.x/dl/perceptron.py:192:22: F821 undefined name 'train_and_perceptron'
    and_perceptron = train_and_perceptron()
                     ^
./src/py3.x/16.RecommenderSystems/test_基于用户.py:74:28: F821 undefined name 'u'
    for v, wuv in sorted(W[u].items, key=itemgetter(1), reverse=True)[0:K]:
                           ^
./src/py3.x/16.RecommenderSystems/test_基于用户.py:74:73: F821 undefined name 'K'
    for v, wuv in sorted(W[u].items, key=itemgetter(1), reverse=True)[0:K]:
                                                                        ^
./src/py3.x/16.RecommenderSystems/test_lfm.py:12:16: F821 undefined name 'items_pool'
        item = items_pool[random.randint(0, len(items_pool) - 1)]
               ^
./src/py3.x/16.RecommenderSystems/test_lfm.py:12:49: F821 undefined name 'items_pool'
        item = items_pool[random.randint(0, len(items_pool) - 1)]
                                                ^
./src/py3.x/16.RecommenderSystems/test_lfm.py:23:14: F821 undefined name 'InitModel'
    [P, Q] = InitModel(user_items, F)
             ^
./src/py3.x/16.RecommenderSystems/test_lfm.py:26:23: F821 undefined name 'RandSelectNegativeSamples'
            samples = RandSelectNegativeSamples(items)
                      ^
./src/py3.x/16.RecommenderSystems/test_lfm.py:28:29: F821 undefined name 'Predict'
                eui = rui - Predict(user, item)
                            ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:10:18: F821 undefined name 'users'
        for i in users:
                 ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:12:22: F821 undefined name 'users'
            for j in users:
                     ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:21:18: F821 undefined name 'v'
            W[u][v] = cij / math.sqrt(N[i] * N[j])
                 ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:30:18: F821 undefined name 'users'
        for i in users:
                 ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:32:22: F821 undefined name 'users'
            for j in users:
                     ^
./src/py3.x/16.RecommenderSystems/test_基于物品.py:41:18: F821 undefined name 'v'
            W[u][v] = cij / math.sqrt(N[i] * N[j])
                 ^
./src/py3.x/16.RecommenderSystems/test_evaluation_model.py:22:16: F821 undefined name 'GetRecommendation'
        rank = GetRecommendation(user, N)
               ^
./src/py3.x/16.RecommenderSystems/test_evaluation_model.py:36:16: F821 undefined name 'GetRecommendation'
        rank = GetRecommendation(user, N)
               ^
./src/py3.x/16.RecommenderSystems/test_evaluation_model.py:51:16: F821 undefined name 'GetRecommendation'
        rank = GetRecommendation(user, N)
               ^
./src/py3.x/16.RecommenderSystems/test_evaluation_model.py:68:16: F821 undefined name 'GetRecommendation'
        rank = GetRecommendation(user, N)
               ^
./src/py3.x/nlp/6.LDA/demo.py:13:14: F821 undefined name 'corpora'
dictionary = corpora.Dictionary(train)
             ^
./src/py2.x/dl/perceptron.py:62:68: E999 SyntaxError: invalid syntax
        return self.activator(reduce(lambda a, b: a + b,map(lambda (x, w): x * w, zip(input_vec, self.weights)), 0.0) + self.bias)
                                                                   ^
1     E999 SyntaxError: invalid syntax
19    F821 undefined name 'GetRecommendation'
20

In def file2matrix()

When I run "datingDataMat, datingLabels = KNN.file2matrix('datingTestSet2.txt')"
It shows
ValueError: could not convert string to float:
It makes me confused ,thanks for your answer.

关于apriori算法的rulesFromConseq函数的问题

对于这个函数

def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.6):
    #参数:一个是频繁项集,另一个是可以出现在规则右部的元素列表 H
    m = len(H[0])
    if (len(freqSet) > (m + 1)): #频繁项集元素数目大于单个集合的元素数
        Hmp1 = aprioriGen(H, m+1)#存在不同顺序、元素相同的集合,合并具有相同部分的集合
        Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)#计算可信度
        if (len(Hmp1) > 1):    #满足最小可信度要求的规则列表多于1,则递归
            rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)

我认为忽略了3-项集及大于3-项集的类似{}-->{1}的关联规则,即关联规则右部只有一个元素的规则.
应该改为这样:

    #参数:一个是频繁项集,另一个是可以出现在规则右部的元素列表 H
    m = len(H[0])
    **if m == 1:
        calcConf(freqSet, H, supportData, brl, minConf)**
    if (len(freqSet) > (m + 1)): #频繁项集元素数目大于单个集合的元素数
        Hmp1 = aprioriGen(H, m+1)#存在不同顺序、元素相同的集合,合并具有相同部分的集合
        Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)#计算可信度
        if (len(Hmp1) > 1):    #满足最小可信度要求的规则列表多于1,则递归
            rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)```

请看一下我的判断对不对?

"/src/py3.x/ml/4.NaiveBayes/bayes.py" #line164 疑问

引自该文件164行:" # 可以理解为 1.单词在词汇表中的条件下,文件是good 类别的概率 也可以理解为 2.在整个空间下,文件既在词汇表中又是good类别的概率"

您好,我对这里的p1Vec意义有疑问,我认为应该是在文件是good类别的前提下,对应位置(index)的单词出现在文档中的概率.

逻辑回归示例代码中有一处问题

上升or下降?

https://github.com/apachecn/MachineLearning/blob/master/src/py2.x/ML/5.Logistic/logistic.py 里有2个方法:

正常的梯度上升法
https://github.com/apachecn/MachineLearning/blob/f727eda8150e26cea0425b0bb17b0badb67d5b01/src/py2.x/ML/5.Logistic/logistic.py#L54

随机梯度下降
https://github.com/apachecn/MachineLearning/blob/f727eda8150e26cea0425b0bb17b0badb67d5b01/src/py2.x/ML/5.Logistic/logistic.py#L100

这2个函数的主要区别只有前者使用全量数据更新w, 后者使用一个样本更新w
那么在这里,上升和下降的区别在哪里? 是否注释错了?

如果可以,我觉得应该分别讲讲上升法和下降法的公式,这样也好理解下面这个代码是上升还是下降

weights = weights + alpha * dataMatrix.transpose() * error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.