apachecn / hands-on-ml-zh Goto Github PK

View Code? Open in Web Editor NEW

3.7K 267.0 1.5K 42.55 MB

:book: [译] Sklearn 与 TensorFlow 机器学习实用指南【版权问题，网站已下线！！】

Python 2.17% JavaScript 10.51% CSS 54.12% HTML 32.56% Shell 0.64%

tensorflow sklearn python machine-learning deep-learning book

hands-on-ml-zh's Issues

第二版上市了

Hands on Machine Learning with Scikit-Learning,Keras&Tensorflow. 英文版第二版已经上市了，水友们继续翻译一波？

https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

第四章

theta_best = np.linalg.inv(X_b.T.dot(X_B)).dot(X_b.T).dot(y)

此处的.dot(X_B)应该是.dot(X_b)

英文 AZW3（对照用）

https://github.com/iamseancheney/pythonbooks/raw/master/Hands-On%20Machine%20Learning%20with%20-%20Aurelien%20Geron.azw3

“gradients = 2 * xi.T.dot(xi,dot(theta)-yi)” 应为 “gradients = 2 * xi.T.dot(xi.dot(theta)-yi)”
“sgd_reg + SGDRregressor(n_iter=50, penalty=None, eta0=0.1)” 应为 “sgd_reg = SGDRregressor(n_iter=50, penalty=None, eta0=0.1)”
“训练过程使用的代价函数和测试过程使用的评价函数不一样样的。” 应为“评价函数是不一样的。”
“如我定义”应为“如果定义”或“如果我们定义”
“去增加了模型的偏差”应为“却增加了模型的偏差”
“对线性回归来说，对于岭回归，我们可以使用封闭方程去计算，也可以使用梯度下降去处理。”应为“就像进行线性回归那样，对于岭回归的处理，我们既可以使用封闭方程去计算，也可以使用梯度下降去处理。”

发现第2章一个完整的机器学习项目中的一个小错误

在“在训练集上训练和评估”一步中

from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(some_labels, housing_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse
68628.413493824875
some_labels应该是housing_labels

第二章代码错误？

文中为：from pandas.tools.plotting import scatter_matrix

现在pandas更新了，调用变成from pandas.plotting import scatter_matrix

请确认

各位大大辛苦，没有附录A/B的翻译吗

没有附录A/B的翻译吗？

PDF十二章图片显示问题

下载的PDF版十二章图片无法显示，从Safari下载用Mac自带的预览打开。如图所示。

请问课后题的答案在哪可以看呢

如题，谢谢回答啦

第四章“Softmax 回归”一个公式错误

错误之处：

此处原文为：

第三章勘误

对分类器来说，一个好得多的性能评估指标是混淆矩阵。大体思路是：输出类别A被分类成类别 B 的次数。
这句应该翻译为：类别为A的示例被错分类为类别B的次数。

第五章支持向量机翻译小错误

在 训练目标 下的注中，

（因为最小化w值和b值，也是最小化该值一半的平方）

原文是 (since the values of w and b that minimize a value also minimize half of its square)，
所以应该是该值平方的一半，不是该值一半的平方吧。

第一章代码示例中prepare_country_stats是一个自定义函数？

代码片段：

    # 准备数据
    country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
    X = np.c_[country_stats["GDP per capita"]]
    y = np.c_[country_stats["Life satisfaction"]]

这里的prepare_country_stats是从哪里来的？还是只是个示例说明，需要用np自己把这两个矩阵粘在一起？

2.一个完整的机器学习项目.md 一个变量名错了

“使用交叉验证做更佳的评估”小节中，

def display_scores(scores):
... print("Scores:", scores)
... print("Mean:", scores.mean())
... print("Standard deviation:", scores.std())
...
display_scores(tree_rmse_scores)

最后一行的“tree_rmse_scores”应该是“rmse_scores”

第二章【在训练集上训练和评估】中部分代码有误

在训练集上训练和评估

行的通，尽管预测并不怎么准确（比如，第二个预测偏离了 50%！）。让我们使用 Scikit-Learn 的mean_squared_error函数，用全部训练集来计算下这个回归模型的 RMSE：

from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(housing_labels, housing_predictions)

最后一行应为lin_mse = mean_squared_error(some_labels, housing_predictions)

第四章 Normal Equation 翻译为正规方程比较好一些

一个完整的机器学习项目中的对文本特征类编码地方问题

为什么使用sklearn的LabelEncoder()和pandas中的factorize()的结果不同

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
housing_cat = housing["ocean_proximity"]
housing_cat_encoded1 = encoder.fit_transform(housing_cat)
housing_cat_encoded2, housing_categories = housing_cat.factorize()
housing_cat_encoded1[:10] 
 housing_cat_encoded2[:10]

为什么housing_cat_encoded1的值0-4， housing_cat_encoded2的值0-2

翻译的译注有点问题

“”“在原书中使用LabelEncoder转换器来转换文本特征列的方式是错误的，该转换器只能用来转换标签（正如其名）。在这里使用LabelEncoder没有出错的原因是该数据只有一列文本特征值，在有多个文本特征列的时候就会出错。”“”

第二章中的译注
LabelEncoder这个转化函数本身要求传入的参数就是一个Series，而不是DataFrame，所以这里的提示没有什么意义
当然作为读书笔记还是加上这种提示的，但是并不代表原书作者使用的错误

第三章变量名拼写错误

图4.2上面 theta.best 应为 theta_best

第三章线性回归中的小错误

theta_best = np.linalg.inv(X_b.T.dot(X_B)).dot(X_b.T).dot(y)
这一行代码应为
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

另外, 由于之前np.random没有设置随机数种子, 所以生成的x和y本来就因人而异, 导致"我们希望最后得到的参数为 \theta_0=4,\theta_1=3 而不是 \theta_0=3.865,\theta_1=3.139 "这一句会与自己调试时必然不符, 没有必要注上译者的结果

错别字

第三章

混淆矩阵
对分类器来说，一个好得多的性能评估指标是混淆矩阵。大体思路是：输出类别A被分类成类别 B 的次数。举个例子，为了知道分类器将 5 误分为 3 的次数，你需要查看混淆矩阵的第五航第三列

应该为：第五航第三列 -> 第五行第三列

第十章使用tensorflow高级api报错

按照教程中的代码录入

import tensorflow as tf
import numpy as np
import os
from sklearn.metrics import accuracy_score
from tensorflow.examples.tutorials.mnist import input_data

### tensorflow警告记录，可以避免在运行文件时出现红色警告
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28 * 28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28 * 28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)

X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

feature_cols = [tf.feature_column.numeric_column("X", shape=[28 * 28])]
# 下面的代码训练两个隐藏层的 DNN（一个具有 300 个神经元，另一个具有 100 个神经元）和一个具有 10 个神经元的 SOFTMax 输出层
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300, 100], n_classes=10,
                                     feature_columns=feature_cols)

input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_train}, y=y_train, num_epochs=40, batch_size=50, shuffle=True)
dnn_clf.train(input_fn=input_fn)

y_pred = list(dnn_clf.predict(X_test))
accuracy=accuracy_score(y_test, y_pred)
print(accuracy)

报错

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 1119, in getfullargspec
    sigcls=Signature)
  File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 2186, in _signature_from_callable
    raise TypeError('{!r} is not a callable object'.format(obj))
TypeError: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32) is not a callable object

The above exception was the direct cause of the following exception:

第0章前言中 “go游戏”应该翻译“围棋游戏”为妥

Some Typos

感谢翻译。

1.Page 26, 表1-1 发过->法国
2.Page 62, "跟喜欢和数字打交道"->“更喜欢和数字打交道”
3.Page 83, “的克隆版本上进行训，”->“的克隆版本上进行训练，”
4.Page 84，“第五航第三列”->“第五行第三列”
5.Page 92，“OvO策略的主要有点”->“OVO策略的主要优点”

第二章 tex 公式，遗漏了减号

位置在 “预测误差是”后面，英文原文为 "The prediction error for this
district is "

第3章混淆矩阵的公式图片

“这证明了提高阈值会降调召回率。” -> 这说明提高阈值会使召回率降低。

越往后越随意了啊，第4章公式图例几乎都没有……

Markdown 中的公式为啥都是图片，而不用 tex 直接写呢？

第二章 CategoricalEncoder代码小错误

if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:
template = ("encoding should be either 'onehot', 'onehot-dense' "
"or 'ordinal', got %s")
raise ValueError(template % self.handle_unknown)

中的self.handle_unknow 应该是self.encoding

第一章图表 1-1 中“法国”被写成了“发过”

图表路径为 /images/chapter_1/t-1-1.png。我给 p 了一下，你们看看用不用这个：

第二章 PipeLine的示例代码运行出错

关于特征缩放的代码,就是下面这个Pipeline转换流水线

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ("imputer", Imputer(strategy="median")),
    ("attribs_adder", CombinedAttributesAdder()),
    ("std_scaler", StandardScaler()),
])
cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer()),
])

full_pipeline = FeatureUnion(transformer_list=[
    ("num_pipeline", num_pipeline),
    ("cat_pipeline", cat_pipeline)
])

housing_prepared = full_pipeline.fit_transform(housing)

运行报错如下:TypeError: fit_transform() takes 2 positional arguments but 3 were given,

查了网上资料后发现是版本问题,LabelBinareizer的fit_transform函数参数定义改变了:

"""
The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:
"""
def fit_transform(self, x, y)
    ...rest of the code
while it is defined to take only two:

def fit_transform(self, x):
    ...rest of the code

希望能解决一哈,虽然找到了问题,但不知道咋改(新手QAQ)

apachecn / hands-on-ml-zh Goto Github PK

hands-on-ml-zh's Issues

“使用交叉验证做更佳的评估”小节中，

在训练集上训练和评估

Recommend Projects

Recommend Topics

Recommend Org