apachecn / hands-on-ml-zh Goto Github PK
View Code? Open in Web Editor NEW:book: [译] Sklearn 与 TensorFlow 机器学习实用指南【版权问题,网站已下线!!】
:book: [译] Sklearn 与 TensorFlow 机器学习实用指南【版权问题,网站已下线!!】
Hands on Machine Learning with Scikit-Learning,Keras&Tensorflow. 英文版第二版已经上市了,水友们继续翻译一波?
https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/
theta_best = np.linalg.inv(X_b.T.dot(X_B)).dot(X_b.T).dot(y)
此处的.dot(X_B)应该是.dot(X_b)
“gradients = 2 * xi.T.dot(xi,dot(theta)-yi)” 应为 “gradients = 2 * xi.T.dot(xi.dot(theta)-yi)”
“sgd_reg + SGDRregressor(n_iter=50, penalty=None, eta0=0.1)” 应为 “sgd_reg = SGDRregressor(n_iter=50, penalty=None, eta0=0.1)”
“训练过程使用的代价函数和测试过程使用的评价函数不一样样的。” 应为“评价函数是不一样的。”
“如我定义”应为“如果定义”或“如果我们定义”
“去增加了模型的偏差”应为“却增加了模型的偏差”
“对线性回归来说,对于岭回归,我们可以使用封闭方程去计算,也可以使用梯度下降去处理。”应为“就像进行线性回归那样,对于岭回归的处理,我们既可以使用封闭方程去计算,也可以使用梯度下降去处理。”
在“在训练集上训练和评估”一步中
from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(some_labels, housing_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse
68628.413493824875
some_labels应该是housing_labels
文中为:from pandas.tools.plotting import scatter_matrix
现在pandas更新了,调用变成from pandas.plotting import scatter_matrix
请确认
没有附录A/B的翻译吗?
如题,谢谢回答啦
对分类器来说,一个好得多的性能评估指标是混淆矩阵。大体思路是:输出类别A被分类成类别 B 的次数。
这句应该翻译为:类别为A的示例被错分类为类别B的次数。
在 训练目标 下的注中,
(因为最小化w值和b值,也是最小化该值一半的平方)
原文是 (since the values of w and b that minimize a value also minimize half of its square),
所以应该是 该值平方的一半, 不是该值一半的平方吧。
代码片段:
# 准备数据
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
这里的prepare_country_stats是从哪里来的?还是只是个示例说明,需要用np自己把这两个矩阵粘在一起?
def display_scores(scores):
... print("Scores:", scores)
... print("Mean:", scores.mean())
... print("Standard deviation:", scores.std())
...
display_scores(tree_rmse_scores)
最后一行的“tree_rmse_scores”应该是“rmse_scores”
行的通,尽管预测并不怎么准确(比如,第二个预测偏离了 50%!)。让我们使用 Scikit-Learn 的mean_squared_error函数,用全部训练集来计算下这个回归模型的 RMSE:
from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(housing_labels, housing_predictions)
最后一行应为lin_mse = mean_squared_error(some_labels, housing_predictions)
为什么使用sklearn的LabelEncoder()和pandas中的factorize()的结果不同
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
housing_cat = housing["ocean_proximity"]
housing_cat_encoded1 = encoder.fit_transform(housing_cat)
housing_cat_encoded2, housing_categories = housing_cat.factorize()
housing_cat_encoded1[:10]
housing_cat_encoded2[:10]
为什么housing_cat_encoded1的值0-4, housing_cat_encoded2的值0-2
“”“在原书中使用LabelEncoder转换器来转换文本特征列的方式是错误的,该转换器只能用来转换标签(正如其名)。在这里使用LabelEncoder没有出错的原因是该数据只有一列文本特征值,在有多个文本特征列的时候就会出错。”“”
图4.2上面 theta.best 应为 theta_best
theta_best = np.linalg.inv(X_b.T.dot(X_B)).dot(X_b.T).dot(y)
这一行代码应为
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
另外, 由于之前np.random没有设置随机数种子, 所以生成的x和y本来就因人而异, 导致"我们希望最后得到的参数为 \theta_0=4,\theta_1=3 而不是 \theta_0=3.865,\theta_1=3.139 "这一句会与自己调试时必然不符, 没有必要注上译者的结果
第三章
混淆矩阵
对分类器来说,一个好得多的性能评估指标是混淆矩阵。大体思路是:输出类别A被分类成类别 B 的次数。举个例子,为了知道分类器将 5 误分为 3 的次数,你需要查看混淆矩阵的第五航第三列
应该为:第五航第三列 -> 第五行第三列
按照教程中的代码录入
import tensorflow as tf
import numpy as np
import os
from sklearn.metrics import accuracy_score
from tensorflow.examples.tutorials.mnist import input_data
### tensorflow警告记录,可以避免在运行文件时出现红色警告
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28 * 28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28 * 28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]
feature_cols = [tf.feature_column.numeric_column("X", shape=[28 * 28])]
# 下面的代码训练两个隐藏层的 DNN(一个具有 300 个神经元,另一个具有 100 个神经元)和一个具有 10 个神经元的 SOFTMax 输出层
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300, 100], n_classes=10,
feature_columns=feature_cols)
input_fn = tf.estimator.inputs.numpy_input_fn(
x={"X": X_train}, y=y_train, num_epochs=40, batch_size=50, shuffle=True)
dnn_clf.train(input_fn=input_fn)
y_pred = list(dnn_clf.predict(X_test))
accuracy=accuracy_score(y_test, y_pred)
print(accuracy)
报错
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 1119, in getfullargspec
sigcls=Signature)
File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 2186, in _signature_from_callable
raise TypeError('{!r} is not a callable object'.format(obj))
TypeError: array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32) is not a callable object
The above exception was the direct cause of the following exception:
第0章 前言 中 “go游戏”应该翻译“围棋游戏”为妥
感谢翻译。
1.Page 26, 表1-1 发过->法国
2.Page 62, "跟喜欢和数字打交道"->“更喜欢和数字打交道”
3.Page 83, “的克隆版本上进行训,”->“的克隆版本上进行训练,”
4.Page 84,“第五航第三列”->“第五行第三列”
5.Page 92,“OvO策略的主要有点”->“OVO策略的主要优点”
if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:
template = ("encoding should be either 'onehot', 'onehot-dense' "
"or 'ordinal', got %s")
raise ValueError(template % self.handle_unknown)
中的self.handle_unknow
应该是self.encoding
关于特征缩放的代码,就是下面这个Pipeline转换流水线
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
("imputer", Imputer(strategy="median")),
("attribs_adder", CombinedAttributesAdder()),
("std_scaler", StandardScaler()),
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer()),
])
full_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline)
])
housing_prepared = full_pipeline.fit_transform(housing)
运行报错如下:TypeError: fit_transform() takes 2 positional arguments but 3 were given
,
查了网上资料后发现是版本问题,LabelBinareizer
的fit_transform
函数参数定义改变了:
"""
The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:
"""
def fit_transform(self, x, y)
...rest of the code
while it is defined to take only two:
def fit_transform(self, x):
...rest of the code
希望能解决一哈,虽然找到了问题,但不知道咋改(新手QAQ)
第9章 手动一节,有这一行
gradients = 2/m * tf.matmul(tf.transpose(X), error)
实际执行时,由于2/m(m是整数)结果为0,导致gradients为0,算法不收敛
第二章中的训练集一会16512 一会16513,前后不一致了
PDF中的第三章的准确率与召回率那里(page86),precision_score(y_train_5, y_pred)中y_pred应该是y_train_pred
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.