Hi anyone knows the workaround ? in page 43/564 Ex

As <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Comments (24)

ageron commented on May 2, 2024 13

Hi everyone,

Apparently this missing code is causing some confusion, I'm sorry about that. It is only there to "whet your appetite", to give you a feel of what's coming next, no to be actually executed. But I understand that some readers might want to run it as is. If you really want to execute it, then here's a prepare_country_stats() function you can use:

def prepare_country_stats(oecd_bli, gdp_per_capita):
    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
    gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
    gdp_per_capita.set_index("Country", inplace=True)
    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
                                  left_index=True, right_index=True)
    full_country_stats.sort_values(by="GDP per capita", inplace=True)
    remove_indices = [0, 1, 6, 8, 33, 34, 35]
    keep_indices = list(set(range(36)) - set(remove_indices))
    return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]

Just add this function at the beginning of the code, and run the program in the directory that contains the data files (oecd_bli_2015.csv and gdp_per_capita.csv) and you should be fine (except that you must add an import sklearn.linear_model, at least in recent versions of Scikit-Learn).

As you can see, it's a long and boring function that prepares the data to have a nice and clean matrix in the end. Just Pandas stuff, nothing special about it, and nothing interesting with regards to Machine Learning, which is why I didn't want to include it in the book. In general, I avoid including every single line of code in the book, for readability, to keep it short and focused on what matters most, but hopefully, from chapter 2 onwards, you should be able to follow along in the Jupyter notebook very easily.

In the latest release, I added a footnote saying "The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction data into a single Pandas dataframe."
Perhaps that's not clear enough, though: I think I will change this to explicitly tell readers that if they want to run the code, they should do so in the Jupyter notebook which contains all the boring details (this is strongly suggested in the preface, but I know not everyone reads the preface, I certainly don't).

What do you think?

from handson-ml.

ageron commented on May 2, 2024 7

As @pprivulet pointed out (thanks!), the function is defined in the notebook. I left some code out of the book when there was really nothing interesting or machine learning specific to it. Things like plotting an image, etc. If you get stuck at any point, check out the corresponding notebook, and don't hesitate to ping me, I'll be happy to help.

Cheers

from handson-ml.

pprivulet commented on May 2, 2024 6

01_the_machine_learning_landscape.ipynb: "def prepare_country_stats(oecd_bli, gdp_per_capita):\n",

The function is defined in 01_the_machine_learning_landscape.ipynb
Good luck

from handson-ml.

ageron commented on May 2, 2024 4

I replaced the footnote with this: "The prepare_country_stats() function's definition is not shown here (see this chapter's Jupyter notebook if you want all the gory details). It's just boring Pandas code that joins the life satisfaction data from the OECD with the GDP per capita data from the IMF."

I also updated the notebook to make the example 1-1's code stand out at the beginning, and I added the prepare_country_stats() function from my previous comment.

Thanks everyone for your very useful feedback! Hopefully, the book will get better and better. :)

from handson-ml.

saravanakumarjsk commented on May 2, 2024 2

Thanks, that was very helpfull

from handson-ml.

ankursworld commented on May 2, 2024 1

Thank you for your clarification and help! Thanks Ankur

…

On Sun, Feb 4, 2018 at 9:02 AM, saravana kumar ***@***.***> wrote: Thanks, that was so helpfull — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVwVKpEsCESMa-wiWB7bypYic8hWQKn9ks5tRSSvgaJpZM4NyLtq> .

from handson-ml.

sor3765 commented on May 2, 2024 1

I guess I download the file wrong maybe... it didnt explain clear where exactly I can get the file so I tried to get it from someone's github... but I can try this again tomorrow morning.

…

On Thu, Feb 14, 2019, 12:51 AM Aurélien Geron ***@***.*** wrote: Hi @sor3765 <https://github.com/sor3765> , Thanks for your question. Perhaps the problem comes from the data? Are you using oecd_bli_2015.csv and gdp_per_capita.csv which are available in the datasets/lifesat <https://github.com/ageron/handson-ml/tree/master/datasets/lifesat> directory or did you try to download the latest data from the OECD and IMF websites? Are you sure you did not modify the code in any way? Perhaps you should download it again, just to be sure? If you copy/pasted the code, perhaps the indentation got modified? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWsy5FTfRznVvUlpTp3KTAB_fvdEWn38ks5vNQdzgaJpZM4NyLtq> .

from handson-ml.

ashokrajv commented on May 2, 2024 1

Thank you @ageron, I shall try this option.
Thanks,
Ashok

from handson-ml.

rkuma107 commented on May 2, 2024 1

Thanks Ageron. I am learning ML for the first time and such forum and your active participation is very helpful.

from handson-ml.

Jai-GAY commented on May 2, 2024

ya, thanks, i need to execute from / using jupyter notebook

from handson-ml.

ankursworld commented on May 2, 2024

Hi @ageron,
I have tried to follow the example 1-1 given in your book .. and also tried to append it with the code/function in the jupyter file. However, it is still causing errors.
I think it should be a fair expectation to be able to follow the code in the book without correcting it. Could you please see if the code in the book can be updated to be self-sufficient. Or you can refer to the file and ask users to only run that and not the code itself.
Thanks
Ankur

from handson-ml.

McCarthyORAL commented on May 2, 2024

I hope this can be helpful
http://www.cnblogs.com/yaoz/p/6858417.html

from handson-ml.

McCarthyORAL commented on May 2, 2024

Hello Aurélien Géron, In fetching a dataset from any website for machine learning, Please help me with a python script to first pull data from the website and secondly a script incrementally update the dataset daily. Thanks Richard

…

On Mon, Jan 15, 2018 at 3:57 PM, Aurélien Geron ***@***.***> wrote: Hi everyone, Apparently this missing code is causing some confusion, I'm sorry about that. It is only there to "*whet your appetite*", to give you a feel of what's coming next, no to be actually executed. But I understand that some readers might want to run it as is. If you really want to execute it, then here's a prepare_country_stats() function you can use: def prepare_country_stats(oecd_bli, gdp_per_capita): oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"] oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value") gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True) gdp_per_capita.set_index("Country", inplace=True) full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True) full_country_stats.sort_values(by="GDP per capita", inplace=True) remove_indices = [0, 1, 6, 8, 33, 34, 35] keep_indices = list(set(range(36)) - set(remove_indices)) return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices] Just add this function at the beginning of the code, and run the program in the directory that contains the data files (oecd_bli_2015.csv and gdp_per_capita.csv) and you should be fine (except that you must add an import sklearn.linear_model, at least in recent versions of Scikit-Learn). As you can see, it's a long and boring function that prepares the data to have a nice and clean matrix in the end. Just Pandas stuff, nothing special about it, and nothing interesting with regards to Machine Learning, which is why I didn't want to include it in the book. In general, I avoid including every single line of code in the book, for readability, to keep it short and focused on what matters most, but hopefully, from chapter 2 onwards, you should be able to follow along in the Jupyter notebook very easily. In the latest release, I added a footnote saying "*The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction data into a single Pandas dataframe.*" Perhaps that's not clear enough, though: I think I will change this to explicitly tell readers that if they want to run the code, they should do so in the Jupyter notebook which contains all the boring details (this is strongly suggested in the preface, but I know not everyone reads the preface, I certainly don't). What do you think? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ad_3xqqYaKNGTnghSPSTUzlLu18EzMdeks5tK3VVgaJpZM4NyLtq> .

from handson-ml.

ageron commented on May 2, 2024

Hi Richard,

the download part does not have to be in python, it might be simpler just writing a cron job that uses wget or curl to download the file.
That said, there's an example code in the notebook for chapter 2: https://github.com/ageron/handson-ml/blob/master/02_end_to_end_machine_learning_project.ipynb

Hope this helps,
Aurélien

from handson-ml.

sor3765 commented on May 2, 2024

I tried doing this example 1 copied and paste it... then add the def function... I keep getting the KeyError: 'Country'...... from Line 12 : gdp_per_capita.set_index("Country", inplace=True)

How to fix this tiny error? I tried it on Jupyter Notebook and Visual Studio both end up the same error.

from handson-ml.

ageron commented on May 2, 2024

Hi @sor3765 ,
Thanks for your question. Perhaps the problem comes from the data? Are you using oecd_bli_2015.csv and gdp_per_capita.csv which are available in the datasets/lifesat directory or did you try to download the latest data from the OECD and IMF websites?
Are you sure you did not modify the code in any way? Perhaps you should download it again, just to be sure?
If you copy/pasted the code, perhaps the indentation got modified?

from handson-ml.

ashokrajv commented on May 2, 2024

I guess I download the file wrong maybe... it didnt explain clear where exactly I can get the file so I tried to get it from someone's github... but I can try this again tomorrow morning.
…
On Thu, Feb 14, 2019, 12:51 AM Aurélien Geron @.*** wrote: Hi @sor3765 https://github.com/sor3765 , Thanks for your question. Perhaps the problem comes from the data? Are you using oecd_bli_2015.csv and gdp_per_capita.csv which are available in the datasets/lifesat https://github.com/ageron/handson-ml/tree/master/datasets/lifesat directory or did you try to download the latest data from the OECD and IMF websites? Are you sure you did not modify the code in any way? Perhaps you should download it again, just to be sure? If you copy/pasted the code, perhaps the indentation got modified? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AWsy5FTfRznVvUlpTp3KTAB_fvdEWn38ks5vNQdzgaJpZM4NyLtq .

Hi sor3765,
Me too facing same issue. I am running in kaggle kernel, and I downloaded the file from public dataset
https://www.kaggle.com/abhilashanil/better-life-index-and-gross-domestic-product/kernels
Any help? Complete error in attachment
Thanks,
Ashok
KeyError_Country.txt

from handson-ml.

ageron commented on May 2, 2024

The files are available directly in this project, in the datasets/lifesat directory:
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv

from handson-ml.

ashokrajv commented on May 2, 2024

The files are available directly in this project, in the datasets/lifesat directory:
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv

Thank you Aurélien Géron,
For quick reply.
I tried loading both files manually.
Able to upload gdp_per_capita.csv. But not able to succeed with oecd_bli_2015.csv.
Some existing file in Kaggle dataset is stopping me, but that file doesn't belong to me. How can I handle this? Screen shot attached.

Appreciate your help!

Thanks,
Ashok

from handson-ml.

ageron commented on May 2, 2024

Hi @ashokrajv ,
I have never run into this issue, sorry. It seems that Kaggle wants to avoid data duplication, so they're asking you to reuse the file from the other dataset. Not sure how this is done in Kaggle, I recommend you ask Kaggle.
Alternatively, you can just update the notebook to download the files instead of using the ones in the project:

from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve (URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve (URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")

Then you can load them using pd.read_csv(), as shown in the notebook.

Hope this helps.

from handson-ml.

ashokrajv commented on May 2, 2024

Hi @ashokrajv ,
I have never run into this issue, sorry. It seems that Kaggle wants to avoid data duplication, so they're asking you to reuse the file from the other dataset. Not sure how this is done in Kaggle, I recommend you ask Kaggle.
Alternatively, you can just update the notebook to download the files instead of using the ones in the project:
from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve (URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve (URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")
Then you can load them using pd.read_csv(), as shown in the notebook.

Hope this helps.

Hi Aurélien Géron,
I tried using urlretrieve method as below.
replaced below line
#datapath = os.path.join("datasets", "lifesat", "")
with the code you have given to read directly from URL
from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve(URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve(URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")

Getting error in urlretrieve function.
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Full error in text attachment
URLError.txt
Any suggestions please.
Thanks,
Ashok

from handson-ml.

ageron commented on May 2, 2024

Wow, that's bad luck! You seem to have DNS issues. Name resolution is converting the domain name (raw.githubusercontent.com) to an IP address (151.101.8.133). You could try again, as it says, it's probably a "temporary failure". If the problem persists, check your network settings or your ISP, something's fishy. Or you could just download the files manually by visiting https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv and https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv and select File > Save Page As...
Hope this helps

from handson-ml.

rkuma107 commented on May 2, 2024

Hi ageron,
After following your first example, i excited and tested a simple linear regression model to calculate square of a number.

x = np.array([[1],[2], [3], [4], [5],[6], [7], [8], [9], [10], [11], [12], [13], [14], [15],[20],[25],[30]])
y = np.array([[1],[4], [9], [16], [25], [36], [49], [64], [81], [100], [121], [144], [169], [196], [225],[400],[625],[900]])
m = sklearn.linear_model.LinearRegression()
m2.fit(x,y)
m2.predict([[10]]) # array([[151.49643705]])
m2.predict([[35]]) # array([[881.60332542]])

In such a simple case, why Linear regression model is not able to give correct square for 10 & 35 ?
My apology if i am not suppose to ask such question here.

from handson-ml.

ageron commented on May 2, 2024

Hi @rkuma107 ,

A linear regression model assumes that the data you are trying to model is linear. In other words, it assumes that y = w1×x1 + w2×x2 + ... + wn×xn + b (plus some Gaussian noise), and it tries to find the coefficients w1 to wn and the bias term b. In your case there is a single input feature x1, so the model simplifies to: y = w1×x1 + b

However, in your example the data is not linear, it is quadratic, so the linear model makes inaccurate predictions. You can see this clearly in the following plot:

You can get this plot by running the following code in Jupyter (or Colab):

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import sklearn.linear_model

#x = np.array([[1],[2], [3], [4], [5],[6], [7], [8], [9], [10], [11], [12], [13], [14], [15],[20],[25],[30]])
#y = np.array([[1],[4], [9], [16], [25], [36], [49], [64], [81], [100], [121], [144], [169], [196], [225],[400],[625],[900]])
x = np.array([list(range(1, 15)) + list(range(15, 31, 5))]).reshape(-1, 1)
y = x ** 2
m = sklearn.linear_model.LinearRegression()
m.fit(x,y)
m.predict([[10]]) # array([[151.49643705]])
m.predict([[35]]) # array([[881.60332542]])

plt.plot(x, y, "o")
plt.xlabel("x", fontsize=16)
plt.ylabel("y", rotation=0, fontsize=16)
xs = np.linspace(0, 30, 100).reshape(-1, 1)
ys = m.predict(xs)
plt.plot(xs, ys)

Note that I defined x and y a bit differently: you can use range() to avoid typing long lists of integers, and you can run operations on NumPy arrays directly, for example y = x**2.
Hope this helps.

from handson-ml.

NameError: name 'prepare_country_stats' is not defined about handson-ml HOT 24 CLOSED

Comments (24)

Thanks, that was very helpfull

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent