Giter Site home page Giter Site logo

Comments (24)

ageron avatar ageron commented on May 2, 2024 13

Hi everyone,

Apparently this missing code is causing some confusion, I'm sorry about that. It is only there to "whet your appetite", to give you a feel of what's coming next, no to be actually executed. But I understand that some readers might want to run it as is. If you really want to execute it, then here's a prepare_country_stats() function you can use:

def prepare_country_stats(oecd_bli, gdp_per_capita):
    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
    gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
    gdp_per_capita.set_index("Country", inplace=True)
    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
                                  left_index=True, right_index=True)
    full_country_stats.sort_values(by="GDP per capita", inplace=True)
    remove_indices = [0, 1, 6, 8, 33, 34, 35]
    keep_indices = list(set(range(36)) - set(remove_indices))
    return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]

Just add this function at the beginning of the code, and run the program in the directory that contains the data files (oecd_bli_2015.csv and gdp_per_capita.csv) and you should be fine (except that you must add an import sklearn.linear_model, at least in recent versions of Scikit-Learn).

As you can see, it's a long and boring function that prepares the data to have a nice and clean matrix in the end. Just Pandas stuff, nothing special about it, and nothing interesting with regards to Machine Learning, which is why I didn't want to include it in the book. In general, I avoid including every single line of code in the book, for readability, to keep it short and focused on what matters most, but hopefully, from chapter 2 onwards, you should be able to follow along in the Jupyter notebook very easily.

In the latest release, I added a footnote saying "The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction data into a single Pandas dataframe."
Perhaps that's not clear enough, though: I think I will change this to explicitly tell readers that if they want to run the code, they should do so in the Jupyter notebook which contains all the boring details (this is strongly suggested in the preface, but I know not everyone reads the preface, I certainly don't).

What do you think?

from handson-ml.

ageron avatar ageron commented on May 2, 2024 7

As @pprivulet pointed out (thanks!), the function is defined in the notebook. I left some code out of the book when there was really nothing interesting or machine learning specific to it. Things like plotting an image, etc. If you get stuck at any point, check out the corresponding notebook, and don't hesitate to ping me, I'll be happy to help.

Cheers

from handson-ml.

pprivulet avatar pprivulet commented on May 2, 2024 6

01_the_machine_learning_landscape.ipynb: "def prepare_country_stats(oecd_bli, gdp_per_capita):\n",

The function is defined in 01_the_machine_learning_landscape.ipynb
Good luck

from handson-ml.

ageron avatar ageron commented on May 2, 2024 4

I replaced the footnote with this: "The prepare_country_stats() function's definition is not shown here (see this chapter's Jupyter notebook if you want all the gory details). It's just boring Pandas code that joins the life satisfaction data from the OECD with the GDP per capita data from the IMF."

I also updated the notebook to make the example 1-1's code stand out at the beginning, and I added the prepare_country_stats() function from my previous comment.

Thanks everyone for your very useful feedback! Hopefully, the book will get better and better. :)

from handson-ml.

saravanakumarjsk avatar saravanakumarjsk commented on May 2, 2024 2

Thanks, that was very helpfull

from handson-ml.

ankursworld avatar ankursworld commented on May 2, 2024 1

from handson-ml.

sor3765 avatar sor3765 commented on May 2, 2024 1

from handson-ml.

ashokrajv avatar ashokrajv commented on May 2, 2024 1

Thank you @ageron, I shall try this option.
Thanks,
Ashok

from handson-ml.

rkuma107 avatar rkuma107 commented on May 2, 2024 1

Thanks Ageron. I am learning ML for the first time and such forum and your active participation is very helpful.

from handson-ml.

Jai-GAY avatar Jai-GAY commented on May 2, 2024

ya, thanks, i need to execute from / using jupyter notebook

from handson-ml.

ankursworld avatar ankursworld commented on May 2, 2024

Hi @ageron,
I have tried to follow the example 1-1 given in your book .. and also tried to append it with the code/function in the jupyter file. However, it is still causing errors.
I think it should be a fair expectation to be able to follow the code in the book without correcting it. Could you please see if the code in the book can be updated to be self-sufficient. Or you can refer to the file and ask users to only run that and not the code itself.
Thanks
Ankur

from handson-ml.

McCarthyORAL avatar McCarthyORAL commented on May 2, 2024

I hope this can be helpful
http://www.cnblogs.com/yaoz/p/6858417.html

from handson-ml.

McCarthyORAL avatar McCarthyORAL commented on May 2, 2024

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hi Richard,

the download part does not have to be in python, it might be simpler just writing a cron job that uses wget or curl to download the file.
That said, there's an example code in the notebook for chapter 2: https://github.com/ageron/handson-ml/blob/master/02_end_to_end_machine_learning_project.ipynb

Hope this helps,
Aurélien

from handson-ml.

sor3765 avatar sor3765 commented on May 2, 2024

I tried doing this example 1 copied and paste it... then add the def function... I keep getting the KeyError: 'Country'...... from Line 12 : gdp_per_capita.set_index("Country", inplace=True)

How to fix this tiny error? I tried it on Jupyter Notebook and Visual Studio both end up the same error.

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hi @sor3765 ,
Thanks for your question. Perhaps the problem comes from the data? Are you using oecd_bli_2015.csv and gdp_per_capita.csv which are available in the datasets/lifesat directory or did you try to download the latest data from the OECD and IMF websites?
Are you sure you did not modify the code in any way? Perhaps you should download it again, just to be sure?
If you copy/pasted the code, perhaps the indentation got modified?

from handson-ml.

ashokrajv avatar ashokrajv commented on May 2, 2024

I guess I download the file wrong maybe... it didnt explain clear where exactly I can get the file so I tried to get it from someone's github... but I can try this again tomorrow morning.

On Thu, Feb 14, 2019, 12:51 AM Aurélien Geron @.*** wrote: Hi @sor3765 https://github.com/sor3765 , Thanks for your question. Perhaps the problem comes from the data? Are you using oecd_bli_2015.csv and gdp_per_capita.csv which are available in the datasets/lifesat https://github.com/ageron/handson-ml/tree/master/datasets/lifesat directory or did you try to download the latest data from the OECD and IMF websites? Are you sure you did not modify the code in any way? Perhaps you should download it again, just to be sure? If you copy/pasted the code, perhaps the indentation got modified? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AWsy5FTfRznVvUlpTp3KTAB_fvdEWn38ks5vNQdzgaJpZM4NyLtq .

Hi sor3765,
Me too facing same issue. I am running in kaggle kernel, and I downloaded the file from public dataset
https://www.kaggle.com/abhilashanil/better-life-index-and-gross-domestic-product/kernels
Any help? Complete error in attachment
Thanks,
Ashok
KeyError_Country.txt

from handson-ml.

ageron avatar ageron commented on May 2, 2024

The files are available directly in this project, in the datasets/lifesat directory:
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv

from handson-ml.

ashokrajv avatar ashokrajv commented on May 2, 2024

The files are available directly in this project, in the datasets/lifesat directory:
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv
https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv

Thank you Aurélien Géron,
For quick reply.
I tried loading both files manually.
Able to upload gdp_per_capita.csv. But not able to succeed with oecd_bli_2015.csv.
Some existing file in Kaggle dataset is stopping me, but that file doesn't belong to me. How can I handle this? Screen shot attached.
image

Appreciate your help!

Thanks,
Ashok

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hi @ashokrajv ,
I have never run into this issue, sorry. It seems that Kaggle wants to avoid data duplication, so they're asking you to reuse the file from the other dataset. Not sure how this is done in Kaggle, I recommend you ask Kaggle.
Alternatively, you can just update the notebook to download the files instead of using the ones in the project:

from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve (URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve (URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")

Then you can load them using pd.read_csv(), as shown in the notebook.

Hope this helps.

from handson-ml.

ashokrajv avatar ashokrajv commented on May 2, 2024

Hi @ashokrajv ,
I have never run into this issue, sorry. It seems that Kaggle wants to avoid data duplication, so they're asking you to reuse the file from the other dataset. Not sure how this is done in Kaggle, I recommend you ask Kaggle.
Alternatively, you can just update the notebook to download the files instead of using the ones in the project:

from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve (URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve (URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")

Then you can load them using pd.read_csv(), as shown in the notebook.

Hope this helps.

Hi Aurélien Géron,
I tried using urlretrieve method as below.
replaced below line
#datapath = os.path.join("datasets", "lifesat", "")
with the code you have given to read directly from URL
from urllib.request import urlretrieve
URL = "https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/"
datapath = os.path.join("datasets", "lifesat", "")
urlretrieve(URL + "gdp_per_capita.csv", datapath + "gdp_per_capita.csv")
urlretrieve(URL + "oecd_bli_2015.csv", datapath + "oecd_bli_2015.csv")

Getting error in urlretrieve function.
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Full error in text attachment
URLError.txt
Any suggestions please.
Thanks,
Ashok

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Wow, that's bad luck! You seem to have DNS issues. Name resolution is converting the domain name (raw.githubusercontent.com) to an IP address (151.101.8.133). You could try again, as it says, it's probably a "temporary failure". If the problem persists, check your network settings or your ISP, something's fishy. Or you could just download the files manually by visiting https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/gdp_per_capita.csv and https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/lifesat/oecd_bli_2015.csv and select File > Save Page As...
Hope this helps

from handson-ml.

rkuma107 avatar rkuma107 commented on May 2, 2024

Hi ageron,
After following your first example, i excited and tested a simple linear regression model to calculate square of a number.

x = np.array([[1],[2], [3], [4], [5],[6], [7], [8], [9], [10], [11], [12], [13], [14], [15],[20],[25],[30]])
y = np.array([[1],[4], [9], [16], [25], [36], [49], [64], [81], [100], [121], [144], [169], [196], [225],[400],[625],[900]])
m = sklearn.linear_model.LinearRegression()
m2.fit(x,y)
m2.predict([[10]]) # array([[151.49643705]])
m2.predict([[35]]) # array([[881.60332542]])

In such a simple case, why Linear regression model is not able to give correct square for 10 & 35 ?
My apology if i am not suppose to ask such question here.

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hi @rkuma107 ,

A linear regression model assumes that the data you are trying to model is linear. In other words, it assumes that y = w1×x1 + w2×x2 + ... + wn×xn + b (plus some Gaussian noise), and it tries to find the coefficients w1 to wn and the bias term b. In your case there is a single input feature x1, so the model simplifies to: y = w1×x1 + b

However, in your example the data is not linear, it is quadratic, so the linear model makes inaccurate predictions. You can see this clearly in the following plot:

image

You can get this plot by running the following code in Jupyter (or Colab):

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import sklearn.linear_model

#x = np.array([[1],[2], [3], [4], [5],[6], [7], [8], [9], [10], [11], [12], [13], [14], [15],[20],[25],[30]])
#y = np.array([[1],[4], [9], [16], [25], [36], [49], [64], [81], [100], [121], [144], [169], [196], [225],[400],[625],[900]])
x = np.array([list(range(1, 15)) + list(range(15, 31, 5))]).reshape(-1, 1)
y = x ** 2
m = sklearn.linear_model.LinearRegression()
m.fit(x,y)
m.predict([[10]]) # array([[151.49643705]])
m.predict([[35]]) # array([[881.60332542]])

plt.plot(x, y, "o")
plt.xlabel("x", fontsize=16)
plt.ylabel("y", rotation=0, fontsize=16)
xs = np.linspace(0, 30, 100).reshape(-1, 1)
ys = m.predict(xs)
plt.plot(xs, ys)

Note that I defined x and y a bit differently: you can use range() to avoid typing long lists of integers, and you can run operations on NumPy arrays directly, for example y = x**2.
Hope this helps.

from handson-ml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.